Fill This Form To Receive Instant Help
Homework answers / question archive / COMP 2150 – Winter 2022 – Assignment 4 compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents
COMP 2150 – Winter 2022 – Assignment 4
compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents. In this assignment the algorithm we will use is called Huffman encoding, and it works by building a binary tree that encodes each character as a sequence of 0s and 1s. The encoding is variable-length: some characters will be encoded as a short sequence of 0s and 1s, while other (more rare) characters will be encoded with longer sequences.
Part 1: Hash Table Dictionary
Create a javascript class called Dictionary that is implemented as a hash table. In the
Dictionary, each entry is a Key-Value pair, and there can't be two Key-Value pairs in a dictionary
with the same Key (that is, keys are unique in a dictionary). The Key will be an object that has a
hashVal method (see below for the Hashable class and hashVal information). A Value can be
anything (except undefined).
When initialized, the Dictionary should be given a size as a parameter1
. The Dictionary will
create an array of that size as a field. When adding or searching for a Key-Value pair in the
Dictionary, the position of the pair in the hash table will be the key’s hashVal modulo the length
of the array.
The hash table should use separate chaining. If there is a collision between two elements based
on their hashVal, the hash table should place both elements in a linked list of elements at the
array position. The order of elements in the linked list at a position is not important.
The Dictionary should implement the following methods:
1. put(k,v): takes a Hashable key k and a value v and adds it to the dictionary, if it does not exist. If a key-value pair with k as the key already exists, the current value is replaced with v. The method does not have a return value.
2. get(k): takes a Hashable key k and returns the value v associated with it, if it appears in the dictionary. The method should return undefined if the key does not appear in the dictionary.
3. contains(k): takes a Hashable key k and determines if it is a key in the dictionary.
Returns a boolean value.
4. isEmpty(): returns a boolean depending on whether the dictionary is empty or not.
For each of the first three methods, the method should ensure that the key parameter has type
Hashable (or has a hashVal() method, and an equals() method, as appropriate), and it should throw an error if the condition does not hold.
Hashable
You should also implement a hierarchy of Hashable classes that can be stored as Keys in the
hash table. The Hashable class should be abstract (using the techniques shown in class) with
an abstract hashVal() and an abstract equals(x) method. You should implement at least the
following two Hashable subclasses:
• IntHash, whose hashVal function is the value of the integer. Two IntHash objects are
equal if they contain the same integer value.
• StringHash, whose hashVal function is the following expression:
s[0]*pn-1 + s[1]*pn-2 + ...+ s[n-3]*p2 + s[n-2]*p + s[n-1]
where s[i] is the ASCII value of the i-th character of the string, n is the length of the
string, and p is a prime.2 You can assume that all characters in the strings are ASCII characters. Two StringHash objects are equal if they contain the same strings.
Note: to get the ASCII value of a character from a string, you can use .charCodeAt(): x.charCodeAt(i) gives the ASCII value of the character at position i of the string x. he subclasses can have additional methods if you need them, including gette
Part 3: Huffman Encoding
After creating your Dictionary and Huffman Tree classes, use them to implement Huffman
encoding. Huffman encoding is used to compress text files by replacing characters in the text
with sequences of bits (0-1).4
The algorithm works as follows:
1. The input to the algorithm is a text file.
2. Before encoding anything, the entire file should be read, and character frequencies are
calculated for each character that appears in the file. You then calculate, for each
character X in the file “what is the percentage of the file that is the character X?” (as a
number between 0 and 1). Store this information in a Dictionary from Part 1. This
calculation should be done for all characters, including spaces, newlines, etc.
3. Next, the Huffman Encoding is constructed. The Huffman Encoding is built by
constructing a set of Huffman trees, and then joining these trees until one final Huffman
tree remains.
a. Initially, we construct a tree for each character in the file. The weight used for
these initial trees are the percentages (0-1) calculated in step 2.
b. At each step, we choose the two Huffman Trees in the set that are smallest,
according to the order given by the compareTo algorithm in Part 2. We join
these two trees, using the joining algorithm in Part 2. (The left tree should be the
smaller tree using compareTo). Remove the two trees that were joined from the
set of Huffman Trees and add the new tree (thus, the set will have one fewer tree
at each step).
c. Repeat this until all of the Huffman Trees are combined into one tree, which we
call T.
To support this step, you may need to construct additional javascript classes.
4. Then, the Huffman code is calculated. For each different character X that appears in the file, calculate the path from the root of T to the leaf that is labelled with the character X.
Build a string that represents this path to X: Each time the path moves from a node to
the left child, add a zero to the string and each time it moves to the right child, add a 1 to
the string.
5. Read the characters from the input file again. For each character in the input file, write the binary encoding sequence from step 4 to the output file. Add spaces between each symbol but no newline characters, except for one at the end.
See an example run of Huffman encoding at the end of the assignment.
Your Huffman Encoder class should have a constructor and one method called encode(). The
constructor should take the name of the file that should be encoded. The encode() method should open a new output file (by adding – not replacing - the “.huff” extension to the original file
name), open the input file, and write the codes for the Huffman algorithm to the output file. (You
should write the values as strings, with a single space separating the codes. Do not output any newline symbols, except for one at the end. Yes, the output file will be larger than the input file.)
You do not need to implement a decoder.
Hand-in
Submit all source code for all classes. Submit all files on umlearn.
You will be provided a test text file for your encoder before the deadline. Submit your output on
thatfile with your submission.
You MUST submit all of your files in a zip file. Additionally, you MUST follow these rules:
• All of the files required for your project should be in a single directory.
• Include a README.TXT file that describes exactly how to run your code from the
command line. The markers will be using these instructions exactly and if your code
does not run directly, you will lose marks.
The easier it is for your assignment to mark, the more marks you are likely to get. Do yourself a
favour.
Example Huffman encoding
Consider the string “TOBEORNOTTOBETHATISTHEBANANA”. The Huffman algorithm first
calculates the following weights:
Character Weight
T 0.21428571428571427
O 0.14285714285714285
B 0.10714285714285714
E 0.10714285714285714
R 0.03571428571428571
N 0.10714285714285714
H 0.07142857142857142
A 0.14285714285714285
I 0.03571428571428571
S 0.03571428571428