Homework answers / question archive / COMP 2150 – Winter 2022 – Assignment 4 compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents

COMP 2150 – Winter 2022 – Assignment 4 compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents

Computer Science

Share With

COMP 2150 – Winter 2022 – Assignment 4

compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents. In this assignment the algorithm we will use is called Huffman encoding, and it works by building a binary tree that encodes each character as a sequence of 0s and 1s. The encoding is variable-length: some characters will be encoded as a short sequence of 0s and 1s, while other (more rare) characters will be encoded with longer sequences.

Part 1: Hash Table Dictionary

Create a javascript class called Dictionary that is implemented as a hash table. In the

Dictionary, each entry is a Key-Value pair, and there can't be two Key-Value pairs in a dictionary

with the same Key (that is, keys are unique in a dictionary). The Key will be an object that has a

hashVal method (see below for the Hashable class and hashVal information). A Value can be

anything (except undefined).

When initialized, the Dictionary should be given a size as a parameter1

. The Dictionary will

create an array of that size as a field. When adding or searching for a Key-Value pair in the

Dictionary, the position of the pair in the hash table will be the key’s hashVal modulo the length

of the array.

The hash table should use separate chaining. If there is a collision between two elements based

on their hashVal, the hash table should place both elements in a linked list of elements at the

array position. The order of elements in the linked list at a position is not important.

The Dictionary should implement the following methods:

1. put(k,v): takes a Hashable key k and a value v and adds it to the dictionary, if it does not exist. If a key-value pair with k as the key already exists, the current value is replaced with v. The method does not have a return value.

2. get(k): takes a Hashable key k and returns the value v associated with it, if it appears in the dictionary. The method should return undefined if the key does not appear in the dictionary.

3. contains(k): takes a Hashable key k and determines if it is a key in the dictionary.

Returns a boolean value.

4. isEmpty(): returns a boolean depending on whether the dictionary is empty or not.

For each of the first three methods, the method should ensure that the key parameter has type

Hashable (or has a hashVal() method, and an equals() method, as appropriate), and it should throw an error if the condition does not hold.

Hashable

You should also implement a hierarchy of Hashable classes that can be stored as Keys in the

hash table. The Hashable class should be abstract (using the techniques shown in class) with

an abstract hashVal() and an abstract equals(x) method. You should implement at least the

following two Hashable subclasses:

• IntHash, whose hashVal function is the value of the integer. Two IntHash objects are

equal if they contain the same integer value.

• StringHash, whose hashVal function is the following expression:

s[0]*pn-1 + s[1]*pn-2 + ...+ s[n-3]*p2 + s[n-2]*p + s[n-1]

where s[i] is the ASCII value of the i-th character of the string, n is the length of the

string, and p is a prime.2 You can assume that all characters in the strings are ASCII characters. Two StringHash objects are equal if they contain the same strings.

Note: to get the ASCII value of a character from a string, you can use .charCodeAt(): x.charCodeAt(i) gives the ASCII value of the character at position i of the string x. he subclasses can have additional methods if you need them, including gette
Part 3: Huffman Encoding

After creating your Dictionary and Huffman Tree classes, use them to implement Huffman

encoding. Huffman encoding is used to compress text files by replacing characters in the text

with sequences of bits (0-1).4

The algorithm works as follows:

1. The input to the algorithm is a text file.

2. Before encoding anything, the entire file should be read, and character frequencies are

calculated for each character that appears in the file. You then calculate, for each

character X in the file “what is the percentage of the file that is the character X?” (as a

number between 0 and 1). Store this information in a Dictionary from Part 1. This

calculation should be done for all characters, including spaces, newlines, etc.

3. Next, the Huffman Encoding is constructed. The Huffman Encoding is built by

constructing a set of Huffman trees, and then joining these trees until one final Huffman

tree remains.

a. Initially, we construct a tree for each character in the file. The weight used for

these initial trees are the percentages (0-1) calculated in step 2.

b. At each step, we choose the two Huffman Trees in the set that are smallest,

according to the order given by the compareTo algorithm in Part 2. We join

these two trees, using the joining algorithm in Part 2. (The left tree should be the

smaller tree using compareTo). Remove the two trees that were joined from the

set of Huffman Trees and add the new tree (thus, the set will have one fewer tree

at each step).

c. Repeat this until all of the Huffman Trees are combined into one tree, which we

call T.

To support this step, you may need to construct additional javascript classes.

4. Then, the Huffman code is calculated. For each different character X that appears in the file, calculate the path from the root of T to the leaf that is labelled with the character X.

Build a string that represents this path to X: Each time the path moves from a node to

the left child, add a zero to the string and each time it moves to the right child, add a 1 to

the string.

5. Read the characters from the input file again. For each character in the input file, write the binary encoding sequence from step 4 to the output file. Add spaces between each symbol but no newline characters, except for one at the end.

See an example run of Huffman encoding at the end of the assignment.

Your Huffman Encoder class should have a constructor and one method called encode(). The

constructor should take the name of the file that should be encoded. The encode() method should open a new output file (by adding – not replacing - the “.huff” extension to the original file

name), open the input file, and write the codes for the Huffman algorithm to the output file. (You

should write the values as strings, with a single space separating the codes. Do not output any newline symbols, except for one at the end. Yes, the output file will be larger than the input file.)

You do not need to implement a decoder.

Hand-in

Submit all source code for all classes. Submit all files on umlearn.

You will be provided a test text file for your encoder before the deadline. Submit your output on

thatfile with your submission.

You MUST submit all of your files in a zip file. Additionally, you MUST follow these rules:

• All of the files required for your project should be in a single directory.

• Include a README.TXT file that describes exactly how to run your code from the

command line. The markers will be using these instructions exactly and if your code

does not run directly, you will lose marks.

The easier it is for your assignment to mark, the more marks you are likely to get. Do yourself a

favour.

Example Huffman encoding

Consider the string “TOBEORNOTTOBETHATISTHEBANANA”. The Huffman algorithm first

calculates the following weights:

Character Weight

T 0.21428571428571427

O 0.14285714285714285

B 0.10714285714285714

E 0.10714285714285714

R 0.03571428571428571

N 0.10714285714285714

H 0.07142857142857142

A 0.14285714285714285

I 0.03571428571428571

S 0.03571428571428

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

comp2150-w2022-A4.pdf

Answer Preview

PFA

COMP 2150 – Winter 2022 – Assignment 4 compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents

Computer Science

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions

menu

COMP 2150 – Winter 2022 – Assignment 4 compression (as used in zip and rar, for instance) is the process of taking large files and making them smaller by analyzing their contents

Computer Science

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions