Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / CS1026: Assignment 3 In this assignment, you will write a Python module, called sentiment_analysis

CS1026: Assignment 3 In this assignment, you will write a Python module, called sentiment_analysis

Computer Science

CS1026: Assignment 3 In this assignment, you will write a Python module, called sentiment_analysis . py (this is the Sentiment Analysis name of the file that you should use) and a main program, main . py, that uses the module to analyze Twitter information. In the module sentiment_analysis . py, you will create a function that will perform simple sentiment analysis on Twitter data. The Twitter data contains comments from Due: November 11th, 2020. 9:00pm. individuals about how they feel about their lives and comes from individuals across the continental Weight: 12% United States. The objective is to determine which timezone (Eastern, Central, Mountain, Pacific; see below for more information on how to do this) is the "happiest". To do this, your program will need >Learning Outcomes to: By completing this assignment, you will gain skills relating to 1. Analyze each individual tweet to determine a score - a "happiness score" - for the individual tweet. Using functions 2. The "happiness score" for a single tweet is found by looking for certain keywords (which are Complex data structures given) in a tweet and for each keyword found in that tweet totaling their "sentiment values". In this assignment, each value is an integer from 1 to 10. Text processing The happiness score for the tweet is simply the sum of the "sentiment values" for keywords File input and output found in the tweet divided by the number of keywords found in the tweet. If there are none of the given keywords in a tweet, it is just ignored, i.e., you do NOT count it. Exceptions in Python To determine the words in a tweet, you should do the following: 0 Using Python modules 0 Testing programs and developing test cases; adhering to specifications _ Writing code that is Separate a tweet into words based on white space. A "word" is any sequence of characters used by other programs. surrounded by white space (blank, tab, end of line, etc.). 0 You should remove any punctuation from the beginning or end of the word (do NOT worry >Background about punctuation within a word). So, "#lonely" would become "lonely" and "happy!!" would become "happy"; but "not-so-happy" is just "not-so-happy". With the emergence of Internet companies such as Google, Facebook, and Twitter, more and more data accessible online is comprised of text. Textual data and the computational means of processing it and extracting information is also increasingly more important in areas such as business, 0 You should convert the "word" into just lower case letters. This gives you a "word" from humanities, social sciences, etc. In this assignment, you will deal with textual analysis. the tweet. Twitter has become very popular, with many people "tweeting" aspects of their daily lives. This "flow If you match the "word" to any of the sentiment keywords (see below), you add the score of of tweets" has recently become a way to study or guess how people feel about various aspects of the world or their own life. For example, analysis of tweets has been used to try to determine how that sentiment keyword to a total for the tweet; you should just do exact matches. For example, if the word "hats" is in the tweet and the word "hat" is a sentiment keyword, then certain geographical regions may be voting - this is done by analyzing the content, the words, and phrases, in tweets. Similarly, analysis of keywords or phrases in tweets can be used to determine how they DO NOT MATCH. Of course, if "hats" is in the list of sentiment keywords, then there is a match. popular or unpopular a movie might be. This is often referred to as sentiment analysis. 3. For each region, you should count the number of tweets in that region and you should count the >Task number of tweets with keywords - these are called "keyword tweets". A "keyword tweet" is a tweet in the region in which there was at least one matched keyword. [Note: the number of "keyword tweets" is always less than or equal to the total number of tweets in a region].

4. The "happiness score" for a timezone is just the total of the happiness scores for the all the (you are free to explore different sets of keywords and values at your leisure for the sheer fun keyword tweets in the region divided by the number of keyword tweets in that region; again, if a of it!). tweet has NO keywords, then it is NOT to be counted as a "keyword tweet" in that timezone, ie., it is just skipped as a "keyword tweet" but counted in the total number of tweets in that region. >Determining timezones across the continental United States A file called tweets . txt contains the tweets and a file called keywords . txt contains Given a latitude and longitude, the task of determining exactly the location that it corresponds to can be very challenging given the geographical boundaries of the United States. For this assignment, we keywords and scores for determining the "sentiment" of an individual tweet. These files are simply approximate the regions corresponding to the timezones by rectangular areas defined by described in more detail below. latitude and longitude points. Our approximation looks like: p9 p7 p5 p3 p1 >File tweets . txt The file tweets. txt contains the tweets; one per line (some lines are quite long). The format of a Pacific Mountain Central Eastern tweet is: " [lat, long] value date time text where: p10 p& p6 p4 p2 [lat, long) - the latitude and longitude of where the tweet originated. You will need these values to determine the timezone in which the tweet originated. So, the Eastern timezone, for example, is defined by latitude-longitude points pl, p2, p3, and p4. To determine the origin of a tweet, then, one simply has to determine in which region the latitude and value - not used; this can be skipped. longitude of the tweet belongs. The values of the points are: date - the date of the tweet; not used, this can be skipped. p1 = (49.189787, -67.444574) p2 = (24.660845, -67.444574) p3 = time - the time of day that the tweet was sent; not used this can be skipped. " text - the (49.189787, -87.518395) p4 = (24.660845, -87.518395) p5 = text in the tweet. (49.189787, -101.998892) p6 (24.660845, -101.998892) p7 : (49.189787, -115.236428) p8 = >File keywords . txt (24.660845, -115.236428) p9 The file keywords . txt contains sentiment keywords and their "happiness scores"; one per line. (49.189787, -125.242264) The format of a line is: p10 = (24.660845, -125.242264) Note: if the latitude-longitude of a tweet is outside of all these regions, it is to be skipped; if a tweet keyword, value where: is on the border between regions, then choose one of the regions. keyword - the keyword to look for. > Functional Specifications: value - the value of the keyword; values are from 1 to 10, where 1 represents very Developing code for the processing of the tweets and sentiment analysis. "unhappy" and 10 represents "very happy".

1. Your module sentiment_analysis . py must include a function compute tweets that has these to test your function; these files are small enough that you can compute the results by hand two parameters. The first parameter will be the name of the file with the tweets and the second to test your program. You should use the program and these files to test your code. Note: while parameter will be the name of the file with the keywords. This function will use these two files to driver . py does some testing, it is your responsibility to design your own test cases to test it process the tweets and output the results. This function should also check to make sure that both thoroughly. files exist and if either does not exist, then your program should generate an exception and the function compute tweets should return an empty list (see part 1.c below). >Additional Information For both files, it is advised that when you read in the files you the line below to avoid encoding errors. The function should input the keywords and their "happiness values" and store them in a data open("fileName. txt" "[" encoding="utf-8") or open( ' fileName . txt ', encoding='utf-8', structure in your program (the data structure is of your choice). errors='ignore'). Your function should then process the file of tweets, computing the "happiness score" for each tweet and computing the "happiness score" for each timezone. You will need to read the >Non-functional Specifications file of tweets line by line as text and break it apart. The string processing functions in Python 1. Include brief comments in your code identifying yourself, describing the program, and describing are very useful for doing this. Your program should not duplicate code. It is important to key portions of the code. determine places that code can be reused and create functions. Your program should ignore 2. Assignments are to be done individually and must be your own work. Software may be used to tweets from outside the time zones. detect cheating. 3. Use Python coding conventions and good programming techniques, for example: Your function, compute tweets, should return a list of tuples: Meaningful variable names a. The list should contain the results in a tuple for each of the regions, in order: Eastern, Conventions for naming variables and constants Central, Mountain, Pacific. . Each tuple should contain three values: (average, count of keyword tweets, Use of constants where appropriate count of tweets), where average is the average "happiness value" of that region, Readability: indentation, white space, consistency count of keyword tweets is the number of tweets found in that region with keywords and count of tweets is the number of tweets found in that region. These values should be in the order specified. You should submit the files main . py and sentiment_analysis . py (others are not required). c. Note: if there is an exception from a file name that does not exist, then an empty list Make sure you upload your Python file to your assignment; DO NOT put the code inline in the textbox. should be returned. 2. Your main program, main . py, will prompt the user for the name of the two files - the file >What You Will Be Marked On containing the keywords and the file containing the tweets. It will then call the function 1. Your program will be executed by an automated testing program. This testing program assumes compute tweets with the two files to process the tweets using the given keywords. that: Your main program will get the results from compute tweets and print the results; it should print the results in a readable fashion (i.e., not just numbers). 0 The modules are named main . py, and sentiment_analysis . py. That you are using Python 3.8. " That you have submitted it via OWL by uploading it. 3. You are also given a program, driver . py, and some test files. The test files are small files of tweets and keywords that driver . py uses to test your program - that is, it will import your program, sentiment_analysis . py, and will make use of the function compute tweets Failure to adhere to these constraints will likely cause the testing program to The files tweetsl . txt and tweets2 . txt are small files with tweets and the files key1 . txt fail. This may require a remarking of your program which will include a 20% and key2 . txt contain keywords and "happiness values". The program driver . py will use penalty.

2. Functional specifications: " Is the program named correctly for testing, i.e., is the module correctly named sentiment analysis . py? Is there a function compute tweets and are the parameters in the correct order? Is there a program main . py which imports and makes use of the module sentiment analysis . py? Does the program behave according to specifications? Does it work with the test program, driver . py ? Does the program handle incorrect function names? Is there an effective use of functions beyond compute Is the output according to specifications? Note: A program like driver . py and other test files will be used to test your program as well. 3. Non-functional specifications: as described above. 4. Assignment submission: via the OWL, though the assignment submission in OWL.

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE