Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / INTRODUCTION TO BIG DATA SUBJECT CODE: BDA400S1A Introduction to Big Data BDA400SA 1) (10 marks) Discuss five points of comparison between Python and R from a Big Data programming perspective

INTRODUCTION TO BIG DATA SUBJECT CODE: BDA400S1A Introduction to Big Data BDA400SA 1) (10 marks) Discuss five points of comparison between Python and R from a Big Data programming perspective

Computer Science

INTRODUCTION TO BIG DATA
SUBJECT CODE: BDA400S1A

Introduction to Big Data BDA400SA
1) (10 marks) Discuss five points of comparison between Python and R from a Big Data
programming perspective. Show samples of Python code and samples of R code to illustrate your
points.
2) (10 marks) List ten apache project open source components which are widely used in Hadoop
environments and explain, in one sentence, what each is used for – then - beside them, mention a
proprietary component which accomplishes a similar task.
3) (10 marks) Using R, write a program which will create a histogram with the following: X-axis
label of “Horsepower”
i. Y-axis label of “Number of Cars”
ii. Over-all title of “Car Comparisons”
iii. Bar width of 20mm
iv. Bar boarder which is black
v. Bar fill which is blue
4) (10 marks) When it comes to Db2 Big SQL:
i. List and describe eight file types it supports and an example of when to use it
ii. Describe the mapping of data types supported by Db2 Big SQL as it maps to SQL
standards and Hadoop.
iii. Write a program which accomplishes the following:
1. brings data into a HADOOP table from an existing traditional relational data
source
2. assume the data includes provincial level data - include a table which is
partitioned by province.
3. store all the data in parquet files
4. create a view over the table showing a subset of the data
5. select date from that view
6. use a schema which is called BDA400
5) (10 marks) When it comes to Machine Learning, describe the following:
a. Two key learning methods
b. The concept of being influenced by the wrong factors
c. The concept of accuracy
d. The continuous learning cycle
e. The evolution from mining to machine learning
6) (10 marks) With the emerging workload of event processing, describe the following:
a. Key characteristics of the workload
b. Architecture needed to provide an event processing solution
c. Components for a completely open source solution
3
d. Components used in the Db2 Event Store
e. A business example of where event processing would be used
7) (10 marks) Using Python, write a program which opens a text file and loads the data into a
dictionary where the key is the line # from the file and the value is the line of text from the file. For
instance, if the file looks like:
Hello,
Welcome to Introduction to Big Data
It is good to meet everyone.
Hope you enjoy the class.
The dictionary should look like:
KEY VALUE
1 Hello,
2 Welcome to Introduction to Big Data
3 It is good to meet everyone.
4 Hope you enjoy the class.
Then change it to a list and reverse the order of the lines of text and save it back to a new text file.
8) (10 marks) Describe the process of text analytics by using a common example of where
text analytics is used in the market. Make sure you include the following:
a. Information about the language used to perform Information Extraction
b. Common Extractors vs Advanced Extractors
c. Where you would likely have to do some application specific extractor definitions
d. Where open source technologies can help in the process
9) (10 marks) Describe how relational database management systems are expanding their
capabilities in the market place to address the growing needs of multi-model, NoSQL and NewSQL
a. Define what multi-model is and provide examples
b. Define NoSQL and examples of how RDBMSs provide NoSQL capabilities
c. Define NewSQL and examples of how RDBMSs provide NewSQL capabilities
10) (10 marks) Logical Data Lakes are becoming a key architecture for many companies. Describe
the following:
a. Data Lake
b. Physical Data Lake vs Logical Data Lake
c. What bad behaviors create a “Data Swamp”
d. In a Logical Data Lake architecture, what role does Data Virtualization play and why is it
important
e. In a Logical Data Lake architecture, what considerations would come into decide which data remains in its current location and what data do you still bring locally into the Data Lake

Option 1

Low Cost Option
Download this past answer in few clicks

26.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions