Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Steps to do This is interaction with GCP console and DATAPROC cluster where I should login into master node and do the following operations: I had created a bucket and that bucket has multiple csv files

Steps to do This is interaction with GCP console and DATAPROC cluster where I should login into master node and do the following operations: I had created a bucket and that bucket has multiple csv files

Computer Science

Steps to do

This is interaction with GCP console and DATAPROC cluster where I should login into master node and do the following operations:

I had created a bucket and that bucket has multiple csv files.

 

In the master VM instance we should do the following

  1. Build a wrapper i.e shell script
  2. Shell script should call python file
  3. That python file should have dataframe or spark summit job which asks us for input argument – csv file name and GCS location to take from that file.
  4. After getting csv file  with use of  dataframe or spark summit job to load that csv into hive table.
  5. Read data from hive table and write to external table.
  6. Now, convert the data from csv table to parquet tabular format.
  7. Show the table size storages for csv and parquet tables
  8. Perform one aggregare query and show time diff of csv and parquet
  9. Perform one Join operation and show diff in csv and parquet
  10. Do the same for AVRO format

Option 1

Low Cost Option
Download this past answer in few clicks

23.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE