Fill This Form To Receive Instant Help
Homework answers / question archive / ASSIGNMENT [Engineering] Senior Backend Engineer In this task, you are required to build 2 components of a micro service which obtains information on active containers travelling on sea using scrapers and a work-management system for handling failed scraping attempts
ASSIGNMENT [Engineering]
In this task, you are required to build 2 components of a micro service which obtains information on active containers travelling on sea using scrapers and a work-management system for handling failed scraping attempts.
Please use python 3.6 and follow best practices.
Create the Dockerfile for packaging your code into a docker image and the corresponding docker-compose.yml file for orchestrating the whole process including setting up the local database and the queue.
For evaluating the result, we expect the chosen components to be up and running by executing docker-compose up -d in the project directory so please make sure this is satisfied before submitting the code.
In ML/AI products the backbone is data, you should be able to demonstrate the data challenges you encountered and how you handled them in your code
Create a scraper using the scrapy framework. You must make use of the scrapy framework. You will be scraping https://www.msc.com/track-a-shipment for container numbers and bill of lading numbers.
As a stretch goal you can build a parsehub scraper for your own understanding, but this is not part of the assignment.
Create a prefect flow amalgamated with the above scraper as a data pipeline task to obtain container and bill of lading information. (https://www.prefect.io/cloud/)
Store the scraped data in the postgres database.
Create a perfect schedule in the same flow to run the scraper every 5mins using the queue you have set up. Also pass arguments in @task in your flow for failed retries.
Create a PostgreSQL database schema, and create a simple REST API
(https://fastapi.tiangolo.com/) to query the scraped data stored in the database.
Use Git and share the private github repo after completion. Include a simple README for me to reproduce the project.
Study the entries of each of the data points in the search results and try to understand the meaning of each.
Pay special attention to the following:
Container numbers & Bill of Lading numbers Tracking