Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / This a Big Data Management Project

This a Big Data Management Project

Unrecognized

This a Big Data Management Project. I want to know the price for the first and the second deliverable, please. First deliverable - March: 3/23/2025 Second deliverable - April: 18/5/2025 This project requires to show progress to the professor and he will give us feedback. And this project has a weight of 40% in the final grade. I need a progress for this sunday 16th. Specifically, I need the data ingestion of the unstructured data in the data lake. You have to code an entire pipeline from scratch with python. I share the following folders with you: 1. Domain and Data Sources and Contextualization (Folder) 2. Project Requirements (Folder) 3. Project Structure (Folder) 4. Project example of previous year (Folder) 5. Explanation of the folders (image) 6. Document I attahed to the email sent to professor (Word) 7. Mail - Consultations on BDM project to professor (PDF) For the first deliverable this is what you have to do: - You have to start doing the whole coding project from scratch which is an entired pipeline. - You have to use the data sources and as are described in the file: BDM_Project_P1 (Domain and Data Sources and Contextualization folder). - You have to do (code) all P1. You can see the image of the diagram of the pipeline desing of the project named: PipelineDesign - P1 (Diagram of what you should do) and PipelineDesign (Domain and Data Sources and Contextualization folder). - See the email of my professor, he answeer some questions I made him (Mail - Consultations on BDM project to professor - PDF and Document I attahed to the email sent to professor - Word): - First, he agreed with the structure of the project (Project Structure folder). - Second, he agree that this is what have to be done in the project: This is for P1: First, Apache Airflow is used to define a DAG (Directed Acyclic Graph) that schedules and executes tasks at predefined intervals. The DAG calls a PythonOperator, which runs a Spark job using Apache Spark. This job loads raw data from an external source such as an API, database, or file storage. The data is stored in Delta Lake, which ensures ACID transactions, schema enforcement, and versioning. Also add metadata. This is for P2: Later, Apache Spark can retrieve data from Delta Lake for further processing, analysis, or querying. Finally, Apache Airflow continues to monitor and orchestrate the pipeline, ensuring that each step runs reliably and on schedule. For the second deliverable: I do not know specifically what have to be done in the second deliverable. The professor have not gave us instructions yet. But I know we have to use tensorflow, machine learning, Apache Airflow, and probably Apache spark or hadoop (I am not sure about hadoop), but I am not sure yet. I give you the Report of the deliverable P1, it is almost finish. The only thing I needd you to add is: -How to run the project. -And how the project is build itself. You can see a project example I share as well (Project example of previous year (Folder)). The project have not the same requirements and tools but is a previous project done in this subject. You can see the report document and the way is built. If you need any clarifications, please. Let me know.

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE