Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Data Management and Exploratory Data Analysis Outline This coursework assignment is designed to give you experience building a data analytics pipeline to process, query and gain insights into a dataset provided to you

Data Management and Exploratory Data Analysis Outline This coursework assignment is designed to give you experience building a data analytics pipeline to process, query and gain insights into a dataset provided to you

Statistics

Data Management and Exploratory Data Analysis

Outline

This coursework assignment is designed to give you experience building a data analytics pipeline to process, query and gain insights into a dataset provided to you. You will build a real-world data analysis pipeline making use of the technologies introduced during the lecture series and practicals as part of the CSC8631 module.

This coursework assignment provides you with an opportunity to work on an entire data analysis pipeline, from data sanitisation to querying and report generation. Consequently, you will have the opportunity to make appropriate use of a wide range of technologies, each of which we have seen during lectures and practical exercises. In this assignment you will submit an individual assignment, but collaboration with colleagues to discuss the problem and possible solutions is strongly encouraged.

Scenario

Learning Analytics, a rapidly-growing application area for Data Science, is defined as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environment in which it occurs”1.

Existing mechanisms to record student engagement (e.g. attendance monitoring) fail to capture the extent and quality of engagement outside of the classroom environment. Further complementary sources of data are routinely collected about our learners (e.g. use of on-campus facilities, Virtual Learning Environment (VLE) and Re- Cap access, and student wellbeing referrals); however, these currently reside in a number of silos.

Learning Analytics seeks to aggregate these sources of data to de-rive shared insights, and provide effective measures of engagement. Insights may inform learning design, inform intervention processes for at-risk students, and improve student attainment.

The most complete introduction is available in government policy policy report “From Bricks To Clicks” *. The report is quite extensive, but there are some nice case studies from Nottingham Trent and the OU to give you a flavour of the types of projects in this area.

Challenge

In this project we will emulate a very familiar process undertaken by data analysts. We will take a dataset provided to us, and develop a suite of tools which allow us to extract interesting insights from this data in a quick, reliable and repeatable manner. The datasets you are expected interpret as a data analyst are commonly previously unseen, so the process of building a pipeline is an exploratory one. Consequently, you will be expected to review and interrogate the data to gain an understanding of its structure and composition.

In this coursework you will develop a data analysis pipeline to explore a given dataset. There are no formal requirements for the functionality or focus of your analysis. Your data analysis should follow routes of enquiry which are of greatest interest to you. There- fore, there exists scope for a great deal of flexibility so we anticipate solutions to this challenge will vary.

 

We encourage you to pursue ambitious analysis, but just as importantly we are looking for good programming practice (see the ‘Best-practice development’ section of this document). When developing large systems such as these, it is important that you write your code incrementally, and test it carefully before continuing to add additional functionality.

Best-practice development

Throughout this coursework we are not simply interested in a solution which achieves some desired functionality. You will also be assessed on the following:

1. You should make use of Git for version control. As a rule of thumb; “if it isn’t visible in the version control logs, it didn't happen”. You will be expected to be (and be assessed on) pushing to your remote (on Github) regularly throughout the project.

2. All source code and programs as part of your solution should be well documented.

3. You should consider the reproducibility of your analysis, making use of ProjectTemplate for R.

4. You should produce much of your written documentation using ‘literate programming’ framework RMarkdown for R.

Design and Implementation Reports

You should prepare a short report via NESS (no more than two pages) summarising the work carried out, and a critical reflection3 on your experience using the tools and techniques introduced on this module in completing the coursework assignment. We are particularly interested in any assumptions you made about the data, and how they motivated your design decisions.

3You should consider the relative merits and limitations of the approaches. No one methodology is perfectly suited to each project, so it is completely reasonable for you to identify limitations of CRISP-DM as a methodology.

You should also produce additional documentation detailing the findings from your exploratory analysis. You should include documentation of all analyses undertaken, whether or not they produced ‘successful’ findings. You should follow the principles of the CRISP-DM 4 methodology and use literate programming framework RMarkdown5, to align analytic code with narrative text. 5 hate. |

4Chapman, et al. (2000). CRISP-DM 1.0: Step-by-step data mining guide. SPSS.

You should submit the source file(s) for the notebook(s) as well as output saved in PDF format. Your report should be a maximum of 20 pages and should be structured in a way that guides the reader through the steps of your analysis.

  • Technical

— Documentation, including code comments and a README document.

— Effective use of Version Control

— Use of ProjectTemplate to achieve reproducibility in your project.

  • Methodology

— Documentation, including code comments and a README document.

— Effective use of Version Control

— Use of ProjectTemplate to achieve reproducibility in your project.

  • Project reporting

— Critical Reflection

— Quality of the communication of your work and the accompanying rationale/significance

Presentation (20% of the marks for the module)

You will pre-record a short presentation focusing on your key findings. We recommend you record a Zoom meeting where you share your screen; audio only is perfectly acceptable if you prefer not to use video for your recording.

Your presentation will be five minutes in duration. We will allow leeway of 5 minutes (+ 30 seconds). Your marker will not watch any content beyond 5:30 and any content appearing after 5:30 will not be marked. Marks for your presentation will be divided equality between Content and Delivery.

  • Content (50% of the Presentation marks)

— The motivation for your analysis is clear.

— Clear statement of the data used to support your analysis.

—- A clear description of the analysis work you undertook.

— A clear description of key findings from your analysis.

— Concluding remarks, relating the findings of your analysis back to their implications for the business context.

  • Delivery (50% of the Presentation marks)

— The slides and speech are clearly understandable6.

 — The presentation is well structured and has a natural flow.

— Time (5 minutes + 30 seconds)

— The slides are well presented with thought given to formatting  and aesthetics

— Effort is taken to talk around the slides

6If you have concerns about being able to record audio, please get in touch with Matt and Joe at your earliest convenience. In our experience, microphones included with your laptop, mobile device or headphones are perfectly suited to this task.

Clarification of requirements

It is often necessary to clarify client requirements throughout the course of a project. Matt and Joe will be happy to assist help clarify any questions you have surrounding the deliverables you are sup- posed to produce, and resolve any ambiguities which may arise as you explore the provided dataset.

Deliverables and Online Submission

You will submit your assignment electronically via NESS 7 by 16:30pm on Friday 19th November 2021. You are required to submit several ‘deliverables’ to NESS.

Source code You are expected to submit all source code developed in the coursework. You should also provide a README.txt document clearly stating which files relate to which part of the coursework solution. Your README.txt file should also provide instructions on running your analyses. These instructions should be sufficient to run the analysis. This should be automated as much as possible, and any non-automated configuration or installation steps should be clearly documented.

Written report Written reports should be submitted in PDF format, and should clearly indicate your name and student number within the document, and also in the file name.

Presentation Slides in PDF format, and in source code format e.g. RMarkdown, Keynote, PowerPoint. Presentation video, e.g. mp4.

Zip files You will often be submitting a number of files at once. You will likely find it most convenient to zip these files up prior to submitting them to NESS. Please ensure any zip files contain your student number and module code in the filename.

Version Control log In this coursework assignment we emphasise the importance of correctly using version control. You should include with your coursework submission a copy of your Git log. This may be easily obtained using the following command;

git log > [yourstudentnumber]GitLogFile. txt.

Questions?

If you have any queries about this coursework exercise or what you are required to submit, do not hesitate to ask us during one of the lectures or practical sessions, or email both Matt and Joe at matthew. forshaw@ncl.ac.uk and joe.matthews@ncl.ac.uk.

Option 1

Low Cost Option
Download this past answer in few clicks

38.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE