Fill This Form To Receive Instant Help
Homework answers / question archive / Project: A Tutorial Motivation Instead students are asked to submit a tutorial that walks the reader through the Data Science pipeline
Project: A Tutorial
Motivation
Instead students are asked to submit a tutorial that walks the reader through the Data Science pipeline. The subject matter of this tutorial is far less important that the ability to communicate the approach throughout and a meaningful discussion of the implications/interpretations of the final results. For the purposes of this tutorial, we will assume that ‘The Data Science
Pipeline’ has the following phases:
1. Data collection/curation + parsing (if necessary)
2. Data management /representation
3. Exploratory data analysis
4. Hypothesis testing and machine learning
5. Communication of insights attained
It is required that each tutorial is a self-contained artifact, using a combination of Mardown and Python code within a Jupyter Notebook. This artifact should be publically available on the web.
1) Expectations
In general we would expect a good submission to provide the following at a minimum:
2 Examples
The following are links to final projects from past semesters. ‘They should be seen as a rough guide to what is expected and to the variety of topics that can be pursued and not as examples of the highest-scoring submissions.
Analysis of Freddie Mac’s Single Family Loan-Level data
Alzheimer’s
3.1 Format of your deliverable
The formatting for the majority of the deliverable is left to your discretion.
However, each submission must begin with the title of the tutorial, providing a rough idea of the topic, followed by your name (and all members of the group).
4 Assessment
The following dimensions of each submission will be given a rating between 1-10:
1. Motivation
2. Understanding
3. Resources
4. Prose
5. Code
6. Communication of Approach
7. Subjective Evaluation
Motivation: each tutorial should be sufficiently motivated. If there is not motivation for the analysis, why would we ’do data science’ on this topic?
Understanding: the reader of the tutorial should walk away with some new understanding of the topic at hand. If it’s not possible for a reader to state ‘what they learned’ from reading your tutorial, then why do the analysis?
Resources: tutorials should help the reader learn a skill, but they should also provide a launching pad for the reader to further develop that skill. The tutorial should link to additional resources wherever appropriate, so that a well-motivated reader can read further on techniques that have been used in the tutorial.
Prose: it’s very easy to write the literal English for what the Python code is doing, but that’s not very useful. The prose should enhance, the tutorial, adding additional context and insight.
Code: code should be clear and commented. Function definitions should be described and given context/motivation. If the prose helps the reader under- stand why the code should be sufficient for the reader to learn how.
Communication of Approach: every technical choice has alternatives, why did you choose the approach taken in the tutorial? A reader should walk away with some idea of what the trade-offs may be.
Subjective Evaluation: does the tutorial seem polished and ‘publishable’, or haphazard and quickly thrown together? The tutorials should read as well put together and having undergone a few iterations of editing and refinement.
This should be the easiest of the dimensions.
4.1 Grades
Once each tutorials has been rated along each dimension, the score for each dimension will be scaled according to the following rubric:
Category | Points Available
Motivation 10
Understanding 10
Resources 10
Prose 20
Code 20
Communication of Approach 20
Subjective Evaluation 10
Total Points: 100