Trusted by Students Everywhere

Google USA

Why Choose Us?

0% AI Guarantee

Human-written only.

24/7 Support

Anytime, anywhere.

Plagiarism Free

100% Original.

Expert Tutors

Masters & PhDs.

100% Confidential

Your privacy matters.

On-Time Delivery

Never miss a deadline.

Predictive Modelling Extended Project Dear Participants, Please find below the Predictive Modelling Extended project instructions: • You have to submit 2 files : Business Report: In this, you need to submit all the answers to all the questions in a sequential manner

Name: Predictive Modelling Extended Project Dear Participants, Please find below the Predictive Modelling Extended project
Brand: Help In Homework
SKU: Q-83349
Price: 24.99 USD
Availability: InStock
Rating: 5 (10 reviews)

Statistics Dec 12, 2021

Predictive Modelling Extended Project

Dear Participants,

Please find below the Predictive Modelling Extended project instructions:

• You have to submit 2 files :

Business Report: In this, you need to submit all the answers to all the questions in a sequential manner. Your answer should include detailed explanations & inferences to all the questions. Your report should not be filled with codes. You will be evaluated based on the business report.

Note: In the business report, there should be a proper interpretation of all the tasks performed along with actionable insights. Only the presence of interpretation of the models is not sufficient to be eligible for full marks in each of the criteria mentioned in the rubric. Marks will be deducted wherever inferences are not clearly mentioned.

Jupyter Notebook file: This is a must and will be used for reference while evaluating.

Any assignment found copied/ plagiarized with another person will not be graded and marked as zero. Please ensure timely submission as a post-deadline assignment will not be accepted.

Problem 1: Linear Regression

You are a part of an investment firm and your work is to do research about these 759 firms.

You are provided with the dataset containing the sales and other attributes of these 759 firms. Predict the sales of these firms on the bases of the details given in the dataset so as to help your company in investing consciously. Also, provide them with 5 attributes that are most important.

Questions for Problem 1:

1.1) Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, data types, shape, EDA). Perform Univariate and Bivariate Analysis. (8 marks)

1.2) Impute null values if present? Do you think scaling is necessary in this case? (8 marks)

1.3) Encode the data (having string values) for Modelling. Data Split: Split the data into test and train (70:30). Apply Linear regression. Performance Metrics: Check the performance of

Predictions on Train and Test sets using R-square, RMSE. (8 marks)

1.4) Inference: Based on these predictions, what are the business insights and recommendations. (6 marks)

Data Dictionary for Firm_level_data:

sales: Sales (in millions of dollars).
capital: Net stock of property, plant, and equipment.
patents: Granted patents.
randd: R&D stock (in millions of dollars).
employment: Employment (in 1000s).
sp500: Membership of firms in the S&P 500 index. S&P is a stock market index that measures the stock performance of 500 large companies listed on stock exchanges in the United States
tobinq: Tobin's q (also known as q ratio and Kaldor's v) is the ratio between a physical asset's market value and its replacement value.
value: Stock market value.
institutions: Proportion of stock owned by institutions.

Dataset for Problem 1: Firm_level_data.csv

Problem 2: Logistic Regression and Linear Discriminant Analysis

You are hired by the Government to do an analysis of car crashes. You are provided details of car crashes, among which some people survived and some didn't. You have to help the government in predicting whether a person will survive or not on the basis of the information given in the data set so as to provide insights that will help the government to make stronger laws for car manufacturers to ensure safety measures. Also, find out the important factors on the basis of which you made your predictions.

Questions for Problem 2:

1. Data Ingestion: Read the dataset. Do the descriptive statistics and do null value condition check, write an inference on it. Perform Univariate and Bivariate Analysis. Do exploratory data analysis. (8 marks)
2. Encode the data (having string values) for Modelling. Data Split: Split the data into train and test (70:30). Apply Logistic Regression and LDA (linear discriminant analysis). (8 marks)
3. Performance Metrics: Check the performance of Predictions on Train and Test sets usingAccuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model.

Compare both the models and write inferences, which model is best/optimized. (8 marks)

1. Inference: Based on these predictions, what are the insights and recommendations. (6 marks)

Data Dictionary for Car_Crash

dvcat: factor with levels (estimated impact speeds) 1-9km/h, 10-24, 25-39, 40-54, 55+
weight: Observation weights, albeit of uncertain accuracy, designed to account for varying sampling probabilities. (The inverse probability weighting estimator can be used to demonstrate causality when the researcher cannot conduct a controlled experiment but has observed data to model)
Survived: factor with levels Survived or not_survived
airbag: a factor with levels none or airbag
seatbelt: a factor with levels none or belted
frontal: a numeric vector; 0 = non-frontal, 1=frontal impact
sex: a factor with levels f: Female or m: Male
ageOFocc: age of occupant in years
yearacc: year of accident
yearVeh: Year of model of vehicle; a numeric vector
abcat: Did one or more (driver or passenger) airbag(s) deploy? This factor has levels deploy, nodeploy and unavail
occRole: a factor with levels driver or pass: passenger
deploy: a numeric vector: 0 if an airbag was unavailable or did not deploy; 1 if one or more bags deployed.
injSeverity: a numeric vector; 0: none, 1: possible injury, 2: no incapacity, 3: incapacity, 4: killed; 5: unknown, 6: prior death
caseid: character, created by pasting together the populations sampling unit, the case number, and the vehicle number. Within each year, use this to uniquely identify the vehicle.

Dataset for Problem 2: Car_Crash.csv

Expert Solution

Buy This Solution

24.99 USD

Instant Access

Already a member? Sign In

Important Note: This solution is from our archive and has been purchased by others. Submitting it as-is may trigger plagiarism detection. Use it for reference only.

For ready-to-submit work, please order a fresh solution below.

Or get a fresh solution

Get Custom Quote