Fill This Form To Receive Instant Help
Homework answers / question archive / - programme ALS alhorithm for numerical scaling regression and test a dataset, without including all possible optimal scaling levels(numerical scaling level is used for outcome variable) -should work on any dataset - scaling levels are indicated per predictor variable - the algorithm updates beta and vector j for each predictor j seperately
- programme ALS alhorithm for numerical scaling regression and test a dataset, without including all possible optimal scaling levels(numerical scaling level is used for outcome variable) -should work on any dataset - scaling levels are indicated per predictor variable - the algorithm updates beta and vector j for each predictor j seperately. - the fucntion should give iteration number, current loss, decrease in loss etc see document 'n'. I can provide lectures and solutions of other homework assignment that cal help
Dataset on medical costs
The dataset that we use for this assignment is the “Medical Cost Personal
Dataset” used in the book Machine Learning with R by Brett Lantz. It contains the following variables:
age: age of primary beneficiary
sex: insurance contractor sex: female or male
bmi: Body mass index
children Number of children covered by health insurance.
smoker: Smoking: yes or no
region: the beneficiary's residential area in the US: northeast, southeast,
southwest. northwest.
charges: Individual medical costs billed by health insurance
In this assignment we focus on predicting an individual's medical costs
billed by health insurance (charges) from the other variables.
Part 1: Program your own ALS algorithm
In this first part you will program the ALS algorithm for Optimal Scaling
Regression and test it on the Medical Costs dataset. Your function does
NOT include all possible optimal scaling levels. Instead, a numeric scaling
level is always used for the outcome variable. And for all the predictor vari-
ables, either a numeric or nominal scaling level is used, depending on the
user specification.
Note that we have also programmed the ALS algorithm for numeric scal-
ing levels in class. However, in the optimal scaling approach it is done a little differently. Namely, the numerical variables are interpreted as categorical variables with many variables: each uniquely observed value is interpreted
as a separate category and the quantifications of each observation are then
represented by indicator matrix G; and quantifications vector v;.
Provided code
For this assignment, we provide the R markdown file HA1.Rmd which you
will extend with your own code (fill in the gaps) and the application to the
data set. Some instructions:
e The function should work on any dataset, i.e. not only on the Medical
Costs dataset. You will only use that for testing.
e The scaling levels are indicated per predictor variable via the argument
scalinglevels, which is a vector with a length equal to the number
of predictor variables.
e Start with writing the subfunctions that are used in the main function,
then continue with the main function. You may use (and adjust) the
indicator and autoscale functions that you (or we) made during the
exercises.
e The starting values for 8. v; and the loss are given. The vj's for
the nominal scaling levels are initialised as vectors of zeros and will
be updated in iterations. For the numeric scaling level, the v;'s are
defined as the unique values of standardized variable x; so that they
satisfy the numeric scaling restrictions.
e In the algorithm, update 3; and v; (if applicable) for each predictor
variable 7 separately, as was explained in the lectures.
Function specifications
Your function should give the following output while running:
e the iteration number
e the current loss
e the decrease in loss
Furthermore, your function should return as final output a list object with
e the regression coefficients
e the category quantifications of each predictor and their original val-
ues /labels
e the regression sum-of-squares
e total sum-of-squares
® apparent prediction error
Test your function
Check your algorithm by applying your function to a selection of variables
in the “MedicalCosts.sav” dataset. In HA1.Rmd code is given to load the
data. Use the variable charges as outcome, and the following variables as
predictors (scaling level is indicated):
® age: numeric scaling level
e sex: numeric scaling level (dichotomous)
e bmi: numeric scaling level
e region: nominal scaling level
Use as value for crititer: 0.000000001 (the default). Do not worry if your
function already converges after a few iterations. Show the requested output
of your function.
Compare in your report the results obtained with your function to the results of categorical regression using SPSS. You can, for example, compare the values of the regression coefficients, the category quantifications, and the apparent prediction error.
Instructions for R
Make sure that:
e you include enough comments in your code, so we can understand it;
e the data set that you use is called exactly “MedicalCosts.sav”. Be
careful, do not use “MedicalCosts(1).sav”. Also, do not add a path to
the data set (this is not needed when the data set is in the same folder
as your R markdown file).
e You may only use the R-packages foreign and/or haven to open the dataset into your R environment. You may not use other R-packages in your code. This is to ensure that the instructors can knit the .Rmd-file easily and without the need to install packages on their computers.
e Preferably knit the .Rmd file to a pdf document (this is the default). If
that does not work for you, you may knit the file to an htm! document.