Fill This Form To Receive Instant Help

#### - programme ALS alhorithm for numerical scaling regression and test a dataset, without including all possible optimal scaling levels(numerical scaling level is used for outcome variable) -should work on any dataset - scaling levels are indicated per predictor variable - the algorithm updates beta and vector j for each predictor j seperately

###### Computer Science

- programme ALS alhorithm for numerical scaling regression and test a dataset, without including all possible optimal scaling levels(numerical scaling level is used for outcome variable) -should work on any dataset - scaling levels are indicated per predictor variable - the algorithm updates beta and vector j for each predictor j seperately. - the fucntion should give iteration number, current loss, decrease in loss etc see document 'n'. I can provide lectures and solutions of other homework assignment that cal help

Dataset on medical costs

The dataset that we use for this assignment is the “Medical Cost Personal

Dataset” used in the book Machine Learning with R by Brett Lantz. It contains the following variables:

age: age of primary beneficiary

sex: insurance contractor sex: female or male

bmi: Body mass index

children Number of children covered by health insurance.

smoker: Smoking: yes or no

region: the beneficiary's residential area in the US: northeast, southeast,

southwest. northwest.

charges: Individual medical costs billed by health insurance

In this assignment we focus on predicting an individual's medical costs

billed by health insurance (charges) from the other variables.
Part 1: Program your own ALS algorithm

In this first part you will program the ALS algorithm for Optimal Scaling

Regression and test it on the Medical Costs dataset. Your function does

NOT include all possible optimal scaling levels. Instead, a numeric scaling

level is always used for the outcome variable. And for all the predictor vari-

ables, either a numeric or nominal scaling level is used, depending on the

user specification.

Note that we have also programmed the ALS algorithm for numeric scal-

ing levels in class. However, in the optimal scaling approach it is done a little differently. Namely, the numerical variables are interpreted as categorical variables with many variables: each uniquely observed value is interpreted

as a separate category and the quantifications of each observation are then

represented by indicator matrix G; and quantifications vector v;.

Provided code

For this assignment, we provide the R markdown file HA1.Rmd which you

will extend with your own code (fill in the gaps) and the application to the

data set. Some instructions:

e The function should work on any dataset, i.e. not only on the Medical

Costs dataset. You will only use that for testing.

e The scaling levels are indicated per predictor variable via the argument

scalinglevels, which is a vector with a length equal to the number

of predictor variables.

e Start with writing the subfunctions that are used in the main function,

then continue with the main function. You may use (and adjust) the

indicator and autoscale functions that you (or we) made during the

exercises.

e The starting values for 8. v; and the loss are given. The vj's for

the nominal scaling levels are initialised as vectors of zeros and will

be updated in iterations. For the numeric scaling level, the v;'s are

defined as the unique values of standardized variable x; so that they

satisfy the numeric scaling restrictions.

e In the algorithm, update 3; and v; (if applicable) for each predictor

variable 7 separately, as was explained in the lectures.

Function specifications

Your function should give the following output while running:
e the iteration number

e the current loss

e the decrease in loss

Furthermore, your function should return as final output a list object with

e the regression coefficients

e the category quantifications of each predictor and their original val-

ues /labels

e the regression sum-of-squares

e total sum-of-squares

® apparent prediction error

Check your algorithm by applying your function to a selection of variables

in the “MedicalCosts.sav” dataset. In HA1.Rmd code is given to load the

data. Use the variable charges as outcome, and the following variables as

predictors (scaling level is indicated):

® age: numeric scaling level

e sex: numeric scaling level (dichotomous)

e bmi: numeric scaling level

e region: nominal scaling level

Use as value for crititer: 0.000000001 (the default). Do not worry if your

function already converges after a few iterations. Show the requested output

Compare in your report the results obtained with your function to the results of categorical regression using SPSS. You can, for example, compare the values of the regression coefficients, the category quantifications, and the apparent prediction error.

Instructions for R

Make sure that:

e you include enough comments in your code, so we can understand it;
e the data set that you use is called exactly “MedicalCosts.sav”. Be

careful, do not use “MedicalCosts(1).sav”. Also, do not add a path to

the data set (this is not needed when the data set is in the same folder

e You may only use the R-packages foreign and/or haven to open the dataset into your R environment. You may not use other R-packages in your code. This is to ensure that the instructors can knit the .Rmd-file easily and without the need to install packages on their computers.

e Preferably knit the .Rmd file to a pdf document (this is the default). If

that does not work for you, you may knit the file to an htm! document.