Homework answers / question archive / CLC - Calibration and External Validation Literature Review Step 1: undefined Review the article, "Validate to Bring Out the Real Value of Visual Analytics," (https://www

CLC - Calibration and External Validation Literature Review Step 1: undefined Review the article, "Validate to Bring Out the Real Value of Visual Analytics," (https://www

Business

Share With

CLC - Calibration and External Validation Literature Review

Step 1:

undefined

Review the article, "Validate to Bring Out the Real Value of Visual Analytics," (https://www.infosysblogs.com/testing-services/2016/04/validate_to_bring_out_the_real.html), for an in-depth understanding of expert validation, predictive validation, external validation, and cross validation.

Step 2:

Conduct a literature review of similar research to compare the model that you completed in Topic 5.

Create a draft outline with the following items in your literature review.

Complete an external and cross validation.
Explain if your validation method is still sufficient and discuss if the model results are consistent with theories in your field.
What are the next steps for your model? Be specific.
Is there a need for a model revision? If so, describe what shortcomings you encountered. If not, describe why.
What future recommendations would you make to your model if you had another opportunity? What would you do differently?
You are required to include at least three scholarly peer-reviewed sources.

Step 3:

Synthesize the information from the draft outline to complete, the relevant components of the External Model Verification and Calibration section of the "Capstone Project Thesis Template." This should be 750-1,000 words.

Needed for Submission Requirements:

Draft Outline

Updated "Capstone Project Thesis Template." - External Model Verification and Calibration

Step 4: Reflection Questions (separate from above)

Due: 5/15/2021

Question 1 - Before adopting and implementing your prediction model, the generalizability of the model needs to be assessed by an external validation. Discuss the strength of your model and issues you might encounter in the external validation process.

Due 5/17/2021

Question 2 - What changes to your model would you entertain if all the external literature concludes diametrically opposed recommendations from your modeling effort?

Running head: TITLE OF PAPER 1 Project Title MIS-690 Capstone Project Thesis Submitted to Grand Canyon University Graduate Faculty of the Colangelo College of Business in Partial Fulfillment of the Requirements for the Degree of by Approved by: Professor Date TITLE OF PAPER 2 Abstract Beginning with the next line, write a concise summary of the key points of your research. (Do not indent.) Your abstract should contain at least your research topic, research questions, participants, methods, results, data analysis, and conclusions. You may also include possible implications of your research and future work you see connected with your findings. Your abstract should be a single paragraph, double-spaced. Your abstract should typically be no more than 250 words. Keywords: List keywords from your paper in your abstract. To do this, indent as you would if you were starting a new paragraph, type Keywords: and then list your keywords. Listing your keywords will help researchers find your work in databases. TITLE OF PAPER 3 Table of Contents List of Tables .......................................................................................................... 4 List of Figures ......................................................................................................... 5 Business Problem Identification ............................................................................. 6 Background ....................................................................................................... 7 Business Problem Statement ............................................................................. 7 Analytics Assumptions ..................................................................................... 9 Analytics Problem Statement .......................................................................... 10 Data Understanding, Acquisition, and Preprocessing........................................... 10 Collection of Initial Data ................................................................................ 10 Description of Data ......................................................................................... 11 Exploration of Data ......................................................................................... 11 Verify Data Quality......................................................................................... 13 Data Diagnostics and Descriptive Summary ........................................................ 13 Exploratory Data Analysis ............................................................................ 144 Trends Analysis .............................................................................................. 19 Simpson Paradox ............................................................................................ 20 Descriptive Analytics ...................................................................................... 21 Conclusion ...................................................................................................... 23 Methodology Approach and Model Building ....................................................... 23 Modeling Methods .......................................................................................... 24 Test Design ..................................................................................................... 25 Model Building ............................................................................................... 25 Conclusion ...................................................................................................... 25 Model Evaluation .................................................................................................. 26 Evaluation Process Justification ..................................................................... 26 Validation Results ........................................................................................... 27 Conclusion ...................................................................................................... 29 External Model Verification and Calibration ....................................................... 30 Literature Review............................................................................................ 30 Calibration....................................................................................................... 30 Future Recommendations ............................................................................... 30 Model Deployment and Model Life Cycle ........................................................... 30 Deployment Cost ............................................................................................ 30 Schedule, Training, and Risk .......................................................................... 30 Benefits ........................................................................................................... 30 Recommendations ........................................................................................... 30 Conclusions ..................................................................................................... 31 References ............................................................................................................. 15 Appendix A: Data Set ........................................................................................... 34 TITLE OF PAPER 4 List of Tables Table 1 Title of Table ....................................................................................................... 13 TITLE OF PAPER 5 List of Figures Figure 1. Title of figure..................................................................................................... 12 TITLE OF PAPER 6 Business Problem Identification In the United States, the masses have come to a consensus that the health care financing system has failed in service delivery. The health care costs have been on the rise and has been unable to accommodate the growing uninsured population. Most Americans agree that the government should provide medical insurance cover to the larger part of the American population. The political focus is on whether the government should supplement the private insurance system with a larger public program that will cover those who cannot afford private insurance coverage. Also, the political class debates on whether to replace private insurance with a universal government health insurance program which would cover the entire population. A large part of the population in the United States have a health insurance cover. Medicare covers the entire population of and the program pays for half of the medical care costs. The other half, is covered by other private insurance, welfare- based programs that cater to those who are qualified for assistant by the federal government. Quality life is not mostly measured by having materialistic things that people see, thus health care insurance is an important part of every individual life as it covers them from the expensive costs that comes with most illnesses. Like most insurance covers, health care covers also experiences challenges that costs millions of dollars of loses to insurance companies, individuals. This can happen through healthcare fraud. Health fraud is where an individual or insurance company defrauds an insurer or a public health care program. Other examples of fraud include drug and medical fraud. Health care fraud has ruined the integrity of the health care system as people suffer from various illnesses they can’t be sure whether they will be taken advantage of. TITLE OF PAPER 7 Patients, contractors and medical professionals also participates in health care fraud. Patients participate in this fraudulent act by acquiring and providing false medical certificates, evading medical charges and also prescription fraud. When contractors participate in corruption and insurance fraud, also billing free community and home based healthcare service. Medical professionals defraud people through overpriced consultations, facility charges, medical tests and medicines. Background Medicare is a public health insurance program, was founded in 1966 under the umbrella of Social Security Administration (S.S.A) in America. Centers for Medicare and Medicaid Services (CMS) currently administered Medicare. It mainly offers medical insurance for senior citizens together with disabled youths. It also covers patients with amyotrophic and end stage renal lateral sclerosis. Initially, Medicare was a program that offered health care to families of military personnel that was a part of the Dependents' Medical Care Act in 1956. In July 1965 Medicare was enacted under the Social Security Act to provide medical cover to senior citizens and it did not consider income or medical history. Before the start of Medicare, 60% of the elderly people had a medical insurance, which was often unavailable or unaffordable to many others. This was the case because older adults paid three times more compared to their younger counterparts for health insurance. Since its operation Medicare has undergone several changes with its provisions expanding to include physical, speech, and chiropractic therapy for its members. They also included the option of a hospice to aid the elderly temporarily. Fraudsters have been in existences before the formation of insurance companies with false claims being done by TITLE OF PAPER 8 government officials towards commoners. This is seen with qui tam law in England and Lincoln law which came later on (Whistleblowing Protection, 2021). Health fraud influence health care systems as it affects access to medical care, medicines and general services. With the false claim act, Medicare has seen a fair share of fraudulent activities from hospitals and individuals. Hospitals have defrauded Medicare of billions of cash, with kickbacks and overpriced medicines Health fraud affects everyone. Stakeholders that can be interviewed are health workers, patients and insurance companies. Also, Healthcare Fraud Prevention Partnership (HFPP) which is a public private partnership that’s between the national government, anti-fraud organizations, local government agencies, private health insurance, law enforcement, employer organizations, and state. This group will help in accessing more data on the effects of health fraud towards the organization. Healthcare insurers and companies need data analytics to identify loses and fraud across all networks. A predictive analytics model strategy helps insurers determine which insurance and health providers have a history of fraudulent activities and those who have behaviors likely indicate increased fraud risks. Business Problem Statement Fraud in general is already a menace in the society, with this health care is also important but plagued with this menace for defrauding of individuals and insurance companies. This leads to mistrust and loses of billions of dollars annually through health care fraud. Therefore, it begs to question, what can be done to solve health care fraud and mitigate the loses made in the health care system? TITLE OF PAPER 9 Analytics Assumptions Data analytics can prevent fraud and with the right tools, data provided should be a basis for predicting the occurrence of fraud. Loses are made when it comes to health care fraud. Fraud costs the global healthcare sector $487 billion per year and an analytic based solution will free $195billion yearly 1. Sampling - this effective when it involves data of a large population and when it comes to fraud detection it might not be as effective as its counterparts, as it examines small parts of data. 2. Ad-Hoc - a hypothesis is used to test transactions in this method, which determines whether there are opportunities for fraudulent activities. The results will determine if the events need to be investigated further (TechAhead, 2021). 3. Repetitive Analysis - this involves the writing of command scripts that will go through large volumes of data (TechAhead, 2021). The point of doing this is to identify fraudulent activities that might occur after some time. It is set to provide periodic information on fraud, due to its repetitive nature, thereby making the process effective and constant. 4. Analytics Techniques – this is focused on identifying the irregularity to detect fraud. It includes discovering numbers that exceed the average standard deviation. If the analysis has high or low values to detect irregularity, it means that fraud might happen. This method also groups’ data based on either geographical area of events showing its specific criteria. Health care fraud prevention is essential in accessing better health care. The scope of the project is to help set preventive measures and models against health care fraud. This will also address the security issues with data analytics. TITLE OF PAPER 10 Medicaid together with the HFPP provides data that will be necessary for the project. Also, case studies will offer more information of financial effects of health care fraud over the years. Analytics Problem Statement To come up with solutions best suited for solving health care fraud and avoid losses that happen annually, analytic based solution should be put in place. Security will also become an issue while using analytics, thus what are measures suitable to prevent health care fraud and address security issues associated with new technologies? Data Understanding, Acquisition, and Preprocessing The business problem as identified in topic I of the thesis template involves the fraudulent United States health sector and the security involved with the data analysis. Data analysis is used to reduce the losses incurred in the medical sector and thus a proper security is profound for such. The data source is from the US data and the samples used are the cost of health for the incurred by the nation and individuals as well as identified from code SI.POV.LMIC showing the poverty levels as a result of medical costs. Other samples taken are the number of patients using medical insurance, the amount of USD lost as a result of health care frauds and the cybercrimes rates involved with the data analysis. The sample data is thus useful to identify the business problem. Collection of Initial Data The dataset whose samples are generated was collected by the world collection indicators and was last updated in March 2021. The dataset involves collection of initial data from the correspondents, conducting interviews and administration of questionnaires. The dataset indicates the indicator name, the indicator code and the time period which the data was collected TITLE OF PAPER 11 (since 1960). The data is missing for some years thus a need for sampling data based on consecutive years. Description of Data The continuous variables are given as the cost of health annually in billions for the past 10 years and also the cost of data analysis frauds for the past 4 years. The continuous variables express extend in which fraud has affected the medical sector. Exploration of Data There are many other variables in the data set that including the number of patients using medical insurance, amount of money lost due to the health fraudulent activities which is a discrete variable in this case, the total number of patients suffered from the health fraud and the level of security given to the data analysis by the government. In order to reduce the health care fraud data analysis ought to give descriptive analysis on the service providers who include doctors and the personnel in the health ministry. Insurance subscribers such as the patient’s employers and the patients themselves needs to be monitored for descriptive data. Insurance carriers who are the insurance companies and personnel are also effective for data analysis in the problem reduction. Descriptive analysis is then conducted from the data. Using a line graph analysis it shows that the costs incurred in the health sector by patients and the government is exponentially in the rise. This explains the need to properly examine the service providers who might be the cause of such medical costs rise. The data description is given below. For the period 2008 to 2018 the health costs per year are on the rise with the initial cost being $7897 to $11172 ten years later. TITLE OF PAPER 12 Figure 1. Graph on Healthcare Cost Per Year. The number of patients going for the medical insurance are also on the rise and thus those not going for medical insurance continue to reduce, this due to the increased health care costs. This suggests a possible loophole being benefited by the insurance carriers. In the long run, the amount of money lost due to frauds resulting from the analysis of data increase. TITLE OF PAPER 13 This indicates the rise in the security issues and thus the need for proper measures to curb the menace. Major solutions for the problem involves setting very high security to the data analysis devices such as one time passwords and using biometric and protection of data analysts from frauds. The government should closely monitor the data analysis work and setting up a special data analysis committee that involves highly trained data personnel who are cautious about improper data exposure. Setting up a law enactment act that protect the data against cybercrimes and frauds is a proper solution and any data analysis fraud should be dealt accordingly. Verify Data Quality There are proper data tables that were conducted in the data analysis. The data is of a very high quality and thus the solutions to the problem are at optimum. The table showing the data fraud costs from 2013 to 2016 is as shown below. Table 1 Cost of Frauds Data Diagnostics and Descriptive Summary The Capstone project Thesis Template identified the business analytic problem affecting the United States health sector. The analytic problem identified is the increased fraudulent activities. This are as a result of the cost of medication becoming expensive. Patients therefore TITLE OF PAPER 14 seek medical insurance and thus the number of patients covered by insurance rise steadily. This leads to a lot of transaction within the health sector that involves the service providers and the insurance carries. Many of them being doctors and patients. This calls for data analysis in order to minimize the analytic problem. Insecurity of the data analysis also adds to the problem. The dataset used for the data analytics is attached to this document as an excel file. The dataset has many variables. The variables range from discrete to continuous. For this analytic problem a number of related variables were sampled from the dataset. The variables include the health expenditure by the US government, the individual annual cost of health, the cost of insurance activities which is a subset of medical insurance, the number of data analysis articles generated per annum and the value lost due to frauds and vandalism resulting from the data analysis. All these variables are closely related to our data analytic problem and were collected from same sampling population. Exploratory Data Analysis To analyze and summarize the samples, a number of data analysis techniques are done. Exploratory data analysis that involves generating statistical models using quantitative methods like line and bar graphs and conducting relevant tests of hypotheses involving the variable of study and providing relevant conclusions. The best data visualization is shown below. TITLE OF PAPER 15 From the health care cost variable, a clustered bar graph was generated. The bar graph shows the individual cost of health per year from 2008 to 2018. The least cost is registered in 2008 and the maximum at 2018. The cost is seen to have progressively risen and the highest costs recorded in the latter years. The increase costs per head would prompt the total government health care expenditure. TITLE OF PAPER 16 Cost of health per individual are directly proportional to the expenditure per capita. The latter variable was monitored from the dataset since 2000 to 2020 with the annual expenditure rising in recent years. The rising costs of medication triggers the number of patients going for medical insurance. The insurance services variable is plotted on a line graph with the y axis showing the period against the x axis showing the percentage of the commercial services as shown below. Data was monitored from the year 2000 to 2019. The slackened line graph is symmetrical with the highest data recorded in the year 2009. This indicates the number of transactions happening between the insurance providers and the health departments. The high money and bills transfer may create loopholes for data frauds. The fourth variable of study is the data analysis services. These services involve all activities in the line of data analysis in order to determine optimal results. These variables are plotted in a clustered bar graph from the year 2000 to 2018. TITLE OF PAPER 17 The annual reports show the bars for the number of articles of science and technical journals recorded within a year. The longest bar is from the year 2014. The shortest bar is from the year 2000. Longest bars are concentrated at the top indicating the increased number of data analysis articles over the years. The fifth variable is the value lost due to fraud and vandalism. As a result of rising data analysis reports and articles, the security of data becomes wanting. The increase in number of articles makes it easy to access the data analysis for a business analytic problem by “frauders”. The data on the amount lost due to fraudulent activities is plotted in a pie chart with the explanatory variables being the percentage of total annual revenue. TITLE OF PAPER 18 The data was collected from 4 consecutive years (2013-2016). The area of the pie charts is distinct for every year. The largest area shows the most fraudulent year that is 2016 while the smallest area indicating the least fraudulent year that is 2013. The intensity of frauds and vandalism is directly proportional to the number of data analysis articles reported in a year. A large data analysis scope provides room for ineffective loopholes for selfish greed by the health practitioners or the medical insurance personnel and those wanting to exploit the data analysis reports. Hypothesis tests involving two variables that is the health expenditure and the number of data analysis was conducted. The null hypothesis (H0): the health expenditure and the number of data analysis increase against the alternative hypothesis (H1): the health expenditure and the number of data analysis does not increase. TITLE OF PAPER 19 The t critical value is 1.859548 which is less than the t star value that is 2.306004. Therefore, we fail to reject the null hypothesis and conclude that the health expenditure and the number of data analysis increase. The alternative hypothesis is thus declared false. Trends Analysis Trends analysis gives investors and the government a glimpse of the trend of data in order to predict future costs, transactions and revenue allocation schemes. The data trends occasionally involve the uptrend, the downtrend and the horizontal trends of data. Researchers must thus provide effective trends for optimum results. From the dataset, the bar graph for the healthcare cost indicates a progressive increase in the medical costs. This affects the economy of the US and thus shows a downtrend. The second variable that is the health care expenditure also shows a downtrend in the economy. The insurance and financial services variable are plotted on a line graph. The data is seen to be symmetrical and involves a series of up and downs making the data unpredictable. This indicates a horizontal trend. This makes researchers not able to effectively predict future trends. The frauds data sample was plotted in a line graph and shows a progressive increase in the TITLE OF PAPER 20 fraudulent activities. This shows a downtrend in the economy. Researchers must therefore be wary of the data trends. Simpson Paradox Moreover, Simpson’s paradox results when a dataset is disaggregated to give a number of data samples. This leads to imbalance between one or more variables of study. When Simpson’s paradox exist, non-optimum solutions are generated which would not address the analytic problem. The dataset therefore, needs to be Simpson’s paradox free for effectiveness and efficiency. To carry out a Simpson’s paradox test between two variables we are going to build regression models and comparing the signs of the coefficients. For our test, we would test the insurance and financial services variable and the health cost per head. The coefficients are both positive that is 8.290816 and 379.1281. The data variables are said to lack Simpson’s paradox thus efficient. TITLE OF PAPER 21 Descriptive Analytics To mitigate the erroneous data samples and results, descriptive analysis comes in handy. It involves conducting relationships involving individual variables for instance the annual cost of health or between combinations of variables for example the annual costs of health for individuals and the health expenditure for the whole population. Descriptive analysis brings with it the relevance of grouping data samples into two categories: quantitative variables or qualitative variables. A good number of our variables for study are quantitative due to their involvement in amount, time period and percentages. Data analysis techniques for instance drawing histograms, scatter plots and box-and-whisker plots. If the data is skewed or balanced we conclude by generation of mean and standard deviations. In the case when the quantitative variable is skewed, either positively or negatively, the median value and the quartiles and the extremes (maximum and minimum) are calculated. In cases of comparing more than one variable we shall use scatter plots. The variable cost of health is a quantitative variable. A histogram of the data is plotted. TITLE OF PAPER 22 The data is positively skewed since a lot of data is distributed in the left. The median of the data is $9121. The lower quartile is 8528 while the upper quartile is 1643. The data upper limit is 12635.5 and the lower limit is 6063.5. The data has no outlier. A box and whisker plot are done for the data analysis articles which is also a quantitative variable. The box and whisker plot show that the data negatively skewed or skewed to the left. We have to get the median value, the upper and lower quartile, the inter quartile range and the upper and lower limit to determine any outlier in the data. TITLE OF PAPER 23 The lower quartile is 369213.2 and the upper quartile is 427630.7. The interquartile range is the difference between the upper and lower quartile and is given by 58417.5. The upper limit is calculated by the sum of the upper quartile and 1.5 times the inter quartile range and is given as 515257. The lower limit is 281587 and calculated by the difference between the lower quartile and 1.5 times the inter quartile range. The data has no outlier since all data values are within the closed interval of the lower limit and the upper limit. Conclusion The study required determination of optimal solutions to curb the data analytic problem identified in the health sector. A proper dataset was chosen and several variables were studied. The variables are related thus effective. On conducting exploratory data analysis, we found the tests of hypotheses that tested the relevance of the variables. Trend analysis was also vital in showing economic trends in several occasions. We evaluated the data for Simpson’s paradox and found that there is no Simpson’s paradox in the data. A descriptive analysis involved using data description tools in testing if there were any outlier in the data. Thus, the data is effective and efficient for optimal data analytic solutions. Methodology Approach and Model Building Methodology Approach and Model Building There are three general parts to the approach of building an analytic model: the strategy, the technique to implement that strategy, and the decision criteria used within the technique (Grace-Martin, 2018). These choices depend on a list of things that are including but not limited to: - Your research questions—what information about the variables are you trying to glean from the model? TITLE OF PAPER - 24 Which specific type of model you’re running—ANOVA, logistic regression, linear mixed model, etc? - Issues in the data—how many predictors do you have and how related are they? - Purpose for the model—is it purely predictive or are you testing specific effects? Utilizing the information from previous topics, I was able to build a simple regression model to solve the identified problem stated in the business analytic section, which was the problems affecting the United States health sector. The analytic problem identified is the increased fraudulent activities. Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables (Corporate Finance Institute, 2020). It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them (Corporate Finance Institute, 2020). Modeling Methods There are many methods of building models. The approaches vary depending on the research design adopted and the nature of the data. For the purpose of this research, we consider the various model building approaches that are suitable for quantitative descriptive research techniques. The first approach is the step up technique, this where we start with an empty model and progressively add the variables to it until a desired a significant model is attained. This approach is an equivalent of forward selection method of model building. The second approach is the top down method where we start with a full model and eliminate predictors that aren’t helping the model. This however requires that we make use of hypothesis that are clearly defined so as to be able to remove variables that are not significant. This study aims at adoption of the top down method since the business problem is clearly TITLE OF PAPER 25 identified and any variable that does not help in answering to the business problem will have to be eliminated. Moreover, the study uses a linear regression approach and applies the ordinary least squares as the estimation technique. Other existing techniques include non linear regression, general least squares, exponential smoothing and logistic regression. Test Design This qualitative approach is necessary in determination of whether the model selected is suitable and that the variable’s included fit the model. It is necessary in determination of the relevance of the coefficients and in reaching the necessary decisions. The commonly used test designs are: equivalent partitioning, boundary value Analysis and decision table testing. The last two approaches are significant in this study as decision tables will be used and boundary level analysis. The significance of the variables is tested at 95% confidence level implying that the level of significance adopted in this study is the 5% level of significance. The tests will involve student T statistical distribution and Fisher (F) distribution in test designs. The level of significance used is 0.05. Model Building The model is of the form : Y=f(xi) Where Y is the dependent variable and Xi the list of independent variables. The model will be of the structure: Y=β0+ β1X1+β2X2+ β3X3…. The β is are the regression coefficients to be estimated. Conclusion TITLE OF PAPER 26 Is the Regression Model supporting the Analytics Problem Statement? The methodology employed to analyze data in this study is one that answers to the business problem identified and provides the best analytics tools. The data collected from World Bank helped prepare this analysis. The study uses a descriptive research design and the model selected was a Simple Linear Regression Model. The model is significant as its analysis will provide a solution to the business problem. The level of significance for testing the significance of variables should be 5%. If not, variables have to be re-examined and removed if necessary. Model Evaluation The Capstone project Thesis Template identified the business analytic problem affecting the United States health sector. The analytic problem identified is the increased fraudulent activities. This are as a result of the cost of medication becoming expensive. Patients therefore seek medical insurance and thus the number of patients covered by insurance rise steadily. This leads to a lot of transaction within the health sector that involves the service providers and the insurance carries. This calls for data analysis in order to minimize the analytic problem. From the draft on the model evaluation an evaluation model was selected and the validation of the same given. Evaluation Process Justification The best approaches to use in the data modelling evaluation are the step-up technique and the top down method. The approach chosen is the top down method where we start with a full model consisting of a whole dataset and eliminate variables that are not useful to the model. Variables that were eliminated include the total number of patients affected by the health frauds. The evaluation model used is the simple linear regression model consisting of a study variable (y) and several explanatory variables (xi’s). The slope parameter (B1) and the intercept TITLE OF PAPER 27 term (B0) are also stated. The model was selected since it addresses the business analytic problem evident by the correlation of the independent variables used. The y variable is taken as the health care costs and the other four variables considered as explanatory variables. The model is in the form of y=B0+B1x1+B2x2+B3x3+…. Validation Results The model could be validated using the ordinary least squares method which gives the estimates (OLSE) that are denoted by b0 and b1 to estimate B0 and B1 respectively. The olse b0 is given by b0=ybar-b1*xbar and the b1 is b1=Sxx/Sxy where Sxx shows the summation squared of the deviation of xi and xbar and Sxy the summation of the product of the deviation of xi from xbar and the deviation of yi from the ybar. The least squares estimators give crucial properties of the estimators as being unbiassed, consistent and sufficient. Since the estimators and the model parameters are all linearly independent to the study variables the model is thus validated. The model is also validated using Hypotheses tests. Here we conduct a paired two samples for means t-test to test the Null hypothesis (H0) against the alternative hypothesis (H1). The Null hypothesis is variable 1 and variable 2 are related against the alternative hypothesis that is variable 1 and variable 2 are not related. From the model we conduct a hypothesis test involving two variables that is the health expenditure and the number of data analysis was conducted. The null hypothesis (H0): the health expenditure and the number of data analysis increase against the alternative hypothesis (H1): the health expenditure and the number of data analysis does not increase. If the t star is less than the t critical value then we fail to reject the null hypothesis and thus conclude that the two variables are related and are thus considered in modelling. TITLE OF PAPER 28 From the above hypothesis test it is evident that the t critical one-tail is lesser than the t critical two tail test thus the H0 is true and the validation of the variables is complete. The validation requires the use of regression techniques in the evaluation of the model. The data regression analysis is a summary of the t statistic, the F statistic and the analysis of variance. In the ANOVA, total sum due to residuals are computed, the total sums due to regression and the total sum of squares. The total sum is a summation of the residual and regression. The test is conducted at 95% confidence intervals with a 5% significance level with n-1 degrees of freedom. The F statistic is given by MSE/MSreg where the MSE shows the mean squares due to error and the MSreg indicates the mean squares due to regression. We fail to reject F when F0 is less than the F significant. The R squared helps in determination of the coefficient of determination and shows how much the deviation of the residuals sum from the total. The model is then tested for heteroscedasticity which is a problem that occurs when a model is not correlated to its variables. TITLE OF PAPER 29 The above output indicates data on coefficients that are determined from the dataset’s variables. The coefficients are the intercept term and the other four variables. The model is then generated using the linear regression formula of y=B0+B1x1+B2x2+B3x3+ B4x4 where B0 is the intercept that is 10632.02, B1 is -5.52, B2 is 0.0056, B3 is 0.0121 and B4 is given by 0.13453. The linear regression model is thus given as y=10632.02 – 5.62x1 + 0.0056x2 + 0.012x3 – 0.1353x4. Conclusion In conclusion, the problem statement required the generation of optimal solutions for the business problem in the US health sector and thus, upon proper collection of the dataset, selection of the variables and making data analytic visual representation, an evaluation model was generated using the ordinary least squares, tests of hypotheses and the regression modelling and as a result the evaluation gives optimal solutions for the business problem statement. Therefore, the model supports the problem statement. TITLE OF PAPER 30 External Model Verification and Calibration Review the article, “Validate to Bring Out the Real Value of Visual Analytics," provided in the study materials, for an in-depth understanding of expert validation, predictive validation, external validation, and cross validation. Conduct a literature review of similar research to compare the model that you completed in Topic 5. Literature Review Existing literature for qualitative validation… Calibration Model revisions Future Recommendations Model Deployment and Model Life Cycle Review and reference the Topic 7 study materials related to models and cost and benefit analysis when addressing Model Deployment and Model Life Cycle questions here. Deployment Cost Schedule, Training, and Risk The first potential limitation is that some of the… Benefits Recommendations Recommendations for practice. The responses to the research question revealed… TITLE OF PAPER 31 The subheading (third level heading) above (and below) is placed at the beginning of the paragraph, bolded, in regular sentence case, and followed by a period. The two required third level headings of the Capstone Project Thesis are already provided in this template. These subheadings are not set to populate in the Table of Contents. Recommendations for future research. Based on the findings from this study and current literature on the topic, the first recommendation for future research… Conclusions This quantitative study addressed the problem that … TITLE OF PAPER 32 References Bulmer, M. G. (1979). Principles of Statistics (Dover Books on Mathematics) (3rd ed.). Dover Publications. Corporate Finance Institute. (2020, February 24). Regression Analysis. https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/ Engel, T. (2018, August ). Big Data and Business Analytic Concepts: A Literature Review. Research Gate Grace-Martin, K. (2018, January 5). Model Building Strategies: Step Up and Top Down. The Analysis Factor. https://www.theanalysisfactor.com/model-building-strategies/ Healthcare Fraud Prevention Partnership (HFPP). (2021). Retrieved from https://www.cms.gov/hfpp TechAhead. (2021, February 5). Use Data Analytics for Fraud Prevention & Detection. https://www.techaheadcorp.com/blog/data-analytics-fraud-prevention/ Turkey, John (1977). Exploratory Data Analysis. Addison-Wesley: Behavioral Science: Quantitative Methods. http://www.ru.ac.bd/wpcontent/uploads/sites/25/2019/03/102_05_01_Tukey-Exploratory-Data-Analysis1977.pdf Vasarhelyi, Q. L. (n.d.). Healthcare fraud detection: A survey and a clustering model incorporating Geo-location information. Rutgers University Whistleblowing Protection. (2021). Qui Tam : A defination A history. Retrieved from whistle blowing protection: http://www.whistleblowingprotection.org/?q=node/69 World Bank Open Data | Data. (n.d.). The World Bank. https://data.worldbank.org/ TITLE OF PAPER 33 World Development Indicators (WDI) | Data Catalog. (2021, March 19). The World Bank. https://datacatalog.worldbank.org/dataset/world-development-indicators TITLE OF PAPER Appendix A: Data Set Cha Nesha Griffin_MIS 690_Data File.xlsb 34

CLC - Calibration and External Validation Literature Review Step 1: undefined Review the article, "Validate to Bring Out the Real Value of Visual Analytics," (https://www

Business

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions

menu

CLC - Calibration and External Validation Literature Review Step 1: undefined Review the article, "Validate to Bring Out the Real Value of Visual Analytics," (https://www

Business

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions