Please follow the example attached and follow it to the letter.

final_report_sample.pdf

Unformatted Attachment Preview

EFFECTS OF SMOKING ON NON-ACCIDENTAL DEATH RATES

EC 315 – Quantitative Research Methods

Russ Miller

Fall II 2006

i

TABLE OF CONTENTS

BACKGROUND …………………………………………………………………………………………………………..1

REGRESSION ANALYSIS ……………………………………………………………………………………………2

CONCLUSIONS ……………………………………………………………………………………………………………4

BIBLIOGRAPHY ………………………………………………………………………………………………………….5

APPENDIX ……………………………………………………………………………………………………………………6

1

I. Background

It is widely accepted that the use of tobacco presents serious health risks. In fact, the Center

for disease control and prevention states that “Tobacco use, including cigarette smoking, cigar

smoking, and smokeless tobacco use, is the single leading preventable cause of death in the

United States” (Center for Disease Control and Prevention, 2006). The purpose of this analysis

is to determine the effects of tobacco use (SMOKE) on the non-accidental death rate (DEATH)

while holding the effects of alcohol consumption (ALCOHOL), drug use (DRUG), and health

insurance (INSUR) constant. This study will use cross-sectional data from the 50 states for the

2002-2003 combined time period. The model (less constant and coefficients) is:

DEATH = SMOKE + ALCOHOL +DRUG – INSUR

The dependent variable, DEATH, is defined as the death rate per 100,000 population for

major causes of death in the United States, excluding non-health related causes such as

automobile accidents, homicide, etc., and is extracted from the National Vital Statistics Reports

(2006).

Data for SMOKE, ALCOHOL, and DRUG were taken from the census bureau’s Statistical

Abstract (2006) and are based on results from the National Household Survey on Drug Use and

Health (NSDUH). SMOKE is defined as the number of people over 12 years of age (in

thousands) who had smoked a cigarette at least once in the month prior to the study. ALCOHOL

and DRUG are similarly defined with ALCOHOL representing binge drinking, and DRUG

representing the use of any illicit drug. These three variables were selected because the CDC has

stated in the Morbidity and Mortality Weekly Report that tobacco, alcohol, and other drug use is

associated with the leading causes of morbidity and mortality…” (Center for Disease Control

and Prevention, 1992). The relationships between DEATH and SMOKE, ALCOHOL, and

2

DRUG should all be positive, since the use of tobacco, drugs and alcohol are all bad for your

health.

The independent variable INSUR is defined as the number of people (in thousands) not

having insurance, and the data for this variable was also extracted from the census bureau’s

Statistical Abstract (2006). This variable was selected because an MIT Sloan study indicated

that automobile accident victims without health insurance were more likely to die than their

insured counterparts because of differences in the medical treatment received (MIT Sloan

Management [MIT], 2003). Although this study deals with non-accidental deaths as opposed to

automobile accident victims, it is assumed that the implied lower standard of care for uninsured

patients may occur for other causes of death as well. The relationship between DEATH and

INSUR should be positive since not having insurance has been linked to lower quality health

care.

II. Regression Analysis

The model was regressed and the results are shown in the Table 1.

Table 1. Original Regression Results

Dependent Variable: DEATH

Independent Variables

Coefficients

SMOKE

25.4186

ALCOHOL

-5.9164

DRUG

37.3578

INSUR

-2.1520

Adjusted R2 = 0.9802

t Statistic

7.1866

-1.0634

4.1175

-1.1743

n = 51

P-Value

0.0000

0.2932

0.0002

0.2463

You should now discuss these results…is your R2 good, bad, etc? What percentage of the

variation is explained by the regression? Are the coefficients (signs) as you expected?

Which variables are statistically significant?

3

Next, test for multicollinearity. Insert the correlation matrix and comment on the results,

and then compute the Variance Inflation Factors.

Table 2. Cross Correlation Matrix

SMOKE

X

Independent Variables

SMOKE

ALCOHOL

DRUG

INSUR

ALCOHOL

DRUG

INSUR

X

X

X

The rule of thumb for the cross correlation is for all coefficients be between -0.7 and

+0.7. Values in the above table outside that range are problematic. Comment on your

specific results.

SMOKE

Variance Inflation Factors

ALCOHOL

DRUG

INSUR

Rule of thumb for VIF is they should be less than 10. Comment on your specific results.

Based on your findings above regarding statistical significance of independent variables

and multicollinearity issues attempt to improve the regression by removing independent

variables if appropriate. Only remove one at a time and see if the regression is better or

worse. Try various combinations as appropriate. If all of your variables are statistically

significant and there are no multicollinearity problems, try lagging (time-series data

only) or logging the model and see if you can improve the regression. Even if you do end

up removing some independent variables you can still try lagging or logging for

additional improvement.

4

After all attempts to improve the regression, compare the “original” and “final”

regressions and discuss the results.

Independent

Variables

SMOKE

ALCOHOL

DRUG

Original Regression

Adjusted R2 = 0.9802

Final Regression

Adjusted R2 = 0.9901

Coefficient

Coefficient

P-Value

Comments

P-Value

INSUR

Explain why you selected the final regression that you did…why is it better? Were there

any trade-offs?

III. Conclusions

List and discuss your final model.

DEATH = -858.13 + 25.42*SMOKE – 5.92*ALCOHOL + 37.36*DRUG – 2.15*INSUR

What type of relationship did you establish between your primary independent variable

and the dependent variable (e.g., strong positive; strong negative, weak positive,

moderate positive, none, etc)? Based on your final regression, quantify the impact of a

change in your primary independent variable (e.g., “The SMOKE coefficient of 25.42

indicates that for every 1,000 additional smokers approximately 25 additional deaths

would occur.”) If your regression was not very good, what are some possible

5

explanations for the poor fit? If your regression is near perfect, why is that?If you were

to research this topic further, what would you change, etc.?

References

Center for Disease Control and Prevention, United States Department of Health and Human

Services. (Last reviewed August 3, 2006). Healthy Youth! Health Topics: Tobacco Use.

Retrieved December 2, 2006 from http://www.cdc.gov/HealthyYouth/tobacco/index.htm

Center for Disease Control and Prevention, United States Department of Health and Human

Services. (1992, September 18). Morbidity and Mortality Weekly Report: Tobacco,

Alcohol, and Other Drug Use Among High School Students – United States, 1991.

Retrieved December 2, 2006 from

http://www.cdc.gov/mmwr/preview/mmwrhtml/00017652.htm

Center for Disease Control and Prevention, United States Department of Health and Human

Services. (2006, April 19). National Vital Statistics Report, Volume 54, Number 13, Table

29 . Retrieved December 2, 2006 from

http://www.cdc.gov/nchs/data/nvsr/nvsr54/nvsr54_13.pdf

MIT Sloan Management News Room Press Releases. (2003, January 22). Uninsured auto crash

victims face 37% higher death rate, says MIT Sloan study. Retrieved December 2, 2006

from http://mitsloan.mit.edu/newsroom/2003-doyle.php

United States Census Bureau. (n.d.). The 2006 Statistical Abstract, Table 195: Estimated Use of

Selected Drugs by State: 2002-2003. Retrieved December 2, 2006 from

http://www.census.gov/compendia/statab/health_nutrition/health_risk_factors/

United States Census Bureau. (n.d.). The 2006 Statistical Abstract, Table 143: Persons With and

Without Health Insurance Coverage By State: 2003. Retrieved December 2, 2006 from

http://www.census.gov/compendia/statab/health_nutrition/health_insurance/

6

Appendix

1. Excel Summary output for original regression:

SUMMARY OUTPUT

NO_INS Res

Residuals

Regression Statistics

Multiple R

0.9908

R Square

0.9818

Adjusted R Square

0.9802

Standard Error

5247.6630

Observations

51

40000

20000

0

-20000 0

2,000

N

ANOVA

df

Regression

Residual

Total

Intercept

NO_INS

TOBACCO

DRUG

ALCOHOL

4

46

50

SS

MS

F

Significance F

68271117105 17067779276 619.790823

2.30853E-39

1266746485 27537967.07

69537863591

Coefficients Standard Error

-858.1337

1134.6000

-2.1520

1.8327

25.4186

3.5370

37.3578

9.0729

-5.9164

5.5638

t Stat

-0.7563

-1.1743

7.1866

4.1175

-1.0634

P-value

0.4533

0.2463

0.0000

0.0002

0.2932

Lower 95%

Upper 95% Lower 95.0% Upper 95.0%

-3141.9628 1425.6954

-3141.9628

1425.6954

-5.8410

1.5369

-5.8410

1.5369

18.2991

32.5382

18.2991

32.5382

19.0950

55.6206

19.0950

55.6206

-17.1156

5.2829

-17.1156

5.2829

2. Excel Summary output for final regression:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.9919

R Square

0.9839

Adjusted R Square

0.9825

Standard Error

0.1417

Observations

51

ANOVA

df

Regression

Residual

Total

Intercept

LTOBACCO

LDRUG

LALCOHOL

LNO_INS

4

46

50

SS

MS

F

Significance F

56.57493941 14.14373485 704.3991372

1.28213E-40

0.923640829 0.020079148

57.49858024

Coefficients Standard Error

3.0154

0.1655

1.0707

0.1363

-0.1750

0.1248

0.1581

0.1398

-0.0298

0.0821

t Stat

18.2249

7.8548

-1.4018

1.1306

-0.3625

P-value

0.0000

0.0000

0.1677

0.2641

0.7186

Lower 95%

Upper 95% Lower 95.0% Upper 95.0%

2.6824

3.3485

2.6824

3.3485

0.7963

1.3451

0.7963

1.3451

-0.4262

0.0763

-0.4262

0.0763

-0.1234

0.4396

-0.1234

0.4396

-0.1951

0.1356

-0.1951

0.1356

…

Purchase answer to see full

attachment