MAST90007: Statistics for Research Workers 2021

This assignment contains three (3) questions worth a total of 20 marks. There is some general advice on the assignment at the end of this document, on page 8. 
The overall requirement for this assignment is to carry out and report on data analytics that address three questions about the data from the Framingham heart study. 
You may know about this study from your general knowledge; it is one of the most famous studies in epidemiology. You can learn about the study from information on Wikipedia (https://en.wikipedia.org/wiki/Framingham_Heart_Study), but also through these references: 
Levy, D., National Heart Lung and Blood Institute., et al. (1999). 50 years of discovery: medical milestones from the National Heart, Lung, and Blood Institute’s Framingham Heart Study. Hackensack, N.J., Center for Bio-Medical Communication Inc. 
Mahmood, S. S., Levy, D., Vasan, R. S., & Wang, T. J. (2014). The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet, 383(9921), 999-1008. 
Oppenheimer, G. M. (2005). Becoming the Framingham study 1947–1950. American Journal of Public Health, 95(4), 602-610. 
You may also find your own useful references. You are not required to read these references for the purposes of the assignment. 
The data file contains some information from long term follow up as well as baseline measures. The file contains records for 5,209 people – all the participants in the original cohort of the study. The participants were followed up every 2 years. The data file includes information from baseline, the 2nd examination (one variable), and the 16th examination (30 years after baseline). 
 
SRW MAST90007 2021 Major assignment 
  
The data file includes: Age at baseline (years) 
Weight at baseline (pounds) 
Sex 
Diastolic blood pressure at baseline (mmHg) 
Systolic blood pressure at baseline (mmHg) 
Serum cholesterol (mg/100ml) examination 2 
Metropolitan Relative Weight at baseline 
Smoker at baseline 
Number cigarettes smoked per day at baseline 
Survived at last examination 

Female / Male 
Serum cholesterol (mg/100ml) at the 2nd examination; this variable has 626 missing values. 
A measure of the percentage of actual weight to desirable weight; a measure very similar to BMI. 
Smoker / Non-smoker 
0 = alive at 16th examination; 1 = died prior to 16th examination 
 
Serum cholesterol (mg/100ml) examination 1 Serum cholesterol (mg/100ml) at baseline; this variable has 2,037 missing values. 
 

      
Height at baseline (inches) 
      
Body Mass Index at baseline (kg/m2) 
  
Serum cholesterol (mg/100ml) baseline Baseline serum cholesterol at examination 1, or, when missing at examination 1, the 
serum cholesterol at the second examination. 
 
Last examination number Number of the last examination that the person participated in. 
  
Cause of death 

0 = still alive

1 = sudden death from coronary heart disease (CHD)

2 = other coronary heart disease

3 = stroke (cerebrovascular accident, CVA) 4 = other cerebral vascular disease

5 = cancer

6 = other causes of death

9 = cause unknown 
 
Examination at which CHD diagnosed, if