Homework 2 BIOS 6111
Popular in Applied Biostatistics I
Popular in BIOS - Biostatistics
This 4 page Class Notes was uploaded by an elite notetaker on Thursday July 28, 2016. The Class Notes belongs to BIOS 6111 at University of Colorado Anschutz Medical Campus taught by Dr. David in Summer 2016. Since its upload, it has received 15 views. For similar materials see Applied Biostatistics I in BIOS - Biostatistics at University of Colorado Anschutz Medical Campus.
Reviews for Homework 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 07/28/16
Bios6601 Homework #2 Instructions: The purpose of this homework is to use the skills we have developed for summarizing proportions and survival data to make descriptive graphs of a dataset and construct a “table 1” of the descriptive statistics. In addition you will write a summary paragraph describing the descriptive statistics. The dataset we will investigate is a modification of the BREAST.DAT from Rosner’s 7 edition text book. This dataset is available in the homework page on canvas or on Rosner’s companion website (http://www.cengagebrain.com/cgi-wadsworth/course_products_wp.pl? fid=M20b&product_isbn_issn=9780538733496&token=). I have provided it as a Stata dataset and a .csv file. The website also has other formats like SAS, SPSS, etc. I also provide some suggested ways to make Stata give you the info that you need. This is at the end of the document with screen shots. Other screen shots are in your notes. Dataset description: 1200 women from the NHS (national health study). The women were identified in 1990 and were postmenopausal and free of any cancer as of 1990. Some were current postmenopausal hormone users (PMH==3) and others had never used PMH (PMH==2). The objective of the study was to investigate the association between PMH use and breast cancer incidence. Thus, breast cancer is the event in our analysis along with time to breast cancer. Follow-up time (FOLUPTM) is the time to breast cancer or the last time the woman provided information. Event is = 1 if the woman developed breast cancer and 0 if not. Other characteristics collected include age (agege55=1 if age >=55; 0 otherwise), Presence/absence of benign breast disease (bbd =1 if yes , 0 otherwise), family history of breast cancer (famhx; 1=yes,0=no), obese (obese; 1=yes,0=no), smoking status (csmk = 1 if current smoker, 0 otherwise; psmk = 1 if past smoker, 0 otherwise; 0 for both variables means that they never smoked). There are other variables in the dataset that we will not use at this time and are not defined here. Submission instructions: To submit your information for grading: 1) please submit your table and figures and summary paragraph to the assignment 3 in canvas and 2) complete the quiz for assessing the concept based questions. Bios6601 Homework #2 Part 1: Table and graph creation 1) Using Stata or another statistical software program fill in the following table including replacing “XXX” with the right value. For an example of the contents of this table, follow this link to Kildreth, Kohrt, and Moreau’s article on oxidative stress: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318646/ Table 1: Descriptive statistics for those that developed breast cancer and those that did not in a sample of XXX women. Characterist Developed BRCA Did not develop BRCA ic N in sample N Proportio 95% CI N Proportio 95% CI N=XXXX n n BRCA (see 54 0.045 1146 0.955 instruction below) Age >=55 8 0.148 288 0.251 (yes) Benign 21 0.389 376 0.328 Breast disease (yes) Family 11 0.204 143 0.123 history (yes) Obese (yes; BMI>=30) PMH user 15 0.278 (current) PMH user 39 0.722 (never) Smoking 2 0.0370 237 0.207 Current Past 25 0.463 464 0.405 Never 27 0.500 445 0.388 BRCA instruction: This variable acts differently than the other variables in this table. N=the number who developed BRCA, the proportion is out Bios6601 Homework #2 of the total sample. This is different than the others. For example, Age>=55 under the developed BRCA columns would be the proportion with Age>=55 in those who developed BRCA. 2) Write a paragraph summarizing this table. It should be no more than ½ of a page in 12-point Ariel or Times New Roman font. We are looking for general communication ideas not that you got every single number interpreted! Do not worry about writing about patterns without statistical evidence—we learn that later. It means it is okay to say things like, in this sample there was a higher proportion of obese women in the group that developed breast cancer (proportion: 95% CI) compared to the group that did not develop breast cancer (proportion: 95% CI). See the Participant Characteristics section of the Hildreth article for an example. 3) Using the barchart command in Stata (or another program) create bar charts of the proportion of age>=55 in those with breast cancer events and those without. Note in Stata the statistic to use is the mean. If you select this value it will result in the proportion being on the Y-axis of your graph. Make sure your y-label is correct (use the Y-axis menu). 4) Write a caption for this figure so I can interpret the graph. (submit with graph) 5) Using software of Using Stata survival graphics, create a survival curve with a 95% confidence band for the total sample. 6) Write a caption for this figure so I can interpret the graph. (submit with graph) 7) Create a set of survival curves with 95% confidence bands for the different levels of PMH. 8) Write a caption for this figure so I can interpret the graph (submit with graph) Part 2: Interpreting your table and graphs. (Quiz questions to be answered online, as well) 2) Why is the confidence interval on the proportion who are greater than 55 in those who develop breast cancer wider than the confidence interval in those who do not develop breast cancer? The confidence interval is wider in the BRCA greater than 55 population compared to their no BRCA counterparts as 3) Will a 95% confidence interval be wider or narrower than a 90% confidence interval calculated on the same sample? A confidence interval of 95% will be wider than a 90 % confidence level. The reason being that there is more certainty (95%) that the mean is within the interval compared to a 90% interval which only means 90% confidence that the mean is with in the interval. Bios6601 Homework #2 4) Could you get all the smoking information from the variables given or did you have to do some data management (yes/no)? Yes all the smoking information was retrieved from the variables given. 5) If you try to obtain the median survival and 95% confidence interval for the entire sample you will get no answer. Why? 6) What does censoring need to be independent from for these curves and confidence intervals to be correct? 7) The 95% confidence bands will get narrower with a larger sample size (yes/no)? 8) Why is the estimated proportion surviving at a particular time not usually equal to the number that are alive at that time/total sample in the study.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'