New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Introduction to Statistics

by: Kamren McLaughlin

Introduction to Statistics STAT 201

Kamren McLaughlin
GPA 3.94

S. McGuire

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

S. McGuire
Class Notes
25 ?




Popular in Course

Popular in Statistics

This 18 page Class Notes was uploaded by Kamren McLaughlin on Monday October 26, 2015. The Class Notes belongs to STAT 201 at University of Tennessee - Knoxville taught by S. McGuire in Fall. Since its upload, it has received 20 views. For similar materials see /class/229892/stat-201-university-of-tennessee-knoxville in Statistics at University of Tennessee - Knoxville.


Reviews for Introduction to Statistics


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15
Project 1 Learning About 2 g By Seth Williams 03092011 McGuire Section 003 1 My Student ID ends in 4 0 Therefore my sample subset will be 400 40 440 students 49 Age First Cell Phone Quantiles Moments l m l 1000 maximum 25 14070455 9596 19795 51 ev 710727 975 1 Std E r M 319371 100 900 15 Upper 95 Mean 14 231492 59 E 50 quartile 5 Lower 95 Mean 13909417 5quot 3 500 median 1 N 440 33 250 quartile 13 I K 100 12 m 20 25 11 05 10 00 minimum 7 2 From the survey I chose to evaluate the categorical variable of What location students Prefer to Sit In Class 10 from questionnaire Below is the bar and pie chart that were used to evaluate the variab e PIE CHART BAR CHART o s H 2 x a m 24sz 096 3 E E 5 a 155 E 2 E 6 gt E E 7 quot 0 Z 2 m 357 2 10 Prefer to Sit in Class 377 10 Prefer to Sir in Class 10 Prefer to Sit in Class Fronr Row Near Front Middle Near Back Back Row Varies Interpretation After looking at the pie and bar chart Ileamed that while the majority of students would prefer to sit in the middle of the class almost the same prefer near the front of the class It didn t surprise me because of the studies that have been done on how muchbetter grades are when you sit closer to the teacher Extraverr I i lntroven 90 200 SD 70 E w g 50 E s so 2 100 40 5 30 20 10 0 r o Strongly Disagree Samewlmr Dlsagree Neutral Samzwhar Agree eranglv Agree Strongly ursaqree Somewhar Disagree Neuxral Somewhat Agree Stranglv Agree 18 quotI39m a Gamerquot 18 quotI39m a Gamerquot Overall they are both similarly skewed with a small percentage of Statistics 201 students that Strongly Agree that they are a amer While at the same time a much larger percentage of students that strongly disagree that they are a gamer As far as differences go on the scale of Somewhat agree to Somewhat Disagree are opposite to each other More introverted students somewhat disagree that they are a gamer while with the extroverted students more students somewhat disagree that they are a gamer 4 I chose the quantitative variable Hours per week playing video games 12 in the questionnaire A B 250 200 150 Counr 100 o H o N o w c a c Overall the shape of the histogrambox plot seems to be extremely skewed to the right showing that overall students in Statistics usually do not spend anymore thanten hours playing video games There are many outliers and I would say there are people that are very into videogarnes and play an extreme amount However even when saying that I believe once you go over 30 hours per week there may be some students that lied or stretched their garnetime a Little much Because the distribution is skewed and not symmetric you would use the median and IQR to nd the center and spread respectively The Median is 2 hours To nd the IQR you subtract the Lower Quarlile from the Upper Quartile 505 making the IQR 5 The median is the middle number out of all the data or the average The IQR is the distance between the two quartiles and the median 233 098 164 E C 128 09 067 39 0 3 gt00 0 5 7067 02 128 a 01 164 433 O 02 Because the points plotted on the Normal Probability Plot do not form an approximate straight line it wouldn t be appropriate to conclude hours played is normally distributed D GoodnessofFit Test Sl39Iapiro Wilk W Test rubltW 0661791 0001 me He The data is from the Normal distribution Small prvalues reject Ha Because the value of ProbltW is less than 05 distribution wouldn t be considered normal So it could not be approximated by Normal Distribution 5 For this question I chose the quantitative variable of GPA 9 in the questionnaire and compared the categorical variable of Gender 2 in the questionnaire 5 A Distributions 02 GenderMae STACIGID HISTOGRAMS IDEBYSIDE BOXPLOTS 730 g o 39 20u 10 4 I l 7 I l l l 39 0051152253354 3 Distribulions 02 GenderFemale 39 g 02 m l E l o z m 1 60 E 3405 720 0 Male 02 Gender B C Female Both the Female and Male GPA histograms are skewed to the left and unimodal There are no outliers in the male GPA but in female GPA there does seem to be a couple of outliers there seems to be one outlier that has a GPA ofbetvveen 0 and 05 this may be a typo When someone was lling out the survey The IQR for the male histogram 35452855 is 69 the IQR for the female histogram 3633 is 63 The difference is only 06 so they are extremely similar The medians on the box plots are also very close Overall Females and Males seem to have similar distribution When comparing their GPA The females do seem to have a little bit of a higher average but looking at count there seems to be a larger number of females in Statistics I chose the two categorical variables of Do you own an EReader 32 in questionnaire and Are you in a fraUsorority 8 in questionnaire to compare A I think overall people in a frat and sorority will be less likely to own an e reader I would think that if you are involved in greek life you have less time for leisure activities like reading B Mos ic Plot LOO 32 Own an e Reader 0 v o 0 N m 08 Frat Sorority Member V Contingency Ta le 08 Fratorqrity Membr i 97259Hi7759 C The mosaic plot really seems completely even and the relationship seems to almost be absent However a er looking at the contingency table there is a small relationship and it seems like there is a small difference 19 that more people involved With the Greek life are more likely to own an e reader Which does not match my expectations Multivariate DZ GenderMale Correlaxions 34 ax UT Yau Are Move Auramve Yhan 34 as a UT You Are More Auracnve Than 35 as a UT You Ave More Athlelic Than 05529 36 56 at UT Vou Are Mme Athletic Than 629 Staterplo Matrix so 349m UT You Are 4c Mum Auramn Than an at UT You Are More Anna Than Multivariate 02 GenderFemale Correlaiions 34 as ax M Van Are More Anmdive 34 as 310T You Ave Mme Auranive Than 36 as a UT You Are More Axhletic Than Scaxerplm Manix so 34aIUTYouAre 40 Mom Anractvve Than 6 at UT You Are More Amen Than BSuspicious Data Excluded Mulliyariale 02 GendeMale Cnrrela uns 34 as at ur You Are More Auramve Than 36 m u UT You Am Mm Amm m Than a 94 3 m You Are Mare Amamve Than com 1 Sm 35 am UTVou Are Mm mm Than u any LDDUU 34 ar UT Yau Arr 4n Mum Auraqu Than 35 on m u 139 You Ar Mare Arhle Than Mullivariala 02 Genderr emale Correlations 34 m UTYou Are Mm Amamve Than 34 ea ax W Van Ave More Auramve Than 36 w ax UT Vau me Mare Axhxem nun un wan 35 9s a WWW Are Mm Amen Than 0 so 349m UYYou A1 r 407 More Auramva Than 36 we Ur You An 40 More AxlIlEn Than C D E A I would assume that more females see themselves as less attractiveless athletic than other students when compared to how males would consider themselves Women seem to not give themselves credit when men usually don t care enough to worry about it or are con dent either way Men also would not admit if they did consider themselves less attractive or athletic than others which could be re ected in the survey The strongest correlation is with men Men not only see themselves as more attractive than other UT students but also more athletic Women still have a correlation of thinking they are but not as strong It obviously says that men have more con dence in themselves but women would tend to be more honest with a survey then men would be to say they are less attractive or athletic than others Yes I would think that the data shows the difference but I wouldn t trust the data completely As I stated previously I don t think that many men would be open or admit they consider themselves less attractive especially less athletic than others Whereas women o en struggle with self image issues openly more so than men 03 Heugm lnhes lt7 w x I l 4 I 20 12 Hours DerWeek Playing Video Games A continued 03 Height Inches 67 Week Playing Video Cam Summary of Fit RSquare 0091263 RSquare Adj 0089188 Root Mean Square Error 4200408 Mean of Res onse 6803011 440 Observations or Sum Wgts Analysis of Variance 163984 0213609quot12 Hours per es Sum of Source DF Squares Mean Square r Ratlo Model 1 7760917 776092 439876 Error 435 77278218 17643 Prob gt F CTotal 439 85039135 0001 Parameter Estimates Term Esumare 5rd Error 1 Rana Interce t 67163984 0239067 0213609 0032207 12 Hours per Week Playmg Video Games B 03 Height Inches Residual HNY t momomo llllll Normal Q uanrile Prahgt m 0001quot Egua on of the least sguares reggession line03 Height Inches 67163984 021360912 Hours per Week Playing Video Games 0 The slope is positive and that suggests that the taller you are the more you probably play video games but men are usually taller than women and men usually like to play video games a little more than women so that might have something to do With it Assumptions 0 Linearity Height and Video games relationships has a linear relationship things to compare 0 ej et each other 0 apparent they can 0 IndependencezBoth variables are independent and would not Homoscedas city The relationship of the two variables is relate to each at er Normalitsz he distribution of the two variables is normal C o RSquare is a pretty far distance from 1 so the correlation though slightly present is far from a strong one D Linear Fit 03 Height Inches 67104405 02359373912 Hours per Week Playing Video Games Summary of Fit RSquare 0088832 RSquare Adj 0086728 Root Mean Square Error 4206379 Mean of Response 6798908 Observations or Sum ngs 435 Analysis of Variance Sum of Source DF Squares Mean square F Rana Model 1 7469228 746923 422142 Error 433 76613378 17694 Prob gt F C Total 434 84081605 lt000139 Parameter Estimates Term Esumare Std Error 1 Rana Probgtl lniercepl 67104405 0243341 27576 lt000139 12 Hours per Week Playing Video Games 0235937 0036313 650 lt0001 o 75 a lt 70 7 z go u65 ii 0 60 55 I 39 I I l I 0 10 30 40 20 12 Hours per Week Playing Video Games With excluded material green line The Highlighted Material Above shows a through c again though there is a change it is very slight and not enough to drastically change the output E As stated earlier I would say it has a lot to do with girls vs guys height Men on average are taller than women and men also usually enjoy Video games more than women Statistics 201 Project 2 Learning More About Statistics 201 Students By Seth Williams April 29 2010 1 Differences with respondents and others 9 One fundamental difference in the respondents and college students as a whole is that the respondents were strictly students at the University of Tennessee Knoxville B One fundamental difference in the respondents and UT students as a whole is the fact that only UTKnoxville students were represented and there are 5 campuses statewide four of which were not represented Statistics 201 is also a class that is only required by certain majors which would lead other majors to be excluded from participating C9The most obvious answer to Why this survey might not represent all N1000 studenw in Statistics 201 is because no one was forced to take it leading some students to not participate A second possibility is that not everyone that took it took it seriously and were completely honest with their answers 2 Confidence Interval for a proportion I chose to analyze the categorical variable Where do you sit in the class 14 I chose students who sit Near Back during Stats 201 The proportion is 1190 A9 14 Where Do You Sit In Class Front Row Near From Mlddle Near Back Back Row Different All The Time Frequencies Level Count Front Row 39 Near Front 281 Middle 347 Back Row 31 Different A11 The Time 35 004207 Total 832 100000 N Missing0 6 Levels B9My student ID ends with a 0 therefore my sample size is 730 73 What the following graph shows about the interval is that you can be 90 con dent that the true proportion of students that classify their seats as Near Back will be between 431 and 1512 14 Where Do You Sit In Class H Q A a E S g a E g If m u r E i E L 4 C a 0 rd 8 g 3 c u z m J 1 2 95 D lt Frequencies Level Count Prob Front Row 3 004110 Near Front 21 028767 Middle 36 049315 39 1 uh NMz39ssz39ng 06 2 002740 Levels Different All The Time 5 006849 Total 73 100000 Con dence Intervals Level Count Prob Lower CI Upper CI l Alpha Level Count Prob Lower CI Upper CI Front Row 3 004110 0016541 0098451 Near Front 21 028767 0209348 0381171 Middle 36 039888 0587909 i J M Liquot Back Row 2 0009108 0079466 Different All The Time 5 006849 0033735 0134093 Total 73 Note Computed using score confidence intervals C np73082 599 lt10 nq 73918 6701 gt10 MBecause the signs are not alike it is not appropriate of the con dence interval D9 The con dence interval is 431 1512 and the population proportion is 119 119 is contained in the Con dence interval 3 Con dence Interval for a Mean The Quantitative Variable I chose was How many hours are you taking this semester 13 A9Using the full data set the population average value p is 1508 hours There are no outliers because if a student was taking this class which would have to be the case to take the survey they would have to have at least 3 hours and with a waiver from the Bursar s Of ce you are allowed to take any number of hours permitted 13 Hours This Semester Quantiles Moments 1000 maximum 24 Mean 15082933 995 20835 Std Dev 20479445 975 19 Std Err Mean 00709997 900 177 Upper 95 Mean 15222293 750 quartile 16 Lower 95 Mean 14943573 500 median 15 N 250 quartile 13 100 13 25 12 05 7495 00 minimum 3 B9Using the Subset random sample data set the mean is 1493 hours The histogram is unimodal and symmetric with a subset of 73 which is more than 40 50 the book says that this test alone can be used to check the Nearly Normal Condition Which this graph meets 13 Hours This Semester Quantiles Moments 1 1000 maximum 19 Mean 14931507 995 19 Std Dev 19531226 975 19 Std Err Mean 02285957 900 18 Upper 95 Mean 15387204 75 0 quamle 15 Lower 95 Mean 1447581 500 median 15 73 25 0 quartile 13 00 13 iiiiii25 12 11 12 13 14 15 16 17 18 19 20 21 05 12 00 minimum 12 C Using my sample set again and the 98 con dence interval for the population mean is 1439 1548 meaning that 1 am 98 Con dent that the population mean is between 1438 and 1548 Being that the shown below population average value 11 is 1493 it proves the con dence interval is correct Con dence Intervals Parameter Estimate Lower Cl Upper CI 1 Alpha Mean 1493151 1438762 154754 0980 Std Dev 1953123 1634424 2416079 0980 4 H othesis Test Re rdin the Difference in Means for Inde endent Sampleszl chose Gender 2 as my Categorical Variable and Texts Sent Per able Day 21 as my Quantitative Vari Distributions 2 Genderfema e A 21 Taxis Sern Per Day Quantiles Moments 1000 maximum 50 ean 95378375 H 995 son 51 Dev 11137405 975 500 mi Err Mean 18309756 900 200 Upper95 Mean 13251235 750 quartile 125 Lower95 Mean 58244411 500 median 250 quar He 30 iiiiiiiilo o m 0 100 200 300 400 500 600 70 255 4 05 4 00 mmlmum 4 Distributions 2 Gendermale Texts Sent Per Day Quantiles Moments M 1000 maximum 500 Mean 52551111 a 99 5 500 5m Dev 4591169 9759 500 gm Err Mean 093523 900 100 Upper9596 Mean 81482645 750 quartile 50 Lower 95 Mean 24239577 500 median 30 N 35 250 quartile 15 100 37 rim 0 100 200 300 400 500 50 2395 a 05 0 0094 minimum a A Continuedln both the male and the female results there were two outliers but only one of them were extreme the other one lies closer to the results and can be looked at along with the other results Graph with excluded points below Distributions 2 Genderzfemale 21 Texts Sent Per Day Quantiles Moments 1000 maximum 300 Mean 51351111 Q 9a 300 t 670353 975 300 Std Err Mean 2111725 900 200 Upper 95 Mean 10594922 750 quartile 100 Lower 95 Mean 55773001 500 edlan 50 N 36 250 quartile 30 lllll 1 0 50 100 150 200 250 300 35 25 4 594 4 00 minimum 4 Distributions 2 Gendermale 21 Texts Sent Per Day Quantiles Moments CE a 1000 maximum 150 ean 40085714 995 150 Std v 3630052 975 150 Std Err Mean 1359077 900 100 Up er 5 Mea 52 555379 750 quartile 50 Lower 95 Mean 27616049 500 median 30 N 35 250 quartile 15 100 36 0 50 100 150 25 0 05 0 00 minimum 0 After looking at the graph that was produced after the extreme ou ier was deleted they appear normal enough to perform a 2sample test B9My observations of the results of the distribution reveal that females more than double on average the number of texts that males send per day I want to know if females on average double the texts per day that males send My Hypothesis Pair would be Null Hypothesis The number of texts sent per day is independent of gender Alternative Hypothesis There is an association with the number of texts sent per day and gender The difference when comparing the female s mean 8036 to the male s mean 4009 is 4027 texts per day The standard error of difference 1358 The P Value 0019 To Conclude 0019lt 1358 0t 05 7 Due to the fact that my PValue is lower than the Alpha Level I reject the null hypothesis and can conclude that the difference in texts sent per day is greater than or equal to 0 UI of 21 Texts By 2 Gender 2 Gender maleJemale Assuming unequal variances DI elence 41 275 Ram 304003 5m Err Dif 1 BF 176132 Llppzr CL Dil 141028 Prob gt I 0003739 Lower CL Dif 76852 Prob gtt 0998 Con dence 095 Prob lt 1 00019 750 730 710 0 ID 20 30 40 50 C W39hen the null hypothesis is false Type II Error occurs but is fails to reject it We would have to reduce B to decrease this type of error This would actually cause an increase in Type I Errors but they are not as serious as Type II Errors H othesis Test Re rdin An Association Between Two Cate orical VariableszMy two categorical variables are Did one or both of your parents graduate from college 26 and Are your parents married 11 A Mosaic Plot Contingency Table we 25 One or Bothrl argnts Graduate from col Coum No Ves w Expected 75 u Cell ChiZ 5g 1 No 54 134 133 1 35024 1529761 3 go w 102511 23539 5 5 Yes 1 1 54 544 15 g 119975 524024 013025 t 3001 06372 w e H N 1 a 1 7155 e77 832 Ves ll Parems Manipt Tests N D LagLIke RSqnare U 332 1 75533e19 00109 TEST ChiSquare PrnbgtChiSq L1kelihoud Ratio 15117 moor Pea son 15324 000139 Fis 1 s Exact Test Prub Allemative Hypothesis Left 10000 In 11 Yes Right DDDI39 H 39 2 Tavl 00001 rruu a Me U 11 Parents Married BThe largest Cell Chiquot2 Value is students who had at least one parent graduate from college and whose parents are still married Alarge Chiquot2 value usually means that the null hypothesis is to be rejected C Null Hypothesis The fact that the students39 parents are married is independent ofthe fact that at least one ofthe parents graduated from college Alternative Hypothesis There is an association between the parents being married and at least one ofthe parents graduating college PValue 0001Because 1105 and the PValue is lower than that at 0001 my null hypothesis isproven false Conclusion There is an association in the students that participated between their parents being married and atleast one ofthe arents graduating college D9This would be a Type I Error The probability of making a mistake is equal to 05 a forthis problem 6 39 UsingQuestion 20 and 5 as directed A Before Exclusions After Exclusions 130 5 3 iso z a E g a g 140 g 5 g izo g g 507 2 5 mo N 50 0 i i I I i i i S 160 12 1A0 l 1amp0 200 ZElD Z 0 8 100 120 140 160 180 200 220 Z 0 S Desired Weight libs 5 De ned Weight quot35 RSquare 37 Summary of Fit According to ourmodel Which means 3 7 of RSquare 0037016 S d 002306 the variability in thefastest speed is accounted R quare A l for by the desired weigh t Root Mean Square Error 2154495 Mean of Response 08662 Observations or Sum Wgts 71 BThe 95 Confidence Interval is 03 33 Parameter Estimates ESImalE Std Error i Ratio Probgti1 Lower 95 Upper 95 lnierce t 8675958 1368967 634 lt0001quot 59449441 11406972 5 Desired Weight lbs 01468848 0090192 163 01080 0033043 03268128


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.