## Statistics 3090

by: Hannah Stephens

# Statistics 3090 Stat 3090-005

Hannah Stephens
Clemson

These notes cover week 3 of notes and notes from the homework assignment!
This 4 page Class Notes was uploaded by Hannah Stephens on Friday January 22, 2016. The Class Notes belongs to Stat 3090-005 at Clemson University taught by Paul J. Cubre in Spring 2016.

Date Created: 01/22/16
Ch.\$2\$Defenitions\$ \$ Confounding\$Variable—Variable\$that\$was\$not\$controlled\$or\$accounted\$for\$by\$the\$ researcher\$and\$damages\$the\$integrity\$of\$the\$experiment.\$ \$ Descriptive\$Statistics—Focuses\$on\$exploratory\$methods\$for\$examining\$data\$\$ \$ Inferential\$Statistics—Develops\$theories\$to\$test\$the\$hypothesis\$using\$data\$collected\$ from\$an\$experiment\$to\$make\$formal\$conclusions\$about\$a\$population\$or\$parameter.\$ \$ Procedure:\$\$The\$DecisionEMaking\$Method\$ 1. Clearly\$define\$the\$problem\$and\$any\$influential\$variables.\$ 2. Decide\$upon\$objectives\$\$and\$decision\$criteria\$for\$choosing\$a\$solution.\$ 3. Create\$alternative\$solutions.\$ 4. Compare\$alternatives\$using\$the\$criteria\$est.\$in\$second\$step.\$ 5. Implement\$the\$chosen\$alternative.\$ 6. Check\$the\$results\$to\$make\$sure\$the\$desired\$results\$are\$achieved.\$ \$ Response\$Variable—The\$variable\$of\$interest\$in\$an\$experiment.\$\$ \$ Explanatory\$Variable—Variable\$that\$affects\$the\$variable\$of\$interest\$(response\$ variable)\$in\$an\$experiment.\$ \$ Placebos—Fake\$treatments\$ \$ Double\$Blind\$Study—Study\$where\$neither\$the\$evaluators\$of\$the\$subjects\$are\$told\$if\$ they\$are\$the\$treatment\$group\$or\$the\$control\$group.\$ \$ Level\$of\$Measurement—The\$quality\$of\$data\$ \$ Nominal\$Data—Data\$that\$represents\$whether\$a\$variable\$possesses\$some\$ characteristic\$ \$ i.e.\$Sex\$(male\$or\$female)\$and\$hair\$color\$(blonde,\$brunette,\$or\$redhead)\$ \$ Ordinal\$Data—Represents\$categories\$that\$have\$some\$associated\$order.\$ \$ i.e.\$Have\$the\$property\$of\$ordinality\$(or\$ranking)\$ \$ Interval\$Data—Data\$that’s\$ordered\$and\$the\$arithmetic\$difference\$is\$meaningful.\$ \$ i.e.\$Difference\$between\$48\$and\$45\$degrees\$is\$the\$same\$as\$the\$difference\$ between\$73\$and\$70\$degrees\$(3\$degrees).\$ \$ Ratio\$Data—Similar\$to\$interval\$data,\$however\$it\$has\$a\$meaningful\$zero\$point\$and\$the\$ ratio\$of\$two\$points\$is\$meaningful.\$ \$ i.e.\$Operations\$of\$addition,\$subtraction,\$multiplication,\$and\$division\$are\$ reasonable\$on\$ratio\$data.\$ \$ Qualitative\$Data—Measurements\$that\$can\$change\$in\$kind,\$but\$not\$in\$degree.\$\$ Measured\$on\$the\$nominal\$and\$ordinal\$scales.\$\$Usually\$labels\$or\$descriptions.\$Do\$not\$ have\$naturally\$occurring\$numerical\$values.\$ \$ i.e.\$Gender\$(M/F),\$Occupation,\$eye\$color\$ \$ Quantitative\$Data—Measurements\$that\$change\$in\$magnitude\$from\$trial\$to\$trial\$ where\$some\$order\$or\$ranking\$can\$be\$used.\$\$Can\$be\$measured\$using\$a\$numerical\$ scale.\$ \$ i.e.\$Number\$of\$students\$in\$a\$class,\$number\$of\$dependents\$claimed\$on\$a\$tax\$ return,\$number\$of\$fans\$at\$a\$sporting\$event,\$time\$it\$takes\$to\$complete\$a\$task\$ \$ Discrete\$Data—Data\$in\$which\$the\$observations\$are\$restricted\$to\$a\$set\$of\$values\$(such\$ as\$1,2,3,4)\$that\$possesses\$gaps.\$\$Can\$assume\$decimal\$values.\$ \$ Continuous\$Data—Data\$that\$can\$take\$on\$any\$value\$within\$some\$interval\$\$ \$ \$ Statistics  Ch.  2  Notes   • Bar  Charts—Display  where  the  length  of  a  bar  corresponds  to  the  frequency   or  number  of  corresponds  to  the  frequency  or  number  of  observations  in  a   category.   • Pie  Charts—Slice  is  proportional  to  the  amount  in  each  category   • Relative  Frequency—The  proportion  and  is  calculated  by  relative  frequency   (#  in  the  class)          #  Total               • Cumulative  Frequency—Sum  of  frequencies  of  a  particular  class  and  all   preceding  classes.   • Cumulative  Relative  Frequency—Make  the  cumulative  frequency  relative       Ex.  Assets  of  10  largest  insurance  companies  in  billions:  148.4,  110.8,  55.6,  52.4,   50.4,  42.7,  41.7,  36.3,  35.7,  35.7     Assets   Frequency   Relative  Frequency   Cululative  Frequency   Cumulative  Relative  Frequency   Column1   \$30-­‐\$59   8   8/10=0.8   8   8/10=0.8   \$60-­‐\$89   0   8   8/10=0.8     \$90-­‐\$119   1   1/10=0.1   9   9/10=0.9     \$120-­‐\$149   1   1/10=0.1   10   10/10=1           • Histogram—A  bar  graph  of  frequency  or  relative  frequency       Algorithm   1. Determine  the  number  of  classes.     2. Find  the  smallest  and  largest  value   3. Class  Width=  largest-­‐smallest   #  of  classes   4. First  class  is  usually  the  first  class  with  the  smallest  number  and  starting  at  a   multiple  of  the  class  width   5. Find  class  boundaries  are  the  average/midpoint  between  two  classes   a. Lower  class  boundary<x<upper  class i boundary   6. Calculate  the  frequency  or  relative  frequency   7. Create  a  bar  graph   **Class  boundaries  are  sometimes  called  cut  points.  Classes  are  sometimes  called   bins.     • Ordered  Array—List  of  all  data  points  in  order   o Rank  Order:    Increasing  order   o Reverse  Rank  Order:    Decreasing  order   • Dot  Plot—Graph  where  each  data  point  is  a  point  above  a  horizontal  axis   (usually  a  number  line)  if  multiple  entries  have  the  same  value  they  are   stacked.   • Probability  Distribution—Assigns  a  probability  to  a  set  of  possible  outcomes   • Symmetric  Distribution—If  one  were  to  draw  a  line  down  the  middle  of  the   distribution  the  two  sides  would  mirror  each  other.   • Skewed  (asymmetrical)  Distribution—Not  symmetric  or  a  group  of   observation  that  are  not  equal  on  both  sides   1. Left  Skewed—Left  side  is  longer   2. Right  Skewed—Right  side  is  longer   • Unimodal—Distribution  has  each  one  “peak”   • Bimodal—Has  exactly  two  “peaks”   • Multimodal—Has  more  than  one  “peak”   How  to  find  the  “center”  of  data?     Suppose  you  have  2  values:    a  and  b,  what  is  the  center?       -­‐Have  to  take  the  avg=  a+b                      2     • Mode—The  value  that  occurs  most  frequently     o Not  necessarily  unique   o One  mode  is  unimodal,  two  is  bimodal,  at  least  two  is  multimodal     • Mean—The  average   th 1. Sample  Mean—Where  X is  the  i i  a  point  in  a  sample.       2. Population  Mean—Where  N  is  the  number  of  elements  in  the   population.   **Aside—Sigma  (Σ)  notation.    Means  add  up  things.     th 3. Weighted  Mean—Suppose  the  i  observation  is  given  a  weight  w.   i 4. Trimmed  Mean—Ignores  equal  percentages  of  the  highest  and   lowest  data  points.   • Median—Data  value  in  the  center  of  an  ordered  list   • Outlier—Data  points  that  are  extremely  small  or  large  relative  to  the  data  set   • Resistant—Statistics  not  affected  by  outliers  are  called  resistant

