### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 649 Class Note for STAT 59800 with Professor Neville at Purdue

### View Full Document

## 14

## 0

## Popular in Course

## Popular in Department

This 34 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 14 views.

## Similar to Course at Purdue

## Popular in Subject

## Reviews for 649 Class Note for STAT 59800 with Professor Neville at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Populations and samples 0 In data mining we often work with a sample of data from the population of interest 0 Estimation techniques allow Population inferences about population properties from sample data Sampling saaple 0 If we had the population we could calculate the properties quot39lilllllllllquot V of interest Parameter Inference Statistic Beta 0546 b 0692 Statistical inference 0 Infer properties of an unknown distribution with sample data generated from that distribution 0 Parameter estimation lnfer the value of a population parameter based on a sample statistic eg estimate the mean 0 Hypothesis testing lnfer the answer to a question about a population parameter based on a sample statistic eg is the mean nonzero Parameter estimation Parameter estimation o lnfer the value of model parameters 9 from data 0 9 can take values in the parameter space 9 0 Example approaches Leastsquares estimation regression 0 Maximum likelihood estimation MLE 0 Maximum aposteriori estimation MAP Likelihood Let D x1 0 Assume the data D are independently sampled from the same distribution pX9 The likelihood function represents the probability of the data as a function of the model parameters L0D L0x1xn Likelihood cont 0 Likelihood is not a probability distribution 0 Gives relative probability of data given a parameter 0 Numerical value of L is not relevant only the ratio of two scores is relevant eg Lwllp L92D A probability distribution allows us to predict unknown outcomes based on known parameters 0 A likelihood function allows us to determine unknown parameters based on known outcomes ML estimation 0 Most wider used method of parameter estimation 0 Choose the value of 9 that maximizes likelihood MLE arg max 0 0 Often easier to work with loglikelihood l9D log L6D log Hpzvi IQ 71 l09pwil0 Maximum likelihood estimation 0 Define likelihood 39 Take derivative set to O and solve Example Toss a weighted coin 100 times observe 30 heads 0 What is the MLE estimate for the p parameter of the Binomial distribution that generated the data Lp pH 300n 10019 30 70 1 30gtp p Example cont Sufficiency 0 A statistic is a function TD of the data 0 A statistic is sufficient if we can calculate the likelihood knowing only TD ie L9D L9TD For large datasets can compute and store sufficient statistics rather than the entire dataset Example D 551 N Bernoullip LplD pT1pquot T where T 2211 Bayesian estimation Freq uentist approach 0 Population parameters are fixed but unknown 0 Data is a random sample from population 0 Bayesian approach 0 Parameters are random variables with a distribution of possible values 0 Data is fixed and known provides evidence for different parameter values Bayesian estimation oont 0 Prior distribution 0 Distribution over values for 9 before observing the data Posterior distribution 0 Updated distribution for 9 given the observed data 0 Use Bayes theorem to compute posterior pD6p6 p a D I 291 olt pD6p6 Maximum a posteriori MAP estimation Posterior specifies a distribution over 6 To get a single point estimate of we calculate the value that maximizes the posterior MAp arg mgxp6D log MOID 0lt logpDl0p0 vvvvvv u MLE is a special case of MAP with a uniform prior for regularization Where s the search unq39 ssauu 1 ssau11 1 V W73 s1213121np9 9112 89881210 d11q u101 1 I 1quot 39 quot I 1 5 7 5 ng quot w ws Ir 39 I I i 5 e rot 139 r 26 h f l 6ungees 8AA 9J8 eoeds 119mm Hypothesis testing Example A new search engine company Xerus claims that its query response time is 21 ms 0 Is this good 0 What if you are told that Google s average response time is 14ms Sampling distributions Sampling Population Distribution Distribution of parameter values determined empirically or analytically can be used to evaluate significance of observations Hypothesis testing Statistical hypothesis tests are sets of statements about population parameters Null hypothesis Ho Presumed true until statistical inference indicates otherwise set up to be refuted by alternative 0 Alternative hypothesis H1 Rival hypothesis that we conjecture is true Example hypotheses 0 Algorithmmodel A works 0 Algorithmmodel A works better than algorithmmodel B 0 Algorithmmodel A works better because it has ability X 0 Ability X improves model A because it handles condition W better than ability Y 0 etc Hypothesis testing strategy 0 Formulate null and alternative hypothesis 0 Ho Xerus mean response time Google s mean response time H1 Xerus mean response time at Google s mean response time 0 Gather a sample statistic eg uestimate of Xerus average query response time 0 Determine the sampling distribution for the statistic under the null hypothesis 0 Use the sampling distribution to calculate the probability of obtaining u given H0 0 If the probability is low reject H0 in favor of H1 Rejecting the null hypothesis Sampling Distribution Under Ho f 3024 ff K If pgt005 then reject 359 Observed value Statistical significance 04 preject H0H0 true ptype 1 error A value of a statistic is statistically significant if it is unlikely to occur under the null hypothesis Errors H19 39Dlnlt illlg Ho 1 Typel error Type 2 error Type 1 null is rejected when it is true 0 Eg conclude cancer drug increases life expectancy when in fact it doesn t Generally considered to be most serious error 0 Type 2 null is accepted when it is false Statistical power 0 Lack of statistical significance does not necessarily imply that H0 is true 0 Test could have low statistical power 1 paccept Hrng false ptype 2 error How to increase power 0 Increase sample size 0 Decrease sample variability Matching sample selection control for confounding variables increase precision of measurements Increase effect size 0 More extreme experimental conditions avoid ceilingfloor effects 0 Increase alpha Parametric tests 0 Classical hypothesis tests 0 Difference of proportions Difference of means Difference of correlations Ratio of variances 0 Sampling distribution is calculate analytically based on known distributions eg Student s t Exam ple T test Two sample ttest assesses difference in means Assumptions 0 Population is normally distributed 0 Variance of two populations are equal 0 Samples are independent random draws from the population 0 The tstatistic follows a Student s t distribution X1 X1 2 SXng 39 Non parametric tests 0 Computationally intensive statistical tests 0 Use randomization or resampling techniques to get an empirical estimate of sampling distribution and estimate probability under H0 0 Compare relative locations of probability distributions rather than specific parameters 0 Many use relative ranks of samples observations rather than numerical values 0 Eg McNemar s test Ho Cl is as likely as IC CI 0 Use binomial distribution How to Choose a test Continuum Non Parametric meme Chisquare or one sample T465 Wilcox Sf binomial fest Mann Fisher39s or Unpaired T rest chi square Tm thfney test fem I Paired Paired ttesf Wilcoxon tesf Mctlzsrgar 5 Importance 0 Statistical significance is a necessary but not sufficient condition for importance 0 Effects can be 0 Very small but statistically significant 0 Very large but not statistically significant 0 Hypothesis tests protects against type 1 errors only Next Class 0 Reading Chapter 6 PDM 0 Topic 0 Predictive modeling overview and representation

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.