×

### Let's log you in.

or

Don't have a StudySoup account? Create one here!

×

or

by: Shlomo Oved

16

0

23

# NYU Tandon School of Engineering Data Analysis MA-UY 2224 Notes for the Whole Semester MA-UY 2224

Marketplace > New York University > Math > MA-UY 2224 > NYU Tandon School of Engineering Data Analysis MA UY 2224 Notes for the Whole Semester
Shlomo Oved
NYU

Get a free preview of these Notes, just enter your email below.

×
Unlock Preview

It essentially covers everything that we went over throughout the whole semester with regards to formulas and strategies
COURSE
Data Anaylsis
PROF.
Dr. Qian
TYPE
Bundle
PAGES
23
WORDS
KARMA
75 ?

## Popular in Math

This 23 page Bundle was uploaded by Shlomo Oved on Monday August 22, 2016. The Bundle belongs to MA-UY 2224 at New York University taught by Dr. Qian in Fall 2016. Since its upload, it has received 16 views. For similar materials see Data Anaylsis in Math at New York University.

×

## Reviews for NYU Tandon School of Engineering Data Analysis MA-UY 2224 Notes for the Whole Semester

×

×

### What is Karma?

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 08/22/16
Data Analysis Formulas Midterm 1 2 2 n(xi−´)  Sample Variance- s =∑ (n−1) i=1 n 2 (xi−´ )  Sample Standard Deviation- s= √i=1(n−1) c  P(E) (Independent event)= 1-P(E ) P(A/B)= P(A∩B)  Conditional Probability= P(B) P(B)+P(B )=1  Probability= P(A/B)P(B)+P(A/B )P(B )  Probability= P(A∩B)  Independence= = P(A)*P(B) P(A∪B )=¿ P(A∩B)  Independence= P(A)+P(B)- P(A∩B)  Mutually Exclusive= = 0 P A∪B =¿  Mutually Exclusive= P(A)+P(B)  Chebyshev’s Inequality: Inner bound is greater than 1 1− 2 k 1  One sided: outer bound is less than k +1  Normal Data Set:  Empirical Rules:  1. If data is normally distributed, then 68% of data points fall within 1 standard deviation  2. If data is normally distributed, then 95% of data points fall within 2 standard deviation  3. If data is normally distributed, then 99.7% of data points fall within 3 standard deviation  Correlation Coefficient Data Analysis Formulas n ∑ (yi−y)(xi−x)  i=1 r= (n−1)S S x y n (x −x)2 i=1 i  Sx= (n−1) n 2 ∑ (yi−y)  Sy= i=1 (n−1)  If r is positive, then positive correlation (the closer to 1, the stronger the positive correlation)  If r is negative, then negative correlation (the closer to -1, the stronger the negative correlation)  If r is 0, then no correlation is present  r can only be between 1 and -1  Combinations and Permutations n!  nP r (n−r ) n!  nC r (n−r )r !  Random Variables  P.m.f= Probability Mass Function (Discrete Random Variables) f(x)=¿ X ∈S  P(X=x) for each x  f(x)>0 n  ∑ f(x)=1 (i.e, all the probabilities of the pmf together must X ∈xS add up to one) n P A = ∑ f x )  X ∈x  C.d.f= Cumulative density function (Continuous Random Variables) Data Analysis Formulas  f( )=P (≤x =) ∑ P (=y ) ∑ f (y)  Can only be less than. If question asks for greater than, than use the below equation:  P(x>n =1−P (x≤n )  Ex: Derivative of cdf= pdf dF x = f(x) dx b ∫ f(x)dx=F (b)−F(a)  P.d.f= Probability density function (Continuous a Random Variables)  f(x)≥0,x∈S x  ∫ f( )dx=1 b  P(a≤ x≤b = ∫ f(x)dx a  Ex: x+1,−1≤x≤0  LetF(x)={ 1−x,0≤ x≤1 Midterm 2  For r.v. X: E(x)=ExpectedValue/Average  n  E(x)=∑ xf (x) for discrete case x ∞ E(x)=∫ xfx dx  −∞ for continuous case E(ax)=a∗E (x)  E(ax+b =a∗E (x+b  Data Analysis Formulas  E (ax +b )=a∗E (x2)+b 2 n 2  E (x = ∑ x f (x) for discrete case x ∞ 2 2  E (x = −∞ x f x )dx for continuous case  For r.v. X: Var (x=E (x2)−μ 2  Var (x=E [(x−μ) 2]  2 2  E (x >μ 2 2  E(x )≠ μ 2  Var (ax+b =Var (ax)=a Var(x) Var (2x)=4Var (x)  Var (x+x )≠Var (x)+Var (x)  Var −x =Var (x)   Covariance and Variance of Sums of Random Variables  If x and y are independent then the joint pdf of x and y: F (x ,)= fx(x) fyy ) Co (x,y)=E (xy−μ μx y  x∧yareindependent whenCo (x ,)=0  E (x ,)=μ μ  x y ¿general :   Var (x+y =Var (x)+Var (y+2Co (x,y ) 2 2  Var (ax+by =a var (x)+b var y +2ab∗Co(x,y)  When independent: Var (x+y =Var (x)+Var (y)  Data Analysis Formulas 2 2  Var(ax+by =a var x)+b var y )  E xy)=E (x)E (y) E x y2)=E (x2)E(y2)   Chebyshev’s Inequality:  Outer Bound: σ 2 1  P (x−μ ≥k ≤) k2 =P (x−μ |≥kσ ) k2 2 t= k ,k=tσ, σ = 1  σ (tσ) t2  Inner Bound : σ2 1  P (x−μ ≤k ≥)− 2 = P(x−μ ≤kσ ≥)− k2 k Distributions  Bernoulli r.v.- Only two possible outcomes: Success or Failure p,x=1success  b(x)={ q=1−p,x=0 failure x 1−x  f(x)=p q  E x)=p  Var(x)=pq  Binomial r.v.-Repetition of a Bernoulli trial (with success probability “p”) n times. X= number of successes, the we say x has binomial distribution with parameters n and p. X Bin (n, )x=0,1,2,….n  f(x)= n p qn−x  ()x  E x)=np Data Analysis Formulas Var x)=npq   When using Poisson to estimate Binomial (n large and p small), λ=np  Hyper-Geometric  N objects of 2 distinct types, n1 and n2  n1+n2=N  Randomly select n from N (n ≤ N)  Let x= number of type 1 objects selected  x HypGeom (N ,n1,n): n1 n2 (n) n−x  fx = N ( ) n E (x=n n1  (N)  Poisson Distribution  Can be used to estimate a binomial when n is large and p is small. λ=np x fx =e−λ λ ,x=0,1,2…,n  x!  E (x=λ  Var x)=λ  Uniform Distribution  X U a,b ] fx = 1  b−a E (x= a+b  2 Data Analysis Formulas 2 Var(x= (b−a)  12  Normal Distr2bution  X N (μ,σ ) X−μ  ¿standardize=Z>¿< σ Thenusenormaldistributiontable   Exponential−λxstribution  f(x)=λe 1  E(x)= λ 1  Var(x= 2 λ  Lack of memory −P(x>s)=P(x>t+s∨x>t)  Distributions arising from Normal  Chi-squared distributions with r degrees of freedom 2  f(x)=x (r)  E(x)=r  Var(x=2r  See Chi-squared table  T-distributions with n degrees of freedom Tn= Z X n  √ n T issymmetricabout x=0  n  See T-distribution table Midterm 3 Data Analysis Formulas Central Limit Theorem  Sample Mean Distribution σ 2 X N(μ, )  n Y N (n μ,nσ )2  Sampling Distribution from a Normal Population σ 2  X N(μ, n ) 2 2  E (S =σ (n−1 )¿S 2 2 2 =X n−1  σ E (x2=σ +μ 2  ´ 2  X∧S are independentr .v. 2  Theorem:x ,1 ,2x benrandomsample¿N (μ,σ ), Then  X−μ 1. Ζ  σ √n ´ 2. X−μ t S n−1  √n X X i¿´ ¿ ¿2  ¿ 2 2 (n−1 )S 2 3.X n−1 σ 2 ∨X ¿n  Parameter Estimation Data Analysis Formulas  Interval Estimates Using T Table   μEstimation Z.05.645  Case 1: 2  X1,X 2…X ind N(μ,σ ) Z =1.96 .025  Goal: to estimate μ Z =2.326  Point estimator: X .01 σ2  Sampling distribution of the statistic: X N(μ, n ) ´ X−μ Ζ σ  n √  100(1−α )ConfidenceInterval for μ: X ±Z α σ  2 * ) ¿1−α ¿ √ n  OneSidedCI−Upper Bound∨OneSided Lower ConfidenceInterval ´ σ  μ<X+Z α * √ n Lower Bound∨One SidedUpperConfidence Interval  μ>X−Z α σ  * √n  Case 2 (Small Sample Problem- n<30) X1,X 2…X ind N(μ,σ )2 σ isunknown  , μ  Goal: to estimate ´  Point estimator: X 2 X N(μ, σ )  Sampling distribution of the statistic: n Data Analysis Formulas X−μ S tn−1  √n 100(1−α )Confidence Interval for μ:  X ±t α σ n−12 ¿1−α  ¿ * √ n ) )  Case 3: (Large sample problem, unknown distribution) X1,X 2…X ind N(μ,σ )2 σ isunknown,(σ <∞)  , μ  Goal: to estimate X  Point estimator: 2 X N(μ, σ )  Sampling distribution of the statistic: n X−μ S Z  √n X ±Z α S  2 * ) ) ¿1−α ¿ √ n  Finding a Number C:  Based on Cases 1-3, and either μ>c,μ<c αnot α  Use 2 μ>c  If use – μ<c  If use + 2   σ estimator Data Analysis Formulas n−1 )∗S2 n−1 ∗S2 2 <σ < 2  ( X n−1, X n−1,1− 2 2 2 2 (n−1)∗S <σ< (n−1)∗S  ( X2 α X2 α ) √ n−12 √ n−1,2−  Finding a Number V:  If V< σ2 (−1 )S 2  ( X2 ) n−1,1−α 2  If V> σ 2 (−1 )S  ( X n−1,α) Sampling from a finite population  Population: 1, 2,3……N, (ex:N=60)  Number of supporters= n (ex: n=20) n  Randomly pick 1 voter, p(support)= N  Estimating the difference in means of 2 Normal Populations: 2 2  Case 1 ( σ x and σ y are known) 2  X nN ( xσ ) 2  Ym N(μy,σ ) Data Analysis Formulas  Goal: to estimate μx−μ y 2 2 ´ ´ σx σy  Sampling distribution of X−Y N μ(−x , y n + m ) X−Y ´ 2 2 Z  σ x σ y √ n +m 100 1−α )Confidence Interval for μ −μ :  x y 2 2 ´ ´ σx σy  (X−Y± Z α√ n + m ) 2  Case 2 (Large Sample Case, σ is unknown) 2  Xn N ( xσ ) Y m N(μ yσ )  μx−μ y  Goal: to estimate S2 S 2 X−Y N μ −x , y x+ y  Sampling distribution of ( n m ) X−Y ´ 2 2 Z  Sx+ Sy √n m  100 1−α )Confidence Interval for μ −μ :  x y 2 2 ´ ´ Sx S y  (X−Y ± Z α√ n + m ) 2  Case 3 (Small Sample Case, both underlying distributions Normal, variance unknown but equal, σ xσ y ) Data Analysis Formulas 2  Xn N ( xσ ) 2  Y m N(μ yσ )  Goal: to estimate μx−μy ´ ´ 2 1 1  Sampling distribution of X−Y N μ (μx,Spy (n m )  Pooled Sample Variance: 2 (n−1)∗Sx+(m−1 )∗Sy  Sp = n+m−2 X−Y´ tn+m−2  Sp 1 + 1 √n m  100 1−α )Confidence Interval fxr y −μ : 1 1  (X−Y±t n+m−2,∗Sp n m ) 2 √  Approximate CI for mean of a Bernoulli r.v. p,x=1  f(x= {q=1−p,x=0 μ=p  σ =pq  y  Point Estimator: p= n pq  Sample distribution of p N(p, ) n ^−μ2 Z  √σ  100 1−α )Confidence Interval p:r ^ Data Analysis Formulas p±Z α ^q^  ( 2√ n )  ¿maximum valuewhep=.5∧q^=.5  canbeused¿estimaten Hypothesis Testing α Type 1 Error= Type 2 Error = β Find type 2 error by finding Critical Region, then calculating ´ P(Lower Bound<X<Upper bound) by standardizing for Z. H0:Null Hypothesis,StatusQuo,Theclaim,SimpleHypothesis  1. H 1 Alternate Hypothesis ,Challenge¿status quo 0oppositeof H   2.Ways¿testit:  P −value  CritcalRegion  “Confidence Interval”  (Use t, when n≤30 , and σunknown )  (Use Z, when n>30, and σ unknown/known)  P-Value:  The probability that a random sample is at least as extreme H 0 as the one observed, assuming the null hypothesis ( is true. H α  Reject 0 if p-value is less than H 0 α  Can’t reject , if p-value is greater than  Data Analysis Formulas  2 Sided: ´ 2∗P Z >|−μ 0|  σ ( ) √n  ¿ |−μ 0| 2∗P Z < σ  ( ) n √  1 Sided: |−μ 0| P Z> σ  ( ) √ n ¿  |−μ 0| P Z <  ( ) σ √ n  ¿ P(tdegreeof freobsm  X−μ tobs S  √n  Critical Region  If the sample is in the critical region, reject null hypothesis ( H 0 X H0¿ , or if is in CR, reject null hypothesis (  Two Sided: Z ∗S∨σ Z ∗S∨σ 2 2  X>μ 0 √n or X<μ 0 √n Data Analysis Formulas tn−1,α∗S tn−1),∗S X>μ + 2 X<μ − 2  0 √n or 0 √ n  1 Sided: ´ Zα∗S∨σ  X>μ 0 n √ ´ Z αS∨σ  X<μ 0 n √ ´ t(−1),αS  X>μ 0 √n ´ t(−1),αS  X<μ 0 √ n  “Confidence Interval”-  Create confidence interval depending on the case. μ H  If 0 is an element of the interval, can’t reject 0  If μ0 is not an element of the interval, reject H 0 Testing the equality of means of 2 Normal populations 2 2  Case 1 ( σx and σ y are known) Xn N ( xσ 2)  2  Ym N(μ yσ ) H :μ =μ  0 x y  H 1μ xμ y H 1μ xμ y  H :μ ≠ μ  1 x y X−Y´ 2 2 Z  σ x σ y n + m √ Data Analysis Formulas 2 2 ´ ´ σ x σy  Sampling distribution of X−Y N 0,( n + m ) σ 2 σ 2  Case 2 ( x and y are unknown, large sample) X n N ( xσ 2)  2  Y m N(μ yσ ) H 0μ xμ y  H :μ >μ  1 x y  H 1μ xμ y H 1μ x μ y  ´ ´ X−Y Z S2 S2  x+ y √ n m 2 2 X−Y N 0, S x+ Sy  Sampling distribution of ( n m ) σ 2 σ 2  Case 3 ( x and y are unknown, small sample) 2  X n N ( xσ ) Y N(μ ,σ )2  m y 2 2 assumeσ =x y  H :μ =μ  0 x y  H 1μ xμ y H 1μ xμ y  H :μ ≠μ  1 x y Data Analysis Formulas ´ ´ X−Y t 1 1 n+m−2  Sampling distribution of Sp + √n m X−Y ´ tobs  Sp 1 + 1 √n m 2 2 (n−1 ( )xm−1)(S ) y  Sp= √ n+m−2 p−value=p(t n+m−2¿>t obs   Dependent data/ paired study Paired t-Test:  Find difference of both groups of data w =y −x ∨x −y i i i i i  Use the differences as your new data H 0μ d0 H0:μ w0  H :μ <¿>¿≠0 H1:μ w∨¿∨¿0  1 d ´ ´ −μ under H : X−μ t under H : w t 0 S n−1 0 Sw n−1  √n √n Hypothesis tests concerning variance of a normal population X ,X ,…X N (μ,σ )  1 2 n 2 2  1.H 0σ 1σ 2 H :σ <σ 2  1 1 2 2 2  σ 1σ 2 σ ≠σ 2  1 2 Data Analysis Formulas 2 2.under H :n−1)S X 2  0 σ2 n−1 compare Xobss.Xn−1  Hypothesis testing for Proportions  Y and N are given  H0:P=p  H1:P<¿>¿≠ p p= Y  N pq  Under H0:Y Bin(n,p)→Y N (np,npq)→ ^ p N(p, n ) ¿ find p−value:  pq  p N(p, ) n  1. P(Y<¿>y )  P(Y<¿>y±.5 ) P p<¿> y  ( n ) ^−μ Z<¿> pq  √ n ) P ¿  Y N (np,npq) P(Y<¿>y )  2. P(Y<¿>y±.5 )  y−np  P(Z<¿> ) √npq Data Analysis Formulas  Compare 2 proportions H0: 1 =p2  H1: 1 <p2  p >p  1 2 p ≠ p  1 2 y1+y2 y1+y2 1 1 0((n +n ∗ 1− n +n )∗(n)+ n )  1 2 1 2 1 2 under H0:^1−^p2 N ¿ p1−^p2 p−value= Z<¿>  y1+y 2 y1+y 2 1 1 ( (n +n ∗ 1− n +n )∗(n) +n ) √ 1 2 1 2 1 2 Chapter 9- Regression  x1,x2,…xr(independentvariable,inputsvara)→y(dependent variable,outputvariable) y=β +β x +β x +…+β x +ε  0 1 1 2 2 r r ε=randomerror ,E(ε)=0  If r>1,multivariateregression  r=1,simpleregression  H0:β=0  (No regression) H1:β≠0   Least square estimators of regression parameters  r=1, y=α+βx+ε  Estimator for α : A Data Analysis Formulas β:  Estimator for B  ^=A+Bx n n 2 2  Sum of Squares (S.S.)= i=1yi−^yi= i=1(yi−(A+B x i)  A=´y−B ´x n ∑i=1iyi−nx´ y B= n  2 2 ∑ xi−n´ x i=1  Distribution of the estimators (B- sampling distribution of β) y=α+βx+ε  2  Assume: ε N (0,σ ) 2 2 y N (α+βx,σ =y i α+(x ,σ i )  Distribution for x ¿ (¿i¿−x´)∗yi n 2 2 ∑ xi−n´ x  i=1 n ∑ ¿ i=1 B=¿ E B =β  2 Var B = σ  ∑ X −n´ x2 i 2 0,σ  ε N ¿ ) Data Analysis Formulas n 2  S.SR= ∑ (i− (+B x i) i=1  Residuals: εiy i(A+B x )i n 2 i=1(i− (+Bx i) 2  2 X(n−2) σ S.SR 2  σ2 X(n−2) E S.S R =n−2  ( )σ2 S.S σ = R  n−2 B−β 2 standard deviationof)Btn−2  σ √ Sxx B−β (standard error of Bn−2  S.SR (n−2 )S √ xx  CI for β : SSR  B±t n−2,√(n−2)SS X 2 t−stat= coefficient  standard error  UNDER ANOVA,  Regression=SSy−SS R Residual=SSR  Total=SS  y Data Analysis Formulas The coefficient of determination and the correlation coefficient n n Sxy ∑ (x iX)(y −i)=´ ∑ xiyi−n XY  i=1 i=1 n n S = (x −X) =2 x −nX ´2  xx i=1 i i=1 i n n 2 2 2  Syy ∑ (y i´y) = ∑ yi−n´y i=1 i=1 Sxy B=  S xx A=´y−B X ´  2 S xx −yy xy  SSR= S xx 2 SyySS R SS R Coefficient of determination=R = ∨1−  S yy Syy S isthetotalerror  yy SSRistherandomerror  S −SS istheerror thatcanbeexplainedbytheinputvariable  yy R Analysis of Residual  If residuals aren’t randomly distributed (positive and negative) data are not fitted with regression line.

×

×

### BOOM! Enjoy Your Free Notes!

×

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

Steve Martinelli UC Los Angeles

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

#### "I made \$350 in just two days after posting my first study guide."

Bentley McCaw University of Florida

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!
×

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com