### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning CS 5350

The U

GPA 3.78

### View Full Document

## 19

## 0

## Popular in Course

## Popular in ComputerScienence

This 43 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 5350 at University of Utah taught by Staff in Fall. Since its upload, it has received 19 views. For similar materials see /class/229986/cs-5350-university-of-utah in ComputerScienence at University of Utah.

## Popular in ComputerScienence

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15

Machine Learning CS 5350CS 6350 08 Apr 2008 Gaussian mixture models We can think of kmeans clustering in a probabilistic framework Suppose that we have a Gaussian centered at each of the means then we can get the probability of the data set as P11N l CLN HNUMIn l MGMJQI Here Mk is the the mean of cluster 16 For now we assume common variance We can think of the clustering problem as trying to nd good Mks Change of notation Instead of an being the cluster for data point n let 2 E 0 lK be an indicator vector for data point n le Zak 1 if In is in k and 0 otherwise Model Generate each data point by rst choosing among one of k clusters each with probability Wk Then generate the data point by a Gaussian centered at Mk ln equations P11N721N1K l 119027 mgV0417 l Magnlzm n k 2 d 71 2 1 2 2 HH MQWWH eXP jllxn Mkll n k 20 From this we get the likelihood of the data by summing over the unknown 2s m WW 2 HH mew exp MM 7 MHZ I1N1K n k H E H7rk27r02d 1Zexp 72 m 7 MHZ n 2 k So now we follow our standard recipe of taking logs and derivatives 1 M logpz l Ma2 7r ZlogZ H Wk27ra2d 12 exp 7 7 Mkl n Zn k But at this point we get stuckl If we knew 2 we could do this easily logpzz l Ma27r Zszk logwk logN0rzn l 116020 n k We call the value with 2 the complete log likelihood77 and the value without 2 the incomplete log likelihood77 Machine Learning CS 535008 6350 2 The idea for clustering with GMMs is the same as for kmeans We will make an initial guess at z and then try to iteratively re ne it This turns out to be a special case of the expectation maximization algorithm which we will discuss shortly in more generality ln kmeans we made hard guesses at the clusters the 2 vector we considered had a one in a single location and zeros everywhere else In Gaussian mixture models we make soft guesses The 2 vector will satisfy Zak 2 0 for all n k and 2k Zak l for all n Thus its a probabilistic guess at the clustering Given some setting of M and 02 we can make guesses at 2 by just looking at their expectations Em 1 02W2nk 1 X pznk 1 l Int03927r 0 X pznk 0 l Int03927r PZnk 1lxnvaUQv 7 MM l Zak 17M702Pznk 1 l 7r 7 Ekpzn l 2mg 1Ma2pznk 1 l 7r Norm l man 7 EkN0TIn l Mk7027Tk These expectations give us a soft clustering for each data point into each of the k clusters Now using these guesses we want to maximize the complete data log likelihood with respect to M and 02 To do this we take the gradient of the complete likelihood with respect to 7r M and 02 We do 7r rst For this we actually need to introduce a Lagrange multiplier to ensure that the constraints on 7r are satis ed Wk 2 0 and 2k 7 1 This gives us an augmented likelihood function of Zszklogwk 7A Ewk 71gt n k k We differentiate this with respect to 7 to get k 2 AZznki7rk0 Summing over all K we get that A En 2k Zak N so Wk n This makes intuitive sensel Next well take care of Mk These are somewhat easier since we don7t need to worry about constraints so there are no Lagrange multipliers Machine Learning CS 535008 6350 Here7 we take the gradient of the complete log likelihood With respect to Mk VM logpzz 1 702770 szvw logVaduz 1 167021 n 1 2 szkv k Mkll n 1 i Mk n We equate this to zero to give 1 szgwk 7 In 0 n gt Z Zn Lk in n n mk n n 2m at 7 n n Znk Again7 this result is intuitive Finally7 we deal With 02 v02 Iogmw Wm 22mm logWW mm n k 7d 2 1 2 Znkvaz 710 Him 7 MN 2an i i MM 7 W n k 202 204 We set this equal to zero to obtain d Zzzn Zzzn llxn l cl n k n k d 1 gtdN02 Zszk n k 1 gta2Wnkznz lIn MkllQ lzn MkllQ lIn MkllQ Machine Learning CS 535008 6350 4 Putting this all together7 we obtain the following algorithm 0 Initialize cluster centers MLK 7r and a2 o lterate T times H 7 Compute expectation of 2 variables by E 2 Z N0rzn l Mk702wk 172 l E M U n k EkNOTIn l Mk703927Tk 7 Compute new values of 7r7 M a2 by 1 m 7 N Mk 2amp3 n n 271 k l 2 2 a WZngtkllrni 4kll One can obtain a more general solution7 where we use full covariance matrices andor clusterspeci c covari ance matrices Machine Learning CS 5350CS 6350 28 August 2008 Information Theory Information theory is the study of the transmission of bits across a noisy channel Suppose I have an alphabet A B CD and I want to encode it in binary I can use Now suppose someone told me that in the messages I send A is more common than the others opltAgt oplt0gt opltBgti opltDgt Can we do better Sure 0A lt0 0 C7lt110gt o B 10 0 D 111 This is clearly unambiguous and gets us an average of 175 bitscharacter The minimum number of bits is the entropy HX 7 ZMX I10g2PX 1 Zero entropy means deterministic high entropy means close to uniform HYIX z is the number of bits needed to send Y given that both the sender and recipient knew X 1 Speci c conditional entropy It s the same as entropy but computed only over data points where X z The full conditional entropy HYIX is the expected speci c conditional entropy HYIX ZNX IHYIX 1 Information gain GOIX is i must send Y 7 how many bits would I save if both ends knew X CYIX HY 7 HYlX Intuitively X has a high information gain with respect to Y if knowing X it takes many fewer bits to transmit Y A few random notes for IG with respect to decision trees 0 We usually don7t need to actually compute HY because its a constant for all features 0 It doesn7t matter what log you use so long as you re consistent its just a multiplicative constant Structured Prediction Hal Daum III 085350 Machine Learning 02 Dec 2008 CS 53506350 Hal Daum I U Utah Structured Predlctlon 1 23 Problem formulation lnformally given 1 A bunch of inputs 2 Correct corresponding outputs Induce a function that maps novel inputs to corresponding outputs Formally given 1 An input space X 2 An output space 3 that is hairy 3 A distribution D over X gtlt y 4 A loss function 631 gtlt 3 a Rt Induce function f X a 3 with low expected loss with respect to D CS 53506350 Hal Daum I U Utah Structured Predlctlon 323 What are complex structures Google nus le has been summallcallv lralvslzlad 1mm Alzhtc Huscav stxessed Luna against Iran an 1 nuclear prnarmn it called auuun rorawu Dummy m mm mm mh m mt Cunveuely Mm maul m mums ll gjwt I44 quot3 q hum up 29ml icloEnglx 1mm Arab sh BETA v Tranllma S v NP N RN39N P DT NN V BD D T JJ The man ate a tasty sandwich CS 53506350 Hal Daum I U Utah Structured Predlctlon 2 23 Example 1 Sequence labeling The man ate the really tasty sandwich Input DET NOUN VERB DET ADV ADJ NOUN Output Inputs are sequences outputs are equallength seqences of labels drawn from a label set Three approaches gt Independent prediction Ignore the structure and predict each label independently gt Global prediction Learn a score f X gtlt y gt W so that good ys have high score gt Productive prediction Learn a function that generates the output sequentially CS 53506350 Hal Daum I U Utah Structured Predlctlon 423 Independent Prediction 08 53506350 Hal Daume39 III U Utah Structured Prediction 5 23 Independent prediction Enforcing Constraints For inetannn Input George Bush spoke to Congress today Output BPER IPER O O BORG O Must ensure that IPER always follows BPER or IPER gt At test time constraints can be enforced by eg integer programming gt Incurs additional testtime complexity gt See Punyakanok and Roth IJCAI 2005 08 53506350 Hal Daume39 III U Utah Structured Prediction 723 Method 1 Independent prediction 11 12 13 14 15 16 17 The man ate the really tasty sandwich DET NOUN VERB DET ADV ADJ NOUN Input Output Features may depend on any of the input but must decompose entirely over the output T fX y Z wTdgtXyt t1 One may use any classifier for instance gt Logistic regression aka maximum entropy classifiers gt Support vector machines V Pros gt Really efficient at training and test time cs 96an use any off the shelf mwtmliasetelassifier Structured Predtctton 623 Global prediction Goal Learn a score 1 X x y a Rf so that good ys have high score Three issues gt How to represent 1 gt How to learn ffrom afinite sample gt Given fand a new input X how to find best y 08 53506350 Hal Daume39 III U Utah Structured Prediction 8 23 Global prediction function representation Convenient representation y is a graph Define features over the cliques of the graph clgtx c 6 RD Represent fas linear with weights w fX y Z chlgtx c cey CS 53506350 Hal Daume39 ill U Utah Structured Prediction 9 23 Global prediction How to predict Prediction problem given new point x and weights w compute ar max WTCD XC y gm Under what circumstances can we solve this exactly gt Roughly if CD only models local structure gt Formally scales exponentially in the freeWidth of the graph gt For linear chains Viterbi search in time 0TKICI gt For Ising models NPhard graph cut gt For bipartite matchings polynomial time Hungarian algorithm CS 53506350 Hal Daume39 ill U Utah Structured Prediction if 23 Function representation sequence labeling Input Output The man ate the really tasty sandwich DET NOUN VERB DET ADV ADJ NOUN Choose cliques as adjacent labels gt Allowed features gt Is the tag VERB associated with the word ate gt Do we assign DET to a singleletter word gt Does y contain the sequence DET NOUN gt Disallowed features gt Does y contain the sequence DET NOUN VERB gt How many VERBs are there in the label sequence gt Are all instances of the word the assigned the same label CS 53506350 Hal Daume39 ill U Utah Structured Prediction 0 23 Global prediction how to learn I Conditional random fields model outputs probabilisticaly py l X w 20 w exp wTdgtxc cey Zx w 2 exp y ey Z chlgtx 0 cEy See Lafferty McCallum and Pereira ICML 2001 Can be optimized using eg limited memory Hessian methods stochastic gradient descent etc Intuition make correct output look good all others look bad Requires efficient computation of normalizer CS 53506350 Hal Daume39 ill U Utah Structured Prediction 2 23 Global prediction how to learn ll Instead of forcing all other outputs to look bad just look at the best Leads to maximum margin Markov networks ensures large margin 1 2 my Ellwll subject to fXny fxy 2 yny Vny predicted scores cost of error See Taskar Guestrin and Koller NIPS 2002 Looks like exponentially many constraints but can be reduced if clever to depend on the treewidth Technically must introduce slack variables 08 53506350 Hal Daum ill U Utah Structured Prediction l3 23 Can we trade structure for features The man ate the really tasty sandwich DET NOUN VERB DET ADv ADJ NOUN Input Output Idea instead of using VERBDET as a feature just add extra input features and use independent predictions Problem gt This introduces a ton new features whose weights we need to estimate gt Probably don t have enough data to do this reliably gt CRFs are computationally complex but statistically simple IRL is computationally simple but statistically complex Solution gt Structure Compilationquot Liang Daum and Klein ICML 2008 cs 53506350 Hal Daum Ill 0 Utah Structured Prediction 1523 Global prediction Summary Your task gt Represent output structure as a graph gt Define features over cliques Prediction can be solved efficiently for many reasonable problems gt chains trees bipartite matchings graph cuts sort of CRF normalization efficient for many problems gt chains trees M3N contraints polynomial for more problems gt chains trees bipartite matchings 08 53506350 Hal Daum ill U Utah Structured Prediction 1423 Structure Compilation Simple ideaalgorithm gt Train a CRF on your labeled data gt Run this CRF on a large amount of unlabeled data gt Train an IRL on the data labeled by the CRF using more features ORR 1 g H 92 ILRtft c6 lint quot ILRf2 St 5 84 8 3 3 i 76 g r l 92 68 2 4 8 16 32 64128200 2 4 8 16 32 64128200 m thousands m thousands a POS b NER Structured Prediction 1623 CE ffb und the IRL classifier sHeHW bW U ah Productive prediction Striking a middle ground CS 53506350 Hal Daume39 lll U Utah Structured Prediction 17 23 Structured prediction via search Key Idea view prediction task as search gt A path corresponds to a full output gt Each decision is over a small set of options gt Train a classifier to make search predictions How can we train this classifier gt Train on every node in the search space This is effectively what CRFs do gt Train only on the correct path This incurs labelbias problem See Daum amp Marcu ICML 2005 and Xu and Fern ICML 2007 gt Train on the subset of states that we are likely to reach How do we know what states we are likely to reach Chickenandegg problem CS 53506350 Hal Daume39 lll U Utah Structured Prediction 19 23 Productive prediction adding history to classifiers Idea Follow independent classifier methodology but gt Predict variables in a prescribed order gt Allow features to depend on any past decision Input The man ate the really tasty sandwich Output DET NOUN VERB DET ADv ADJ NOUN gt At training time 1 Make a classification example for DET 2 Make an example for NOUN knowing DET 3 Make an example for VERB knowing DET NOUN 4 Make an example for DET knowing DET VERB 5 And so on gt At test time 1 Predict the first label y1 What cou39d go 2 Predict the second y2 knowing y1 3 Predict ya knowing y1y2 Wrong 392 us 4 And so on pleture 39 cs 53505350 Hal Daume39 lll u Utah Structured Prediction 1823 SEARN Searchbased structured prediction Idea Train on the subset of states that we are likely to reach gt Iterative algorithm gt First train on the correct path gt hm gt Then on an interpolation between the best path and hm i 72 gt Then on an interpolation between the best path and hm and M h3 gt And so on Guaranteed to converge in a polynomial number of iterations to a model with regret 1 In T T See Daum Ill Langford and Marcu MLJ 2007 FfDh1ast g 2Tn T avg cmax CS 53506350 Hal Daume39 lll U Utah Structured Prediction 20 23 Experimental comparison 08 53506350 Hal Daume Ill U Utah Structured Prediction 2 23 Summary Problem Learn to predict complex structures from examples Three solutions 1 Enumerate over all possible structures Useful for simple problems or small data sets 2 Ignore structure entirely Useful when structure is not particularly useful Can recover somewhat through structure compilationquot 3 Compute structure piecewise Useful for large data when structure can be sequentialized 08 53506350 Hal Daume Ill U Utah Structured Prediction 23 23 Summary of experimental results Part of Speech Tagging Similar performance speed is differentiator Daum III and Marcu ICML2005 Punyakanok and Roth IJCAI 2005 Namedentity Recognition Independent classifiers worse McCallum and Li CoNLL 2003 Tsochantaridis Hoffmann Joachims ampAltun JMLR 2005 Liang Daum amp Klein ICML2008 Daum III Langford amp Marcu MLJ 2007 Machine translation Independent classifiers amp global impossible Liang BouchardCot Klein amp Taskar ACL 2006 Coreference resolutionclustering Globalpredictive best McCallum and Wellner NIPS 2004 Daum III and Marcu NAACL 2005 Finley and Joachims ICML 2005 Bipartite mathing CRFs impossible M3Ns outperform classifiers Taskar LacosteJulien amp Jordan JMLR 2006 Image segmentation CRFs amp M3Ns state of the art performance Kumar and Herbert NIPS 2004 Wang and Ji CVPR 2005 08 53506350 Hal Daume Ill U Utah Structured Prediction 22 23 Machine Learning CS 5350CS 6350 09 Jan 2006 What is machine learning Trichotemy 1 ML statistics 2 ML optimization 3 ML modeling Three components to any machine learning problem 1 Task A model 2 Experience a data 3 Reward loss Together these are fed into a learning algorithm which produces a solution to the task Examples of machine learning problems 1 Chess lt7 reinforcement learning 2 Medical diagnosis lt7 classi cation 3 Realestate pricing lt7 regression 4 Machine translation lt7 structured prediction 5 Document clustering lt7 unsupervised learning Important realization how to map your problem into a known machine learning paradigm 1 Reinforcement Learning a Agent world reward b Finite horizon 21 Rt c In nite horizon Eil y RU where y lt l d Examples block world chess robotic manipulation taxi driving elevator control etc 2 Classi cation supervised learning a lnput discrete output b lnput is typically RD c Output is either 2 01 K 01K 7 l or 71 1 for convenience What is machine learning 2 d e 9quot a b C d 8 Loss is usually 1W y y 17 6131 where is predicted and y is truth Examples cancer prediction document classi cation image recognition etc Regression also supervised learning lnput continuous output lnput is typically RD Output is either R Loss is usually halfsquared loss 7 m2 Examples Realestate pricing time prediction weather prediction stock prediction etc 4 Structured Prediction also supervised learning 97 E g VV D1 manU VVVV lnput large complex77 output lnput is typically multiple RDs can be considered just one Output is a discrete data structure Loss is task speci c sometimes approximated by Hamming loss 2 1M y y with 239 ranging over parts of the structure Examples sequence labeling syntactic parsing protein secondarystructure prediction machine translation etc Unsupervised Learning lnput is a data set usually RD N Output is a di erent representation of the input Loss is taskspeci c N0 training data Examples document clustering dimensionality reduction feature selection sort of data Visu alization language modeling etc A simple classi cation example Length 179cm 176cm 174cm 169cm 162cm 160cm 156cm 152cm 158cm 155cm 151cm 150cm Type Ski Ski Ski Ski Ski Ski Ski Ski Snowboard Snowboard Snowboard Snowboard Snowboard Snowboard Snowboard Snowboard What is machine learning 3 What rule would you use to make a prediction about type given length Why not just memorize For an example like this it makes sense to consider a linear threshold predictor 7 ski length 2 9 type 7 snowboard length lt 9 1 Task choose 9 This is learning There s no theoretical reason we couldn t have used a quadratic threshold predictor 7 Ski length oz gtlt length2 2 9 type 7 snowboard length oz gtlt length2 lt 9 2 New task choose 9 and a This is also learning Many other options are possible A simple perhaps unrealistic regression example Square footage Price 1340 125 1390 105 1400 130 1420 135 1500 145 1550 160 1700 155 1900 140 2150 130 2300 135 Linear prediction model price m 90 91 X square footage 3 Quadratic prediction model price m 90 91 gtlt sqft 92 gtlt sqft2 4 Kth order prediction model price m 90 91 gtlt sqft 9K gtlt sqftK 5 K m 29k gtlt sqftlC 6 lC0 What is machine learning What K is preferable 1 Too low K lt7 under tting 2 Too high K lt7 over tting Regularizationl Often tuned with crossvalidation development data etc Learning theory Can I guarantee how well I will do in prediction Learning algorithm A takes training data 951 yl i i i xN yN and makes a prediction about it Can we make any guarantees about how well A will do at this task Analyses are statistical what is the probability A will err Or what is its expected error ls there a best A What is this course all about H How to identify learning problems i i How to specify a model i i How to nd good values for parameters in the model i i P97quot How to select good features to enable learning i i 9quot How to evaluate the performance of learned systems i i 6 How to transform learning problems i i gt1 How to evaluate the performance of learning algorithms i i Ha Daum m meha3 name Collaborative Filtering Hal Daum III School of Computing University of Utah meha13name mt UNIVERSITY or UTAH 05 535B Hal Daum mehal3name What do these have in common amazomcum Browse Watch Vnur Navies DVDs Instantly Queue Yam 0 JoinDigg About Login Slide 2 CS 5350 Hal Daum mehal3name NetFlix Problem gt Given N users and D movies with ratings on some small fraction of pairs Predict rating 15 of arbitrary user on arbitrary movie Real goal maximize pro t Approximate goal minimize mean squared error Slide 3 CS 5350 Hal Daum mehal3name NetFlix Challenge gt Beat the current NetFlix system by 10 gt Reward 1m Slide 4 CS 5350 Hal Daum mehal3name Recommendations via Regression 1 5 3 4 2 1 2 2 2 5 1 1 5 5 1 2 4 2 2 3 2 1 3 1 5 4 3 3 3 1 2 1 5 2 2 5 3 3 3 4 1 1 Slide 5 CS 5350 Hal Daum mehal3name Recommendations via Regression 3 1 2 7 1 5 5 1 4 2 2 2 3 5 4 3 2 5 5 3 4 1 What features to make this prediction Slide 6 CS 5350 Hal Dagm l mehal3name An Initial Attempt Slide 7 CS 5350 Hal baggm III mehal3name An Initial Attempt Slide 8 CS 5350 Hal bagum mehal3name An Initial Attempt Slide 9 CS 5350 Hal D39aum mehal3name Initial Predictions Scorendzk Unkvkd U k Vlad 71 wirw 41 MAN Slide 10 CS 5350 Hal Daum mehal3name What39s Wrong What are the right genres How many genres Generalization to DiggAmazonMatchcom How to nd U values Laborintensive creation of V values Slide 11 CS 5350 Hal D39azum mehal3name A More Generic Approach Think of U as a N K matrix quot V as a K gt D matrix ftll 5 7ij 9 quot1 quot39t x mwucgmu A quot Slide 12 CS 5350 Hal Daum mehal3name A More Generic Approach N Scorenldz ZR Unkvkd Score U V Slide 13 CS 5350 Hal Daum mehal3name Instantiated for NetFlix Movies Genres Jsers Score U V Slide 14 CS 5350 Hal Daum mehal3name Instantiated for Amazon Books Genres Jsers Score U V Slide 15 CS 5350 Hal Daum mehal3name Instantiated for Digg Stories To ics Jsers Score U V Slide 16 CS 5350 Men Hal Daum mehal3name Instantiated for Matchcom Women Qualities K 739 K N Score U V Slide 17 CS 5350 Hal Daum mehal3name How many factors can there be gt What happens as K approaches 1 gt What happens as K approaches D Score Slide 18 CS 5350 Hal Daum mehal3name The Linear Algebra Approach gt Suppose X were completely known gt Task Find UV such that XUV and UN have rank K Essentially a singular value decomposition problem Slide 19 CS 5350 Hal Daum mehal3name The Linear Algebra Approach gt Suppose X is not completely known gt Task Find UV such that the observed XUV and UN have rank K gt Difficult optimization problem iterate between finding U and V Slide 20 CS 5350 Hal Daum lll mehal3name It39s all about Regularization gt A solution with KD will be uninteresting Why gt K acts as a regularizer prevents over tting by limiting the complexity of the model Slide 21 CS 5350 Hal Daum mehal3name Infinite LinAlg Matrices gt As K approaches infinity bad things happen gt Need another way of constraining complexity gt Idea don39t let any entry of U or V get too big Results in a semidefinite programming problem using the trace norm Slide 22 CS 5350 Hal Daum mehal3name The Probabilistic Approach gt Pretend X arose because UV were generated from some distribution multiplied and then some noise was added Pr U I PT39UPT39VPXUV PX Find UV that maximize their joint probability Slide 23 CS 5350 Hal Daum mehal3name Regularization as Prior Beliefs gt So long as Pr U and Pr V favor small matrices we39ll be okay PrU le PTquotUPrVpXUIV Slide 24 CS 5350 Hai Daum m mehai name Infinite Probabilistic Matrices Let U be generated by a Beta Process Intuitively follows the Indian Buffet model mm Shde 25 name Hal Daum m mehai3 Infinite Probabilistic Matrices Inference over potentially infinite matrices Possible because only a finite portion is active at anV time Slide 26 1 mm Hal Daum mehal3name Does It Work gt Team BellKor just won 50k using these techniques Slide 27 CS 5350 Hal D aum mehal3name Discussion Recommender systems are everywhere The best use fancy machine learning techniques Applications abound eg in biology You could Win 1m Slide 28 CS 5350

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.