### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning CS 5350

The U

GPA 3.78

### View Full Document

## 31

## 0

## Popular in Course

## Popular in ComputerScienence

This 6 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 5350 at University of Utah taught by Staff in Fall. Since its upload, it has received 31 views. For similar materials see /class/229986/cs-5350-university-of-utah in ComputerScienence at University of Utah.

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15

Machine Learning CS 5350CS 6350 05 Apr 2007 Bayesian inference H The key problem with the good Monte Carlo techniques is that they require a proposal distribution 11 that is good everywhere What we d like to do is to have a proposal distribution 11 that is good locally Of course the question is locally to what In Markov Chain Monte Carlo techniques we maintain a random walk of parameters over parameter space That is instead of drawing a bunch of samples 91gt92 9R independently from a proposal distribution we will draw 90 conditioned on 9 quot This introduces a few new problems samples are no longer independent but will enable us to de ne a better proposal distribution MetropolisHastings Let 119 l 9 be a proposal distribution MH works as follows 1 Initialize 90gt 2 Forr1R71 a Draw 9 according to 119 l 90 b Compute acceptance probability 7 lowMW l 9 a mm 1 P9T II9 l 60gt c If accepted set 90 to 9 otherwise set to 90quot The idea behind the acceptance probability is as follows We want to move to 9 if it has high p0 probability we want to stay in 9 T if it has high p0 probability If 11 unnecessarily favors 9 we don t want to move there if it unnecessarily favors 90quot we want to move away Gibbs Gibbs sampling is a bit different from other sampling algorithms we ve talked about First it only works in very particular cases Pretty much if you ve constrained yourself to conjugate distributions then it works It also only works for multivariate distributions Let s say our parameters are a vector 9 lt91 9Dgt We ll write 90 for 9 without position d namely 6701 lt91 i i 6d7136d15 i i a 9D Now we have to assume that we can directly sample from the distribution p9d l 970 for all d While this seems strong it s actually not that unheard of We ll shortly see an examp e The Gibbs sampler works as follows 1 Initialize 90gt 2 Forr2R a Set 90quot 90quot Machine Learning CS 535005 6350 2 b For each dimension d i Sample 95 Np9 9823 Latent Dirichlet Allocation We ll now explore one particular model LDA LDA is a probabilistic model of text It posits that a document is composed of a mixture of topics Eg we might have a sports topic an entertainment topic a tech topic and a science topic Something about blueray might be a mix of tech and science The model is formally speci ed as follows We have a vocabulary over V words We have K topics7 For each topic k there is a multinomial k over V The s have a symmetric Dirichlet prior with concentration 7 The corpus is over D documents Each word 1103 in document 51 is assigned a discrete latent variable 205 that speci es which topic 1 this word comes from Documents themselves have a mixture parameter 9 that is a Kdimensional multinominal with symmetric global Dirichlet prior with concentration 1 There generative model is as follows 1 Choose oz N Hni010 2 For each topic k lK a Choose k N DiT7 7 3 For each document 51 1 D a Choose 90 N Di39ra a b For each word 71 lN i Choose topic 2017 N Multwd ii Choose word wdn N Mult zdn Or written as a hierarchical model oz N Hni010 kln NDiwam 90 oz N 397 1 a 1 2d 1 Muzaegj 110171 l de N Multhdn It turns out we can construct a simple Gibbs sampler for this model A useful fact for Gibbs samplers is that if we have a graphical model then p9d 970 only depends on the Markov blanket of d The Markov blanket contains d s parents d s children and the parents of 51 s children Using the Markov blanket we can easily see that oz depends only on the 9s k depends only on 7 the ws and the 2s etc We get the following Gibbs updates Machine Learning CS 535005 6350 pa l 7 a Hmla l 010HDiT9d l a pwk l k DiTWIc l mgdHVlul wdn l kfzd c p9d 7 9d owed a H Anxtumzd 4 MM l Zulu Mqun l 9Multhn l an Now we can apply conjugacy to obtain expressions for most of these MM 51c Di k l 77 Zzlzdn1a 17 Zzlzdn d n d n p9dl 79d Di39rt9d l 01 lend1iua lend 71 n For handling 20 we can just compute probabilities for all possible values and sample 2 is discrete For a we can use numeric optimization techniques Machine Learning CS 5350CS 6350 0 Mar 2007 Sequence labeling We now return to supervised learning for a bit However we are interested in solving problems whose output has structure We will begin with the simplest nontrival structure sequences There are lots of problems from langauge and biology that t in this paradigm Sequence labeling is the following problem we get a sequence of M inputs and want to produce a sequence of M labels Training data will be N such inputoutput pairs Example Thedet mannoun ateverb adet sandwithnoun withprep mayonoun77 We get the sentence and want to produce the part of speech labels We will usually talk about a label set 3 of size L Several nonoptions 1 Treat it as a classi cation problem into ML classes Doesn t work 7 too many classes M not constant 2 Treat as M separate Lclass classi cation problems Works okay but doesn t re ect structure eg unlikely to have verb follow det We adopt a Markov model This model makes two assumptions 1 The value of the word depends only on its label and not on surrounding labels words 2 Conditioned on the K previous labels the current label is independent of all others Number 2 is called a Kth order Markov assumption The generative model looks something like 1 Choose a label yl 2 Generate input 951 conditioned on yl 3 Form2M a Choose a label ym conditioned on ym1 ymK b Generate input 95m conditioned on ym For simplicity we will stick with K l the extension is straightforward For this we need three ingredients 1 Initial state probabilities 7r This is a vector of length L all positive with sum 1 These are used to choose yl 7r is the probability that yl l 2 Emission probabilities This is a matrix of size LtimesV where V is the size of the output vocabulary is the probability of emitting word 1 given that the label is l 3 Transition probabilities a This is a matrix of size LtimesL where am is the probability of transi tioning to state 1 given that we were in state p p77 for previous Machine Learning CS 535005 6350 2 Putting this all together the joint probability of an input sequence x and state sequence y both of length M is M Ma l 7quot 0 Wm ymm H aymimym ymxm m2 It becomes frustrating to have to separate out the rst element all the time So what we ll do instead is de ne yo to be a state with a unique label 2 and de ne Ozzy 7r Then we get M pxay l 0 H aymiuym ymxm m 1 1ym H MW Haiffm F l 1 17 1 II 3 Given data we can estimate oz and exactly as in the naive Bayes classi cation model 7 of times 1 follows p am 7 of times p occurs B of times 1 has label 1 Zn of times 1 occurs We can generalize this to a simple naive Bayes model over emissions rather than the label emits word77 part le if 95m itself is a bunch of features 95ml meD then be becomes a vector of length D instead of V but everything else remains the same The key problem in such models is at test time given a word sequence as nd the best label sequence y This question is somewhat illposed there are many possible interpretations of best77 1 Best output sequence y that maximizes px y oz 2 Best output sequence y that maximizes Hmpx ym oz Typically people consider the rst because it is parsimonious This problem is most easily considered by picturing a lattice of possible states with connections between adjacent time steps We ask ourselves what is the probability that the most likely path will pass through state 1 at time m We store this as Amjl leaving o conditioning on oz and we have Am yad1py1milax1milaym 1 We can compute this probability recursively Machine Learning CS 535005 6350 Am1l Ewpwm 96m ym l max maxpy1mi1x1mi1ym pxmym 1 yumii 17 grinaximgxpy1mi1x1mi1ym ppxmym l l y1m71x1m71ym p yglaximgxpy1mi1x1mi1ym ppxmym l l ym p my yrlpaxlpy1m71z1mi1ym p pxmym l l ym p my Ammpwmym l l ym p my Amppym l l ym ppxm l ym 1 mlquot Amypapyl lyrm This analysis leads to an efficient 9ML2 dynamic programming algorithm for nding the most likely state sequences The only extra thing we need to do is for each position m l we need to store a backpointer to where we came from 7 ie to the value of p that maximized the produce We store these in a Q matrix 1 Initialize A0 1 and A0 for l y 2 remember 2 is our unique start state 2 Form0uiM a Compute Am1jl maxp Amypapyl ljim b Store ltm1yl argmaxp Amypapyl ljim Now all we need to do is extract the most likely state sequence from this lattice We do this by starting at the end and working our way backward Lei rst we compute the most likely state to be in at time M 1 That is argmaxl AM1jli Call this 71 Now we recurse backward The label for time step M is M1W This gives us yM The label for time step M 7 1 is MyyM This gives us yM71 In general we compute ym ltm1ym1

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.