### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning CS 5350

The U

GPA 3.78

### View Full Document

## 25

## 0

## Popular in Course

## Popular in ComputerScienence

This 8 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 5350 at University of Utah taught by Staff in Fall. Since its upload, it has received 25 views. For similar materials see /class/229986/cs-5350-university-of-utah in ComputerScienence at University of Utah.

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15

Machine Learning CS 5350CS 6350 4 Mar 2008 Clustering Now begins our foray into unsupervised learning Most unsupervised learning problems deal only with input data typically considered as a matrix X E RNXD over N data points and D dimensions There are no labels Clustering is the problem of grouping data according to similarity Typically one uses a distance metric to judge similarity though this is not always required A metric d must satisfy 0 dz y 0 iff z y 7 isolation o dz y dyz 7 distance is symmetric o dz y dy 2 2 dz 2 7 triangle inequality lf 1 doesnlt hold it is a pseudometric All of our favorite norms are metrics Nonreal metrics include things like string edit distance Basically two types of clustering o Hierarchical 0 Flat Hierarchical tries to build a tree over the data N leaves Typically binary Flat just tries to nd K clusters of the data Hierarchical Clustering Bottomup strategy 0 Put every element into its own cluster 0 Merge two most closest clusters o If gt 1 cluster exists go to 2 How to measure closest For individual data points just use the metric d What about dzl 12 13 14 I5 0 Minlink aka single link 1375 Eggsesdwgyz o Maxlink aka complete dRS max SdzRzs xRERxSE o Averagelink dRS Z damn xRERxSES Machine Learning CS 535008 6350 2 These have very different properties Minlink results in chaining Maxlink results in very round clusters Averagelink is an interme iary Topdown strategy 0 Put every element into a single cluster 0 Remove the outsider or outsiders from the group o If a cluster of size gt 1 exists go to 2 Same story as bottom up with linkage Here if we remove a single element we get a funny tree If we want to split clusters in half it can be very computationally expensive Typically bottomup is prefere Flat Clustering Want to break N elements into K groups How to choose the groups 0 Want high intercluster similarity 2k Enfmek dzn 1m should be low 0 Want low intracluster similarity Ek c Enek Emek dznzm should be high Too hard to optimize a linear combination of these there are 2k 71K k 11kN possible clusters for N 19K 4 this is over 1010 ldea propose clusters then iteratively x errors 0 lnitialize cluster centers 0 Assign each data point to the closest center 0 Recompute the centers as the means of the data points o If assignments have changed go to 2 This is the kmeans algorithm It is highly sensitive to initialization but converges quite quickly There are various ways to speed it up as well lnitializing means 0 Random points drawn from a reasonable input space 0 Random data points 0 First a random data point then a second data point as far away as possible then a third data point again as far away as possible etc In general its a good idea to do a few runs of kmeans kmediods is similar but forces each centroid to be a data point This is advantageous for nonreal valued data Machine Learning CS 535008 6350 Choosing the number of clusters 0 Prior knowledge of the domain 0 Visual inspection in low D 0 Try a bunch and use them somehow 0 Look for the elbow in the k versus interracluster similarity Machine Learning CS 5350CS 6350 15 Feb 2007 Clustering Now begins our foray into unsupervised learning Most unsupervised learning problems deal only with input data typically considered as a matrix X E RNXD over N data points and D dimensions There are no a els Clustering is the problem of grouping data according to similarity Typically one uses a distance metric to judge similarity though this is not always required A metric 51 must satisfy 1 dx y 0 iii 95 y 7 isolation 2 dx y dy x 7 distance is symmetric 3 dx y dy 2 2 dx 2 7 triangle inequality lf 1 doesn t hold it is a pseudometric All of our favorite norms are metrics Nonreal metrics include things like string edit distance Basically two types of clustering l Hierarchical 2 Flat Hierarchical tries to build a tree over the data N leaves Typically binary Flat just tries to nd K clusters of the data Hierarchical Clustering Bottomup strategy 1 Put every element into its own cluster 2 Merge two most similar clusters 3 If gt 1 cluster exists go to 2 How to measure similar7 For individual data points just use the metric d What about simx1 952 953 954 955 l Minlink aka single link SimR S xReIialglsesdxR x3 2 Maxlink aka complete simR S xRem zajgesdQR x5 3 Averagelink 1 simR S W ERESESdxR x5 Cl ustering 2 These have very different properties Minlink results in chaining Maxlink results in very round clusters Averagelink is an intermediary Topdown strategy 1 Put every element into a single cluster 2 Remove the outsider or outsiders from the group 3 If a cluster of size gt 1 exists go to 2 Same story as bottom up with linkage Here if we remove a single element we get a funny tree If we want to split clusters in half it can be very computationally expensive Typically bottomup is prefere Flat Clustering Want to break N elements into K groups How to choose the groups 1 Want high intercluster similarity Eh Enmek dxn 95m should be low 2 Want low intracluster similarity Ek d Enek Emek dam 95m should be high Too hard to optimize a linear combination of these there are Ek7lK kIkN possible clusters for N 19K 4 this is over 1010 ldea propose clusters then iteratively x errors 1 lnitialize cluster centers 2 Assign each data point to the closest center 3 Recompute the centers as the means of the data points 4 If assignments have changed go to 2 This is the kmeans algorithm It is highly sensitive to initialization but converges quite quickly There are various ways to speed it up as well lnitializing means 1 Random points drawn from a reasonable input space 2 Random data points 3 First a random data point then a second data point as far away as possible then a third data point again as far away as possible etc In general it s a good idea to do a few runs of kmeans kmediods is similar but forces each centroid to be a data point This is advantageous for nonrealvalued data Cl ustering Choosing the number of clusters 1 Prior knowledge of the domain 2 Visual inspection in low D 3 Try a bunch and use them somehow 4 Look for the elbow in the k versus interracluster similarity Machine Learning CS 5350CS 6350 09 Oct 2008 Maximum margin classi ers Large margin principle want 11 that separates the classes maximally We assume that 11 separates the data and achieves a functional margin of at least 1 That is 39me b 2 l for all positive m and 39me b S 7l for all negative m We de ned the geometric margin based on the normalized weight vector 7 minn yn ulmn b where u 39w Compute margin as a function of normalized weight vector u for positive point and negative point LE 1 7 ulmb7ulm 7b 1 w T w T 7 77 7 a 77 a 2 llwll llwll 1 T T 7 7 w a 7w a 2Hle l 7 l llwll This shows that the margin is inversely proportional to the norm of the weight vector and independent of the bias Moreover having a large margin is equivalent to having a small weight vector norm why does this make intuitive sense Now we write the learning problem as an optimization problem 1 2 minimizew E subject to yn wlmn b 7 l 2 0 Vn Now we need to gure out how to solve this Enter convex optimization and Lagrange theory lntroduce Lagrangemultipliers ozLN one for each constraint Leads to Lagrangian 1 2 Lwa E 7 Zanyn wlanrb 71 n Now we want to minimize L39wa with respect to both 39w and a Take derivatives with respect to 39w VwL w 7 Zanynmn 0 n gt39w Z anynmn n m Z M o n So given 01 39w is deterministic plug back in to L 2 Z anynwn 7 2 an on 2 amym7ngtTn b 71 n n m zzawnymwmm 7 zzawnymwmm 7 b zany a 1 7E Z Zamanymynmmln 2 an m n n La Maximum margin classi ers 2 So now just solve 1 m1n1m1zea Zan E Z ynymanamnT17m n nm subject to Zynan 0 n an 2 0 7 Vn Then compute the bias 1 b 77 max men min men 2 ny771 ny71 This leads to a sparse solution most 04 are zero Why The Karush Kuhn Tucker conditions say that at the optimum an ynw1wn b71 7 0 Vn This means that an y 0 only when the point is right on the margin These points are the support vectors Not linearlyseparable data Introduce slack parameters n is how far on the wrong side ynmn is from the margin Then minimizew ll39ww C2577 subject to yn 39men b 715n 2 0 7 Vn 5n 2 0 7 Vn High C means t data while low C means have a large margin Following the same dual formulation7 we get 1 T T Lwbsargt 7 5w w 0257 7 gun yn w M712 717577 7 nsi Differentiate VwL w 7 Zynanmn 0 7w Zynanwn VbL Zynan 0 VgnLC7an7rn0 Thus 1 T 110417575705 r 7 anynymanamwn mm Which is the same as before7 but now we have constraints an 2 0 and rn 2 0 and C 7 an 7 rn 0 This means 1 an S C 2 if rn 0 then an C Geometrically7 support vectors are now also the noisy points Why large margins Because they mean simple solutions

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.