ST Prog Analy &Mechanization
ST Prog Analy &Mechanization CS 591
Popular in Course
Popular in ComputerScienence
This 15 page Class Notes was uploaded by Trent Dare on Wednesday September 23, 2015. The Class Notes belongs to CS 591 at University of New Mexico taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/212213/cs-591-university-of-new-mexico in ComputerScienence at University of New Mexico.
Reviews for ST Prog Analy &Mechanization
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/23/15
Decision trees and empirical methodology W Sec 43 5I 54 Review Want to find replicate target function f 0 Candidates from hypothesis space H 0 best candidate measured by accuracy for the moment 0 Decision trees built by greedy recursive search 0 Produces piecewise constant axisorthagonal hyperrectangular model 0 Can handle continuous or categorical attributes 0 Only categorical class labels 0 Learning bias for small well balanced trees Splitting criteria 1 0 What properties do we want our getBestSplitAttribute function to have 0 Increase the purity of the data 0 After split new sets should be closer to uniform labeling than before the split 0 Want the subsets to have roughly the same purky 0 Want the subsets to be as balanced as possible 0 These choices are designed to produce small trees 0 De nition Learning bias tendency to nd one class of solution out of 3in preference to another Entropy O We ll use entropy 0 Consider a set of truefalse labels Y true false false true 0 Want our measure to be small when the set is pure all true or all false and large when set is almost split between the classes 0 Expresses the amount of information in the set 0 Later we ll use the negative of this function so it ll be better if the set is almost pure Entropy cont d 0 De ne class fractions aka class prior probabilities true examples ptrue total of examples false examples pfalse Z 1 ptrue total of examples 0 De ne entropy of a set ptrue 1nptrue pfalse 1npfalse 0 In general for classes Cl Ck k EY 21907 lnp0 a The entropy curve Entropy of pX1 px o 0 o 0 0 0 3 00 L O I CID l 0 L CD 0 01 02 03 04 05 06 07 08 09 1 DX lEntropy of a split LN 07 A split produces a number of sets one for each branch 0 Need a corresponding entropy of a split ie entropy of a collection of sets 0 De nition entropy of a split ESphta EYCL17 YCL27 quot7 Yakgt k EEOm H Information gain 0 The last easy step 0 Want to pick the attribute that decreases the information content of the data as much as possible 0 QWhy decrease 0 De ne gain of splitting data set XY on attribute a gainX Y a EY EksplitX Y a Z ZEYai a Final algorithm 0 Now we have a complete alg for the getBestSplitAttribute function Input InstanceSet X LabelSet Y Output Attribute baseInfoentropyY foreach a in Xattributes XlXkYlYksplitDataXYa gainabaseInfo splitEntropyY1Yk return argmaxgain DTs in practice 0 Growing to purity is bad over tting DTs in practice 0 Growing to purity is bad over tting A o 39 x o x x x 0 x x o x 39 o 3 xx x x O X 0 x x x x 3 xx 8 0 x2 sepal width 3 O A x 39 petal length DTs in practice 0 Growing to purity is bad over tting A 1 39 I39quot393913 T39 Iquot quotquotquotquotquotquotquotquotquotquot quot wicllth we 5 035 x2 sep 0 o 639 r X x xi A k N 39 Reta length DTs in practice Wg to purity is bad over tting O Terminate growth early 0 Grow to purity then prune back 0 Multiway splits are a pain 0 Entropy is biased in favor of more splits 0 Correct w gain ratio DTs in practice I lmng to purity is bad over tting l O Terminate growth early 0 Grow to purity then prune back 0 Multiway splits are a pain 0 Entropy is biased in favor of more splits 0 Correct w gain ratio 0 Realvalued attributes 0 rules ofform if x1lt34 0 How to pick the 34