### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Introduction to Artificial Intelligence CSCI 3202

GPA 3.51

### View Full Document

## 47

## 0

## Popular in Course

## Popular in ComputerScienence

This 72 page Class Notes was uploaded by Allie West II on Thursday October 29, 2015. The Class Notes belongs to CSCI 3202 at University of Colorado at Boulder taught by Gregory Grudic in Fall. Since its upload, it has received 47 views. For similar materials see /class/231983/csci-3202-university-of-colorado-at-boulder in ComputerScienence at University of Colorado at Boulder.

## Reviews for Introduction to Artificial Intelligence

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

Support Vector Machine SVM Classi cation Greg Grudie Greg Grudic Intro AI Last Class 0 Linear separating hyperplanes for binary classi cation Rosenblatt s Perceptron Algorithm Based on Gradient Descent Convergence theoretically guaranteed if data is linearly separable In nite number of solutions For nonlinear data Mapping data into a nonlinear space Where it is linearly separable or almost However convergence still not guaranteed Greg Grudic Intro AI 2 Questions Greg Grudic Intro AI Why Classi cation Uncertainty A Computation Classi cation FL C4 Not typically addressed in CS ly Symbols The Grounding Problem DccisionsPlannin g Agent Greg Grudic Introduction to AI 4 The Problem Domain for Project Test 1 Identifying and Navigating Paths Image 1 Doly Mahalanobis Data 9 Comm Classi er Class ier 9242007 Intro AI Path labeled Image 5 Today s Lecture Goals Support Vector Machine SVM Classi cation Another algorithm for linear separating hyperplanes A Good text on SVMs Bernhard Scholkopf and AleX Srnola Learning with Kernels MIT Press Cambridge MA 2002 Greg Grudic Intro AI Support Vector Machine SVM Classi cation Classi cation as a problem of nding optimal canonical linear hyperplanes Optimal Linear Separating Hyperplanes In Input Space In Kernel Space Can be nonlinear Greg Grudic Intro AI Linear Separating HyperPlanes How many lines can separate these points Which line should we use Greg Grudic Intro AI Initial Assumption Linearly Separable Data on 39 o O o 0000 000 0 Greg Grudic Intro AI Linear Separating HyperPlanes on I I I 39 o I 0 IO 0 o I 000 o l O O O W Xbgt0 I O I WX l blt0 Linear Separating HyperPlanes Given data X1y1XNNgt Finding a separating hyperplane can be posed as a constraint satisfaction problem CSP Vi E ndw and 9 such that WXl b2 1 if yl 21 WXl bS l if 321 1 Or equivalently 4 W39 X l 1 Z 0 Vi If data is linearly separable there are an in nite number thyperplanes that satis this CSP Greg Grudic Intro AI 11 The Margin of a Classi er Take any hyperplane PO that separates the data 0 Put a parallel hyperplane Pl on a point in Class 1 Closest to PO Put a second parallel hyperplane P2 on a point in Class 1 Closest to PO 0 The margin Al is the perpendicular distance between P and P2 Greg Grudic Intro AI 12 Calculating the Margin of a Classi er P2 P0 Any separating hyperplane Pl Parallel to P0 passing through closest point in one class P2z Parallel to P0 passing through point closest to the opposite class X1 gt Margin 1M1 distance measured along a line perpendicular to P1 and P2 0 O 0 C30 000 Greg Grudic Intro AI 13 SVM Constraints on the Model Parameters Model parameters wb must be chosen such that for X1 on P1 and for X2 on P2 P1 W X1b 1 ForanyP0these constraints are always o attainable P2 W X2 b l Given the above then the linear separating boundary lies half way between P1 and P2 and is given by W X b O Resulting Classi er j sgn W X 9 Greg Grudic Intro AI 14 Remember signed distance from a point to a hyperplane dxhyperplane CWOX CWOX Hyperplane de ne by C W Greg Grudic Intro AI 15 Calculating the Margin 1 M dP1 2 W XTIJHZ H M dP2x1 W XIIIJVVHZ A 0 dp1x1 W39Xh b1 0 dP2 1 W X Hb 1 Greg Grudic Intro AI 16 Calculating the Margin 2 blwx b l Signed M dP1x2 dP2X1 Distance Therefore wx2blwxlb l Wx 2 NWquot NW 1 Therefore M wx1 2bl Z 2wx1 b1 Z 20 2 NW NW NWquot quotWquot 1 NW Greg Grudic Intro AI 17 Take absolute value to get the unsigned margin M Different PO s have Different Margins P2 P0 Any separatmg hyperplane P 0 Pl Parallel to P0 passing through closest point in one class P2z Parallel to P0 passing through 131 s point closest to the opposite class Margin 1M1 distance measured along a line perpendicular to P1 and P2 0 O O 00 000 Greg Grudic Intro AI 18 Different PO s have Different Margins P2 P0 Any separating hyperplane Pl Parallel to P0 passing through closest point in one class P2z Parallel to P0 passing through point closest to the opposite class Margin 1M1 distance measured along a line perpendicular to P1 and P2 0 O 0 C30 000 Greg Grudic Intro AI 19 Different PO s have Different Margins P0 Any separating hyperplane Pl Parallel to P0 passing through closest point in one class P2 Parallel to P0 passing through P2 point closest to the opposite class Pl 0 O O O O O Margin 1M1 distance measured along 0 O a line perpendicular to P1 and P2 Greg Grudic Intro AI 20 How Do SVMs Choose the Optimal Separating Hyperplane boundary P2 F ind the W that maximizes the margin 0 o O O C O o O 0 margin M O O W distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI 21 i quotWquot SVM Constraint Optimization Problem Given data X1y1XNyN Minimize W 2 subject to yiwxib IZO VilN The Lagrange Function Formulation is used to solve this Minimization Problem Greg Grudic Intro AI 22 The Lagrange Function Formulation For every constraint we introduce a Lagrange Multiplier a 2 0 The Lagrangian is then de ned by 1 N Lltwbagt 5 er 2a x w bl 1 21 Where the primal variables are Mb the dual variables are OarWOW Goal Minimize Lagrangian Wrt primal variables and Maximize Wrt dual variables Greg Grudic Intro AI 23 Derivation of the Dual Problem At the saddle point extremurn Wrt primal 8 59 ab Wba 0 8W Wba 0 This give the conditions N N E aiyi 09 W 2 E aiyixi i1 i1 Substitute into LWba to get the dual problem Greg Grudic Intro AI 24 Using the Lagrange Function Formulation we get the Dual Problem Maximize N 1 N N WW 2109 722090wa Xi 39XJ i1 j1 Subject to 041 Z 0 i lN N Z aiyi 0 i1 Greg Grudic Intro AI 25 Properties of the Dual Problem Solving the Dual gives a solution to the original constraint optimization problem For SVMs the Dual problem is a Quadratic Optimization Problem which has a globally optimal solution Gives insights into the NONLinear formulation for SVMs Greg Grudic Intro AI 26 Support Vector Expansion 1 N W 2 E al yixi i1 y W o X 19 gt1 gt 04 O gt x irrelevant OR y w o X 9 1 On Margin XI Support Vector b is also computed from the optimal dual variables 05 Greg Grudic Intro AI 27 Support Vector Expansion 2 fx sgnltwxbgt N Substitute W E O iylXi i1 OR fx sgnZNozlylXi xb i1 Greg Grudic Intro AI 28 What are the Support Vectors 0 o x 0 39 O O 00 Q O O I o O OO o 321 Greg Grudic Intro AI 29 Why do we want a model with only a few SVs Leaving out an example that does not become an SV gives the same solution 0 Theorem Vapnik and Chervonenkis 1974 Let N W be the number of SVs obtained by training on N examples randomly drawn from PXY and E be an expectation Then E Probtest error 3 E gSV Greg Grudic Intro AI 30 Next Class Nonlinear data Now a quiz Greg Grudic Intro AI 31 Support Vector Machines Greg Grudic Notes borrowed from Bernhard Scholkopo Greg Cvmdm Machine ierrmg Today s Lecture Goals 39 Support Vector Machine Classification A Good text on SVMs Bernhard Scholkopf and Alex SInola Learning with Kernels MIT Press Cambridge MA 2002 Greg Cvnldm Machme iemurrg Support Vector Machine Classi cation Classification as a problem of finding optiInal canonical linear hyperplanes OptiInal Linear Separating Hyperplanes 7 In Input Space 7 In Kernel Space Greg Cvmdm Machine ierrmg Linear Separating HyperPlanes How many lines can separate these points Which line should we use Greg Cvnldm Machme iemurrg R N Linear Separating Hyper Planes 0 000 o O O O wablt0 L x1 Linear Separating Hyper Planes Given data X1y1XNyN Finding a separating hyperplane can be posed as a constraint satisfaction problem Vi 61NN findw such that wai b21 if 1 1 wai 1234 if yl 71 Or equivalently yiwaib7120 Vi Greg Grudre Machine Learmrrg 6 The Margin of a Classifier Take any hyperplane PO the separates the data Put a parallel hyperplane P l on a point in class 1 closest to PO Put a second parallel hyperplane P2 on a point in class 1 closest to P0 The margin is the perpendicular distance between P and P2 Greg Grudre Machine Learning 7 Margin For Canonical Hyperplane Calculating the Margin xltwxgtb1 xlm3xgtb Note Q W5Xlgtl71 3xl yi1 ltw9x3gtl7 l gt ltwyx1 x2gt 2 w 2 gt ltmeltxrxzgtgtm O x d o 52 ltwxgt gwlx 7 HXHxIltXaXgt Greg Grudre Machine Learmrrg a How do SVMs choose the boundary 0 Margin is the distance detween two hyperplanes Find the one with the Maximum MARGIN origin Greg Grudic Machine Learning 9 SVM Constraint Optimization Problem 0 Given data X1y1XNyN 0 Minimize W 2 subject to yiwaib 120 Vi1N Greg Grudic Machine Learning 10 The Lagrange Function Formulation Introduce Lagrange multipliers az gt O and a Lagrangian Lltw b a guww Z a ltwxlgt bl 1 i1 L has to minimized vvrt the primal variables w and b and maximized With respect to the dual variables al 0 if a constraint is violated7 then yz ltWXZ39 b l lt 0 gt Ozz39 will grow to increase L 7 how far W b want to decrease L ie they have to change such that the constraint is satis ed If the problem is separable this ensures that az lt oo o similarly if yi w xi b 1 gt 0 then ozi 0 otherwise L could be increased by decreasing 042 KK T conditions B Scrimkopr szlnxra February 2002 Greg Grudic Machine Learning 11 Derivation of the Dual Problem 0 At the saddle point extremum 3 3 L L ab wba O a wba O 0 This give the conditions N N Zaiyi 09 wzzaiyixi i1 i1 0 Substitute into LWba to get the dual problem Greg Grudic Machine Learning 12 Dual Problem 39 Maximize N 1 N N Wltagt1a7l1alajyy ltxxgt 39 Subjectto a120 i1wN N Zeal yl 0 11 Greg Cvmdm Mm meg B Support Vector Expansion 1 N W Zaixxi i1 yi ltwxigtb gt1 j ozi 0 gt Xi irrelevant OR yl ltWXigt b 1 On Margin Xi Support Vector Greg Cvnldm Machme umng u Support Vector Expansion 2 f sgnltwxgt 17 N Substitute W Zaiin i1 OR f Sgn 04y 17 Greg Cvmdm Mm meg 15 What are the Support Vectors o 0 0 39 O o o 339 o o 0 z 0 o O r O O O o 5393Max1mliied o Why do we want a model With only a few SVs 0 Leaving out an example that does not become an SV gives the same solution 0 Theorem Vannilg and Chervonenlgis 1974 Let SVN be the number of SVs obtained by training on N examples randomly drawn for PXY and E be an expectation Then ESVN E Probtest error 3 N Greg Grudic Machine Learning 17 What Happens When Data is Not Separable Soft Margin SVM Add a Slack Variable i distance to margin otherwise Greg Grudic Machine learning 18 Soft Margin SVM Constraint Optimization Problem 39 Given data x1y1xNyN l 2 N 0 Minimize Ew CZ subject to i1 yiltwaibZl VilN l Greg Grudic Machine Learning 19 Dual Problem Non separable data 0 Maximize N 1 N N 7a 2109 322aio jyiyj ltXi Xjgt i1 j1 39 Subject to 03 ozl g C ilN N Zaiyi 0 i1 Greg Grudic Machine learning 20 0 if xi correctly classi ed Same Decision Boundary fltXgt sgn 2051321 ltXiXgtlb i1 Greg Grndic Machine Learning 21 Mapping into Nonlinear Space lt1 n2 gt n3 9517932 39 gt 21722723 imiAi9319327173 Greg Grudic Machine Learnin g 22 R2 39l 392 input space X ltIgt2RZ R3 Nonlinear x O O 6 Exp feature space D gtlt 39 X w 2 w3 x 2 2 f39sgn w l39lwz 2w3 5 39 I392b I R3 R O X 0 x X X x X X X Greg Grndic Machine Learning 23 Nonlinear SVMs KEY IDEA Note that both the decision boundary and dual optimization formulation rely on dot products in input space only ltxpxgt fltXgt sgn 2051321 XiXgtlb i1 N N N W l a E xi 55 E aiajyiyjltxixjgt i1 i1 11 Greg Grudic Machine Learning 24 Kernel Trick Replace xi 5 X1 with Kltxixj ltltIgtXiltIgtxjgt Can use the same algorithms in nonlinear kernel space Greg Cvmdm Machine ierrmrg 75 Nonlinear SVMs Maximize N 1 N N Waal igzztxlajylyjKltXpXJgt x J 1 Boundary N f SgnZaiyiKXiX b i1 Greg Cvnldm Machme Learrung Need Mercer Kernels KltXiXj ltltIgtltXiltlgtxjgt ltltIgtltXjltlgtxigt KltXjXi Greg Cvmdm Machine ierrmrg 27 Gram Kernel Matrix TrainingData Xpylwaxmyw Kx1x1 Kx1xN K E E KXNX1 KXNXN Progen ies 39Positive De nite Matrix 39Symmetric N 39Positive on diagonal by N Greg Cvnldm Machme Learrung Commonly Used Mercer Kernels 0 Polynomial d KltXiXj ltXiXjgtlC 0 Sigm01d KXXj tanhnxixjgt6 l Gaussian KltXiXj eXp 21j Greg Grudic Machine Learning 29 Soft Margin SVMs C SVM 15 for C gt 07 minimize 1 m we w2 02t i1 subject to yz wxi b 2 1 1 Z gt 0 margin V SVM 55 for 0 g D lt 17 minimize 1 1 Tltw7 7p up 1 subject to yz wxZ b 2 p 1 12 0 margin 2pw B Scholkopf Canbm ra February 2002 Greg Grudic Machine Learning 30 Duals Using Kernels C SVM dual maximize 1 WW 1 012 i2 aiajyiyj Xia Xj subject to 0 S 052 S C39 ozZyz39 0 V SVM dual maximize 1 W101 Zij OtiOtjyz39yjleij subject to 0 g ozl S aiyz 0 ozZ Z 1 In both cases decision function fx sgn aiyiMx xi b B Srh lkupf Canberra February 21U2 Greg Grudic Machine Learning 31 SVM Training o naive approach the complexity of maximizing m 1 m WW 2221 011 52mg OttajyiijXz Xj scales with the third power of the training set size m 0 only SVs are relevant gt only compute kxi xj7j for SVs Extract them iteratively by cycling through the training set in chunks 63 o in fact one can use chunks which do not even contain all SVs 42 Maximize over these sub problems using your favorite optimizer o the extreme case by making the sub problems very small just two points one can solve them analytically 45 Greg Grudic Machine Learning 32 MNIST A SVM Success Story 0 Hhandwrltten h m1 m4 C aracter benc mark 60000 training and 100000 testing 1m HQ mim j uw HE Dimension d 28 X 39m m m m 1 39mm E 28 an mm m MMMME E EHEE E WEWE EMW MM ELM m magma l MMM mm m M mm Greg Grudic Machine Learning 33 Results on Test Data Classi er test error linear Classi er 84 3 nearest neighbour 24 SVM 14 Tangent distance 11 LeNet4 11 Boosted LeNet4 07 Translation invariant SVM 056 SVM used a polynomial kernel of degree 9 Greg Grudic Machine Learning SVM Kernel Model Structure classi cmion f39 sgn Z liktrxi b weights comparison eg k3939vr39d 13939 exp llxltril2 c support vectors I quot1 kvrmnhK3909 inpuI veclorx Greg Grudic Machine Learning 35 Nearest Neighbor Classification and Regression Greg Grudic Notes borrowed from Thomas G Dietterich and Tom Mitchell Notes Downloadable Machine Learning Software Many algorithms studied in this class are implemented in JAVA in the WEKA environment 39 httpwwwcswaikatoacnzmlweka Homework 1 Equations from MATLAB code Due Sept 21 Learning Classification Models Collect Training data Build Model happy ffeature space Make a prediction Q 9 g 994 9 High 3 W MEET e 5 Space 6 9 o 039quot Q quot Binary Classification Learning Data Dimension 1 Dimension 2 x1 X2 y Example 1 095013 058279 1 Example 2 023114 04235 m 1 Example 3 08913 043291 1 Example 4 0018504 076037 1 MultiClass Classification Learning Data Dimension 1 Dimension 2 y X X1 2 Example 1 095013 058279 Example 2 023114 04235 5 Example 3 08913 043291 6 Example 4 0018504 076037 6 Learning Regression Models Collect Training data Build Model stock value ffeature space Make a prediction A R 0quot 53 S tack 1 3 Value 5 is t I 3944 39 Feature Space Regression Learning Data Dimension 1 Dimension 2 x1 x2 y Example 1 095013 058279 022 Example 2 023114 04235 1734 Example 3 08913 043291 501 Example 4 0018504 076037 62 The Learning Data Symbolic Representation of N learning examples of d dimensional inputs x11 le xld yl de yN Nearest Neighbor Algorithm Given training data Xlay1aaXNayN Define a distance metric between points in input space Common measures are Euclidean squared D X Xi 2d xj xi j2 l Weighted Euclidean WJ 2 0 d DXXi ij xj xi j2 J KNearest Neighbor Model Given test point X Find the K nearest training inputs X1XN to X given the distance metric Dxx Denote these points as Xv ylquotquot XK yK KNearest Neighbor Model Regression 1 K y yk Classification 51 most common class in set yl yK Picking K Goal of Supervised Learning 7 Accurate prediction on future data Use N fold cross validation 7 Pick K to minimize the cross validation error 39 For each of N training example 7 Find its K nearest neighbors 7 Make a prediction based on these K neighbors classification and regression 7 Calculate Error in Prediction difference between predicted out and actual out 7 Output average error over all examples Use the K that gives lowest average error over the N training examples Measuring Model Accuracy Regression Assume a set of data X1y1XKyK Regression accuracy of model M Two commonly used metrics 39 Mean Square Error K 1 1 K A errorMm EZIXM MXi2 EZW 7 yif 39 Relative Error K 2 E y M X errarMm 1K 2 got 7 y Measuring Model Accuracy Classi cation Assume a set of data X1 y1w XK yK Classification accuracy of model M 1 K errormx EZlcxi yiM Xi 0 if yi M xi Where C Xi WM Xi i1 otherwise KNearest Neighbor Model Weighted by Distance Regression M DXXkyk 9 2 E DXXk k l w Classification 51 2 most common class in Wieghted set 1 15 Picking w1wd Use N fold cross validation Pick values the minimize the cross validation error This is can be computationally expensive Dimensionality reduction Nearest Neighbor Properties Class Decision Boundaries The Voronoi Diagram l Each line segment is equidistance between points in opposite classes The more points the more complex the boundaries KNearest Neighbor Algorithm Characteristics Universal Approximator Can model any many to one mapping arbitrarily well Curse of Dimensionality Can be easily fooled in high dimensional spaces Dimensionality reduction techniques are often used Model can be slow to evaluate for large training SCtS kdtrees can help Selectively storing data points also helps kdtrees More Recent Optimized NN Searches 0 Cover Trees httphunchnetjlprojectscovertreecovertr eehtrnl Fast for large 1 20 10 CSCI 3202 Introduction to AI Decision Trees Greg Grudic Notes borrowed from Thomas G Dietterich and Tom Mitchell Intro AI Decision Trees Outline Decision Tree Representations ID3 and C45 learning algorithms Quinlan 1986 CART learning algorithm Breiman et al 1985 Entropy Information Gain Over tting Intro AI Decision Trees 2 Training Data Example Goal is to Predict When This Player Will Play Tennis Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Intro Al Decision Trees Decision Tree Hypothesis Space 0 Internal nodes test the value of particular features I and branch according to the results of the test 0 Leaf nodes specify the Class hx Sunny 0v ere 2151 Rain l High Nonnal Strong Weak No Yes No Yes Suppose the features are Outlook 11 Temperature 12 Humidity 13 and Wind 14 Then the feature vector x Sunny Hat High Strong will be classi ed as No The Temperature feature is irrelevant Intro AI Decision Trees If the features are continuous internal nodes may test the value of a feature against a threshold Sunny Overcast Rain gt 75 lt 75 gt 20 lt 20 No Yes No Yes Intro AI Decision Trees 5 Decision Tree Decision Boundaries Decision trees divide the feature space into axis parallel rectangles and label each rectangle with one of the K classes x2 x2lt3 l l 6 xllt4 xllt3 1 1 l 0 0 l x2lt4 l 4 0 0 l 0 l 0 0 2 l 0 l 0 l 0 0 Z 4 6 xl Intro AI Decision Trees Decision Trees Can Represent Any Boolean Function x2 x1 lt05 1 l 0 x2 lt05 x2 lt 05 0 0 l 0 l l 0 0 1 XI The tree will in the WDl St case require exponentially many nodes however Inlro AI Decision Trees Learning Algorithm for Decision Trees SltxlaylgtltXNyNgt lt 3939 1 GROWTREES Jay 5 if y 0 for all x y E 5 return new leaf0 else if y 1 for all Xy E S return new leaf1 else choose best attribute m 30 all xy E S with mj 0 51 all xy E S with mj1 return new nodeav7 GROWTREESQ GROWTREE31 What happens if features are not binary What about regression Intro AI Decision Trees 8 Choosing the Best Attribute A1 and A2 are attributes ie features or inputs Which attribute is best 2935 A1 2935 A2 Number and examples t f t f before and after asplit 2l5 830 1 1833 112 Many different frameworks for choosing BEST have been proposed We will look at Entropy Gain Intro AI Decision Trees Entropy 0 pg is the proportion of positive examples in S 0 p9 is the proportion of negative examples in S o Entropy measures the impurity of S EWWOPZS E P10g2 Pea P9 102 P9 Intro AI Decision Trees 10 Entropy 10 Q E 205 L5 00 0 5 10 P65 0 S is a sample of training examples Entropy is like a measure of purity Intro AI Decision Trees 11 Entropy Entropy8 expected number of bits needed to encode class ea or 9 of randomly drawn member of 8 under the optimal shortestlength code Why Information theory optimal length code assigns logg p bits to message having probability p So expected number of bits to encode ea or 9 of random member of S pe 10g2pe pe 10g2pegt Em I OPMS E pEB10g2p B 199 102 p9 Intro AI Decision Trees Information Gain GamS A expected reduction in entropy due to sorting on A ISvl Cam 8 A E Entro S Z iEntro S 7 gt M UGVGWSW m M u 2357 A1 2357 A2 t f t f 2157 830 l8337 ll27 Intro AI Decision Trees 13 Training Example Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Intro Al Decision Trees Selecting the Next Attribute Which attribute is the best classifier S39 95 E 0940 High N armu 34 61 E0i985 E0592 Gain S Humidity 940 714985 714592 151 S39 95 E 0i940 Weak Strung 62 33 E 0i811 E 100 Gain S Wind 940 81481161410 048 Intro AI Decision Trees 15 D1 D2 D14 95 Smmy Over cast Rain D1D2D8D9D11 D3D7D12D13 D4D5D6D10D14 23 40 32 Which attribute should be tested here 5mm D1D2D8D9D11 Gain Ssmyqunidiry 970 35 00 25 00 970 Gain 55mm Temperature 970 25 00 25 10 15 00 570 Gain Smmlv Wind 970 25 10 35 918 019 Intro AI Decision Trees 16 NonBoolean Features Features with multiple discrete values Multiway splits Test for one value versus the rest Group values into disjoint sets Realvalued features Use thresholds Regression Splits based on mean squared error metric Intro AI Decision Trees Intro AI Hypothesis Space Search 55 l A A2 4 7 A3 A4 4 Decision Trees 18 You do not get the globally optimal tree Search space is exponential Over tting Consider error of hypothesis It over 0 training data erronmmw o entire distribution 7 of data error h Hypothesis h E H over ts training data if there is an alternative hypothesis h E H such that T7 0Ttrainhgt lt GTTOTtramWI and error79 gt TTOT Dhlgt Intro AI Decision Trees 19 Over tting in Decision Trees Accuracy 0 6 On training data 7 011 test data Wquot 055 0 5 0 10 20 30 40 50 60 70 80 90 100 Size of tree number of nodes Intro AI Decision Trees 20 Validation Data is Used to Control Over tting Prune tree to reduce error on validation set Intro AI Decision Trees 21 9P Nf Intro AI Quiz 7 What is over tting Are decision trees universal approximators What do decision tree boundaries look like Why are decision trees not globally optimal Do Support Vector Machines give the globally optimal solution given the SVN optimization criteria and a speci c kernel Does the K Nearest Neighbor algorithm give the globally optimal solution if K is allowed to be as large as the number of training examples N N fold cross validation is used and the distance metric is xed Decision Trees 22

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.