### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning ECS 271

UCD

GPA 3.75

### View Full Document

## 61

## 0

## Popular in Course

## Popular in Engineering Computer Science

This 51 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 271 at University of California - Davis taught by Staff in Fall. Since its upload, it has received 61 views. For similar materials see /class/191704/ecs-271-university-of-california-davis in Engineering Computer Science at University of California - Davis.

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/08/15

Decision Tree Learning 46 read Chapter 3 recommended exercises 31 34 0 Decision tree representation 0 ID3 learning algorithm 0 Entropy Information gain 0 Over tting lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Decision Tree for PlayTenm s Sunny Overcast Rain YL High Normal Strong Weak No Yes No Yes 17 lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 A Tree to Predict CSection Risk Learned from medical records of 1000 women llegative exalnl es are Leecthjns 833 167 83 17 Feta1Presentation 1 822116 88 12 PreviousCsection O 76781 90 10 Primiparous O 39913 97 03 Primiparous 1 36868 84 16 FetalDistress O 33447 88 12 I I I BirthWeight lt 3349 201106 95 05 I I I BirthWeight gt 3349 133364 78 2 FetalDistress 1 3421 62 38 PreviousCsection 1 5535 61 39 Feta1Presentation 2 329 11 89 Feta1Presentation 3 822 27 73 48 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Decision Trees Decision tree representation 0 Each internal node tests an attribute 0 Each branch corresponds to attribute value 0 Each leaf node assigns a classi cation How would we represent 0 7 V7 XOR oABVCaDE o M of N 49 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 When to Consider Decision Trees o Instances describable by attributeivalue pairs 0 Target function is discrete valued o Disjunctive hypothesis may be required 0 Possibly noisy training data Examples 0 Equipment or medical diagnosis 0 Credit risk analysis 0 Modeling calendar scheduling preferences 30 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Top Down Induction of Decision Trees Main loop 1 A lt the best 7 decision attribute for next node 2 Assign A as decision attribute for node 3 For each value of A create new descendant of node 4 Sort training examples to leaf nodes 5 If training examples perfectly classified7 Then STOP7 Else iterate over new leaf nodes Which attribute is best 2935 A1 2935 A2 t f f 215 830 1833 ll2 31 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Entropy 10quot a E 205 a m 00 05 10 p63 0 S is a sample of training examples 0 p is the proportion of positive examples in S 0 pg is the proportion of negative examples in S o Entropy measures the impurity of S E71757 01929 S l 5 PG 102190 Plt 10g2po lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Entropy EntropyS 2 expected number of bits needed to encode class G3 or 9 of randomly drawn member of S under the optimal7 shortest length code Why Information theory optimal length code assigns log2 19 bits to message having probability p So7 expected number of bits to encode G3 or 9 of random member of 8 pets 102 Po PC 102 Po E71757 01929 S l 5 PG 102190 Plt 10g2po 33 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Information Gain GainS A 2 expected reduction in entropy due to sorting on A S GainS A E EntropyS Z Entropyw 39veVal39uesA 2935 A1 2935 A2 t f f 215 830 1833 ll2 51 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Training Examples Day Outlook Temperature Humidity Wind PlayTennis D 1 Sunny Hot High Weak No D 2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D 5 Rain Oool Normal Weak Yes D6 Rain Oool Normal Strong No D 7 Overcast Oool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Oool Normal Weak Yes D 10 Rain Mild Normal Weak Yes D 1 1 Sunny Mild Normal Strong Yes D 1 2 Overcast Mild High Strong Yes D 13 Overcast Hot Normal Weak Yes D 14 Rain Mild High Strong No 37 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Selecting the Next Attribute Which attribute is the best classi er S 95 S 95 E 0940 E0940 High Normal Weak Strong 354l 651l 652l 353l E0985 E0592 E0811 E100 Gain S Humidity Gain S Wind 940 714985 714592 940 81481161410 151 048 36 lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 D1 D2 D14 95 Sunny Overcast Rain D1D2D8D9Dll D3D7D12Dl3 D4D5D6D10Dl4 23 4O 32 1 Which attribute should be tested here ssumy D1 D2D8D9D1 1 Gain SsunnyyHumidlly 970 35 00 25 00 970 Gain Ssunny Temperature 970 25 00 25 10 15 00 570 Gain Ssunny Wind 970 25 10 35 918 019 37 lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 Hypothesis Space Search by ID3 38 lecture slides for textbook Machine Learning T0m M Mitchell McGraw Hill 1997 Hypothesis Space Search by ID3 o Hypothesis space is complete Target function surely in there 0 Outputs a single hypothesis which one Can t play 20 questions 0 No back tracking Local rninirna o Statisically based search choices Robust to noisy data 0 Inductive bias approx prefer shortest tree 7 39 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Inductive Bias in ID3 Note H is the power set of instances X gtUnbiased Not really 0 Preference for short trees7 and for those With high information gain attributes near the root 0 Bias is a preference for some hypotheses7 rather than a restriction of hypothesis space H o Occarn s razor prefer the shortest hypothesis that ts the data 50 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Occam s Razor Why prefer short hypotheses Argument in favor 0 Fewer short hyps than long hyps gt a short hyp that ts data unlikely to be coincidence gt a long hyp that ts data might be coincidence Argument opposed 0 There are many ways to de ne small sets of hyps o eg7 all trees with a prime number of nodes that use attributes beginning with Z 7 0 What s so special about small sets based on size of hypothesis l lecture slides for textbook Machine Learning Tom M Mitchell MeGraW Hill 1997 Over tting in Decision Trees Consider adding noisy training example 15 Sunny7 Hot7 Normal7 Strong7 PlayTennis No What effect on earlier tree Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes 52 lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 Over tting Consider error of hypothesis h over 0 training data errortamh o entire distribution D of data erroerz Hypothesis h E H over ts training data if there is an alternative hypothesis h E H such that errommxh lt errommh and erroerz gt erroerz 53 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Over tting in Decision Tree Learning Accuracy 06 On training data On test data 055 I I I I I I I I I 0 10 20 30 40 50 60 70 80 90 100 Size of tree number of nodes 54 lecture slides for textbook Machine Learning T0m M Mitchell MeGmw Hill 1997 Avoiding Over tting How can we avoid over tting 0 stop growing when data split not statistically signi cant 0 grow full tree7 then post prune How to select loest 7 tree 0 Measure performance over training data 0 Measure performance over separate validation data set 0 MDL minimize 3izetree sizemisclassificationstree 53 lecture slides for textbook Machine Learning Tom M Mitchell MeGraW Hill 1997 ReducedError Pruning Split data into training and validation set D0 until further pruning is harmful 1 Evaluate impact on validation set of pruning each possible node plus those below it 2 Greedily remove the one that most improves validation set accuracy 0 produces smallest version of most accurate subtree o What if data is limited 56 lecture slides for textbook illachinc Learning Tom M Mitchell MeGraw Hill 1997 Effect of ReducedError Pruning O a 3 8 lt 06 On training data 7 On test data quot7 055 On test data during pruning r I I 39 39 I I I I 0 10 20 30 40 50 60 70 80 90 100 Size of tree number of nodes 7 lecture slides for textbook Machine Learning T0m M Mitchell MeGmw Hill 1997 Rule PostPruning 1 Convert tree to equivalent set of rules 2 Prune each rule independently of others 3 Sort nal rules into desired sequence for use Perhaps most frequently used method eg7 045 58 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Converting A Tree to Rules 59 Sunny Overcast Rain is High Normal Strong Weak No Yes No Yes lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 IF Outlook 2 Sunny Humidity 2 High THEN PlayTennis N 0 IF Outlook 2 Sunny Humidity 2 Normal THEN PlayTennis Yes 70 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Continuous Valued Attributes Create a discrete attribute to test continuous 0 Temperature 2 825 0 Temperature gt 723 t f Temperature 40 48 60 72 80 90 PlayTeum39s No No Yes Yes Yes No lecture slides for textbook Machine Learning T0m M Mitchell MeGraw Hill 1997 Attributes with Many Values Problem 0 If attribute has many values7 Gain Wlll select it 0 Imagine using Date 2 Jun31996 as attribute One approach use GainRatio instead GainS A 39 A E GamRatwS Splitnf0rmation37 A r I A E 1 39 Splzt nformatzonwv E1 3 ng S where Si is subset of S for which A has value vi 72 lecture slides for textbook Machine Learning Tem M Mitchell MeGraw Hill 1997 Attributes with Costs Consider 0 medical diagnosis7 BloodTest has cost 150 0 robotics7 Widthjromift has cost 23 sec How to learn a consistent tree with low expected cost One approach replace gain by 0 Tan and Schlirnrner 1990 Gain2S A CostA o Nunez 1988 2GainSA 1 W Where w 6 01 determines importance of cost 73 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Unknown Attribute Values What if some examples missing values of A Use training example anyvvay7 sort through tree 0 If node 71 tests A7 assign most common value of A among other examples sorted to node n o assign most common value of A among other examples With same target value 0 assign probability p to each possible value 1 of A assign fraction pi of example to each descendant in tree Classify new examples in same fashion 74 lecture slides for textbook Machine Learning Tom M Mitchell MeGraw Hill 1997 Machine Learning Week 5 K The Brain A Paradox The brain contains 1011 neurons each of which may have upto 104 i 0 connections Each neuron is slow with a switching time of 1 msec Yet the brain is astonishingly fast and reliable at computationally intensive tasks like Vision speech recognition and retrieving stored knowledge Neural nets or connectionism is a eld based on the assumption that a computational architecture similar to the brain will duplicate at least some of its wonderful abilities Machine Learning Week 5 A Brief History of Neural Networks Pomerleau 195565 Rosenblatt s Perceptron Late 60 s Minksy and Papert publish de nite analysis of perceptrons 1975 Werbos Phd thesis at Harvard Beyond regression de nes backpropagation 1985 PDP book published that ushers in modern era of neural networks 1990 s Neural networks enter mainstream applications Machine Learning Week 5 ALVINN A Neural Networkbased Autonomous Vehicle Pomerleau 30x32 Sensor Input Retina Machine Learning Week 5 Left Wall K PAVLOV A NeuralNet based Navigation Architecture Front Opening Back Opening Khaleeli local occupancy grid 32x32 Right Wall 512 Input Layer Machine Learning Week 5 PAVLOV Learning to Find Trashcans 001010 EF IF Theocharous 101100 Machine Learning Week 5 Problems Suited to Neural Networks Input space is high dimensional and continuous 0 Output space is multi dimensional and discrete continuous 0 Training examples are noisy Long training times are feasible Explanation of learned structure is not necessary 0 Fast computing of output given input Machine Learning Week 5 K x2 1 Xok W0 0L 0 xn Single Layer Perceptrons x01 net wn 39 TL net E wimi i0 01ifnetgt0 lseoz 1 1I1 Machine Learning Week 5 Single Layer Perceptrons Simplest met over realvalued input N 0391 acn 1 iwaich gt O 1 otherwise i0 0 1 gt Class1 0 1 gt ClassB Example Let N 2 Then U10 wlml w22 0 CE 1 K Mg Mg Machine Learning Week 5 Initialize weights and threshold Set weights wi to small Present Input and Desired Output Set the inputs to the Calculate Actual Output Adapt Weights If actual output is different from desired output Repeat from Step 2 until done The Perceptron Learning Algorithm random values example values 337 and let the desired output be t 0 sgnu7 a then wq lt2 wq at 03397 Where O lt Oz lt 1 is the learning rate Machine Learning Week 5 K Decision Surface of a Perceptron A Some linearly separable functions AND Not all functions are linearly separable eg XOR V 10 Machine Learning Week 5 Gradient Descent in Error Space 11 Machine Learning Week 5 Given a set of weights 11 the mean squared error over the set Gradient Descent in Error Space of training instances is EM 1 2w 0d 2 dED The function E de nes an error surface in weight space To nd the weight vector that yields the lowest error we can do gradient descent along the error surface The direction of steepest descent is given by the gradient function VElt168E 8E 12 Machine Week 5 Learning Learning by Gradient Descent o The training rule for gradient descent is Aw 77VE1D o The weight 10 is Changed by the amount 8E 8102 Am 2 77 o For a linear unit unthresholded perception the weight update is Am 2 77 Z td 0cm dED 13 Machine Learning Week 5 Batch Mode Do until error lt minimum Incremental Stochastic Gradient Descent 1 Compute the gradient VED 2 u lt u nVEDu7 Incremental Mode Do until error lt minimum 1 For each training example d E D 0 Compute the gradient VEdW w lt w nVEdm Em Zlttd 002 161 Edlwl 3 Odlg Given small enough 17 incremental SG can approximate batch SG 14 Machine Learning K Week 5 Summary 0 Linear training unit uses gradient descent 0 Guaranteed to converge to hypothesis with MSE Provided learning rate 77 is suf ciently small Even When training data is not describable in H o Perceptron training rule guaranteed to succeed if Training examples are linearly separable Suf ciently small learning rate 15 Machine Learning output Week 5 Smooth Di 39erentiable Units Sigmoid function Hyperbolic tangent output 0 0 weighted input weighted input 16 Machine Learning Week 5 Sigmoid Units x01 1 X 0k wo 2 x20w net wn xn TL net E wimi i0 0 0net 1 e net 17 Machine Learning Week 5 Training Sigmoid Networks If 733 2 1 Note that 103 d3 altxgtlt1 we Error gradient for sigmoid units 8E 8 1 8w ztd0d2 80d 8d gad 0d netd 810i 2W 0d0d1 0d5 3 z d Machine Learning Week 5 Learning the AND Function with a Sigmoid Unit Mean Square Error 0 0 Sigmoid Unit for And Function 100 200 300 400 500 600 700 800 900 1000 Number of Epochs 19 Machine Learning Week 5 Limitations of Threshold and Perceptron Units Perceptrons can only learn linearly separable classes Perceptrons cycle if classes are not linearly separable Threshold units converge always to MSE hypothesis Network of perceptrons how to train Network of threshold units not necessary why 20 Machine Learning output Week 5 Smooth Di 39erentiable Units Sigmoid function Hyperbolic tangent output 0 0 weighted input weighted input 21

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.