×

### Let's log you in.

or

Don't have a StudySoup account? Create one here!

×

or

by: Natasha Orn

12

0

4

# Data DSCI 4520

Natasha Orn
UNT
GPA 3.66

Staff

These notes were just uploaded, and will be ready to view shortly.

Either way, we'll remind you when they're ready :)

Get a free preview of these Notes, just enter your email below.

×
Unlock Preview

COURSE
PROF.
Staff
TYPE
Class Notes
PAGES
4
WORDS
KARMA
25 ?

## Popular in Decision Sciences

This 4 page Class Notes was uploaded by Natasha Orn on Sunday October 25, 2015. The Class Notes belongs to DSCI 4520 at University of North Texas taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/229147/dsci-4520-university-of-north-texas in Decision Sciences at University of North Texas.

×

## Reviews for Data

×

×

### What is Karma?

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/25/15
Lecture 4 DSCI 45205240 Data Mining Lecture Notes A nscn 45205240 nsnss 21525 DATA ma DSCI 45205240 Lecture 4 Decision Trees I Some slide material taken from Witten amp Frank 2000 Olson amp Shi 2007 de Ville 2006 SAS Educa on 2005 unwell 1 A simple example Weather Data um Mwwe Leclmell M 2007 University of North Texas Lecture 4 DSCI 45205240 Data Mining Lecture Notes uset 52M t MYAM N NE Pseudocode for 1R For each attribute follows mak t attributeeva calculate the error count how often each find the most freque t e he rule assign that class to this lue For each value of the attribute make a rule as rate of th choose the rules with the smallest error rate clas n class s appears e rules humidity WI dy rainy and make 3 c get all 4 sets of rules Let s apply 1R on the weather data I Consider the rst outlook ofthe 4 attributes outlook temp 39n Consider all values sunn overcast orresponding rules Continue until you taine ln Dec on tree for the weather ata use 52M ENAMle uset 52M t Evaluating the Weather Attributes in IR Discretization in IR uset 52M t ENAMMNG ENAMle AWN Rug EH0quot wt Consroler eontinuous Temperature olata arter snrtingthem m aseendrng oroler errors 6 7 72 72 75 75 80 81 83 5 7 Yes No Yes Yes Yes No No Yes YesYes No Yes Yes No L7Llii00ilt Sillllly 7 No 2f5 4H4 lt A 0 YeslNolYes Yes Yes lNo No lYes Yes YeslNolYes Yes lNo meamm N4 514 To avold over ttrng 1R aolopts the rule that observatlons ofthe majority elass m eaeh f partrtron be as many as possrblebut no more than 3 unless there ls a runquot 5 Yes No Yes Yes YeslNo No Yes Yes YeslNo Yes Yes No it 4 Wilma If adjacent partitions have the sarne majority elass the partitrons are merged Htlmttltty Htgh 7 No 297 Am ranimmune Yes No Yes Yes Yes No No Yes Yes YeslNo Yes Yes No N e a yd l hmwmnlwn 0mm i 7 equallylilely The nal drseretizationleaols tothe rule set l39llltldy Fe 5 7 Yes 28 5M 4 nuunrnes 12F temperature lt 77 5THEN Yes T 5 7 N3 6 u ternperature gt 77 5 THEN No nine l2 new t 2007 University of North Texas Lecture 4 DSCI 45205240 Data Mining Lecture Notes Which attribute to select EEEEEEEE m n Computing Information lInformaIion is measured in bits lGiven aprobability distribution the info required to predict PY an event is the distribution s en ro Entropy gives the addznumzl rzqmmd mfurmanun i e the information de cit in bits This can involve fractions ofbits TL 39 39 in the en opy all negative logs back to positive values Formula for computing the entropy Entropy pisz mph Pl logpi sz logpz TPn lospn A criterion for attribute selection Which is the best attribute The one which will result in the smallest tree Heuristic choose the attribute that produces the purest nodes Popu arimpurity criterion Infmmatmn This is the ex ra information needed to classify an instance It takes a low value using Infunmm39nn Gain Infu befure e Infu u er lnfonnation Gain increases with the average pun39ty ofthe subsets that an attribute produces t gain helm as Continuing to split Gain Temperature 0 571 bits Gain Humidity 0971 bits Gain windy o 020 bits 2007 University of North Texas Lecture 4 DSCI 45205240 Data Mining Lecture Notes Weather example attribute outlook nsmmw Outlook Sunny Info23 entropy25 35 72510g25 73510g35 0971 bits Info23 Outlook Overcast 1nfo40 entropy1 0 7110g1 7010g0 0 bits by de nition Outlook Rain 1nfo32 entropy35 25 73510g35 72510g25 0971 bits Expected Information for attribute Outlook Info32 40 32 514gtlt0971 414gtlt0 514gtlt0971 0693 bits Lectmell 32 Computing the Information Gain nsmmw Information Gain Information Before InformutionA er Gain Outlook inio95 7 iiuu23J 40 32 0940 7 0693 0247 bits Information Gain for attributes from the Weather Data Gain Outlook 0247 bits Gain Temperature 0029 bits Gain Humidity 0152 bits Gain Windy 0048 Lectmell 33 2007 University of North Texas

×

×

### BOOM! Enjoy Your Free Notes!

×

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

Bentley McCaw University of Florida

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Kyle Maynard Purdue

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made \$280 on my first study guide!"

Bentley McCaw University of Florida

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!
×

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com