×

### Let's log you in.

or

Don't have a StudySoup account? Create one here!

×

or

24

0

6

# DataMining.pdf 4390

UTEP
GPA 2.8

Enter your email below and we will instantly email you these Notes for Special Topics in Computer Science

(Limited time offer)

Unlock FREE Class Notes

### Enter your email below to receive Special Topics in Computer Science notes

Everyone needs better class notes. Enter your email and we will send you notes for this class for free.

These notes will cover what the midterm will be over!
COURSE
Special Topics in Computer Science
PROF.
TYPE
Class Notes
PAGES
6
WORDS
KARMA
Free

## Popular in ComputerScienence

This 6 page Class Notes was uploaded by Brian Madunezim on Sunday February 21, 2016. The Class Notes belongs to 4390 at University of Texas at El Paso taught by MOHAMMAD HOSSAIN in Spring 2016. Since its upload, it has received 24 views. For similar materials see Special Topics in Computer Science in ComputerScienence at University of Texas at El Paso.

×

## Reviews for DataMining.pdf

×

×

### What is Karma?

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/21/16
1/21/16 Associative Rules: -What is an association?: When one thing shares a common trait with another thing. A connected with b. If a b occurs then c occurs. -Given a set of transactions D, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction. -What is an transaction?: A list of item. Ex. A Receipt from Walmart. -The strength of association depends on the amount of transactions that are appearing. Based on the frequency of transactions. -What is the typical (average) length of a transaction?: no more than 100. -FREQUENT ITEM SETS: find comb. Of items that occur frequently. Ex. Bread, Milk, Diaper, then the item set number is 3. The count is always binary. Can assist in giving a summary for the data. -Confidence: how probable it is that an item will belong to an item or a set of items. *Two frequent tasks” 1. Itemsets and 2. association rule mining. -SUPPORT COUNT: the number of times you see an itemset. -Frequent item sets means support. -SUPPORT: fraction of the transactions in which an tiemset appears. -Minsup: is the threshold -Frequency: itemset whose support is greater than or equal to the minsup. -Frequent itemsets are only positive comb. -Frequent itemset mining are only towards the domain products, such as bread, milk, etc. Not towards gum, magazines, etc. -Finding frequent itemsets: given transaction and minsup you try to find the combination of frequency that has the minsup. -In order to be Frequent item set an item must meet the minsup threshold. -If minsup is 0, then everything is Frequent. That means you don’t care about what is frequent. Not good information. -Therefore, the larger the minsup the larger it is to find frequent set. -How to find all frequent itemsets??: Brute force algorithm, you generate the outcomes and generate the list of sets. The comb. Is 2^d. This can give you a very very large number. -Reducing the number of candidates is called apriori principle (good for finding a frequent set), which means if a is not frequent then ab cannot be frequent. If a subset is not frequent then a superset cannon be frequent. Ex. {set}>{subset}. -Anti-monotone: support of an itemset never exceeds the support of its supper set. -If a superset supports an item that Is not frequent as a result that superset cannot be frequent either. -Apriori Algorithm was when we went over the black and white strips representing whether or not an item set was frequent. 2/2 *FP- Growth(Frequent Itemset growth)(Advantages= not much data being received and not a lot of work)(Disadvantages= *FP- Growth Algorithm: -FP-growth *FP-tree representation: - a compressed representation of the input data. -at first, arrange the items in decreasing order of support. *FP-Tree representation: if you expecting some patterns then the tree should have less children nodes from the null node. *The size of the fp-tree also depends on the repetitions occurring. *What is the difference between decreasing order of support and increasing order of support? The increasing order of support has more children nodes extending from the null node. *Prefix Path: can be trees ending in what ever letter you want it to end it in. So, if you want to end it in “e” then you would say that there is a prefix spot ending in “e”. & Conditional FP-tree *Frequent itemset Mining in Graph Domain: -Edges are considered items -Graphs are considered to be transactions. *Frequent sub graph mining?: -dealing with connected components. 2/4 *FP-Tree will help you find a particular item with a specific suffix. *In a conditional FP-Tree you always ignore the item that you are looking for. Ex. “e”. **PROJECT INFO: -Yelp Dataset Challenge: + www.yelp.com/dataset challenge -Cultural trends -Location mining and Urban Planning -Seasonal trends (Ex. More people eat more steak in the winter, etc.) -Infer Categories -Natural Language Processing (English, Spanish, French). -Change points and Events (Finding out trends and seeing if good or bad, and see why it is good or bad. /You have to find out why there was a drastic change). -Social Graph Mining (What type of reviews they write based on their location, their sentiment, etc.). **PROJECT PRESENTATION: -Can be Thursday Feb. 11. -Or Tues. Fed. 16. -10MIN. Presentation +Some analysis of data(make sure that you download the data, understand it, analyze it, and then come up with a target problem). +Research idea(cant have your topic already categorized or published) +Tentative Solution Plan -Submit a 2pg report after the presentation. +The beginning of the paper should be about the big part of the topic. *Frequent SubGraph Mining(Apriori Algorithm): -Core sub-graph -Cost of comparison -Hash-code *FSG for Frequent SubGraph Mining: *SubGraph Extension Generation: *SEG Uses a super(master) graph: *Clustering Algorithms: -What is clustering?: Similar items should go to items, or similar document, or similar images. *Clustering vs. Classification: -Clustering you really don’t know what group you should be putting them in, vs Classification when you actually put the items together and categorize them. -Which comes first?: Clustering. *Clustering: -Unsupervised learning -Hgih-Intra class similarity: from multiple clusters. -Low intra class similarity: from different clusters. 2/18 Homework Assign.: **need two parameters **name the class rand index -[RI]= RandIndex(c1, c2){ } **I should be comparing the rows. *Goal of Kmeans Clustering Algorithm, is to get to the center of the centroid. *Major limitations of k-means: -have to define the number of clusters that exists. -always assume that clusters that have a cluster inside of it are all together when in all actuality that is not the case. (Assumption of Convex Region) *Solution to Assumption of Convex Region is Density-Based Clustering!: -Clustering based on density -Each cluster has a considerable higher density of points. -Two Global Parameters: +Eps= Max radius of the neighborhood +MinPts= min. number of points in an eps- neighborhood of that point. *Core object: -are objects that reach the MinPts. *Border Object/point: - object that is on the border of a cluster. Doesn’t have five elements within the border of the radius. *Direct Density-Reachable: -A point p is directly density-reachable froma point q wrt. Eps, MinPts if +1.)p belongs to Neps(q) +2.)|Neps(q)| >= MinPts(Core point condition) *Density Reachable: *Density-Connected:

×

×

### BOOM! Enjoy Your Free Notes!

×

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

Jim McGreen Ohio University

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Anthony Lee UC Santa Barbara

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Bentley McCaw University of Florida

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!
×

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com