### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 624 Note for STAT 597C at PSU

### View Full Document

## 25

## 0

## Popular in Course

## Popular in Department

This 9 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Pennsylvania State University taught by a professor in Fall. Since its upload, it has received 25 views.

## Similar to Course at Penn State

## Reviews for 624 Note for STAT 597C at PSU

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Adding a cluster neighboring structure topologv to K means Selforganizing Maps 1S0 m A SOMtvpe clustering alorithm 0 Choose a distance function for points say dx1x2 0 Choose a K 0 Now we want to specify a topology neighboring structure for the clusters We do this choosing a KxK matrix with nonnegative entries that expresses distances among K latent learning units one for each cluster u1uK 5uiuj ij 1K The topology does not necessarily have to correspond to a physical similaritydissimilarity or proximity What it ends up mapping is clusters relatedness in the fitting process 0 The next thing we need to specify a learning propagation mechanism We do this through a kernel function that will take as argument distances between learning units M5 0 Initialize K Tdimensional weight vectors one for each learning unit these are points in the space of the data and will function as centroids eg K among the N data points selected at random w10 wK 0 o For eachx 1 Compute the distances from the current centroids dxwk k 1K 2 Update the current cluster membership identifying the closes winning centroid mx arg mink1K 6 06 wk 3 Update all centroids based on the proximity of the corresponding units to the one of the winner and the learning propagation mechanism wkewkh6u ukx wk klK mx 3 0 Repeat until cluster centroids and thus memberships have stabilized possibly cycling through the observations several times Loosely speaking this algorithm pursues a minimum of the objective function AKstart Z i h5umxukd2xwk x kl where the centroids are again the optimization variables and the minimum we reach depends on the starting w10wK0 In real SOM s the kernel further depends on a time parameter that represents the number of iterations h 51 As I increases the kernel becomes smaller allover and more concentrated about 50 see later In the objective function we have replaced an indicator with a function that comes from composing a postulated learning propagation mechanism kernel with a postulated neighboring structure among clusters distance in the leaming units space Whatever the learning topology may be SOM reduces to an algorithm similar to Kmeans if there is no learning propagation which corresponds to hgt0 60 may 0 520 Similarly whatever the learning propagation mechanism may be SOM reduces to an algorithm similar to Kmeans if we postulate no neighboring structure among clusters which corresponds to 0 2 6ltupujgt J Ii Real SOM algorithms Slightly different because they are devised to train a net for future recognition purposes not explicitly for clustering The data points are a collection of training data input patterns that are fed to the net so it may or may not make sense to present them more than once The number of learning units K does not necessarily correspond in any sense to the number of groups one expects in the training data Sometimes it is selected to be as large as the number available input patterns or even larger The neighboring structure is automatically defined by the fact that learning units are located on a l or 2dimensional grid square distances on the grid In order to have the net stabilize as input patterns are presented to it the leaming mechanism kernel has a further parameter expressing time as time goes by h6t s shape decreases in hight and width reducing both the learning effect and the leaming propagation note with this change one cannot rigorously write an objective function corresponding to the training of the net anymore Imagine a situation in which input patterns only once and one does not necessarily have a large enough number of them as to guarantee convergence of the algorithm tapering learning has a stabilizing effect The order in which the input pattems are presented may matter a great deal need to randomize Geometrically speaking the weight vectors of the adapted units are moved a bit towards the input pattern The amount of weight vector movement and the number of units that are affected by adaptation are determined by the kernel function and decrease in time In time each weight vectors comes to be similar to a type group of input patterns Each unit is thus more likely to win at future presentations of a pattern type Adapting not only the winner unit but also a number of units in the neighborhood of the winner leads to the following Close by input pattern types in data space with its distance 69 Close by weight vectors in data space with its distance 69 Close by units in units space with its distance39 on the lattice The training process of the selforganizing map describes a topology preserving mapping from a highdimensional data space onto a low dimensional unit space Patterns that are similar in terms of the data space are mapped to geographically close locations in the units lattice SOM s for Visualization start with a number of learning unit larger or equal to the number of data points input patterns Each unit will come to represent a data point a proximitypreserving nonlinear projection from the data space Tdimensional high to a 2D grid Example from httpwwwifstuwienacaUandisomlibsomhtml SOM consisting of a square arrangement of 7 by 7 learning units The black circle indicates the unit that was selected as the winner for the presentation of input pattem xt The weight vector of the winner wt is moved towards the input pattem and thus wt1 is nearer to xt than was wt Similar yet less strong adaptation is performed with a number of units in the Vicinity of the winner These units are marked as shaded circles The degree of shading corresponds to the strength of adaptation the weight vectors of units shown with a darker shading are moved closer to xt than units shown with a lighter shading As t increases the shading fades Architecture of a selforganizing map m s are w s in our symbols 39Irpu qua As a result of the training process similar input data are mapped onto neighboring regions of the map Fuzzy K means If the data are not lumpy ie if the variation in the data space is gradual instead of abrupt disjoint clusters may provide a poor description An approach with fuzzy clusters seems more appropriate Objective function one version AKstart Zimxkfd2xck x k1 mxk in 01 are membership functions fuzzy memberships Sum to 1 over k and a strictly positive number over x f is the ccfuzziness exponent 3 1 Degree of fuzziness of the final solution39 degree of overlap between groups With f 1 the solution is a hard partition As f approaches infinity the solution approaches its highest degree of fuzziness Corresponding algorithm one version Choose a distance function for points say dx1x2 Choose a K Choose a fuzziness exponent f 3 l Initialize K Tdimensional centroids e g K among the N data points selected at random 010 cK0 altematiVely initialize the memberships lterate the following dxck Vxk 1K 2 71 d f xgck K Zdzf Wx c 11 Zmxkfx Ck e XZmxkf 0 Repeat until cluster centroids and thus memberships have stabilized possibly cycling through the observations several times mxke VxklK Data points located between clusters will have fairly spread membership functions How about outliers ie points outside the main body of data Fuzzy k means with extragrades adds an extra class for outliers De Gruijter and McBratney 1988 References Kohonen T 2quotd edition 1997 Self Organizing Maps SpringerVerlag EM algorithms for selforganizing maps 1999 THeskes J Spanjers W Wiegerinck Abstract Selforganizing maps are popular algorithms for unsupervised learning and data visualization Exploiting the link between vector quantization and mixture modeling we derive EM algorithms for selforganizing maps with and without missing values We compare selforganizing maps with the elasticnet approach and explain why the former is better suited for the visualization of highdimensional data Several extensions and improvements are discussed 1 Introduction Selforganizing maps are popular tools for clustering and visualization of high dimensional data 8 13 To derive an error function for the selforganizing map we will follow the vector quantization interpretation given in among httpciteseernineccom280386html On the use of selforganizing maps for clustering and visualization 1999 A Flexer Principles of Data Mining and Knowledge Discovery Abstract We will show that the number of output units used in a selforganizing map SOM in uences its applicability for either clustering or visualization By reviewing the appropriate literature and theory as well as our own empirical results we demonstrate that SOMs can be used for clustering or visualization separately for simultaneous clustering and visualization and even for clustering via visualization For all these different kinds of application SOM is compared to other statistical approaches This will show SOM to be a very exible tool which can be used for various forms of explorative data analysis but it will also be made obvious that this exibility comes with a price in terms httpciteseernineccom105424html Bezdek J C 1981 Pattern Recognition with Fuzzy Objective Function Algorithms Plenum Press New York DeGruijter J J McBratney AB 1988 A modi ed fuzzy k means for predictive classification In BockHHed Classification and Related Methods of Data Analysis pp 97104 Elsevier Science Amsterdam Roubens M 1982 Fuzzy clustering algorithms and their cluster validity European Journal of Operational Research 10 294301 XieXL BeniG 1991 A validity measure for fuzzy clustering IEEE Transactions of Pattern Analysis and Machine Intelligence 13 841847

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I made $350 in just two days after posting my first study guide."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.