Class Note for CMPSCI 683 at UMass(15)
Class Note for CMPSCI 683 at UMass(15)
Popular in Course
Popular in Department
This 11 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 28 views.
Reviews for Class Note for CMPSCI 683 at UMass(15)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
39 Neural Networks Representing functions using networks of simple arithmetic computing elements Learning such representations from examples Victor Lesser CMPSCI 683 Fall 2004 vtmaue mm Z Bf 39 Approximately 10it neurons 104 synapses connections per neuron 39 Neuron fires when its inputs exceed a threshold Faithful to coarse neural constraints not neural dels 39 ln uts are wet hted and can have excrto or m p g ry Large numbers of simple neuronlike processing Computational architectures and cognitive models that are neurallyinspired mmer e ed39 units interconnected through weighted links 39 39 d39V39dila39 er IS SW 39001 second hm They do not compute by transmitting symbolically bandvndth IS very high 10it bitssec coded messages 39 The brain performs many tasks much faster than hmbloband excitonIvgvals a computer Scene recognition time 1 39 f provram ueardes m the structure afrhe second interconnection massive arallelism andnocentralizedcontrol 39 Leamrngandgracefuldegradation p v tnue mm K v t5th Al Ability to bring large numbers of interacting Automobile automatic guidance systems constraints to bear on problem solving so constraints Noise resistance error tolerance graceful Credit application evaluation mortgage screening real estate appraisal Object recognition faces characters degradation Ability to do complex multilayer recognition 39 SpeeCh recognition and V0 syntheSis with a large number of inputsoutputs quickly Market forecasting automatic bond trading Learning with generalization Robot control process control Biological plausibility Breast cancer cell analysis Potential for speed of processing through ne on and gas exploration gramed Paralle39lsm Image and data compression vtnmsixmlm 5 mttmslmim t 39Compose olnodnunits connected by links m W W 39Each link has a numericwaight associath with n stamp 39Processing unns compmeweightedsum oltheir inputs and then In applyathmhold mction 7 Lineaitunmnn mmhins lupus amt W a w nak r Nnnlirear Midlm tmnslnrms gmmhined in tn adlvallm value mm mn mm m input Anhvmm Output function funutm vtnmsixmlm 7 mttmslmim x a Step function h Sign function c sigmoid function Cari make each nithese functions attiresimld such that it outputs at when the input lS greaterthan threshold can also through dummy link Figure 196 Units with Step filmmnfbxthz ammumnmncmm lay gm yvm appmpnm imam mayo gt XOR requires mullihyer network Robust approach to approximating realvalued discretevalue and vectorvalued target functions Learning the Weights and Connectivity wN 0 implies no connectivity among nodes a and a FeedForward Networks unidirectional links No cycles DAG No internal state other than weights Layered feedforward Each unit is linked only to units in the next layer Synchronized movement of information from layer to layer Relatively understood Averystmpk may reearomd mm with m mth m ham rimk ml mo nutth mde vunmsixmlm to Have bidirectional connections With symmetric Weights All units are both input and output units Activation function is the sign function can only be 1 or 7 l Functions as an associative memory a new example Will cause the net to settle into a training pattern that most closely resembles the new example Training set cl photographs 7 Each weight is a partial encoding olall photographs 7 Newstimulus small piece alone olthetrained photographs naivatioh levels Mnenrd units will relied coma photograph vunmsixmlm t5 Recurrent Network arbitrary links Activation is fed back to units that caused it Internal state stored in activation levels Can be unstable oscillate etc Can represent more complex functions mnmsllmm u Singlelayered feedforward networks studied in the late 1950 s t lifwnwxL wnxngt0 1 otherwise mnmsllmm in Represents some useful functions linearly separable But some functions not representable XOR uumzsixmim n Local encoding Each attributed single input value Pick appropriate number of distinct values to correspond to distinct symbolic attributed value Distributed encoding One input value for each value of the attribute Value is one or zero whethervalue has that attribute x between 0 and 3 A lSllflLt inputs yi yzyaiwii X4 yiuy2uy3ny4i termmi ix Perceptron learning rule wlewll or ogtg reduce difference between obxerved and predicted Value in well increment where isthetarget value oftraining example o is the perceptron output or is a small constant eg 1 called the learning rate x1 is either1 or1 Stan out with randomly assigned weights between 05o5 uumzsixmim w The perceptron learning rule will converge to a set of weights that correctly represents the examples as long as the examples represent a linearly separablequot function and a is suf ciently small Why does it work Perceptron is doing gradient descent in weight space that has no local minima 7 Optimization in the weight space based on sum of squared errors termmi in 1Leamingthemajority function of 11inputs 1 Lea minrgth e Wt lth u39tn redic ate y t 39 tsri ri 7 09 w J N 7quot 72 E a 18 a 08 I elceptmn 1 If Decihiunlree 3 07 3 a 8 5 06 Perceplmnr 3 f Dectsinntree 59 05 04 I ll 21 30 40 50 6t 70 Xlt 9 III II III 20 It 40 ill rll 70 81 9 lllt Training set size Training sci size Delta Rule wle wl 012504 04de Drs set ct ehtrre trarhrhg examptes tf trarhrhg examptes are hot hhearty separabte Detta rute converges towards bestht approxrmatroh to target concept Use gradreht descent search to search hypothesrs space of possrbte Werghtvectors to nd the Werghts that bestfrt the trarhrhg exampte Arbrtraryrhrtratwerghtvectnr t each step Werght vector rs attered m the drrectroh that produces the steep desceht atohg error surface N r r Fm ahmzrumththtwn was thz hypnthstsspane He tte w m mm The Meat an m Q Oba m mmum error 5 reached mdmatesthzemxuhhewmgmllingwe vshxhypndmix elm m a xedxelnf mag mmplex 11a mw shnws thz negated gmdmnt at m particular pant mm a duectmnmtle wquot m dam gunmng steepest descent 31mg the emrsnrfzae mnmsumm 2b least mean square ermrnver all training examples WWW 1 Convergence guaranteed for perceptron since error surface contains only a single global minimum and learning rate sufficiently small large number of iterations Larger learning rate Possibly overshoot minimum in the error surface Can use larger learning rate ifgradually reduce value of learning rate over time Slmllal to Slmulated annealing Incremental gradient descent by updating weights per example 7 wie w at 03x Looks similar to perceptron rule 7 0 not thresholded perceptron no 9 output rather thresholded linear combinations of inputs wx Reduces cost of each update cycle Needs smaller learning rate More update cycles than gradient descent v uxxm mfmnt 23 Problem with Perceptrons is coverage many functions cannot be represented as a network sum threshold function But with one hidden layer and the sigmoid threshold function can represent any continuous function Choosing the right number of hidden units is still not well understood With two hidden layers can represent any discontinuous function 0 BackPropagation Learning How to assess the blame for an error and divide it among the contributing weights at the same and different layers Gradient descent over network weight vector v uxxm mfmnt 23 Provides a way of dividing the calculation of gradient among the units so that change in each weight can be calculated bythe unitto which the weight is attached using only local information Based on minimizing EM ET raj trnu1t1ple output units i First level of Back propagation to hidden layer WJL WJd Il39aJ39At a 6mm ram 0 with mspct to wll Second level of Back propagation to input AtTt ot39g 2WJuaJ tJ layer m W W CA1 AJ g Emit iiipm Oi J it t it J 039 Gmdient Aim at m on 95 Summing tne errorterms for each output unit influence by wK thru all weighting eacn by tne WW tne degree to which nidden unit is responsible for error in output 3 Compute the delta values for the output units using the observed error Starting with output layer repeat the following for each layer in the network Propagate delta values back to previous layer Update the weights between the two layers Typically use sigmoid function g 11 7 e 1 Nice proprty gilltl gtxll Gradient of error E with respect to weight w 95 200 odiodtl 04 correct on test set 1 Learning the WillWajtbredicate t o l Multilayer network h Decision tree OAlllllllll 3 Joi m l ll 10 20 30 40 5t ht 70 80 90 ltltl Training set size Initialize all weights to small random numbers Repeat until satisfied For each training example 1 Compute the network outputs 2 For each output unitk 6 e r magi 0 3 For each hidden unith 6h eohtlOh w 6 M A lam rut 4 Update each weight w le Awu where Aw 1611 mnmsumm at Ndimensional Euclidean Space of network weights Continuous Contrast with discrete space of decision tree Error Measure is differentiable with respect to continous parameters Results in wellde ned error gradient that provides a useful structure for organizing the search for the best hypothesis mnmsumm 1 Backprop is susceptible to overlitting After initial learning Weights are being tuned to fit idiosyncrasies of training examples and noise Overly complex decision surfaces constructed Smooth Interpolation between data points Smootnly varying decision regions Tend to label points in between positive Weight Decay decrease weight by some exampes as positive exampies if no negative small factor during each Iteration thru data examples Keep Weight values small to bias learning against complex decision surfaces Exploit Validation Set Keep track of error in validation set during searcn Use Weight setting that minimizes error Use as stopping criteria comm 3i unionsmi ax hm um might We mind ll no em Error Surface can have multiple local minimum Guaranteed to converge only to local minimum K ouerlilling lawmaipuemmmi Momentum model w Weight update partially dependent on the nl iteration 523 AWJWW 791 t HAW ll1 i3 7 Helps notto get stuckln local minimum m mi 7 Giaduallyincieasing the step size of the search in regions where the gradient is unchanging 3v unionsmi bu Gradient descent over network weight vector 39wpi initial llti l W m Easily generalizes to any directed graph 34 it as iixunin Will findalocal not necessarily globalerror r 1 w an 7 i inrwi A M q minimum i in u Minimizes error overtraining examples will it generalize well to subsequent examples Training is slow can take thousands of iterations Using network a er training is very fast iii W i fiCJiiiiiiii FigmAledeiddenlayerzpmsemm This magpiqu mdtnlmmthz idemty mmmmrgth ugmmmrgmamplts min AkrSOmhmmrgepmls him hidden mtwnmsemnde mmmmmis mirgthzmmdirgikwmmth my mum mm mm mammmmmmnmsmhmmmmqmnmm i mimicsme u Instances are re resentedb man attribute value pairs p y y Reinforcement Learning The target function output may be discrete valued realvalued or a vector of several real or discretevalued attributes The training examples may contain errors Long training times are acceptable Fast evaluation ofthe learned target function may be required The ability of humans to understand the learned target function is not important wmmm u mimicsme u
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'