Special Topics ECE 8833
Popular in Course
Popular in ELECTRICAL AND COMPUTER ENGINEERING
This 0 page Class Notes was uploaded by Cassidy Effertz on Monday November 2, 2015. The Class Notes belongs to ECE 8833 at Georgia Institute of Technology - Main Campus taught by Christopher Rozell in Fall. Since its upload, it has received 11 views. For similar materials see /class/233879/ece-8833-georgia-institute-of-technology-main-campus in ELECTRICAL AND COMPUTER ENGINEERING at Georgia Institute of Technology - Main Campus.
Reviews for Special Topics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
Neural Models of Bayesian Belief Propagation Rajesh P N Rao Introduction Animals are constantly faced with the challenge of interpreting signals from noisy sensors and acting in the face of incomplete knowledge about the envi ronment A rigorous approach to handling uncertainty is to characterize and process information using probabilities Having estimates of the probabilities of objects and events allows one to make intelligent decisions in the presence of uncertainty A prey could decide whether to keep foraging or to ee based on the probability that an observed movement or sound was caused by a preda tor Probabilistic estimates are also essential ingredients of more sophisticated decisionmaking routines such as those based on expected reward or utility An important component of a probabilistic system is a method for reasoning based on combining prior knowledge about the world with current input data Such methods are typically based on some form of Bayesian inference involv ing the computation of the posterior probability distribution of one or more random variables of interest given input data In this chapter we describe how neural circuits could implement a gen eral algorithm for Bayesian inference known as belief propagation The belief propagation algorithm involves passing messages probabilities between the nodes of a graphical model that captures the causal structure of the en vironment We review the basic notion of graphical models and illustrate the belief propagation algorithm with an example We investigate potential neural implementations of the algorithm based on networks of leaky integrator neu rons and describe how such networks can perform sequential and hierarchical Bayesian inference Simulation results are presented for comparison with neu robiological data We conclude the chapter by discussing other recent mod els of inference in neural circuits and suggest directions for future research Some of the ideas reviewed in this chapter have appeared in prior publications 30 31 32 42 these may be consulted for additional details and results not included in this chapter 236 11 Neural Models othzyesitm BeliefPropagation Rajesh R N R120 Bayesian Inference through Belief Propagation Consider the problem of an animal deciding whether to ee or keep feeding based on the cry of another animal from a different species Suppose it is often the case that the other animal emits the cry whenever there is a predator in the vicinity However the animal sometimes also emits the same cry when a po tential mate is in the area The probabilistic relationship between a cry and its probable causes can be captured using a graphical model as shown in figure 111 The circles or nodes represent the two causes and the observation as random variables R Predator M Mate and C Cry heard We assume these random variables are binary and can take on the values 1 and 0 for presence and absence respectively although this can be generalized to multiple values The arcs connecting the nodes represent the probabilistic causal relationships as characterized by the probability table PCR7 Predator R Mate M Cry Heard C Figure 111 An Example of a Graphical Model Each circle represents a node denoting random variable Arrows represent probabilistic dependencies as specified by the probability table PCR For the above problem the decision to ee or not can be based on the poste rior probability PRC of a predator given that a cry was heard C 1 This probability can be calculated directly as PR1C1 ZPR1MC1 M Elmo 11R 1MPR 1PM 111 M where we used Bayes rule to obtain the second equation from the first with k being the normalization constant 1 ER M PC 11R7 MPRPM The above calculation required summing over the random variable M that was irrelevant to the problem at hand In a general scenario one would need to sum over all irrelevant random variables an operation which scales expo nentially with the total number of variables quickly becoming intractable For tunately there exists an alternate method known as belief propagation or prob 1121 1122 112 Bayesian Inference through BeliefPropagation 237 ability propagation 26 that involves passing messages probability vectors between the nodes of the graphical model and summing over local products of messages an operation that can be tractable The belief propagation algo rithm involves two es of r quot n U quot quot quot over local joint distributions and multiplication of local marginal probabilities Be cause the operations are local the algorithm is also well suited to neural im plementation as we shall discuss below The algorithm is provably correct for singly connected graphs ie no undirected cycles 26 although it has been used with some success in some graphical models with cycles as well 25 ml A Simple Example We illustrate the belief propagation algorithm using the feedor ee problem above The nodes R and M first generate the messages PR and PM re spectively which are vectors of length two storing the prior probabilities for 0 and 1 and M 0 and 1 respectively These messages are sent to node C Since a cry was heard the value of C is known C 1 and therefore the messages from R and M do not affect node C We are interested in computing the marginal probabilities for the two hidden nodes R and M The node C generates the message mod mcHM 01 ie probability of absence of a cry is 0 and probability of presence of a cry is 1 since a cry was heard This message is passed on to the nodes R and M Each node performs a marginalization over variables other than itself using the local conditional probability table and the incoming messages For exam ple in the case of node R this is EM 0 PCRMPMPC EM PC 1R7 since C is known to be 1 Similarly the node M performs the marginalization ER 0 PCR7 MPRPC ER PC 1R MPR The final step involves quot these 11161191116111 ed r quotquot with other messages received in this case PR and PM respectively to yield after nor malization the posterior probability of R and M given the observation C 1 PRC1 altZPC1RMPMgtPR 112 M PMC 1 5ltZ PC 1RMPRPM 113 R where a and B are normalization constants Note that equation 112 above yields the same expression for PR 1C 1 as equation 111 that was derived using Bayes rule In general belief propagation allows efficient com putation of the posterior probabilities of unknown random variables in singly connected graphical models given any available evidence in the form of ob served values for any subset of the random variables Belief Propagation over Time Belief propagation can also be applied to graphical models evolving over time A simple but widely used model is the hidden Markov model HMM shown 238 11 Neural Models othzyesitm BeliefPropagation Rajesh P N R120 in figure 112A The input that is observed at time t 17 2 r r r is represented by the random variable It which can either be discretevalued or a realvalued vector such as an image or a speech signal The input is assumed to be gen erated by a hidden cause or state 19t which can assume one of N discrete values 1 r r N The state 19t evolves over time in a Markovian manner de pending only on the previous state according to the transition probabilities given by P6t 211967 1 j P091116 for ij 1HrN The observa tion It is generated according to the probability PItl 9t The belief propagation algorithm can be used to compute the posterior prob ability of the state given current and past inputs we consider here only the forward propagation case corresponding to online state estimation As in the previous example the node 19 performs a marginalization over neighbor ing variables in this case 19quot1 and It The first marginalization results in a probabilityvector whose ith component is E P19 9 1m 1quot where the jth component of the message from node 19quot1 to 19 The second mar inal ization is from node It and is given by 21EPIt16 PIt If a particu lar input I is observed this sum becomes Em PItl 9 6lt7 I PI l 9 where 6 is the delta function which evaluates to 1 if its two arguments are equal and 0 otherwise The two messages resulting from the marginalization along the arcs from 19quot1 and It can be multiplied at node 19 to yield the following message to 19 71 IS mz Hl PI l Z P6 l 1m 1quot 114 139 If mg l P6i the prior distribution over states then it is easy to show using Bayes rule that my PW Itr r r 11 Rather than computing the joint probability one is typically interested in calculating the posterior probability of the state given current and past inputs ie P0911116 r r 11 This can be done by incorporating a normalization step at each time step Define for t 17 2 r r r m PI l 9 ZP6 16 1m 1gt 115 1 mz Hl mgn 116 where n If mg l P6 the prior distribution over states then it is easy to see that m Plt6111lttgtm1lt1gtgt 117 This method has the additional advantage that the normalization at each time step promotes stability an important consideration for recurrent neuronal net works and allows the likelihood function PI l to be defined in proportional terms without the need for explicitly calculating its normalization factor see section 114 for an example 1123 112 Bayesian Inference through BeliefPropagation 239 et et1 It t1 A It B Figure 112 Graphical Model for a HMM and its Neural Implementation A Dynamic graphical model for a hidden Markov model HMM Each circle represents a node denoting the state variable 6t which can take on values 1 N B Recurrent network for implementing online belief propagation for the graphical model in A Each circle represents a neuron encoding a state i Arrows represent synaptic connections The probability distribution over state values at each time step is represented by the entire population B Locetron Feature Locauons L Features F coorng Neurons coorng Neurons rnierrneoraie Represeniauon c 0 image r Figure 113 A Hierarchical Graphical Model for Images and its Neural lmplementa tion A Threelevel graphical model for generating simple images containing one of many possible features at a particular location B Three level network for implement ing online belief propagation for the graphical model in A Arrows represent synaptic connections in the direction pointed by the arrow heads Lines without arrow heads represent bidirectional connections Hierarchical Belief Propagation As a third example of belief propagation consider the threelevel graphical model shown in figure 113A The model describes a simple process for gen erating images based on two random variables L denoting spatial locations and F denoting visual features a more realistic model would involve a hi erarchy of such features subfeatures and locations Both random variables are assumed to be discrete with L assuming one of n values L17 l l l L7 and F assuming one of In different values F17 l l l Fm The node C denotes different combinations of features and locations each of its values C1 l l l Op encoding a specific feature at a specific location Representing all possible combinations is infeasible but it is sufficient to represent those that occur frequently and to map 240 113 113 11 Neural Models ofBayesitm BeliefPropagation Rajesh P N R120 each featurelocation L7 F combination to the closest C using an appropriate distribution PClL7 F see section 114 for an example An image with a spe cific feature at a specific location is generated according to the image likelihood PIl 0 Given the above graphical model for images we are interested in computing the posterior probabilities of features more generally objects or object parts and their locations in an input image This can be done using belief propa gation Given the model in figure 113A and a specific input image I I belief propagation prescribes that the following messages probabilities be transmitted from one node to another as given by the arrows in the subscripts mLuc PL 118 mFaC PF 119 mbc PII lC 1110 mCLL ZZPCLFPFPIIC 1111 F C nan ZZPCLFPLPII C 1112 L C The first three messages above are simply prior probabilities encoding beliefs about locations and features before a sensory input becomes available The posterior probabilities of the unknown variables C L and F given the input image I7 are calculated by combining the messages at each node as follows PClI 1 ambc Z ZPCLFmLucmFuc 1113 F L PLlI 1 mcuLHL 1114 PFl1 1 7m0upPF 1115 where a B and 7 are normalization constants that make each of the above probabilities sum to 1 Note how the prior PL multiplicatively modulates the posterior probability of a feature in equation 1115 via equation 1112 This observation plays an important role in section 114 below where we simulate spatial attention by increasing PL for a desired location Neural Implementations of Belief Propagation Approximate Inference in Linear Recurrent Networks We begin by considering a commonly used neural architecture for modeling cortical response properties namely a linear recurrent network with firing rate dynamics see for example 5 Let I denote the vector of input firing rates to the network and let v represent the output firing rates of N recurrently connected neurons in the network Let W represent the feedforward synaptic weight matrix and M the recurrent weight matrix The following equation 113 Neural Implementations ofBelief Propagation 241 describes the dynamics of the network 7 7vWIUv 1116 where 739 is a time constant The equation can be written in a discrete form as follows vitl vite7vitWiItZuijvjt 1117 where e is the integration rate 1 is the ith component of the vector V W is the 2th row of the matrix W and uij is the element of U in the 2th row and jth column The above equation can be rewritten as vt1 ew1tZUvt 1118 139 where Uij euij for 239 y j and Uii 1 51 7 1 Comparing the belief propagation equation 115 for a HMM with equation 1118 above it can be seen that both involve propagation of quantities over time with contributions from the input and activity from the previous time step However the belief propagation equation involves multiplication of these contributions while the leaky integrator equation above involves addition Now consider belief propagation in the log domain Taking the logarithm of both sides of equation 115 we get iogmg log PI l log 2 P9191m1gt 1119 139 This equation is much more conducive to neural implementation via equation 1118 In particular equation 1118 can implement equation 1119 if vitl logm 1120 6Wi1t logPI l 9 1121 ZUijvjt logZPw lOfUmTlquot 1122 139 j The normalization step equation 116 can be computed by a separate group of neurons representing m that receive as excitatory input log and in hibitory input logn log Ej logmz Hl logm 7 log n 1123 These neurons convey the normalized posterior probabilities my back to the neurons implementing equation 1119 so that m2 may be computed at the next time step Note the the normalization step makes the overall network nonlinear In equation 1121 the loglikelihood log PI l is calculated using a lin ear operation ewiIt see also 45 Since the messages are normalized at each 242 1132 11 Neural Models othzyesitm BeliefPropagation Rajesh P N R120 time step one can relax the equality in equation 1121 and make log PI l6 olt F19It for some linear filter ewi This avoids the problem of calculat ing the normalization factor for PI l19 which can be especially hard when I takes on continuous values such as in an image A more challenging problem is to pick recurrent weights Uij such that equation 1122 holds true For equa tion 1122 to hold true we need to approximate a logesum with a sumeofelogs One approach is to generate a set of random probabilities xj t for t 11 1 1 T and find a set of weights Uij that satisfy ZUijlogzja e 10 P19 19 1zjt 1124 139 j for all 239 and t This can be done by minimizing the squared error in equa tion 1124 with respect to the recurrent weights Uij This empirical approach followed in 30 is used in some of the experiments below An alternative ap proach is to exploit the nonlinear properties of dendrites as suggested in the following section Exact Inference in Nonlinear Networks A firing rate model that takes into account some of the effects of nonlinear fil tering in dendrites can be obtained by generalizing equation 1118 as follows vit 1 fw1t 9ZUU1 1125 where f and 9 model nonlinear dendritic filtering functions for feedforward and recurrent inputs By comparing this equation with the belief propagation equation in the log domain equation 1119 it can be seen that the first equa tion can implement the second if vitl logmg 1126 fwIt iogPaeg 1127 9ZUU1 iogzpwgleflmfk 1128 139 139 In this model figure 1123 N neurons represent log i 17 1 1 1 N in their firing rates The dendritic filtering functions f and g approximate the loga rithm function the feedforward weights W act as a linear filter on the input to yield the likelihood PI l6 and the recurrent synaptic weights Uij directly encode the transition probabilities P19 19 1 The normalization step is com puted as in equation 1123 using a separate group of neurons that represent log posterior probabilities log mt H1 and that convey these probabilities for use i in equation 1128 by the neurons computing log m2 1133 113 Neural Implementations ofBelief Propagation 243 Inference Using Noisy Spiking Neurons Spiking Neuron Model The models above were based on firing rates of neurons but a slight modifi cation allows an interpretation in terms of noisy spiking neurons Consider a variant of equation 1116 where v represents the membrane potential values of neurons rather than their firing rates We then obtain the classic equation de scribing the dynamics of the membrane potential 1 of neuron 239 in a recurrent network of leaky integrateandfire neurons d1 7397 715 gwijlj 111714 where 739 is the membrane time constant Ij denotes the synaptic current due to input neuron j wij represents the strength of the synapse from inputj to recurrent neuron i 1 denotes the synaptic current due to recurrent neuron j and uij represents the corresponding synaptic strength If v crosses a threshold T the neuron fires a spike and v is reset to the potential mesa Equation 1129 can be rewritten in discrete form as vit1 UN 5PM 210271105 Zuijv t 1130 6 Z wi j Z Uijv t j j where e is the integration rate Uii 1 51 7 1 and for 239 y j Uij The nonlinear variant of the above equation that includes dendritic filtering of input currents in the dynamics of the membrane potential is given by vit1 fltZwijIjtgt 9Z Uijv my 1129 ie vit 1 1131 1132 where f and g are nonlinear dendritic filtering functions for feedforward and recurrent inputs We can model the effects of background inputs and the random openings of membrane channels by adding a Gaussian white noise term to the righthand side of equations 1131 and 1132 This makes the spiking of neurons in the recurrent network stochastic Plesser and Gerstner 27 and Gerstner 11 have shown that under reasonable assumptions the probability of spiking in such noisy neurons can be approximated by an escape function or hazard function that depends only on the distance between the noisefree membrane potential 1 and the threshold T Several different escape functions were found to yield similar results We use the following exponential function suggested in 11 for noisy integrateandfire networks Pneuron 239 spikes at time t ke T 1133 where k is an arbitrary constant We use a model that combines equations 1132 and 1133 to generate spikes 244 114 1141 11 Neural Models ofBayesitm BeliefPropagation Rajesh P N R120 Inference in Spiking Networks By comparing the membrane potential equation 1132 with the belief prop agation equation in the log domain equation 1119 we can postulate the following correspondences vit1 logmg 1134 NEW1111 IogPltI 16gt 1135 J39 9ZUU1 logZP19 19TlmTl t 1136 J39 J39 The dendritic filtering functions f and g approximate the logarithm function the synaptic currents Ij t and vt are approximated by the corresponding instantaneous firing rates and the recurrent synaptic weights Uij encode the transition probabilities P19 6 1 Since the membrane potential vit1 is assumed to be equal to log equa tion 1134 we can use equation 1133 to calculate the probability of spiking for each neuron 239 as Pneuron 239 spikes at time t 1 olt e 1 1 T 1137 olt 4mm 1138 olt 1139 Thus the probability of spiking or equivalently the instantaneous firing rate for neuronz in the recurrent network is directly proportional to the message m2 which is the posterior probability of the neuron s preferred state and current input given past inputs Similarly the instantaneous firing rates of the group of neurons representing log m is proportional to my which is the precisely the input required by equation 1136 Results Example 1 Detecting Visual Motion We first illustrate the application of the linear firing ratebased model sec tion 1131 to the problem of detecting visual motion A prominent property of visual cortical cells in areas such as V1 and MT is selectivity to the direction of visual motion We show how the activity of such cells can be interpreted as representing the posterior probability of stimulus motion in a particular direc tion given a series of input images For simplicity we focus on the case of 1D motion in an image consisting of X pixels with two possible motion directions leftward L or rightward R Let the state 197 represent a motion direction j 6 LR at spatial loca tion 239 Consider a network of N neurons each representing a particular state 197 figure 114A The feedforward weights are assumed to be Gaussians ie F0913 F0911 Gaussian centered at location 239 with a standard 114 Results 245 deviation 0 Figure 11413 depicts the feedforward weights for a network of 30 neurons 15 encoding leftward and 15 encoding rightward motion He new WW 9m 03 Image 20 10 Spatial Location pixels From Neuron I is so To Neuron i Figure 114 Recurrent Network for Motion Detection from 30 A depicts a re current network of neurons shown for clarity as two chains selective for leftward and rightward motion respectively The feedforward synaptic weights for neuron i in the leftward or rightward chain are determined by The recurrent weights reflect the transition probabilities P9ml9m and P iLl jL B Feedforward weights for neurons 139 1 15 rightward chain The feedforward weights for neurons 139 15 30 leftward chain are identical C Transition probabilities P tl9quot1 Probability values are proportional to pixel brightness D Recurrent weights Uij com puted from the transition probabilities in C using Equation 1124 We model visual motion using an HMM The transition probabilities P091716 are selected to re ect both the direction of motion and speed of the moving stimulus The transition probabilities for rightward motion from the state ag ie P09131616 were set according to a Gaussian centered at location k I where z is a parameter determined by stimulus speed The transition probabilities for leftward motion from the state 9 were likewise set to Gaus sian values centered at k 7 z The transition probabilities from states near the two boundaries 239 1 and 239 X were chosen to be uniformly random values Figure 114C shows the matrix of transition probabilities 246 11 Neural Models ofBayesitm BeliefPropagation Rajesh R N R120 Recurrent Network Model To detect motion using Bayesian inference in the above HMM consider first a model based on the linear recurrent network as in equation 1118 but with normalization as in equation 1123 which makes the network nonlinear We can compute the recurrent weights mij for the transition probabilities given above using the approximation method in equation 1124 see figure 114D The resulting network then r fr beli r r U for the HMM based on equation 11201123 Figure 115 shows the output of the net work in the middle of a sequence of input images depicting a bar moving either leftward or rightward As shown in the figure for a leftwardmoving bar at a particular location i the highest network output is for the neuron representing location 239 and direction L while for a rightwardmoving bar the neuron rep resenting location i and direction R has the highest output The output firing rates were computed from the log probabilities log m using a simple linear encoding model f c v Frr where c is a positive constant 12 for this plot F is the maximum firing rate of the neuron 100 in this example and denotes rectification Note that even though the loglikelihoods are the same for leftward and rightwardmoving inputs the asymmetric recurrent weights which represent the transition probabilities allow the network to distinguish between leftward and rightwardmoving stimuli The posterior probabilities m are shown in figure 115 lowest panels The network correctly com putes posterior probabilities close to 1 for the states 61 and 63 for leftward and rightward motion respectively at location i Rightward Moving Input Leftward Moving Input lag likelihood 5 W mi A i i A i on l i A l 5 30 gm seiecwe neuvom i iattsetactwe nemms I 5 so ight seiecwe neuvom i tensmnve nemms Figure 115 Network Output for a Moving Stimulus from 30 Left Panel The four plots depict respectively the log likelihoods log posteriors neural firing rates and pos terior probabilities observed in the network for a rightward moving bar when it ar rives at the central image location Note that the log likelihoods are the same for the rightward and leftward selective neurons the first 15 and last 15 neurons respectively as dictated by the feedforward weights in Figure 11413 but the outputs of these neu rons correctly reflect the direction of motion as a result of recurrent interactions Right Panel The same four plots for a leftward moving bar as it reaches the central location 1142 114 Results 247 Nonlinear Spiking Model The motion detection task can also be solved using a nonlinear network with spiking neurons as described in section 1133 A singlelevel recurrent network of 30 neurons as in the previous section was used The feedforward weights were the same as in figure 1143 The recurrent connections directly encoded transition probabilities for leftward motion see figure 114C As seen in fig ure 116A neurons in the network exhibited direction selectivity Furthermore the spiking probability of neurons re ects the probability of motion direc tion at a given location as in equation 1139 figure 1163 suggesting a proba bilistic interpretation of directionselective spiking responses in visual cortical areas such as V1 and MT Rightward Motion Leftward Motion Negron 1 A i l m t A i it l 12 l N B Figure 116 Responses from the Spiking Motion Detection Network A Spiking re sponses of three of the first 15 neurons in the recurrent network neurons 8 10 and 12 As is evident these neurons have become selective for rightward motion as a conse quence of the recurrent connections transition probabilities specified in Figure 114C B Posterior probabilities over time of motion direction at a given location encoded by the three neurons for rightward and leftward motion Example 2 Bayesian DecisionMaking in a RandomDots Task To establish a connection to behavioral data we consider the wellknown ran dom dots motion discrimination task see for example 41 The stimulus consists of an image sequence showing a group of moving dots a fixed frac tion of which are randomly selected at each frame and moved in a fixed di rection for example either left or right The rest of the dots are moved in 248 11 Neural Models ofBayesitm BeliefPropagation Rajesh R N R120 random directions The fraction of dots moving in the same direction is called the coherence of the stimulus Figure 117A depicts the stimulus for two dif ferent levels of coherence The task is to decide the direction of motion of the coherently moving dots for a given input sequence A wealth of data exists on the psychophysical performance of humans and monkeys as well as the neural responses in brain areas such as the middle temporal MT and lateral intra parietal areas LIP in monkeys performing the task see 41 and references therein Our goal is to explore the extent to which the proposed models for neural belief propagation can explain the existing data for this task The nonlinear motion detection network in the previous section computes the posterior probabilities P19LlIt1117 I1 and P19Rllt1117 I1 of left ward and rightward motion at different locations 239 These outputs can be used to decide the direction of coherent motion by computing the posterior prob abilities for leftward and rightward motion irrespective of location given the input images These probabilities can be computed by marginalizing the pos terior distribution computed by the neurons for leftward L and rightward R motion over all spatial positions 239 PLlIt111 11 ZP19Lllt11111 1140 PRlIt111I1 7 ZP19Rllt111 11 1141 To decide the overall direction of motion in a randomdots stimulus there exist two options 1 view the decision process as a race between the two probabilities above to a prechosen threshold this also generalizes to more than two choices or 2 compute the log of the ratio between the two probabilities above and compare this logposterior ratio to a prechosen threshold We use the latter method to allow comparison to the results of Shadlen and colleagues who postulate a ratiobased model in area LIP in primate parietal cortex 12 The logposterior ratio Tt of leftward over rightward motion can be defined as 11 log PLlIt 1 1 1 11 4 log PRlIt1 1 1 11 1142 PLlIt11 111 10g PRlIt111I1 1143 If Tt gt 0 the evidence seen so far favors leftward motion and vice versa for Tt lt 0 The instantaneous ratio Tt is susceptible to rapid uctuations due to the noisy stimulus We therefore use the following decision variable dL t to track the running average of the log posterior ratio of L over R dLt1 dLtart7dLt 1144 and likewise for d3 t the parameter a is between 0 and 1 We assume that the decision variables are computed by a separate set of decision neurons that receive inputs from the motion detection network These neurons are once 1143 114 Results 249 again leakyintegrator neurons as described by Equation 1144 with the driv ing inputs Tt being determined by inhibition between the summed inputs from the two chains in the motion detection network as in equation 1142 The output of the model is L if dL t gt c and R if dRt gt c where c is a confidence threshold that depends on task constraints for example accu racy vs speed requirements 35 Figure 11713 and C shows the responses of the two decision neurons over time for two different directions of motion and two levels of coherence Be sides correctly computing the direction of coherent motion in each case the model also responds faster when the stimulus has higher coherence This phe nomenon can be appreciated more clearly in figure 117D which predicts pro gressively shorter reaction times for increasingly coherent stimuli dotted ar Comparison to Neurophysiological Data The relationship between faster rates of evidence accumulation and shorter re action times has received experimental support from a number of studies Fig ure 117E shows the activity of a neuron in the frontal eye fields FEF for fast medium and slow responses to a visual target 39 40 Schall and collabora tors have shown that the distribution of monkey response times can be repro duced using the time taken by neural activity in FEF to reach a fixed threshold 15 A similar risetothreshold model by Carpenter and colleagues has re ceived strong support in human r the prior probabilities of targets 3 and the urgency of the task 35 In the case of the randomdots task Shadlen and collaborators have shown that in primates one of the cortical areas involved in making the decision re garding coherent motion direction is area LIP The activities of many neurons in this area progressively increase during the motionviewing period with faster rates of rise for more coherent stimuli see figure 117F 37 This behavior is similar to the responses of decision neurons in the model figure 117KB suggesting that the outputs of the recorded LIP neurons could be interpreted as representing the logposterior ratio of one task alternative over another see 3 12 for related suggestions 395 eiiiileill that if 39 Example 3 Attention in the Visual Cortex The responses of neurons in cortical areas V2 and V4 can be significantly mod ulated by attention to particular locations within an input image McAdams and Maunsell 23 showed that the tuning curve of a neuron in cortical area V4 is multiplied by an approximately constant factor when the monkey focuses attention on a stimulus within the neuron s receptive field Reynolds et al 36 have shown that focusing attention on a target in the presence of distractors causes the response of a V2 or V4 neuron to closely approximate the response elicited when the target appears alone Finally a study by Connor et al 4 demonstrated that responses to unattended stimuli can be affected by spatial 250 11 Neural Models ofBayesitm BeliefPropagation Rajesh R N R120 attention to nearby locations All three types of response modulation described above can be explained in terms of Bayesian inference using the hierarchical graphical model for images given in section 1123 figure 113 Each V4 neuron is assumed to encode a feature F as its preferred stimulus A separate group of neurons eg in the parietal cortex is assumed to encode spatial locations and potentially other I i r 39 liail fuiiilaliuil iiie peclive Of feature Values Lowerlevel neurons for example in V2 and V1 are assumed to represent the interme diate representations Ci Figure 1133 depicts the corresponding network for neural belief propagation Note that this network architecture mimics the di vision of labor between the ventral object processing quotwhatquot stream and the dorsal spatial processing quotwherequot stream in the visual cortex 24 The initial firing rates of location and featurecoding neurons represent prior probabilities PL and PF respectively assumed to be set by task dependent feedback from higher areas such as those in prefrontal cortex The input likelihood PI MO is set to Ej wijIj where the weights wij repre sent the attributes of C specific feature at a specific location Here we set these weights to spatially localized oriented Gabor filters equation 1111 and 1112 are assumed to be computed by feedforward neurons in the location coding and featurecoding parts of the network with their synapses encod ing PClL7 F Taking the logarithm of both sides of equations 11131115 we obtain equations that can be computed using leaky integrator neurons as in equation 1132 f and g are assumed to approximate a logarithmic trans formation Recurrent connections in equation 1132 are used to implement the inhibitory component corresponding to the negative logarithm of the nor malization constants Furthermore since the membrane potential vit is now equal to the log of the posterior probability ie vit log PFlI I and similarly for L and C we obtain using equation 1133 Pfeature coding neuron 239 spikes at time t olt PFlI I 1145 This provides a new interpretation of the spiking probability or instantaneous firing rate of a V4 neuron as representing the posterior probability of a pre ferred feature in an image irrespective of spatial location To model the three primate experiments discussed above 4 23 36 we used horizontal and vertical bars that could appear at nine different locations in the input image figure 118A All results were obtained using a network with a single set of parameters PClL7 F was chosen such that for any given value of L and F say location Lj and feature Fk the value of C closest to the combi nation Lj7 Fk received the highest probability with decreasing probabilities for neighboring locations see figure 1183 h J n of r We simulated the attentional task of McAdams and Maunsell 23 by present ing a vertical bar and a horizontal bar simultaneously in an input image At tention to a location Li containing one of the bars was simulated by setting 114 Results 251 a high value for PL corresponding to a higher firing rate for the neuron coding for that location Figure 119A depicts the orientation tuning curves of the vertical feature cod ing model V4 neuron in the presence and absence of attention squares and circles respectively The plotted points represent the neuron s firing rate en coding the posterior probability PFlI I F being the vertical feature At tention in the model approximately multiplies the unattended responses by a constant factor similar to V4 neurons figure 1193 This is due to the change in the prior PL between the two modes which affects equation 1112 and 1115 multiplicatively Effects of Attention on Responses in the Presence of Distractors To simulate the experiments of Reynolds et al 36 a single vertical bar Refer ence were presented in the input image and the responses of the vertical fea ture coding model neuron were recorded over time As seen in figure 1110A top panel dotted line the neuron s firing rate re ects a posterior probabil ity close to 1 for the vertical stimulus When a horizontal bar Probe alone is presented at a different location the neuron s response drops dramatically solid line since its preferred stimulus is a vertical bar not a horizontal bar When the horizontal and vertical bars are simultaneously presented Pair the firing rate drops to almost half the value elicited for the vertical bar alone dashed line signaling increased uncertainty about the stimulus compared to the Referenceonly case However when attention is turned on by increas ing PL for the vertical bar location figure 1110A bottom panel the firing rate is restored back to its original value and a posterior probability close to 1 is signaled topmost plot dotdashed line Thus attention acts to reduce un certainty about the stimulus given a location of interest Such behavior closely mimics the effect of spatial attention in areas V2 and V4 36 figure 11103 Effects of Attention on Neighboring Spatial Locations We simulated the experiments of Connor et al 4 using an input image con taining four fixed horizontal bars as shown in figure 1111A A vertical bar was ashed at one of five different locations in the center figure 1111A 15 Each bar plot in figure 11113 shows the responses of the vertical feature coding model V4 neuron as a function of vertical bar location bar positions 1 through 5 when attention is focused on one of the horizontal bars left right upper or lower Attention was again simulated by assigning a high prior probability for the location of interest As seen in figure 11113 there is a pronounced effect of proximity to the locus of attention the unattended stimulus vertical bar produces higher re sponses when it is closer to the attended location than further away see for example Attend Left This effect is due to the spatial spread in the condi tional probability PClL7 F see figure 1183 and its effect on equation 1112 and 1115 The larger responses near the attended location re ect a reduction 252 1151 11 Neural Models ofBayesitm Beliemepzgation Rajesh R N R120 in uncertainty at locations closer to the focus of attention compared to locations farther away For comparison the responses from a V4 neuron are shown in figure 1111C from 4 Discussion This chapter described models for neurally implementing the belief propaga tion algorithm for Bayesian inference in arbitrary graphical models Linear and nonlinear models based on firing rate dynamics as well as a model based on noisy spiking neurons were presented We illustrated the suggested approach in two domains 1 inference over time using an HMM and its application to visual motion detection and decisionmaking and 2 inference in a hierarchi cal graphical model and its application to understanding attentional effects in the primate visual cortex The approach suggests an interpretation of cortical neurons as computing the posterior probability of their preferred state given current and past in puts In particular the spiking probability or instantaneous firing rate of a neuron can be shown to be directly proportional to the posterior probability of the preferred state The model also ascribes a functional role to local re current connections lateral horizontal connections in the neocortex connec tions from excitatory neurons are assumed to encode transition probabilities between states from one time step to the next while inhibitory connections are used for probability normalization see equation 1123 Similarly feedback connections from higher to lower areas are assumed to convey prior proba bilities re ecting prior knowledge or task constraints as used in the attention model in section 1143 Related Models A number of models have been proposed for probabilistic computation in networks of neuronlike elements These range from early models based on statistical mechanics such as the Boltzmann machine 19 20 to more re cent models that explicitly rely on probabilistic generative or causal models 6 10 29 33 43 44 45 We review in more detail some of the models that are closely related to the approach presented in this chapter Models based on LogLikelihood Ratios Gold and Shadlen 12 have proposed a model for neurons in area LIP that interprets their responses as representing the loglikelihood ratio between two alternatives Their model is inspired by neurophysiological results from Shadlen s group and others showing that the responses of neurons in area LIP exhibit a behavior similar to a random walk to a fixed threshold The neuron s response increases given evidence in favor of the neuron s preferred hypoth esis and decreases when given evidence against that hypothesis resulting in 115 Discussion 253 an evidence accumulation process similar to computing a loglikelihood ratio over time see section 1142 Gold and Shadlen develop a mathematical model 12 to formalize this intuition They show how the loglikelihood ratio can be propagated over time as evidence trickles in at each time instant This model is similar to the one proposed above involving logposterior ratios for decision making The main difference is in the representation of probabilities While we explicitly maintain a representation of probability distributions of relevant states using populations of neurons the model of Gold and Shadlen relies on the argument that input firing rates can be directly interpreted as loglikelihood ratios without the need for explicit representation of probabilities An extension of the Gold and Shadlen model to the case of spiking neurons was recently proposed by Deneve 8 In this model each neuron is assumed to represent the log odds ratio for a preferred binaryvalued state ie the logarithm of the probability that the preferred state is 1 over the probability that the preferred state is 0 given all inputs seen thus far To promote efficiency each neuron fires only when the difference between its logodds ratio and a prediction of the logodds ratio based on the output spikes emitted thus far reaches a certain threshold Models based on logprobability ratios such as the ones described above have several favorable properties First since only ratios are represented one may not need to normalize responses at each step to ensure probabilities sum to 1 as in an explicit probability code Second the ratio representation lends it self naturally to some decisionmaking procedures such as the one postulated by Gold and Shadlen However the logprobability ratio representation also suffers from some potential shortcomings Because it is a ratio it is susceptible to instability when the r quotquot in the I39 r r zero a log probability code also suffers from a similar problem although this can be han dled using bounds on what can be represented by the neural code Also the approach becomes inefficient when the number of hypotheses being consid ered is large given the large number of ratios that may need to be represented corresponding to different combinations of hypotheses Finally the lack of an explicit probability representation means that many useful operations in prob ability calculus such as marginalization or uncertainty estimation in specific dimensions could become complicated to implement Inference Using Distributional Codes There has been considerable research on methods for encoding and decoding information from populations of neurons One class of methods uses basis functions or kernels to represent probability distributions within neuronal ensembles 1 2 9 In this approach a distribution Px over stimulus x is represented using a linear combination of basis functions PX Emux 1146 i 254 11 Neural Models ofBayesitm Beliemepzgation Rajesh R N R120 where T is the normalized response firing rate and b the implicit basis func tion associated with neuron 239 in the population The basis function of each neuron is assumed to be linearly related to the tuning function of the neuron as measured in physiological experiments The basis function approach is similar to the approach described in this chapter in that the stimulus space is spanned by a limited number of neurons with preferred stimuli or state vectors The two approaches differ in how probability distributions are represented by neu ral responses one using an additive method and the other using a logarith mic transformation either in the firing rate representation sections 1131 and 1132 or in the membrane potential representation section 1133 A limitation of the basis function approach is that due to its additive na ture it cannot represent distributions that are sharper than the component dis tributions A second class of models addresses this problem using a genera tive approach where an encoding model eg Poisson is first assumed and a Bayesian decoding model is used to estimate the stimulus x or its distribu tion given a set of responses T 28 46 48 49 51 For example in the distribu tional population coding DPC method 48 49 the responses are assumed to depend on general distributions Px and a maximimum a posteriori MAP probability distribution over possible distributions over x is computed The best estimate in this method is not a single value of x but an entire distribu tion over x which is assumed to be represented by the neural population The underlying goal of representing entire distributions within neural populations is common to both the UPC approach and the models presented in this chap ter However the approaches differ in how they achieve this goal the UPC method assumes prespecified tuning functions for the neurons and a sophisti cated nonneural decoding operation whereas the method introduced in this chapter directly instantiates a probabilistic generative model with an exponen tial or linear decoding operation Sahani and Dayan have recently extended the UPC method to the case where there is uncertainty as well as simultane ous multiple stimuli present in the input 38 Their approach known as dou bly distributional population coding DDPC is based on encoding probability distributions over a function mx of the input x rather than distributions over x itself Needless to say the greater representational capacity of this method comes at the expense of more complex encoding and decoding schemes The distributional coding models discussed above were geared primarily to ward representing probability distributions More recent work by Zemel and colleagues 50 has explored how distributional codes could be used for infer ence as well In their approach a recurrent network of leaky integrateand fire neurons is trained to capture the probabilistic dynamics of a hidden vari able X t by minimizing the KullbackLeibler KL divergence between an in put encoding distribution PXtlRt and an output decoding distribution QXtlSt where Rt and St are the input and output spike trains re spectively The advantage of this approach over the models presented in this chapter is that the decoding process may allow a higherfidelity representation of the output distribution than the direct representational scheme used in this chapter On the other hand since the probability representation is implicit in 115 Discussion 255 the neural population it becomes harder to map inference algorithms such as belief propagation to neural circuitry Hierarchical Inference There has been considerable interest in neural implementation of hierarchical models for inference Part of this interest stems from the fact that hierarchical models often capture the multiscale structure of input signals such as images in a very natural way eg objects are composed of parts which are composed of subparts which are composed of edges A hierarchical decomposition often results in greater efficiency both in terms of representation eg a large num ber of objects can be represented by combining the same set of parts in many different ways and in terms of learning A second motivation for hierarchical models has been the evidence from anatomical and physiological studies that many regions of the primate cortex are hierarchically organized eg the visual cortex motor cortex etc Hinton and colleagues investigated a hierarchical network called the Helmholtz machine 16 that uses feedback connections from higher to lower levels to instantiate a probabilistic generative model of its inputs see also 18 An interesting learning algorithm termed the wakesleep algorithm was pro posed that involved learning the feedback weights during a wake phase based on inputs and the feedforward weights in the sleep phase based on fantasy data produced by the feedback model Although the model employs feedback connections these are used only for bootstrapping the learning of the feedforward weights via fantasy data Perception involves a single feedfor ward pass through the network and the feedback connections are not used for inference or topdown modulation of lowerlevel activities A hierarchical network that does employ feedback for inference was ex plored by Lewicki and Sejnowski 22 see also 17 for a related model The LewickiSejnowski model is a Bayesian belief network where each unit encodes a binary state and the probability that a unit s state S is equal to 1 depends on the states of its parents paS via 135 1pasW m w sj 1147 139 where W is the matrix of weights wji is the weight from Sj to S wji 0 for j lt 239 and h is the noisy OR function hr 1 7 e z 2 0 Rather than inferring a posterior distribution over states as in the models presented in this chapter Gibbs sampling is used to obtain samples of states from the posterior the sampled states are then used to learn the weights wji Rao and Ballard proposed a hierarchical generative model for images and explored an implementation of inference in this model based on predictive coding 34 Unlike the models presented in this chapter the predictive cod ing model focuses on estimating the MAP value of states rather than an en tire distribution More recently Lee and Mumford sketched an abstract hier archical model 21 for probabilistic inference in the visual cortex based on an 256 1152 11 Neural Models ofBayesitm BeliefPropagation Rajesh R N R120 inference method known as particle filtering The model is similar to our ap proach in that inference involves message passing between different levels but whereas the particlefiltering method assumes continuous random variables our approach uses discrete random variables The latter choice allows a con crete model for neural r quotquot while it is unclear how a biologically plausible network of neurons can implement the different components of the particle filtering algorithm The hierarchical model for attention described in section 1143 bears some similarities to a recent Bayesian model proposed by Yu and Dayan 47 see also 7 32 Yu and Dayan use a fivelayer neural architecture and a log proba bility encoding scheme as in 30 to model reaction time effects and multiplica tive response modulation Their model however does not use an interme diate representation to factor input images into separate feature and location attributes It therefore cannot explain effects such as the in uence of attention on neighboring unattended locations 4 A number of other neural models exist for attention eg models by Grossberg and colleagues 13 14 that are much more detailed in specifying how various components of the model fit with cortical architecture and circuitry The approach presented in this chapter may be viewed as a first step toward bridging the gap between detailed neural models and more abstract Bayesian theories of perception i and piuce in Of Open Problems and Future Challenges An important open problem not addressed in this chapter is learning and adap tation How are the various conditional probability distributions in a graphical model learned by a network implementing Bayesian inference For instance in the case of the HMM model used in section 1141 how can the transition probabilities between states from one time step to the next be learned Can wellknown biologically plausible learning rules such as Hebbian learning or the BienenstockCooper Munro BCM rule eg see 5 be used to learn con ditional probabilities What are the implications of spiketiming dependent plasticity STDP and shortterm plasticity on probabilistic representations in neural populations A second open question is the use of spikes in probabilistic representations The models described above were based directly or indirectly on instantaneous firing rates Even the noisy spiking model proposed in section 1133 can be regarded as encoding posterior probabilities in terms of instantaneous firing rates Spikes in this model are used only as a mechanism for communicat ing information about firing rate over long distances An intriguing alternate possibility that is worth exploring is whether probability distributions can be encoded using spike timingbased codes Such codes may be intimately linked to timingbased learning mechanisms such as STDP Another interesting issue is how the dendritic nonlinearities known to exist in cortical neurons could be exploited to implement belief propagation as in equation 1119 This could be studied systematically with a biophysical com partmental model of a cortical neuron by varying the distribution and densities 115 Discussion 257 of various ionic channels along the dendrites Finally this chapter explored neural implementations of Bayesian inference in only two simple graphical models HIVJMs and a threelevel hierarchical model Neuroanatomical data gathered over the past several decades pro vide a rich set of clues regarding the types of graphical models implicit in brain structure For instance the fact that visual processing in the primate brain involves two hierarchical but interconnected pathways devoted to spa tial and object vision the what and where streams 24 suggests a mul tilevel graphical model wherein the input image is factored into progressively complex sets of object features and their transformations Similarly the ex istence of multimodal areas in the inferotemporal cortex suggests graphical models that incorporate a common modalityindependent representation at the highest level that is causally related to modalitydependent representa tions at lower levels Exploring such graphical models that are inspired by neurobiology could not only shed new light on brain function but also furnish novel architectures for solving fundamental problems in machine vision and robotics Acknowledgments This work was supported by grants from the ONR Adap tive Neural Systems program NSF NGA the Sloan Foundation and the Packard Foundation References 1 Anderson CH 1995 Unifying perspectives on neuronal codes and processing In 19th International Workshop on Condensed Matter Theories Caracas Venezuela 2 Anderson CH Van Essen DC 1994 Neurobiological computational systems In Zu rada M Marks II R Robinson C eds Computational Intelligence Imitating Life pages 2137222 New York IEEE Press 3 Carpenter RHS Williams MLL 1995 Neural computation of log likelihood in control of saccadic eye movements Nature 37759762 4 Connor CE Freddie DC Gallant L Van Essen DC 1997 Spatial attention effects in macaque area V4 Iournal ofNeuroscience 17320173214 5 Dayan F Abbott LF 2001 Theoretical F eling ofNeural Systems Cambridge MA MIT Press 6 Dayan F Hinton G Neal R Zemel R 1995 The Helmholtz machine Neural Computae tion 78897904 7 Dayan F Zemel R 1999 Statistical models and sensory attention In Willshaw D Murray A eds Proceedings of the International Conference on Artificial Neural Networks ICANN pages 101771022 London IEEE Press 8 Deneve S 2005 Bayesian inference in spiking neurons In Saul LK Weiss Y Bottou L eds Advances in Neural Information Processing Systems 17 pages 3537360 Cambridge MA MIT Press 9 Eliasmith C Anderson CH 2003 Neural Engineering Computation Representation and Dynamics in Neurobiological Systems Cambridge MA MIT Press 1 1 1 1Mndi 258 11 Neural Models ofBayesian BeliefPropagation Rajesh P N Rao 10 Freeman WT Haddon Pasztor EC 2002 Learning motion analysis In Probabilistic Models of the Brain Perception and Neural Function pages 977115 Cambridge MA MIT Press 11 Gerstner W 2000 Population dynamics of spiking neurons Fast transients asyn chronous states and locking Neural Computation 12143789 12 Gold 1 Shadlen MN 2001 Neural computations that underlie decisions about sen sory stimuli Trends in Cognitive Sciences 511 16 13 Grossberg S 2005 Linking attention to learning expectation competition and con sciousness ln Neurobiology ofAttention pages 6527662 San Diego Elsevier 14 Grossberg S Raizada R 2000 Contrastsensitive perceptual grouping and object based attention in the laminar circuits of primary visual cortex Vision Research 40141371432 15 Hanes DP Schall D 1996 Neural control of voluntary movement initiation Science 2744277430 16 Hinton G Dayan P Frey B Neal R 1995 The wakesleep algorithm for unsupervised neural networks Science 268115amp1161 17 Hinton G Ghahramani Z 1997 Generative models for discovering sparse distributed representations Philosophical Transactions of the Royal Society of London Series B 352117771190 18 Hinton G Osindero S Teh Y 2006 A fast learning algorithm for deep belief nets Neural Computation 18152771554 19 Hinton G Sejnowski T 1983 Optimal perceptual inference In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pages 4487453 Washington DC 1983 New York IEEE Press 20 Hinton G Sejnowski T 1986 Learning and relearning in Boltzmann machines In Rumelhart D McClelland eds Parallel Distributed Processing volume 1 chapter 7 pages 2827317 Cambridge MA MIT Press 21 Lee TS Mumford D 2003 Hierarchical Bayesian inference in the visual cortex Iournal of the Optical Society ofAmerica A 2071434r1448 22 Lewicki MS Sejnowski T 1997 Bayesian unsupervised learning of higher order structure In Mozer M Jordan M Petsche T eds Advances in Neural Information Prof cessing Systems 9 Cambridge MA MIT Press 23 McAdams C Maunsell HR 1999 Effects of attention on orientationtuning func tions of single neurons in macaque cortical area V4 Iournal of Neuroscience 194317 441 24 Mishkin M Ungerleider LG Macko KA 1983 Object vision and spatial vision two cortical pathways Trends in Neuroscience 64147417 25 Murphy K Weiss Y Jordan M 1999 Loopy belief propagation for approximate infer ence an empirical study In Laskey K Prade H eds Proceedings of LIAI Uncertainty in Al pages 467475 San Francisco Morgan Kaufmann 26 Pearl I 1988 Probabilistic Reasoning in Intelligent Systems Networks of Plausible Inferi ence San Mateo CAMorgan Kaufmann 27 Plesser HE Gerstner W 2000 Noise in integrateandfire neurons from stochastic input to escape rates Neural Computation 1223677384 115 Discussion 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 259 Pouget A Zhang K Deneve S Latham PE 1998 Statistically efficient estimation using population coding Neural Computation 1023737401 Rao RPN 1999 An optimal estimation approach to visual perception and learning Vision Research 3911196371989 Rao RPN 2004 Bayesian computation in recurrent neural circuits Neural Computai tion 1611738 Rao RPN 2005 Bayesian inference and attentional modulation in the visual cortex Neuroreport 1616184371848 Rao RPN 2005 Hierarchical Bayesian inference in networks of spiking neurons In Saul LK Weiss Y Bottou L eds Advances in Neural Information Processing Systems 17 pages 111371120 Cambridge MA MIT Press Rao RPN Ballard DH 1997 Dynamic model of visual recognition predicts neural response properties in the visual cortex Neural Computation 947217763 Rao RPN Ballard DH 1999 Predictive coding in the visual cortex a functional inter pretation of some extraclassical receptive field effects Nature Neuroscience 21797 87 Reddi BA Carpenter RH 2000 The influence of urgency on decision time Nature Neuroscience 388277830 Reynolds H Chelazzi L Desimone R 1999 Competitive mechanisms subserve at tention in macaque areas V2 and V4 Iournal of Neuroscience 19173r1753 Roitman ID Shadlen MN 2002 Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task Iournal of Neuroscience 22947579489 Sahani M Dayan P 2003 Doubly distributional population codes simultaneous representation of uncertainty and multiplicity Neural Computation 15225572279 Schall ID Hanes DP 1998 Neural mechanisms of selection and control of visually guided eye movements Neural Networks 11124171251 Schall D Thompson KG 1999 Neural selection and control of visually guided eye movements Annual Review ofNeuroscience 2224 7259 Shadlen MN Newsome WT 2001 Neural basis of a perceptual decision in the parietal cortex area LIP of the rhesus monkey Iournal ofNeurophysiology 864191r1936 Shon AP Rao RPN 2005 Implementing belief propagation in neural circuits Neuroi computing 65663937399 Simoncelli EP 1993 Distributed Representation and Analysis of Visual Motion PhD the sis Department of Electrical Engineering and Computer Science MIT Cambridge MA Weiss Y Fleet D 2002 Velocity likelihoods in biological and machine vision In Rao RPN Olshausen BA Lewicki MS eds Probabilistic Models of the Brain Perception and Neural Function pages 7796 Cambridge MA MIT Press Weiss Y Simoncelli EP Adelson EH 2002 Motion illusions as optimal percepts Na ture Neuroscience 565987604 Wu S Chen D Niranjan M Amari SI 2003 Sequential Bayesian decoding with a population of neurons Neural Computation 15 260 11 Neural Models ofBayesian BeliefPropagation Rajesh P N Rao 47 Yu A Dayan P 2005 Inference attention and decision in a Bayesian neural archi tecture In Saul LK Weiss Y Bottou L eds Advances in Neural Information Processing Systems 17 pages 157771584 Cambridge MA MIT Press 2005 48 Zemel RS Dayan P 1999 Distributional population codes and multiple motion mod els In Kearns MS Solla SA Cohn DA eds Advances in Neural Information Processing Systems 11 pages 174r180 Cambridge MA MIT Press 49 Zemel RS Dayan P Pouget A 1998 Probabilistic interpretation of population codes Neural Computation 1024037430 50 Zemel RS Huys QIM Natarajan R Dayan P 2005 Probabilistic computation in spik ing populations In Saul LK Weiss Y Bottou L eds Advances in Neural Information Processing Systems 17 pages 160971616 Cambridge MA MIT Press 51 Zhang K Ginzburg I McNaughton BL Sejnowski T 1998 Interpreting neuronal population activity by reconstruction A unified framework with application to hip pocampal place cells Iournal ofNeurophysiology 792101771044 115 Discussion 261 50 01 65 Leftward C 75Rightwar 100 3R dL 01 0 150 300 0 150 300 A Time no of time steps Time no of time steps B C 005 D 0 30 fquot39 005 Leftward quotquotquot 60 i 90 0 150 300 0 150 300 0 150 300 Time no of time steps 70 512 256 A gt t S E 1 F a gt 0 EH Zlt 5gt 0 22 0 100 200 Time from stimulus ms 0 400 800 Time from motion onset ms Figure 117 Output of Decision Neurons in the Model A Depiction of the random dots task Two different levels of motion coherence 50 and 100 are shown A 1 D version of this stimulus was used in the model simulations B amp C Outputs dLt and dRt of model decision neurons for two different directions of motion The deci sion threshold is labeled quotcquot D Outputs of decision neurons for three different levels of motion coherence Note the increase in rate of evidence accumulation at higher co herencies For a fixed decision threshold the model predicts faster reaction times for higher coherencies dotted arrows E Activity of a neuron in area FEF for a monkey performing an eye movement task from 40 with permission Faster reaction times were associated with a more rapid rise to a fixed threshold see the three different neu ral activity profiles The arrows point to the initiation of eye movements which are depicted at the top F Averaged firing rate over time of 54 neurons in area LIP during the random dots task plotted as a function of motion coherence from 37 with per mission Solid and dashed curves represent trials in which the monkey judged motion direction toward and away from the receptive field of a given neuron respectively The slope of the response is affected by motion coherence compare for example responses for 512 and 256 in a manner similar to the model responses shown in D 262 11 Neural Models of Bayesian Belief Propagation Rajesh P N Rao A B Up 06 Lt 06 Rt 06 Up 06 Dn Lt12345Rt 04 04 04 04 Dn Z Z Z Z 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Spatial Positions Figure 118 Input Image Configuration and Conditional Probabilities used in the At tention Experiments A Example image locations labeled 1 5 and Up Dn Lt and Rt for up down left and right relevant to the experiments discussed in the paper B Each bar plot shows PC L F for a fixed value of L 2 Lt Rt Up or Dn and for an arbitrary fixed value of F Each bar represents the probability for the feature location combination Ci encoding one of the locations 1 5 Model V4 Neuron 05 10 n262 cu 0 a C 03 3 O I Q 005 E D I a a a E o 39 Attended Unattended Z 37522 00 90 60 30 0 30 60 90 039050 39 e39o 39 3390 39 6 39 339o 39 60 90 Relative Orientation Orientation in degrees A B Figure 119 Multiplicative Modulation due to Attention A Orientation tuning curve of a feature coding model neuron with a preferred stimulus orientation of 0 degrees with filled squares and without unfilled circles attention from 31 B Orienta tion tuning curves of a V4 neuron with filled squares and without attention unfilled circles from 23 115 Discussion 263 V4 Neuron V 4 SEMI 08 06 04 Spikes per second 02 50 100 150 200 250 300 350 me from stimulus onset Normalized Firing Rate SEMI Pair 390 A Att Ref 087 C 25 O 8 20 Paquot 0 67 m y A Away CD Q 15 D Probe 047 q AttAway 10 6 027 E G 50 10 150 200 250 300 350 0 ms rom stimulus onset Time steps A B Figure 1110 Attentional Response Restoration in the presence of Distractors A Top Panel The three line plots represent the vertical feature coding neuron s response to a vertical bar Reference a horizontal bar at a different position quotProbequot and both bars presented simultaneously quotPairquot In each case the input lasted 30 time steps beginning at time step 20 Bottom Panel When attention depicted as a white oval is focused on the vertical bar the firing rate for the Pair stimulus approximates the firing rate obtained for the Reference alone from 31 B Top Panel Responses from a V4 neuron without attention Bottom Panel Responses from the same neuron when attending to the vertical bar see condition Pair Att Ref from 36 264 11 Neural Models of Bayesian Belief Propagation Rajeslz P N Rao A 1 2 3 4 5 Normalized firing rate 0 Response rate spikessec 90 60 Attend Left 1 2 3 4 5 30 12345 Attend Upper 1 2 3 4 5 Attend Right 1 2 3 4 5 Attend Lower 1 2 3 4 5 Bar position 12345 12345 Bar position spacing 19 Figure 1111 Spatial Distribution of Attention A Example trial based on Connor et al s experiments 4 showing five images each containing four horizontal bars and one vertical bar Attention was focused on a horizontal bar upper lower left or right while the vertical bar s position was varied B Responses of the vertical feature coding model neuron Each plot shows five responses one for each location of the vertical bar as attention was focused on the upper lower left or right horizontal bar from 31 C Responses of a V4 neuron from 4 Index acausal 122 attention 101 attention Bayesian model of 249 average cost per stage 273 Bayes filter 9 Bayes rule 298 Bayes theorem 6 Bayesian decision making 247 Bayesian estimate 8 9 Bayesian estimator 114 Bayesian inference 235 Bayesian network 12 belief propagation 12 235 belief state 285 bellshape 1 11 Bellman equation 267 Bellman error 272 bias 1 13 bit 6 causal 122 codin conditional probability 4 continuoustime Riccati equation 281 contraction mapping 2 8 contrasts 94 control gain 282 convolution code 119 convolution decoding 121 convolution encoding 119 correlation 5 costate 274 covariance 4 Cram r Rao bound 113 115 Cram r Rao inequality 11 crossvalidation 65 curse of dimensionality 272 decision theory 295 decoding 53 112 decoding basis function 121 deconvolution 120 differential dynamic programming 282 direct encoding 117 discounted cost 273 discretetime Riccati equation 282 discrimination threshold 116 distributional codes 253 distributional population code 117 doubly distributional population code Dynamic Causal Modeling DCM 103 dynamic programming 266 economics 295 encoding 11 2 entropy 7 estimationcontrol duality 287 EulerLagrange equation 278 evidence 12 expectation 4 expect ationmaximizat ion 1 21 extended Kalman filter 285 extremal trajectory 275 filter gain 284 firing rate 112 Fisher Information 11 Fisher information 115 fMRl 91 FokkerPlanck equation 286 gain of population activity 118 general linear model 91 318 generalized linear model GLM 6O graphical model 12 Hamilton equation 278 HamiltonIacobiBellman equation 271 Hamiltonian 275 hemodynamic response function 92 hidden Markov model 238 285 hierarchical inference 255 hyperparameter 12 hypothesis 6 importance sampling 285 independence influence function 277 information 6 information filter 284 information state 285 integrateandfire model generalized 62 iterative LQG 282 lto diffusion 269 lto lemma 271 joint probability 4 Kalman filter 9 283 Kalman smoother 285 KalmanBucy filter 284 kernel function 119 KL divergence 8 Kolmogorov equation 286 KullbackLeiber divergence 8 Lagrange multiplier 276 law of large numbers 118 Legendre transformation 278 likelihood 6 linearquadratic regulator 281 log posterior ratio 248 loss function 114 MAP 8 marginal likelihood 12 marginalization 12 Markov decision process 269 massunivariate 91 maximum a posterior estimate 8 maximum a posteriori estimator 67 Index maximum aposterior estimator 287 maximum entropy 122 maximum likelihood 55 114 maximum likelihood estimate 8 maximum likelihood Estimator 114 maximum principle 273 mean 4 MEG 91 Mexican hat kernel 120 minimal variance 114 minimumenergy estimator 287 LE 8 model selection 12 modelpredictive control 276 motion energy 8 multiplicity 121 mutual information 7 neural coding problem 53 optimality principle 266 optimality value function 266 particle filter 9 285 Poisson distribution 112 Poisson noise 115 Poisson process 56 policy iteration 267 population code 111 population codes 111 population vector 113 posterior distribution 114 posterior probabilit 6 posterior probability mappingPPM 99 preferred orientation 111 PrimerMarginal 12 prior distribution 114 prior probability 6 probabilistic population code 118 probability 3 probability density 3 probability distribution 3 product of expert 122 random field 122 random field theory RFT 99 Rauch recursion 285 recurrent network linear 240 recurrent network nonlinear 242
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'