Class Note for C&PE 940 at KU (2)
Popular in Course
Popular in Department
This 24 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 16 views.
Reviews for Class Note for C&PE 940 at KU (2)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
NONPARAMETRIC CLASSIFICATION TECHNIQUES CampPE 940 30 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoffkgskuedu 8642093 Overheads and other resources available at http peoplekuedugbohlingcpe940 Modeling Categorical Variables Classification In classification applications we are trying to develop a model for predicting a categorical response variable G from one or more predictor variables X That is if we know that an observation arises from one of K different mutually exclusive classes or groups Gk then we are trying to estimate the probability of occurrence of each group at each point in the predictor space 13k x ProbGk X x We could then assign each estimation point to the class with the highest probability at that point segmenting the predictor space into regions assigned to the different classes The fact that the probabilities are continuous variables immediately suggests the possibility of using the nonparametric regression techniques we discussed last time to model them However we will have to apply transforms to ensure that the estimated Pk x values represent a legitimate set of probabilities for mutually exclusive events meaning OSPkxSl k1K Pkxl To use this approach the group memberships in the training dataset are coded as a set of indicator variables one for each group with yk 1 for the group to which data point 139 belongs and 0 for all other groups the set of probabilities corresponding to our certain knowledge of the group membership for each training data point Another approach to the classi cation problem is to model the probability density function for each group fk X and then plug the density estimates along with those pesky prior probabilities qk into Bayes theorem to get 13kx xqk Z In this case we automatically get a legitimate set of probabilities but our models for the probability density functions should obey the constraints fkx2 0 Ifkxdxl We can employ a variety of nonparametric density estimation techniques to accomplish this task Not surprisingly these techniques are closely related to nonparametric regression techniques with counts of data points taking the place of the continuousvalued response variable We will look at a few procedures following both of these approaches modeling density functions or directly modeling probabilities For the density function based approaches we will assume that the prior probabilities are equal so that they cancel out in Bayes formula Again we will only consider supervised learning using a training dataset with known group memberships and will assume that theX values are known without error Example Data We will look at predicting facies from logs for a Cretaceous section in north central Kansas 32 p 100 km Facies assignments from core are available from the Jones well along with a suite of logs including neutron and density porosity photoelectric factor and thorium uranium and potassium components of the spectral gamma ray log We will recast the density porosity and apparent matrix density Rhomaa and the photoelectric factor as apparent matrix volumetric photoelectric absorption Umaa so that the six logs employed for discrimination are Th U K Rhomaa Umaa and 1m The six facies picked from core are marine paralic oodplain channel splay and paleosol NeutzunDensity pnmity Gamma My 50 m 0 m 3 5 gm mm unns CORE Pans leL mimlTll Phnmelzem39c swarm sum Manama o bunsech m Th MRN men man nor FLD CHN Annnnn 5393 m nnnnnn amr SPL FLD can men nnnnnn MRN So We will min on this data from the Jones Well and look at predictions both in the Jones Well and the Kenyon Well For the sake of illustration we will also look at a twodimensional twogroup subexample trying to discriminate maIine and paralic facies 25 0 Marine 16 A Paralic o E 27 7 o o 0 g n 2 a 7 6gt 0 0 29 o o 30 4 6 8 10 12 14 Nearest Neighbor Classi cation This is basically the same as nearest neighbor averaging except that instead of taking the average of y values at neighboring points we are assigning the class at each point based on a majority vote of the class memberships at neighboring points Alternatively one can compute a vector of group membership probabilities by dividing the count for each group by the total number of neighboring points As usual we would want to scale the predictor variables to comparable ranges before computing distances to neighboring points Applying this approach to the twofacies RhomaaUmaa example and predicting over a grid of Rhomaa Umaa values using 15 nearest neighbors we get the following map of probability of membership in the marine facies RHOMAA Applying nearestneighbor classification with 20 neighbors to the entire Jones well dataset using all six logs and predicting all six facies we get the following table of predicted columns versus actual rows facies C ore Predicted Facies Facies Marine Paralic Floodplain Channel Splay Paleosol Marine l l 2 2 8 0 0 6 Paralic 2 l l 0 l 9 l 9 3 3 Floodplain 5 7 206 7 4 3 Channel 0 l 5 25 4 2 0 Splay 0 0 21 2 15 0 Paleosol 2 5 25 0 0 34 Overall 82 of the intervals are classified correctly The predicted facies versus depth look a little noisy 0 Predicted Facles Core Facles o 1007 o 5 E2007 o 5 a o o o l 3007 g o 8 400 9 o 500 5 l l l l l Marine Paralic Floodplain Channel Splay Paleosol The predicted facies versus depth in the Kenyon well are very ratty 1113 N 8 Depth fee A 8 4113 5113 Marine Paralic Floodplain Channel Splay Paleosol Predicted Facies in Kenyon Kernel Density Estimation This basically uses the same procedure as kernel regression using kernel basis functions to produce smooth estimates of the group specific density functions fk X Here the kernel function are not serving to smooth out the y values but are in a sense spreading out the location of each training data point in predictor space turning a delta spike function at its exact location into a Gaussian or similar curve centered at that location The kernel density estimate at each location is basically a sum of the surrounding kernels rescaled to represent a density estimate In general we would have to estimate fully multivariate density functions using highdimensional kernel functions However if we make the simplifying assumption of independence among the variables then the multivariate density function for each group is given by the product of the density functions of the individual variables for that group fkXfkX139 cX2quot fkXd This simplification means that we only need to develop one dimensional density function estimates Plugging these density estimates into Bayes theorem leads to a method called naive Bayes 10 Applying the naive Bayes approach to the RhomaaUmaa example we get the following density estimates for each group 7 multiplying the onedimensional kemel density estimates along each axis to get the combined density estimates Rh omaa Density Paralic u u m n ww w lmu mu Ml mu n ma mm uu lm u mm m lmm u Note that the Rhomaa axis in these 3D plots is pointed in the opposite direction than the Rhomaa axis in the 2D plots so the marine cluster is plotting towards the back rather than towards the front like it should ll As contour plots the two densities look like Paralic 25 26 29 30 25 26 29 30 12 Combining the density estimates for the two groups in Bayes theorem give the following probability for the Marine facies thkumne um um um um Paralic l l 5 7 9 11 13 Applying the naive Bayes procedure to the full Jones dataset we get the following table of results With a 72 correct classification rate overall C ore Predicted Facies Facies Marine Paralic Floodplain Channel Splay Paleosol Marine l l 6 0 6 0 0 6 Paralic 4 l l 3 l 8 8 3 10 Floodplain 20 13 156 0 16 27 Channel 0 84 l 174 3 0 Splay 0 1 9 0 28 0 Paleosol 4 5 l 0 0 0 47 The predicted facies versus depth look like 0 Predicted Facles o 7 Core Facies o E 100 7 s o s a E 200 7 g o 5 I 5 D 300 7 o 8 O o i I 400 7 o E o o 500 7 Marine Paralic Floodplainchannel Splay Paleosol l4 Neural Networks For classification problems we can use essentially the same structure of neural network as we used for continuousvariable modeling The difference is that we now have K output values T k which we transform to probabilities using the softmax transfer function 13k x eXlO Tk ismo This transform ensures that we get nonnegative probabilities that sum to 1 To train the network we adjust the weights so that the estimated probabilities match the group indicator values yik as closely as possible for the N training data points Although it is possible to use a leastsquares objective function in this case it is more common to use the following crossentropy objective function RM iyiklogaxiars il kl The first sum is over the training data points and the second is over the groups for each data point Because the group indicators yLk are either 0 or 1 this objective function is really just the sum of the negative logarithms of the predicted probabilities associated with the actual class for each training data point Because logPk goes to zero as Pk approaches 1 and goes to infinity as Pk approaches 0 minimizing the crossentropy objective function tends to drive the predicted probabilities for the observed classes towards l 15 Here is a schematic representation of the classification network for our example leaving the bias nodes out of the picture Input Layer Hidden Layer Output Layer The exponential curves in the output nodes represent the softmaX transform For d input variables M hiddenlayer nodes Ml including the bias node and K output classes the number of weights in this network is M d l K M 1 Again we can use any ofa number of optimization algorithms to adjust the weights And just as for the regression problem we can include a weight decay term in the objective function forcing a smoother representation of the boundaries between classes than we would obtain otherwise 16 Using a single hiddenlayer node for the RhomaaUmaa example the sigmoid basis simply forms a step or boundary between the paralic and marine data points I ve included those points on the plot in their indicator form 7 0 s for paralic and 1 s for marine Ea Nquot m 00 H m m ow m WWquotWmumgm m g italthMWMum W Since the classification problem is about drawing boundaries the sigmoid basis functions are in a sense more natural to this problem than to the regression problem since the step in the basis function forms a boundary l7 Fitting five basis functions with a decay constant of 001 yields m Wl Ill WW 39I C PkManne 060 025 050 075 100 WII 1W l 39 W l MIMI II I lI I 18 Fitting a network With 20 hiddenlayer nodes to the entire Jones dataset using a decay constant of 01 just guessing at reasonable values for the tuning parameters yields the following classification table With 93 of the facies predicted correctly Core Predicted Facies Facies Marine Paralic Floodplain Channel Splay Paleosol Marine 124 l l 0 0 2 Paralic 0 140 5 9 0 2 Floodplain 0 9 216 2 2 3 Channel 0 4 2 256 0 0 Splay 0 2 8 0 28 0 Paleosol 0 2 7 0 0 57 And the predictions versus depth look like 1113 Depth n 3113 4113 5113 0 Predicted Facles Core Facles o l Marine Paralic Floodplain Channel 19 Splay Paleosol The predictions in the Kenyon are still quite messy 1113 N 8 Depth fee A 8 4113 5113 Marine Paralic Floodplain Channel Splay Paleosol Predicted Facies in Kenyon 20 We can also look at the facies probabilities versus depth in each well first the Jones then the Kenyon 10 08 o o i Probability o is i 02 10 08 o o i Probability o is i 02 Marine Paiaiic Floodplain cnannei Spiay Paieosol 125 250 375 500 Marine Paiaiic Floodplain cnannei Spiay Paieosoi quot39l 125 250 375 500 Depth n 21 For the facies classification problem we have certain expectations regarding typical facies thicknesses and regarding which facies occur more frequently above other facies We can encode these expectations into a transition probability matrix that describes the probability of observing each facies at certain depth given that a particular facies occurs at the next depth below that The transition probability matrix developed from the facies observed at halffoot intervals with a little modification is Faoies Facies Above Below Marine Paralic Floodplain Channel Splay Paleosol Marine 099 000 000 000 000 001 Paralic 001 098 001 001 000 000 Floodplain 000 001 096 001 000 001 Channel 000 000 001 097 001 001 Splay 000 000 003 000 097 000 Paleosol 000 003 003 000 000 094 We can use Bayes theorem again to combine the probabilities predicted from the logs at depth m p with facies probabilities derived from the probabilities at the underlying interval ml weighted by the upward transition probabilities 5k to derive modified probabilities at each depth combining information from the logs with our expectations regarding spatial adjacency m mil pk 13k WJ39 W m J k m mil 21 Ztm39wk J39 r Applying this procedure in both wells cleans up the predicted probabilities considerably 22 Jones Mame Parahc F oodp am Channe sway Pa eoso 0 125 250 375 500 Depth 1 10 08 Mame Parahc F oodp am Channe sway Pa eoso 02 0 0 0 125 250 375 500 Depth 1 23 Jones 1113 Depth n 3113 4113 5113 Core Fac es o Pred cted Fac es Marine Paralic Floodplainchannel Splay Paleosol Kenyon 1113 N 8 Depth fee A 8 4113 5113 3 7 22 igig E s Marine Paralic Floodplain Channel Splay Paleosol 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'