Class Note for MATH 563 at UA
Popular in Course
Popular in Department
This 18 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 15 views.
Reviews for Class Note for MATH 563 at UA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Markov Jabberwocky fesh excenture and the like John Kerl Department of Mathemaucs Umversxty of Amzona August 26 2009 Markov Jabberwocky fan manna and the We August 25 2009 118 Lewis Carroll39s Jabberwocky Ie Jaseroque der Jammerwoch 39Twas brilig and the slithy toves Did gyre and gimbe in the wabe All mimsy were the borogoves And the mome raths outgrabe ltltGarde toi du Jaseroque mon s La gueue qui mord la griffe qui prend Gardetoi de l39oiseau Jube e vite Le frumieuX Bandaprendgtgt Er griff sein vorpas Schwertchen zu Er suchte lang das manchsam39 Ding Dann stehend unterm Tumtum Baum Er an zu denken ng Many of the above words do not belong to their respective languages yet look like they could or should It seems that each language has its own periphery of almostwords Can we somehow capture a way to generate words which look Englishy Frenchish and so on It turns out Markov chains do a pretty good job of it Let s see how it works Markoszbberwocky fzh excentwe and theiike August 2520139 2 1E Probability spaces A probability spacequot is a set 9 of possible outcomes X along with a probability measure P on events sets of outcomes Example 9 1 2 3 456 the results of the toss of a fair die What would you want P1 to be What about P2 3456 And of course we Want P1r 2 P1 P2 The axioms for a probability measure encode that intuition For all A B Q Q o PA 6 01 for all A g Q o PQ 1 o PA U B PA PB if A and B are disjoint Any function P from subsets of Q to 01 satisfying these properties is a probability measure Connecting that to realworld randomness is an application of the theory Here s tne rine print tnese definitionswork it n is rinite or countabiyinrinite if n is uncountable tnen we need to restrict our attention to a orfield 77 or Pnneasurabie subsets or n For ruii information you can take Matn 553 w Here s more rine print in taking my random variables X to be tne identity runction on outcomes m Markov Jabberwocky fzh excenture and tne like August 25 2009 3 1E Independence of events Take a pair of fair coins Let Q HHHT THTT What s the probability that the first or second coin lands headsup What do you think PHH ought to be H T H 14 14 A lst is heads T 14 14 B 2nd is heads Now suppose the coins are welded together you can only get two heads or two tails now PHH 7 H 12 0 A lst is heads T 0 12 B 2nd is heads We say that events A and B are independent if PA B PAPB Markoszbberwocky fzh excentwe and thehke August 2520139 4 1E PMFs and conditional probability A list of all outcomes X and their respective probabilities is a probability mass function or PM F This is the function PX x for each possible outcome x 16 16 16 16 16 16 Now let 9 be the people in a room such as this one If 9 of 20 are female and if 3 of those 9 are also lefthanded what s the probability that a randomlyselected female is lefthanded We need to scale the fraction of lefthanded females by the fraction of females to get 13 L R F 320 620 M 220 920 w e say PLF PLl F W This is the conditional probability of being lefthanded given being female Markoszbberwocky fzh excentwe and mime August 2520139 5 1E Dietipping and stochastic processes Repeated die rolls are independent But suppose instead that you first roll the die then tip it one edge at a time Pips on opposite faces sum to 7 so if you roll a 1 then you have a 14 probability of tipping to 2 34 or 5 and zero probability of tipping to 1 or 6 A stochastic process is a sequence Xt of outcomes indexed for us by the integers t 123 For example the result of a sequence of coin flips or die rolls or die tips The probability space is Q X Q X and the probability measure is specified by PX1 x1 X2 x2 Using the conditional formula we can always split that up into a sequencing of outcomes PX1 x1X2 2XT InPX1 x1 PX2 Iz l X1 Il PX3 is l X1 1X2 2 PXT x l X1 x1XT1 M71 lntuition How likely to start in any given state Then given all the history up to then how likely to move to the next state Markov Jabberwockv f5 Excentwe and the like August 25 2009 5 1E Markov matrices A Markov process or Markov chain if the state space 9 is finite is one such that the PXn in i X1 17X2 Miqunel M71 PXn M l Xna Ina If probability of moving from one state to another depends only on the previous outcome and on nothing farther into the past then the process is Markov Now we have PltX1 MPltX1 x1 PX2 x2lX1 39PXn In l Xn71 Ina We have the initial distribution for the first state then transition probabilities for subsequent states Dietipping is a Markov chain your chances of tipping from 1 to 2 3 45 are all 14 regardless of how the die got to have a 1 on top We can make a transition matrix The rows index the fromstate the columns index the tostate lt1 lt2 lt3 lt4 lt5 lt6 0 14 14 14 14 0 14 0 14 14 0 14 14 14 0 0 14 14 14 14 0 0 14 14 14 0 14 14 0 14 0 14 14 14 14 0 Markoszbberwockv f5h2xcentwe and theiike August 252009 7 1E Markov matrices continued What s special about Markov chains 1 Mathematically we have matrices and all the powerful machinery of eigenvalues invariant subspaces etc If its reasonable to use a Markov model we would want to 2 ln applications Markov models are often reasonable Each row of a Markov matrix is a conditional PMF PX2 xi l X1 The key to making linear algebra out of this setup is the following law of total probability PX2 xi ZPX1 in M ZPX1 xiPX2 I i X1 xi PMFs are row vectors The PMF of X2 is the PMF of X1 times the Markov matrix M The PMF of X3 is the PMF of X1 times M7 and so on Markov Jabberwockv f5 Excentwe and the like August 25 2009 a 18 Back to mOtUEl Phase 1 of 2 read the dictionary file Word lists about a hundred thousand words each were found on the Internet English French Spanish German The state space is Q X Q X where Q is all the letters found in the dictionary file az perhaps 6 8 etc After experimenting with different setups I settled on a probability model which is hierarchical in word length 0 l have PWord length I 0 Letter 1 PX1 ii i I Then PXk m l Xkcl will for k 2l o I use separate Markov matrices nonhomogeneous Markov chains for each word length and each letter position for that word length This is a lot of data But it makes sure we don t end words with gr etc PMFs are easy to populate Example dictionary is appe bat bet cat cog dog Histogram 1 2 3 4 5 Then just normalize by the sum to get a PMF for word lengths 0 0 56 0 16 l 05 1l kl k2 kg k4 ks Markov Jabberwocky f5 Exc ntwa and the like August 25 2009 9 1E Example Dictionary is appe bat bet cat cog dog Wordlength PMF as above 0 0 56 0 16 z 1 k2 k3 k4 k5 Letter1 PMF for threeletter words 25 25 15 b C d Letter1 to Ietter 2 transition matrix for threeletter words a e 0 b 12 12 0 c 12 0 12 d 0 0 1 Letter2 to Ietter 3 transition matrix for threeletter words 75 9 a 1 0 e 1 O a O 1 Markov Jabberwocky fan manna and the We August 25 2009 10 1E Phase 2 of 2 generate the words using CDF sampling How can we sample from a nonuniform probability distribution Think of the PMF as a dartboard We throw a uniformly wild dart Outcomes with bigger P should take up bigger area on the dartboard Theorem This works Technically 0 We write a cumulative distribution function or CDF Whereas the PMF is fx PX the CDF is PX g Put some ordering on the outcomes 0 Let U the dart be uniformly distributed on 01 0 Then F 1U appropriately interpreted has the distribution we want See my September 2007 grad talk Is 2 a random number for full details Example PMF for letter 1 of threeletter words is 04 04 02 b C d CDF for letter 1 of threeletter words is 04 08 10 l b C d i If U comes out to be 06329 then I pick letter 1 to be c If U comes out to be 01784 then I pick letter 1 to be I Etc I also make a CDF for each row of each Markov matrix Markov Jabberwocky fzh excentwe and the like August 25 2009 11 18 Word generation continued To generate a word given the Markovchain data obtained from a specified dictionary file 0 Use CDF sampling to pick a word length I from the wordlength distribution 0 Use the letter1 CDF for word length I to pick a first letter 0 Go to that letter s row in the letter1 to letter2 transition matrix for word length I Sample that CDF to pick letter 2 Keep going until the th letter 0 Print the word out Markov Jabberwocky f5 Excentwe and the like August 25 2009 12 1E Threeletter memory The nonMarkov part of the story Using Markov chains as described here I got decent words but not always Realword correlations go more than one letter deep Example Using a German dictionary my program generated the 5 letter word War This made sense There are b I words in German eg beib There are ll words in German eg ales But my Markov model only looks at correlations between adjacent letters and thus it didn t detect that b never happens in German For revision two of the project I did all the steps described in the previous slides but now with the following data 0 l have PWord length l as before 0 For first letters PX1 x1 ll 0 For second letters PX2 I2 lX1 ml 0 For the rest PXk m l inz mikaa M711 Markov Jabberwocky f5 Excenmye and the like August 25 2009 13 1E Results with a tiny word list Dictionary is bake balm bare cake calm care cart case cave Here are all possible outputs all of Q X Q X using twoletter and threeletter memory respectively Words appearing in the output but not in the input word list are marked with x o Pw o Pw bake 00740741 bake 01111111 balm 00740741 balm 01111111 bare 00740741 bare 00740741 bart 00370370 bart 00370370 base 00370370 cake 01111111 bave 00370370 calm 01111111 cake 01481481 care 01481481 calm 01481481 cart 00740741 care 01481481 case 01111111 cart 00740741 cave 01111111 case 00740741 cave 00740741 When larger word lists are used 9 is far larger than the input word list ie far more mimsy and mome than were and the Markov Jabberwocky fzh Excenmre and the like August 25 2009 14 1E Results with real word lists For fullsize word lists ldon t try to enumerate all possible outputs just generate 100 or so at a time When I feed word lists from different languages into the same computer program get different outputs Hopefully you can tell which is which churency kinging supprotophated doconic linictoXy stewaorties murine hawkinesses teXueuX roseras placates exhum rent orieff cinquetassions laissiez regre n ses sa uceptant montrenards re sai39smez enjupilames ratTt fa usive per nimo bo n sanfija morricete esmotorrar bisfato filamberecer estempoI ml ceta zarl fero senestrosia desaificapio Bc39iservole techtaus le Nah wohassee verschiitzen Probinus tra39Bcher Postenpand einpriickt BuBrfere hb39hegendeter occamo domitor nestum inhibeo prohisus equino eribro obvola exteptor eXI39bro abduco loci equa occasco Markov Jabberwocky fan manna and the like August 25 2009 15 1E Matching Aramian Wasielak s idea run a word real or not through the Markovchain data for all tabulated languages computing the probability of the word PW0rdlengthPX1 1lPX2 2 l X1 17quot39 last four columns Then for each word normalize those numbers to get a score between zero and one first four columns Word En scare Fr score SD score De score En P Fr P Sp P De P cat 1 000 o 000 o 000 o 000 5 5 10 o o o baguette o 015 o 935 o 000 o 000 4 w 10 9 2 1 10 o o Wurst o 130 o 000 o 000 0 320 1 2 10 o o 5 5 10 pzlzpz o 014 o 056 o 920 o 000 9 o 109 2 e 10 e o 10 fesh 1 000 o 000 o 000 o 000 9 2 10 o 0 location 0 19 o 093 o 000 o 131 1 9 10 2 e 10 o 4 s 10 gtltzz o 000 o 000 o 000 o 000 o o o o brillig o 000 o 000 o 000 1 000 o o o 2 5 10 9 slithy 1 000 o 000 o 000 o 000 2 1 10 o o o toues o 000 o 000 o 000 o 000 o o o o outyzbe o 000 o 000 o 000 o 000 o o o o frumieux o 065 0 895 o 000 o 025 4 5 10 e o 10 1U o 2 5 10 gm 0 42 o 129 o 000 0 11s w4 10 1 2 10 o 1 1 10 vorpzl 1 000 o 000 o 000 o 000 1 2 10 9 o o o mugge 1 000 o 000 o 000 o 000 1 5 10 o o o expecto o 000 o 000 1 000 o 000 o o s 1 10 o patronum 1 000 o 000 o 000 o 000 2 o 101U o o 0 Markov Jabberwocky fan excentwe and the like August 25 2009 15 1a Other possibilities In this project my goal was to construct words out of letters using languagespecific empirical knowledge of transition probabilities from one letter to the next One can do something similar constructing sentences out of true words using languagespecific empirical knowledge of transition probabilities from one word to the next Google for Garkov and Rooter See also Cam McLeman s page on languagemath experiments Another idea of Aramian s Using more languages eg German Dutch Swedish French Spanish Catalan ltalian Polish Czech Russian etc can we adapt the scoring mechanism to measure relatedness of languages All the machinery here works on letters specifically on written language Better results might be obtained by using not letters but units such as e n on 91 This requires a language expert to decide what the pieces are Or does it Can we automate detection of these digraphs trigraphs and so on When we invent nonsense sayings ldon t think there are little Markov chains running in our heads What s so satisfying about Carroll s Long time the manXome foe he sought and where does it really come from Markov Jabberwocky fzh Excenmre and the like August 25 2009 17 1E Vielen Dank jr Ihre Aufmerksamkeit Je vous remercie de votre attention Gracias por su atenci n Thank you for attending Markov Jabberwocky fan manna and the We August 25 2009 1a 1a
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'