New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Giovani Ullrich PhD


Giovani Ullrich PhD
GPA 3.5


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Statistics

This 13 page Class Notes was uploaded by Giovani Ullrich PhD on Saturday September 26, 2015. The Class Notes belongs to STAT 430 at Iowa State University taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/214401/stat-430-iowa-state-university in Statistics at Iowa State University.




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/26/15
A branch of systems biology deals with the design and analysis of mathematical and computer models of disease in order to estimate immunological and pathogenrelated parameters compare alternative models of the immune response and to explore treatment and vaccination strategies in silico We will be looking at output from a computer model of Leishmania major infection Leishmania are protozoan parasites that infect macrophages are transmitted through the bites of infected sand ies and are endemic in 88 countries though not the United States Disease can either be cutaneous where skin ulcers occur on exposed surfaces of the body or visceral where the infection spreads to organs such as the liver or the spleen which has nearcertain mortality if left untreated Today we will analyze computer model output in order to understand the effect that computer model parameters have on simulation output Important Files 1 httpwwwpubliciastateeduNgdancildsummer2007filesdesigytxt This file contains a 100 x 5 matrix of the computer model inputs Each column corresponds to a particular computer model parameter and each row corresponds to the parameter setting for a particular computer model run All parameter values are scaled to be integers between 0 and 49 Note that the column names are given in this file The computer model parameters are as follows a1 7 the growth rate of the parasite Id 7 the amount of intracellular parasite that an infected macrophage can hold mrecr 7 the recruitment rate of macrophages the host cell trecr 7 the recruitment rate oft cells which eliminate the pathogen tdelay 7 t cells are not recruited until a threshold level of total pathogen is reached 2 httpwwwpubliciastateeduNgdancildsummer2007filespathogentxt This is a 100 x 35 table of computer model output corresponding to pathogen load over time The columns correspond to days 1 through 35 and each row corresponds to output from a particular simulation eg pathogen3 is the pathogen load over time from simulation 3 Exercises Questions 1 2 L V Read in the above two les Construct a graph of pathogen load versus time for all simulations on a single graph hint use the plotLines function that we have created earlier the pathogen table must be in matrix form when it is passed into this function What is the relationship between each of the parameters and parasite load at day 6 note day 6 corresponds to the 6 column of the pathogentxt le a Look at this relationship visually for all parameters by plotting the parameter value on the xaxis and parasite load at day 6 on the yaxis Fit a multiple linear regression model using parasite load at day 6 as the response variable and the six computer model parameters as the input Note you may need to convert the design table to a matrix using asmatrix Is this form of a multiple linear regression model the best Fquot Hypotheses Testing and Permutation Tests You have recently developed an algorithm for predicting someone s lifespan based on information obtained during childhood You are interested in whether or not your algorithm does a better job than a competitor s Each program has been tested against 10 known but different cases and for each case a score is recorded that indicates a measure of difference between the individual s true lifespan and the predicted lifespan These results can be found in the workspace httpwwwpubliciastateeduNgdancildstat430fileslifespanRData which contains the object lifespan that has components programl and program2 You wish to test whether your algorithm is better ie has a lower mean score H01 H1 H2 H11 H1 lt Hz a Read in the workspace above using the command loadurl httpwwwpubliciastateeduNgdancildstat430fileslifespanRData b Use the function ttest to test the alternative hypothesis above type ttest to see how to test with a onetailed alternative hypothesis c A lot of people would report the results above and be happy However visually verify that it is not clear whether the assumptions of the ttest have been met namely the assumption that both populations are normally distributed since n lt 30 This actually should have been done prior to a Use the functions hist and if you are familiar with normal quantiles plots qqnorm and qqline d In this situation a permutation test is more reliable and more conservative than the ttest Construct an approximate randomization test to test H1 against H0 What is your conclusion Might it differ from a conclusion using the p value obtained in b 7 Maximum Likelihood Estimation Recall from example 022 pg 5 ofthe course notes that for X1 X Xi N independent Geometric p with joint probability PX1 k1Xn kn H1pk71 p i1 A n a maximum likelihood estimate mle of p is p n i i1 In this example the maximum likelihood estimate of p exists in closed form but this is not always the case When this happens numerical search methods are often used In this exercise we will use R to demonstrate the use of numerical methods for finding the mle of p using the likelihood function above Let k the number of trials before successfully connecting to a server k c5 ll 8151510 Note that the likelihood of k can be written in R as geolike lt functionp k k is a vector matrix of observations k asvectork in case k is passed in as a matrix ans matrix0lengthk for i in llengthk ansm 1pki1 p return prodans Use the geolike function and the function optimize to nd the maximum likelihood estimate of p for the vector k above Type optimize for information on how to use this function and note that we want to maximize the function not minimize it Using R nd the true mle of p using the formula on the top of this page How does this result compare to the result obtained using optimize It may also be useful to know that R supports multidimensional minimization or maximization using the function optim 8 Monte Carlo Estimation of pvalues Suppose that the vector k c5 ll 8151510 is actually censored That is there is a vector k that contains the true but unknown observations and k mink 15 for all 139 However we believe that the distribution of k is geometric with p 011 and we wish to test the hypothesis H0p0ll H1p 0ll In what follows we will use the test statistic s the sample mean We will use a Monte Carlo Sampling approach Cohen pg 150 to find the sampling distribution oft under the null hypothesis Fori l M a Draw a sample of size n 6 under H0 using rgeom b Calculate the test statistic from the sample in a taking into account that some of the data might be censored Use the distribution of s to calculate a pvalue and make a conclusion Chapter 1 Elementary Simulation We want to be able to perform an experiment with a given set of probabilities Starting point 11 Random Number Generators Random number generators rng produce a stream of numbers that look like realizations of independent standard uniform variab es U1 U2 U3 H Usually these numbers are not completely random but pseudo randomi This way we ensure repeatability of an experiment Note even the trick to link the systems rand function to the internal clock gives you only pseudo random numbers since the same time will give you exactly the same stream of random numbers There are hundreds of methods that have been proposed for doing this some most are pretty bad A good method and in fact current standard in most operating systems is Linear Congruent ial Method De nition 111 Linear Congruential Sequence For integers a c and m a sequence of random numbers77 In is de ned by ziEazi1c modmforil2iu Note this sequence still depends on the choice of 10 the so called seed of the sequencer Choosing different seeds yields different sequences That way we get a sequence with elements in 0 m 7 1 We de ne ui I The choice of the parameters a c and m is crucial obviously we want to get as many different numbers as possible therefore m needs to be as large as possible and preferably prime that way we get rid of small cyclesi Example 111 rag examples Statusquo in industry is the so called Minimal standard generatori lt ful lls the common requirements of a rng and at the same time is very fast lts parameters are c 2 a 16807 m 23171 CHAPTER 1 ELEMENTARY SIMULATION An example for a terrible random number generator is the RAND U With c 0 a 65539 m 231 It was Widely used before people discovered how bad it actually is Knowing two successive random numbers gives you the possibility to predict the next number pretty well i i i that7s not how rng s are supposed to work For more information about random number generators and different techniques how to produce and check them look at httpcryptomatsbgacatresultskarlserver State of the art at the moment is the Marsaglia MulticarryRNG define znew z36969 zamp65535zgtgt16ltlt16 define wnew w18000 wamp65535wgtgt16amp65535 define IUNI Znewwnew define UNI Znewwnew2328306e10 static unsigned long z362436069 w521288629 void setseedunsigned long i1unsigned long i2zi1 wi2 Whenever you need random integers or random reals in your C program just insert those six lines at near the beginning of the program In every expression where you want a random real in 01 use UNI or use IUNI for a random 32bit integer No need to mess with ranf or ranflastI etc with their requisite overheads Choices for replacing the two multipliers 36969 and 18000 are given below Thus you can tailor your own inline multiplywithcarry random number generator This section is expressed as a C comment in case you want to keep it filed with your essential six lines Use of IUNI in an expression will produce a 32bit unsigned random integer while UNI will produce a random real in 01 The static variables 2 and w can be reassigned to i1 and i2 by setseedi1i2 You may replace the two constants 36969 and 18000 by any pair of distinct constants from this list 18000 18030 18273 18513 18879 19074 19098 19164 19215 19584 19599 19950 20088 20508 20544 20664 20814 20970 21153 21243 21423 21723 21954 22125 22188 22293 22860 22938 22965 22974 23109 23124 23163 23208 23508 23520 23553 23658 23865 24114 24219 24660 24699 24864 24948 25023 25308 25443 26004 26088 29379 29889 30135 30345 30459 30714 30903 30963 31059 31083 or any other 16bit constants k for which both k2 161 11 RANDOM NUMBER GENERATORS 3 and k2 151 are prime Armed with a Uniform rng all kinds of other distributions can be generated 111 A general method for discrete data Consider a discrete pmf with r rllt Iglt lt17 pr PI1 PI2 Ava PIn The distribution function F then is F t 2 PW 115 Suppose we have a sequence of independent standard uniform random variables 1 U2 an they have realizations ul ug realizations are real values in 01 then we define the ith element in our new sequence to be zj if pm u 2pm 161 161 W W F0574 Fm Then X has probability mass function p This is less complicated than it looks Have a look at gure 11 Getting the right z value for a specific u is done by drawing a horizontal line from the y axis to the graph of F and following the graph down to the z axis This is how we get the inverse of a function graphically Figure 11 Getting the value corresponding to ui is done by drawing a straight line to the right until we hit the graph of F and following the graph down to zj Example 112 Simulate the roll of a fair die Let X be the number of spots on the upturned face The probability mass function of X is for all i l 6 the distribution function is FX t for all t E 0 6 We therefore get X from a standard uniform variable U by H rogUg 2 H ltU 3 H ltU 4 H3ltU 5 HltU 6 if ltU calm calm om mlw calm 03W A mama WARYSIMULAnoN 1 1 2 A gEnmaJ Methad m Cannnuaus Dmsmes Wamdammemmmmmmi wmwg m mm tomgtatu tammkmMaummm r mama Canal mum Wm maymmmahmmmmmmmwkmmnmmmxm Wm mmmmtmmmmkixawalmusmymmhmyumxxstmkss my ixwhumsudsmmwahw J awlv mm 5 mt mg m z ammmmmmwg c m memqmmmmmwmdmm amp 3 StrumVanith swmwewutwsmumeaxmmmxmtmmexpmmmmmmmnux mmwmmgmmmwmuww m n 7 u 9min nymph WED 5912 5 m H q n m an my 11 RANDOM NUMBER GENERATORS 5 Let u be a positive real number u 1 7 e7 ltgt 1 7 u e ltgt ln17 u 7A1 1 ltgt I 7Xln17 u F 1u then X I 7 ln1 7 U has an exponential distiibution with rate A In fact since 1 7 U is uniform if U is uniform we could also have used X I 7 an For speci c densities there are a lot of different special tricks for simulating observations For all of the next sections7 let s assume that we have a sequence of independent standard uniform variables U17 U27 U37 1121 Simulating Binomial amp Geometric distributions Let p be the rate of success for a single Bernoulli triall de ne 7 0 if ui 2 p Xi i 1 if ui ltp Then X ZXZ N BM 11 and W I of Xi until the rst is 1 W N Geometricp 1122 Simulating a Poisson distribution With given U7 we know7 X 7 an has an exponential distribution with rate A e ne 1 11 Y 2 largest index j such that ZXi S 1 and ZXi gt 1 i1 i1 then Y N Poi 1123 Simulating a Normal distribution To simulate a normal distribution7 we need two sequences of standard uniform variables Let U1 and U2 be two independent standard uniform variables De ne Z172ln Ullil2 cos27rU2 Z2 2 72 In Ullil2 sin27rU2 Then both Z1 and Z2 have a standard normal distribution and are independent7 Z17 Z2 N N01 and XuaZi NNu02 6 CHAPTER 1 ELEMENTARY SIMULATION 12 Basic Problem Simulation allows us to get approximate results to all kinds of probability problems7 that we couldn7t solve analytically The basic problem is X7Y7 7 Z k independent random variables We know how to simulate each k random variables Of these variables 917 y7 7 2 some quite complicated function g of k variables V gX7 Y7 7 Z a random variable of interest We might be interested in some aspects of the density of V eg P13ltVlt17 EW VaTV 7 unless g is simple7 k is small7 and we are very lucky7 we may not be able to solve these problems analytically Using simulation7 we can do the following steps of Simulation 1 Simulate some large number say it of values for each of the k variables X7 7 Z we then have a set of n ktuples of the form Xi7li77Zl foril77n 2 plug each Xi7327 7 Z into function g and compute Vi VigXi7Yl77Zi foril77n 3 then approximate aPesvsww Vi Vi E a7b n b EWVH by 1 n 1 n gjaw7 1e EV 71771 c VaTV by 1 n 72M7W n i1 Example 121 Simple electric circuit Consider an electric circuit with three resistors as shown in the diagram 2 Simple Physics predicts that R7 the overall resistance is R2 R3 R2R3 R1 RiR ii 1iR 7 1 R1 R3 7 1 R3 Assume7 the resistors are independent and have a normal distribution with mean 100 Q and a standard deviation of 2 12 BASIC PROBLEM What should we expect for R the overall resistance The following lines are R output from a simulation of 1000 values of R it Example Simple Electric Circuit it Goal Simulate 1000 random numbers for each of the resistances R1 R2 and R3 it Compute R the overall resistance from those values and get approximations for it expectated value variance and probabilities it rnorm n mean0 sd 1 generates 11 normal random numbers with the specified it mean and standard deviation it R1 lt rnorm 1000 mean100 sd 2 R2 lt rnorm 1000 mean100 sd 2 R3 lt rnorm 1000 mean100 sd 2 it it compute R R lt R1 R2R3R2 RS it it now get the estimates meanR gt 1 1499741 sdR gt 1 2134474 it it the probability that R is less than 146 is given by the number of values it that are less than 146 divided by 1000 sum Rlt1461000 gt 1 004 Example 122 at MacMall Assume7 you have a summer job at MacMall7 your responsibility are the Blueberry lMacs in stock At the start of the day7 you have 20 Blueberry lMacs in stock We know X of lMacs ordered per day is Poisson with mean 30 Y of lMacs received from Apple is Poisson with mean 15 a day Question What is the probability that at the end of the day you have inventory left in the stock Let I be the number of Blueberry lMacs in stock at the end of the day 1207XY Asked for is the probability that I 2 l Again7 we use R for simulating I it Example MacMall it it Goal generate 1000 Poisson values with lambda 30 it CHAPTER 1 ELEMENTARY SIMULATION Remember 1 Poisson value needs several exponential values step 1 produce exponential values u1 lt runif33000 e1 lt 130logu1 sume1 1 1099096 sum of the exponential values is gt 1000 therefore we have enough values to produce 1000 Poisson values step 2 add the exponential values cumsum is cumulative sum E1 lt cumsume1 E12535 1 07834028 07926534 07929962 07959631 08060001 08572329 08670336 8 08947401 10182220 10831698 11001983 E1 lt floorE1 E12535 1 0 0 0 0 0 0 0 0 1 1 1 Each time we step over the next integer we get another Poisson value by counting how many exponential values we needed to get there step 3 The table command counts how many values of each integer we have x lt tableE1 X110 0 1 2 3 4 5 6 7 8 9 32 26 31 32 17 27 31 33 32 31 we have 1099 values we only need 1000 K lt X11000 check whether X is a Poisson variable then eg mean and variance must be equal to lambda which is 30 in our example meanX 1 30013 varX 1 2984067 generate another 1000 Poisson values this time lambda is 15 Y lt rpois100015 looks a lot easier now compute the variable of interest I is the number of Blueberry IMacs we have in store at the end of the day I lt 20 X Y 12 BASIC PROBLEM 9 it and finally it the result we were looking for it the empirical probability that at the end of the day there are still it computers in the store sumI gt 01000 1 0753 Using simulation gives us the answer7 that with an estimated probability of 0753 there will be Blueberry lMacs in stock at the end of the day Why does simulation work On what properties do we rely when simulating PV E m 12 approximated by 5 W EhV approximated by FL I iELI VaTW approximated by EllaZ 7 V2 Suppose V1 gX1Y1111 7Z1V2 9X27Y271117Z2111Vn 9Xn7Yn71117Zn arei1i1d then Vi 6 ab N Bnp with p PV E m 12 n trialsi So7 we can compute expected value and variance of 5 A 1 7 1 Ella gZElVil 7110 21 A 1 n 1 17 1 VaTp EZVaTUi np1ip SEgt0f0rngtw i1 ie we have the picture that for large values of 7L7 5 has a density centered at the true value for PV E m 12 with small spreadi ie for large n 5 is close to p with high probability Similarly7 for i1i1d7 are also i1i1d1 Then 7L E1131 ZE1hltvLgt1 E1hltvgt1 vmm ZVa hOi VarhVn a 0 for n a 001 Once again we have that picture for FL that the density for FL is centered at EhV for large n and has small sprea 1


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.