Mathematical Methods for Economists I
Mathematical Methods for Economists I AMS 11
Popular in Course
Popular in Applied Math And Statistics
This 41 page Class Notes was uploaded by Milton Sawayn DVM on Monday September 7, 2015. The Class Notes belongs to AMS 11 at University of California - Santa Cruz taught by Staff in Fall. Since its upload, it has received 89 views. For similar materials see /class/182151/ams-11-university-of-california-santa-cruz in Applied Math And Statistics at University of California - Santa Cruz.
Reviews for Mathematical Methods for Economists I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/07/15
A parallel code to solve for the quasi steady dynamics of stellar interiors Pascale Garaud AMS 11 March 2008 All stars rotate to some degree Centrifugal forces and Coriolis forces induced by rotation can 0 change the properties of eddies in convective re gions o induce flows in radiative regions All stars are magnetic to some degree Lorentz forces induced by the magnetic field can 0 change the properties of eddies in convective re gions o induce flows in radiative regions HOW does this affect the intrinsic structure and CVO Iution Of a star HOW does it affect its observable properties 4 s 6 9 Metaritic Li abundance 3 r quot i39 39 Typical Errur Bar 0 x 5 a 9k 3 2 39 39 v 39t 2 VP E 1 c Sun I39 v 33937 V V V39 V 9 a 0 7 V V 6 v v V 1 I 39 7000 6000 5000 4000 T K The equations governing of the spherically symmetric structure amp evolution of the stellar interior are 15 E dr pdr39 1 d 2d 4 GT r2dr T dr W p d 1 1 Ta E 55T2Frad T2Fconv 1 OX f TX at i 7 F 19ET7X1ER ERET7X1 E 557T7X The evolution equations are quasi linear PDEs for 1 spatial variable and 1 temporal variable The structure equations ie the steady state solution for given composition require the solution of a two point boundary value problem elliptic For a given background state 5 T known the equations governing the 3D anelastic magnetohydrody namics of the stellar interior are baa L uVu Vp pVj gtlt BVH VIE 0 57 5711 V3 2 VkVT p pp7T Vgtltu x B VgtltnVgtltB VB 0 These equations are again quasi linear PDEs for 3 spatial variables and 1 temporal variable I have been interested in studying the axially symmet ric steady state solutions of these equations Finding these solutions requires solving a 2D nonlinear elliptic problem The numerical solution of all of these types of equations quasi linear PDEs or elliptic PDEs usually requires in verting a very large block tridiagonal system of linear equations In what follows I will 0 show where and how these large block tridiagonal system of linear equations arise from PDEs o how these systems are traditionally solved serially and what problems arise for very large systems 0 show how to solve them numerically using parallel algorithms Nonlinear Elliptic Problems Nonlinear elliptic problems part 1 the basic idea Consider the simple nonlinear ODE um fvu where f can be any nonlinear function with boundary conditions gAxAuu O or gBxBuu 0 where 9A or 93 can be any nonlinear functions A common method of solution is the Newton Raphson Kantorovich method NRK The idea behind NRK 1 Discretize the ODE let an 2 uxn with n 1 N then an 1 an 1 ifltxn1un1gt fltxmungti 7141 it 2 The N 1 discretized equations are second order ac curate haIf way between the meshpoints 2 Write an 2 an Sun where an is a guess for the real solution un Taylor expand the discretized equations assuming Sun is small Unl 6Unl an sun Axn 1 8 E fxnl7 n1 6U 1 f 8a mu En 3 Also write the boundary condition in the same form for example if the boundary condition is at the first boundary nly n1 8f 8a Where Am 2 xn1 sun 8 gAx1U16u1 0 8a 1751 4 This yields an N x N linear system for the errors Sun G1 0 O O O O O O 5111 gA A1 B1 0 O O O O O 5112 01 0 A2 B2 0 O O O O 5113 C2 0 0 A3 B3 0 O O O 5114 C3 0 O 0 A4 B4 0 O O 5115 C4 0 O O 0 A5 B5 0 O 5116 C5 0 O O O 0 A6 B6 0 5117 Ce 0 O O O O 0 A7 B7 SUB C7 where 8 G1 and 9A 9A17U1 8a 331 An 1 Ax 2 8a 33mm Axn 8f Bn 2 1 2 8a m a n17 n1 Axn 0n Um Un1 2 fn7Un fn17Un1 5 Once calculated the solution of this linear system Sun is added on to the initial guess UiU6u to provide a better approximation to the real solution 6 The method can be iterated upon many times until the desired accuracy is achieved a driver for the method would then read Initialize guess do iter 1nitermax o Solve for Su 0 U U M 0 Evaluate the norm of 6a to estimate the error 0 If error is small enough exit loop enddo 10 Nonlinear elliptic problems part 2 systems of ODEs The method described earlier can naturally be extended to systems of coupled nonlinear ODEs as well Given the interval xAxB consider the system of I nonlinear equations 8 fixu for z 1I 8x where u u1u2u1 and the functions f can be any nonlinear function The boundary conditions are split between the two boundaries with g vAyu 0 for k 1kA g ltBU O for k kA1I where the functions g and g g can be any nonlinear func tions As before we discretize the equations upon the mesh and consider that for each dependent variable u at each meshpoint as u uin and write them as the sum ofa guess plus and error i z39 i un un6un 11 This time the errors Sufi are the solution of the block tridiagonal linear system GA 0 O O O O O O gA A1 B1 0 O O O O 0 Cl 0 A2 B2 0 O O O 0 C2 0 0 A3 B3 0 O O O 03 O O 0 A4 B4 0 O 0 Su 2 C4 0 O O 0 A5 B5 0 O 05 O O O O 0 A6 B6 0 Ce 0 O O O O 0 A7 B7 C7 0 O O O O O 0 GB 93 where the matrices GA GB An B have coeffs a z39 GA J39 9A fOI i 17kA7jZ 1I 8U mh l a ikA GEM 93 fori1I kAj1I 8a N7uN Am 8 Anz39j 5z39j 2 8W 36mm Am 8 Bn 2quot 5zquot J 7 2 8 m a n17 n1 and the vectors 9A 93 and On are giAx7U1 for 17H7kA 93 g BkAxNUN for 1 1 I kAj 1I z39 A53 239 z39 un unl 717 71 f xn17un1 2 12 Note that the matrix is indeed block tridiagonal to see this note that the first block line only has kA lines and the last one only has k3 lines Emphasize the tridiagonal nature of the problem by looking is as r f2 44 Each of the main blocks An or Bn is I x I There are N l such block lines The total size of the matrix is IN gtlt IN gt could be huge Nonlinear elliptic problems part 3 from a small set of 2D PDEs to a large set of 1D ODEs Consider now for example the 2D quadratically nonlin ear elliptic problem uug um uyy on the unit square 10 E 01 y E 01 with zero Dirich let condition uxy O on the boundary One possible way of solving the system is to expand one of the two directions in Fourier modes say the x direction for example let uvy Z umysinm7rx m1M the sum is truncated at finite M since numerically we will only be able to solve the problem for a finite number of modes The governing nonlinear PDE is then reduced to the large coupled set of nonlinear second order ODEs d2u 2 2 km 2 m m 7 um Z UkUm k umk uk m d y k l M 2 where by definition uk 2 0 if k is greater than M or smaller than 1 14 By defining the new functions Um um dum U m M dy we end up with a set of 2M coupled nonlinear first order ODEs dUm dU m km 3 m27r2Um E 3UkUmk Umk Uk m k1M which is exactly in the form studied in part 2 with the 2M boundary conditions split between the two bound aries at y 0 and y 1 as Um0 o and Um1 o If we wanted to study the solutions to this equation with for example o 500 meshpoints in the y direction 0 100 spectral modes we would need to invert a 100000 x 100000 block tridiagonal matrix where there are 500 block lines each containing blocks of size 200 x 200 15 Nonlinear elliptic problems part 4 my own code My aim is to find 2D axially symmetric solutions of the following set of quadratically nonlinear PDEs in a spherical geometry uVu Vp pVjgtltBVII V u O TuVs VkVT ppp7T Vxu x B VxanB VB O The unknown variables are the thermodynamic quanti ties pTp the velocity vector u unamuq and the magnetic field vector B BB9B Each can de pend on the radius r and the co Iatitude 0 This set of PDEs there are actually 10 equations some algebraic some first order some second order can be expanded upon a set of M Chebishev functions in 0 and results in a system of 15M equations some first order some algebraic My high resolution simulations typically have 2000 mesh points in r and up to 100 spectral modes Then 0 each block of the main matrix is 1500 x 1500 o there are 2000 block lines o the total size of the matrix would be 3000000 x 3000000 16 Serial solution of block tridiagonal systems see Numerical Recipes for example 17 Let s consider the following system generated from 4 equations and 4 meshpoints with 3 boundary conditions at the first point and 1 at the last one XXXNNXX XXXNNNX XXXNNNX kakkkk gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtlt gtltgtltgtltgtltgtlt gtltgtltgtltgtltgtlt gtltgtltgtltgtltgtlt 18 The serial algorithm consists in the following steps NNXXOOH Pe brnia MVoted Gausszkwdan eHn nann ofthe first block to transform it to the identity update the rest of the nwatnx aSNeH as the RllS vector not shcwvn here and store the entnes S arfuture backsubs tu on XXXXMMM Nkkkkkkk Nkkkkkkk NNXXNXNN XXXXXXXX XXXXNXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXNX XXNNN XXNNN XXXXX NXNXOHO NNNXHOO 19 OOOOOOH Zero the block below the first block and perform the corresponding ops on the rest of the matrix and RHS Note that this doesn t affect the stored entries ooooowo OOOOHOO XXXXMMM XNXXXXXX XNXXXXXX NNNXXXNN kakkkkk kakkkkk NNXXXKNN NXXXXXNX NXXXXXNX XXXXX XXXXX XXXXX XXXXX 2O OOOOOOH Perform a Gauss Jordan elimination on the diagonal block to transform it to the identity update the rest of the matrix note that this doesn t affect the previously stored block nor the zeroed block update the RHS vector and store the entries marked S 008 108 018 0010008 0001008 0000108 0000018 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXX 21 0 Continue iterating the algorithm zero the block be low the diagonal Gauss Jordan the diagonal block store the entries Make sure to keep track of RHS At the end the transformed matrix is OOOOOOH OOOOOHO OOOOHOO OOOHMMM OOOOHOOO OOOHMMMM OOOOOOl O OOOOOl OO OOOOOOHO OOOOOHOO OOOOHOOO OOOHMMMM OOOHO OOHOO OHOOO Hmmmm and the problem is ready for back substitution 2 Note that 0 Each of the main blocks is dense this means that they take a memory space of IgtltIgtlt8 bytes 2 18MB for I 15000 in double precision Reading all of the non zero entries of the matrix in memory in one go can be impossible cf in my high resolution case The largest stored array contains the entries S which must be stored before back substitution This ar ray is of size Ix k3 gtlt Ngtlt 8GB for double precision o The pros of the algorithm Pivoting makes the algorithm stable The matrix doesn t need to be read ahead of time Each block line can be read just before it is manipulated o The cons of the algorithm It is inherently serial cannot be parallelized as such If the largest stored array is larger than available memory huge loss of performance cf in my code for high resolution I x k3 x N x 8 1668 gt most processor memory 23 Parallel solutions of block tridiagonal systems see Golub amp Ortega for example 24 Let s first consider a simple tridiagonal system and write out the equations explicitly let s assume for ex ample there are N equations where N is odd 0113101 0123102 bl 021301 022362 0233103 52 0323102 033363 0343104 bs 043303 0443104 045305 2 b4 aN 1N 2 UN 2 aN 1N 1 UN 1 LN LNZUN bN l aNN 1 UN 1 CLNNZUN 5N The standard Gaussian Elimination consists in solving for 1 in the first equation substituting into the second one then solve for 2 then substitute that into the third one etc Instead Gaussian Elimination let s perform a cyclic re duction First let s take an odd equations and write all of the odd variables in terms of the even ones bl 012362 301 all b3 0323102 034364 x3 L33 bN aNN 1 UN 1 xN aNN Then if we substitute these into the remaining even equations we get a new system of equations for the even variables only x2x4xN1 25 In terms of matrix representation these operations can be viewed as X X H xxx jxm xxx xmxmx XXX HO xxx xxx xxx xxx xx 0 NMNC DN o O 1 with the equivalent operations on the RHS Coka In the example above the original 9 x 9 system has been reduced to a 4 x 4 system The idea is to continue reducing the size of the system in the same way in each cycle roughly a half of the variables are eliminated 26 If one starts with a system that is exactly of size 2N 1 2N 1 the cyclic reduction can be done exactly N 1 times to yield the central variable W71 after which back substitution can start Example Start with N 4 15 variables 3617362736373643657366736779687969736107361173612736137961479615 First cycle of reduction down to 7 variables 27m47678y10712714 Second cycle of reduction down to 3 variables 364736873612 Third cycle of reduction down to 1 variable 968 Start cycle of backsubstitution solve for 8 First cycle of backsubstitution solve for 43c12 Second cycle of backsubstitution solve for x2m6m10x14 Third cycle of backsubstitution solve for all remaining vari ables 27 Of course exactly the same technique can apply for block tridiagonal system c the operation for example b1 0123102 531 all is merely replaced by the matrix operation X1 Aff B1 A12X2 so that the matrix view of the first cycle of reduc tion becomes IS OXOX SIS XOXOX SIS XOXOX SIS XOXO SI o After the last reduction cycle only 1 block is left containing a linear system for the central vari ables recall if we start with 2N 1 variables 0 2N1 ACCXC BC That last linear system can be solved for the vector X0 and backsubstitution begins 28 The advantage is that within one cycle all operations are local ie they can all be done at the same time and only require information from 3 lines or block lines at a time Cyclic reduction is not a serial process and can be done with minimal inter processor communication 29 Inter processor communication is only necessary for the block lines at the boundaries between the processors XX XsX 019 XSXsX OIO XsWXSX OIWO XSXsWX SXSX IO 0 XSXS 010 SXSX IO SX XXX XXX XXX XWXX XXWX XXX XXX WXXX XWXX XXWX XXK XXX XXX A typical algorithm works in the following way o performs as many rounds of reductions as necessary until there are only 1 block line to manipulate left in each processor 0 the reduction then continues across processors At each round half of the processors stop working until the central block solution is found in the central processor c then back substitution begins 31 An example of the work flow for a 127 x 127 block tridiagonal system working on 8 processors would be E1 E2 E4 P1 5 4 8 8 E 3612 16 16 16 16 16 17 18 20 P2 5 20 24 24 5 28 32 32 32 32 32 32 33 34 36 P3 5 36 40 40 5 5 44 48 48 48 48 48 49 50 52 P4 52 56 56 5 60 64 64 64 64 64 64 64 65 66 68 P5 5 68 72 72 5 76 80 80 80 80 80 81 82 84 P6 5 84 88 88 92 96 96 96 96 96 96 97 98 100 P7 5 100 104 104 112 112 112 112 112 113 114 116 P8 5 116 120 120 5 5 124 127 126 32 Memory amp Scaling 1 Memory requirements In this algorithm all of the block lines in one pro cessor have to be read before the start of the back substitution A similar number of block lines must be stored for later back substitution This implies that the total memory requirement per processor is OH x I x NNp gtlt 8GB where Np is the number of processors 2 Scaling performance At the kth reduction cycle 2N C blocks are inverted At the kth backsubstitution cycle 2k blocks back substituted This implies that the longest stage of the algorithm are the first reduction cycle and the last backsub stitution cycle Though processors are idle in the later part of the reductionfirst part of the backsubstitution that stage lasts a very small percentage of the time it takes for the first cycle gt expect the algorithm to scale well with minimal inter communication provided the number of processors is much smaller than the total number of block lines 33 Three catches Catch 1 The total number of PP operations can be shown to be roughly twice that of serial Gaussian elimination gt Maximum speedup for cyclic reduction compared with Gaussian Elimination is p2 for p processors Catch 2 Optimal performance is only achieved for a set num ber of block lines and number of processors gt This limits the choice of the number of mesh points for two point BVP Catch 3 the biggy It is not possible to pivot the whole matrix properly using cyclic reduction only individual blocks can be pivoted gt Cyclic reduction can be unstable gt This iS a notorious problem for tWODOint bound ary value problems containing diffusion opera tors 34 The sweeping reduction To go beyond the intrinsic stability problem of cyclic reduction we try to combine the idea of the serial and the parallel algorithm gt Perform a sweeping pivoted Gaussian Elimina tion within each individual processor 35 Step 1 By performing a sweeping Gaussian Elimina tion from the top block line in each processor down to the astthe block tridiagonal matrix is first transformed into the following form example of 12 block lines split between three processors IS DIS 0 OIX SIS SOIS S OIS X OIX SIS SOIS S OIS X 0 36 Step 2 At the end of the sweeping step each process sends its very last block row to the following process Thus the blocks are redistributed as quotIS 015 015 OIX SIS 5015 S 015 X OIX SIS 5015 S 015 X 01 37 Step 3 Each process can then continue eliminating undesirable variables from its top block row by Gaussian elimination with the successive block rows below until the following form is achieved quotIS 015 015 OXOOOX SIS 5015 S 015 X OXOOOX SIS 5015 S 015 X 01 At this point the top rows in each process can be com bined into a much reduced block tri diagonal system Solution from here on proceeds with a cyclic reduction of the remaining blocks across processes followed by a back substitution step 38 What have we gained Stability through pivoting Speedup there is only a single step of inter processor communication before the cross processor reduc tion starts instead of k stages in the original algo rithm and similarly after the cross processor back substitution has ended Versatility the number of meshpoints is not con strained anymore and it is easy to load balance the number of block lines per processor Did we loose anything NO It s easy to show that the total number of PP Ops is exactly the same as cyclic reduction YAY 39 Performance analysis The following plot shows the time taken for 1 itera tion of the NRK algorithm working on 900 simultane ous coupled nonlinear ODEs for 2000 meshpoints dia monds and for 1000 meshpoints stars Minutes 3 i0 WOO iOOO Number of processors 4O
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'