Parallel Computations COSC 6374
Popular in Course
Popular in Chemistry
This 42 page Class Notes was uploaded by Lowell Harris on Saturday September 19, 2015. The Class Notes belongs to COSC 6374 at University of Houston taught by Edgar Gabriel in Fall. Since its upload, it has received 67 views. For similar materials see /class/208169/cosc-6374-university-of-houston in Chemistry at University of Houston.
Reviews for Parallel Computations
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
i PSTL COSC 6374 Parallel Computation Performance Oriented Software Design Edgar Gabriel Spring 2009 Edgar Gabrlel P TL Amdahl s Law Describes the performance gains by enhancing one part of the overall system code computer Performance of entire task using the enhancement Speedup Performance of entire task not using the enhancement Or Execution time of the task not using the enhancement Speedup Execution time of the task using the enhancement cost 74 Parallel computatlorr Edgar Gabriel Amdahl s Law II Amdahl s Law depends on two factors Fraction of the execution time affected by enhancement The improvement gained by the enhancement for this fraction Thus F t39 Execution timem Execution timemg 17 Fractionm M 122721 7 7 S edupm Execution time 1 Speedupmmu g 122722 Execution 7 timem 1 7 Frac onwh F motion h Speedupwh cost 74 Parallel Computatlon Edgar Gabriel PSTL Amdahl s Law III Speedupowzmll Fraction 17 Fractionm 7 2quot Speedupm Fraction enhanced 20 Fraction enhanced 40 1 Fraction enhanced 60 chtion enhanced 80 Speedup total 0 N Speedup enhanced Amdahl s Law IV Speedup according to Amdahl39s Law Speedup total ls o N 0 02 04 06 08 1 Fraction enhanced cost 74 Parallel Computatlon Edgar Gabriel PVSTL Three big questions Which are the most time consuming code section How efficient are those routines What are the reasons for inefficiency cost 74 Parallel Computatlon Edgar Gabriel r 39 Wh1ch are the most t1me consum ll gL c e section Need to prometheapp can Standard tool In UN the e valgmd on nvvronment garof Valgrm r CDHECUDH m varmus tanks m analyze an appncauan at rumva Lnnlm2mch m mury d2 LDDlcach2gnnd 25mm an the appncauan 2r cache usage m an mmcaugnnd pmwdes a trace m the mncuan cans mm mm pruduce an Dutput 2 cachegnnd ltprDcvdgt DUO kca chagan wsuanmmn mm m valgrmd Dutput M25 How to determine the sources of overhead Get detailed data for different sections of the routine get an estimate on the number of operations executed within these section Scaling issues For each process we might end up with o a large no of time stamps eg k per process a large no of measurements per time stamp eg m per time stamp Execution time of MPI functions various PAPI counters user defined values This leads to n k m data values for the performance analysis 0050 74 Parallel Computatlon Edgar Gabriel PV TL Data reduction for performance Analysis Data reduction for the number of processes analyzed Find processors exposing the same behavior and focus on the performance analysis of a single processor of each group Data reduction per process Eliminate the measurements exposing the same information Data reduction in time Find a small typical cycle in the application and ignore the rest Automatic statistical methods inevitable cluster analysis 0050 74 Parallel Computatlon Edgar Gabriel K KKK KHKSKKKKKKK mw Kmmm cpxli KK K m m 1 K KKK K KKK xle ui t A x mm pl Kw in m x mm K imam xxu tu ynopx KSKKKKKKKKSKK KKKKKKKKKKKKK I Mauepwe 9500 BM op eJeqM 113d u P TL I PAPI hardware performance counters I Modern prucemr awe a Dme cuunter vmvch gvve Dme vnfurmauun about the perfurmance r Lvmvtad number ur cnunters r nnuuaneuus cnunters and me suppnrtad v nun ur hardware cuuntars dependmg an the r r u u s cnmb na prucessu Avauame on man mudern uperaung ytem r Lmugtlt requvres recumpmngthe kernel 7 Wmduws wurks n htaway huwever nutveryaccurate Requvre mudv catvun or yuur Durce code to mert the PAW can F STL General Counters PAPliFPioPS Floatlng pulnt uperatluns 7 PAPIi39rcq icYc Total cycles inn Hw mm Hardware luterrupts 39 quotifa39r39i m ii39m max My ZIH fm quot quot39 In auyaimmamwm gnawmurprme nscr completed awn W am pm max m mumum anaramm g F STL I FP Instruction Counters eeaege me alaaema Fame maeaamaha ea e1 me me a aema Fame halel 1 rPAPIJ Nvilns Fleaelhg pelhe inverse ammew mlavmammuxumhv Cache Counters iPAPIiLU Z3DITCMHARW cache level 123 9mm aaealheeauceleheeeal cache MHARW misesmtsaccesses aeaelewueee rPAyliLD Z BJJLD sTJM cache level 123 Lmsm loadstore meeee e ganja D 1 m p efetch cache mleeee ammew PAPI manual example mummygmu mgvenicummy Query and set up hE right events ca manlcax 1 mama z m ms g Pmmwgmmgm ms mm Emshm 7 7 PAPIiscaxticauncexs lnr 1 Events quotUnixNEWS n Execute the tea cade daiflapsluu iFLuPs PAPIixeadicauncexsvalues quotauxems Mum Ty uble a a 1 lta l mezzwmMquot Example low FP rate due to FF exceptions I Consequences for software design rdvmermonal eLLoceuon m 00 cheL cude equence mmemx r memury eLLuceLed mvgnt nut be cunuguuu r luwer perfurmance PSTL uh e m mallm L m slzeafldauble Llt a L 1 1m memxm Ldauble L mallac L m2 meetLaeumgLL Consequences for software design Alternative allocation technique double matrix double data data double mallocdiml LdimZ Lsizeofdouble 39 matrix double h malloc diml5izeofdouble H for i0 iltdiml i matrix amp dataidim2 cost 74 Parallel Computatlon Edgar Gabriel PVSTL Consequences for software design Inner loop should go over the outmost index of multi dimensional arrays in CC correct version for i0 iltdiml i for j0 jltdim2 j matrixi j wrong version for j0 jltdim52 j i for i0 iltdim51 i matrixi j l Edgar Gabriel i PSTL What shall you do if one variable requires access along the row and one variable along the columns for i0 iltdim i for j0 jltdim j for k0 kltdim kgt Clillj alillk blkllj Blocked code versions optimize cache usage for i0 iltdim iblock for j0 jltdim jblock for k0 kltdim kblock for iii iiltiblock ii for jjj jjltltjblockgtP jjgt r kkk kkltkblockkk Clii jj Jr alii kk L blkk jj 005 74 Farallelcomputztlon gtu r 7 III sugarcane qual Ill PSTL Comparison operators Comparing integer values is orders of magnitudes faster than comparing strings map options to integers and use if or switch statements avoid st rcmp or similar functions wherever possible Avoid unnecessary memory copy operations minimizing memory footprint improves cache behavior passing pointers to a subroutine instead of making a copy of the data array might have however a negative impact on loops within the subroutine since the compiler does not know boundaries of the arrayloop cost 74 Parallel Computatlon Edgar Gabriel ps rL Object structures Rule of thumb it is better to have an object containing a vector of data than having a vector of objects with one data point each fewerindirections better cache usage cosc 74 Farallelcomputztlon H EdgarGabnel PETL COSC 6374 Parallel Computation Parallel O I O basics Edgar Gabriel Spring 2009 PSTL Concept of a clusters Compute node message passing network administrative network 0050 74 Parallel Computatlon Edgar Gabriel O Problem I Every node has its own local disk Most applications require data and executable to be locally available eg an MPI application using multiple nodes requires executable to be available on all nodes in the same directory using the same name Multiple processes need to access the same file potentially different portions efficiency cost 74 Parallel Computation Edgar Gabriel Ill l t PV TL Basic characteristics of storage devices Capacity amount of data a device can store can readwrite in a certain amount of time Transfer rate or bandwidth amount of data at which a device Access time or latency delay before the first byte is moved Prefix Abbreviation Base ten Base two kilo kibi K Ki 10quot3 2quot101024 Mega mebi M Mi 10quot6 2quot20 Giga gibi G Gi 10quot9 2quot30 Tera tebi T Ti 10quot12 2quot40 Peta pebi P Pi 10quot15 2quot50 cost 74 Parallel computation Edgar Gabriel UNIX File Access Model o A File is a sequence of bytes o When a program opens a file the file system establishes a file pointer The file pointer is an integer indicating the position in the file where the next byte will be written read o Disk drives read and write data in fixedsized units disk sectors o File systems allocate space in blocks which is a fixed number of contiguous disk sectors o In UNIX based file systems the blocks that hold data are listed in an inode An inode contains the information needed to find all the blocks that belong to a file o If a file is too large and an inode can not hold the whole list of blocks intermediate nodes indirect blocks are introduced cosc 74 Farallelcomputztlon 7 Edgar Gabriel PSTL Write operations Write the le systems copies bytes from the user buffer into system buffer If buffer lled up system sends data to disk System buffering allows file systems to collect full blocks of data before sending to disk File system can send several blocks at once to the disk delayed write or write behind Data not really saved in the case of a system crash For very large write operations the additional copy from user to system buffer couldshould be avoided cost 74 Parallel Computatlon Edgar Gabriel Read operations Read File system determines which blocks contain requested data Read blocks from disk into system buffer Copy data from system buffer into user memory System buffering file system always reads a full block file caching If application reads data sequentially prefetching read ahead can improve performance Prefetching harmful to the performance if application has a random access pattern cost 74 Parallel Computatlon Edgar Gabriel PSTL Dealing with disk latency Caching and buffering Avoids repeated access to the same block Allows a file system to smooth out IO behavior Helps to hide the latency of the hard drives Lowers the performance of IO operations for irregular access Nonblocking 0 gives users control over prefetching and delayed writing Initiate readwrite operations as soon as possible Wait for the finishing of the readwrite operations just when absolutely necessary cost 74 Parallel computation Edgar Gabriel Improving Disk Bandwidth disk striping Utilize multiple hard drives Split a file into constant chunks and distribute them across all disks Three relevant parameters Stripe factor number of disks Stripe depth size of each block Which disk contains the first block of the file Block1 BlockZ Block3 Block n Disk 1 isk 2 Disk 3 Disk 4 cost 74 Parallel Computatlon Edgar Gabriel PV TL Disk striping Ideal assumption bN P P bN 1 with N number of bytes to be written b bandwidth p number of disks Realistically bNP lt P bN 1 since N is often not large enough to fully utilize p hard drives networking overhead cost 74 Parallel computation Edgar Gabriel PSTL Two levels of disk striping I Using a RAID controller Hardware typically a single box number of disks 3n cps scm 7 mid commie Saga Bahia r A Redundant arrays of independent disks AID Goals improve reliability and performance of an ID system im rove erforrnance of anllD system Several RAID levels defined RAID 0 disk striping widwout redundant storage JBOD just a bunch of d39sks No fault tolerance Good for high transfer rates ie readwrite bandwid w of a single large file Good for high request rates ie access time to many small files RAID 1 mirroring All data is replicated on two or more disks Does notimprove write performance and just moderately the read performance EZE Taa Emma s m cost 74 Parallel Computatlon Edgar Gabriel RAID level 2 o RAID 2 Hamming codes Each group of data bits has several check bits appended to it forming Hamming code words Each bit of a Hamming code word is stored on a separate disk Very high additional costs eg up to 50 additional capacity required o Hardly used today since parity based codes faster and easier IIll PVSTL RAID level 3 o Parity based protection Based on exclusive OR XOR Reversible Example 01101010 data byte 1 XOR 11001001 data byte 2 10100011 parity byte Recovery 11001001 data byte 2 XOR 10100011 parity byte 01101010 recovered data byte 1 cost 74 Parallel Computatlon Edgar Gabriel RAID level 3 cont o Data divided evenly into N subblocks N number of disks typically 4 or 5 o Computing parity bytes generates an additional subblock o Subblocks written in parallel on N1 disks o For best performance data should be of size N sector size o Problems with RAID level 3 All disks are always participating in every operation gt contention for applications with high access rates If data size is less than Nsector size system has to read old subblocks to calculate the parity bytes o RAID level 3 good for high transfer rates 0050 74 Parallel Computatlon Edgar Gabriel 5 g PETL RAID level 4 o Parity bytes for N disks calculated and stored o Parity bytes are stored on a separate disk o Files are not necessarily distributed over N disks o For read operations Determine disks for the requested blocks Read data from these disks o For write operations Retrieve the old data from the sector being overwritten Retrieve parity block from the parity disk Extract old data from the parity block using XOR operations Add the new data to the parity block using XOR Store new data Store new parity block o Bottleneck parity disk is involved in every operation cost 74 Parallel Computatlon Edgar Gabriel RAID level 5 Same as RAID 4 but parity blocks are distributed on different disks l l l l cost 74 Parallel Computatlon Edgar Gabriel PVSTL RAID level 6 Tolerates the loss of more than one disk o Collection of several techniques Eg PQ parity store parity bytes using two different algorithms and store the two parity blocks on different disks o Eg Two dimensional parity Parity disks Mill Hill quotDID INDIE cost 74 Parallel Computatlon Edgar Gabriel i PSTL RAID level 10 o Is RAID leve 1 RAID level 0 RAID 1 mirroring g 3 a RAID 0 striping o Also available RAID 53 RAID 0 RAID 3 005 74 Farallelcomputztion gtr w l mm ragt iii Comparing RAID levels RAID Protection Space usage Good at Poor at level 0 None N Performance Data protect I Mirroring 2N Data protect Space effic 2 Hamming codes 15N Transfer rate Request rate 3 Parity N1 Transfer rate Request rate 4 Parity N1 Read req rate Write perf 5 Parity N1 Request rate Transfer rate 6 PQ or 2D N2 or Data protect Write perf MNMN 10 Mirroring 2N Performance Space effic 53 parity Nstriping Performance Space effic factor m EdgarGabnel PSTL Two levels of disk striping II I Uslng a parallel le system 7 expuses tne lndlvldual unlts capable ufhandllng data uflen called stenage servers we nudes r eaen smnage server mlgvl use multlple hard drlves underneath tne huud te lncrease lts neadwne b an WY 7 Netadatasenvenwmenkeepstnaeketwmenpantsetanle are un Wmen stura e server 7 lngle dlskfallura less ufa prublam lfeach server uses underneath tne hand a RAlD 5 stenage system Parallel File Systems Conceptu al ok Campute nudes Melardam server a as r smyeeeeemeyu m 39m la a ma Smlaaewe a ma m smyeeeeemeyz k 2 mm m m smyeeeeemeya Isa g r39STL I File access on a parallel file system Cam puts nude Meladata Sen21 mm m APDlVCiuan all u m us maue ma Velaam m Mae 10v mlwmapmm MD a My enm ange lD aw Us Send din a ange a m r Disk striping Reqwamams a varave Derfarmance af llo averatmns usmg dvskstrvvmg a Nultvvlevhysmaldvsks 7 Have a balance netwarkhandwvdth and we handwvdth Prahlemafsvmvledvsksmvmg a far a ma le me the numberafdvsks whvch can be sad m Parallel lsllmaa Pmmmenlparallel lesystems a wrsz a Lustre curs a NFSvA 2 lnew standard currently hemgratv edl a wwwm r bTL I Distributed vs Parallel File Systems Dvsmhuted Fvle Systems 7 o evenness m a tulleman M me an vemate macmne r vypmauycnemgwey haxedavvm 7 van ventmvtheuxev am mm m I Network File System 235ng my PSTL Parallel vs Distributed File Systems Concurrent access to the same file from several processes is considered to be an unlikely event Distributed file systems assume different numbers of processors than parallel file systems Distributed file systems have different security requirements than parallel file systems 0050 74 Farallelcomputztlon Edgar Gabriel Ill COSC 6374 Parallel Computation Introduction to MPI III Process Grouping Edgar Gabriel Spring 2009 Edgar Gabriel 1 PVSTL Terminology I an MPIiGroup is the object describing the list of processes forming a logical entity a group has a size MPIiGroupisize every process in the group has a unique rank between 0 and size of group 1 MPIiGroupirank a group is a local object and cannot be used for any communication cost 74 Parallel computauon Edgar Gabriel 1 39 M39STL Termmology H I An PIitamlumcataeran mamvrg r anemwa yauvsmvmceweLmtmmmttr mmme Acammumcatarh a an errar handlerattachedta n Acammumcataroanhavea name MA thexexhdexfacu an mommmumcmm e the m m Dammvatmgvmcesxe can he degmined by a mgle amuv h PSTL I Prede ned commumcator mm mm mm W r r W he mm need em mm 7 mmmm w ma mmgamay r emupanexmngmgmc r can nut he madman heed eh l 33me I PSTL I reating newcommunicamrs AH cammmmmwm m aememm Hylicumiwukw ar plicumisur freemg 3 mm m a mum an Wm wfthe mm mmch have mcaummmmmgmemm a r creahrgxuhgmuvs m the angmal cammumcatar r r2amenng m Nessie based an capmugymrmatmn awn new Frame m1 7 canned Maapphmtmm and mergethmr b W1 am mumcatars l EESSLW W W39 39 PizTL Splitting a communicamr 1nn m m m mm mm Partmun cummtu subr ummumcaturs r Dmuexxeshavvrgthexamecalaxvm hemthexame r erpmcesxeswnhthesamecalaxaccar vrgtathe xeyvame r vftheknyaluevxvde amecalaxythe meal ans mums mm 2 Same arderfarth uiedaim mum n e Dmuessesw he 391 m I 2 a l J3 11 Example for MPICommspll39t I MP17Comm newcomm int color rank MP17Comm7rank MP17COMMiWORLD amprank color rank2 MP17Comm75plit MPIJIOMMJJORLD color rank ampnewcomm MP17Comm75ize newcomm amp5ize MP17Comm7rank newcomm amprank oddeven splitting of processes a process can just be part of one of the generated communicators can not see the other communicators can not see how many communicators have been created m 0050 74 Parallel Computatlon Edgar Gabriel U1 PSTL Example for MPICommspll39t II rank and size of the new communicator MPlicOMMiTJORLD newcomm color0 size newcomm colorl size cost 74 Parallel Computatlon Edgar Gabriel Invalid color in MPICommsplit If a process shall not be part of any of the resulting communicators set color to MPLUNDEFINED e newcomm will be MPLCOMNLNULL MP17COMM7NULL is an invalid communicator any function taking a communicator as an argument will return an error or abort if you pass MP17COMM7NULL ie even MP17Commisize and MP17Comm7rank or MP17Comm7free cosc 74 Parallelcomputztlon m m Edgar Gabriel Cs EH Modifying the group of processes l original communicator Extract the grou of processes from the origina communicator Modify the group Create new communicator based on the modified group new com m unicator cosc 74 Parallelcomputztlon Edgar Gabriel cost 74 Parallel Computatlon Edgar Gabriel a Extracting the group of processes MPIACommAgroup MP17Comm comm MPIiGroup groupl with 7 com original communicator 7 group the group object describing the list of participating processes in com Ill PSTL Modifying groups I MPIiGroupiincl MPIiGroup group int cnt int rank5 MPliGroup newgroup MPIiGroupiexcl MPIiGroup group int cnt int rank5 MPliGroup newgroup with 7 group the original group object containing the list of participating processes 7 ranks array of integers containing the ranks of the processes in group which shall be included in the new group for MPliGroupii ncl excluded from the original group for MPliGroupiexcl 7 newgroup resulting group 0050 74 Parallel Computatlon Edgar Gabriel i Modifying groups II for more groupconstructors see also 7 MPliGroupirangeiincl 7 MPliGroupirangeiexcl 7 MPliGroupidifference 7 MPliGroupiintersection 7 MPliGroupiuni on cost 74 Farallelcomputztlon Edgar Gabriel PSTL Creating a new communicator based on a group MP1 Comm create MP1 Comm comm MP1 Group newgroupj 7 7 MPlicomm newcomm with 7 com original communicator 7 group the group object describing the list of processes for the new communicator 7 newcom resulting communicator Note 7 newcomm is always a subset of com you can generate one communicator at a time in contrary to MP17Commisplit list of arguments has to be identical on all processes of comm 7 newcomm will be MP1 COMM NULL for processes which have been excludedrTot inc lded in newgroup cost 74 Parallel Computatlon Edgar Gabriel i Example for MPCommcreate generate a communicator which contains only the first four processes and the last process of the original communicator MP ricerMillie RLD newcomm size 5 cost 74 Parallel Computatlon Edgar Gabriel PV TL 1st Option using MPGroupincl MP1 Comm newcomm MPIGroup group newgroup int color Size rank55 cnt MP17Comm75ize MPIJIOMMJJORLD amp5ize cnt 5 rank50 0 rank5l l rank52 2 rank53 3 rank54 Sizeil MP17Comm7group MPIJEOMMiWORLD ampgroup MPliGroupiincl group cnt ranks ampnewgroup MP17Comm7create comm newgroup ampnewcomm if newcomm l MP17COMM7NULL MP17Comm7rank newcorer ampnrank MP17Comm7free ampnewcomm MPliGroupifree ampnewgroup MPIiGroupifree ampgroup V 2nd Optlon us1ng MPGroupexcl MP17Comm newcomm MPIiGroup group newgroup int color Size rank5 cnt L NOTE Assuming that Size gt5 ranks is large enough etc MP IfCommis iz e MP IfCOMMiiJORLD amp 5 iz e cnt 0 for i4 ilt5izeil i rank5cnt I MP17Commigroup MP17COMM7WORLD ampgroup MPliGroupiexcl group cntil ranks ampnewgroup MP17Comm7create comm newgroup ampnewcomm if newcomm l MPIJIOMMiNULL MP17Comm7rank newcomm ampnrank MP17Comm7free ampnewcomm MPliGroupifree ampnewgroup l f MP IiGr oupifree amp group PSTL Freeing groups and communicators MP17Commifree MP17Comm comm MPliGroupifree MPliGroup group return MP17COMM7NULL respectively MPliGROUPiNULL MP17Comm7free is a collective function MPliGroupifree is a local function cost 74 Parallel computauon Edgar Gabrlel Topology information in communicators Some application scenarios require not only to know who is part of a communicator but also how they are organized Called topology information 1D 2D3D cartesian topology What are the extent of each dimensions Who are my leftright upperlower neighbors etc Yes its easy to do that yourself in the application Position Xdirection coordx rank 96 nx Position in ydirection coordy floor rank nx cosc 74 Parallelcoiiipuiatiori Edgar Gabriel Ill 15 g PSTL MPICartcreate MPIicarticreate MPI Comm comm int ndims J int7dims int periods int reorder MP17Comm newcomm o Create a new communicator having a cartesian topology with e ndims dimensions Each dimension having dims i processes i0 ndims l 7 periods i indicates whether the boundaries for the i th dimension are wrapped around or not 7 reorder flag allowing the MPI library to rearrange processes quotdlmxi Note If H dzmsz lt Size of com some processes Will not 0 be part of newcomm cost 74 Parallel computation Edgar Gabriel Example for using MPICartcreate y Consider an application using 12 processes and arranging the processes in a 2D cartesian topology int ndims2 dims2 43 i00 no periodic boundaries reorder0 no reordering of processes MP17Comm newcomm l l l 555 rrrrrr c D H l 0 Q D J MP17Cart create MPIicoMMilJORLD ndims periods di s reor er ampnewcomm PVSTL Who are my neighbors easy to determine by hand for low dimensions eg npx no of procs in x direction quotI7 nlz rank 1 no of procs in y direction y n rank 1 right nap rank 11px ndgwn rank 11px more complex for higher dimensional topologies special care needed at the boundaries cost 74 Parallel Computatlon Edgar Gabriel i PSTL Who are my neighbors MP17Cart4shift MPIAComm comm int direction int distance int leftn int rightn with 7 direction dimension for which you would like to determine the ranks of the neighboring processes 7 distance distance between the current process and the neighbors that you are interested in 7 ieftn rank of the left neighbor in com 7 rightn rank of the right neighbor in com o if a process does not have a leftright neighbor eg at the boundary ieftn andor ri ghtn will contain MPliPRociNULL cosc 74 Farallelcomputztlon Edgar Gabriel i PSTL Example for using MPICartshift continuing the example from MP Cartcreate int ndims2 int dims2 43 int periods2 00 no periodic boundaries int reorder0 no reordering of processes MP17Comm newcomm int nleft nright nup nlow int distancel We are intersted in the direct neighbors of each process MP17Cart create MPI7COMM7WORLD ndims periods ns reorder ampnewcomm MP17Cartishift newcomm 0 distance ampnleft ampnright MP17Cartishift newcomm 1 distance ampnup ampnloW Now you can use nleft nright etc for communication MPlisend buf cnt dt nleft 0 newcomm n rvv MPITopotest MPIiTopoites d MPIicomm comm int topo type J o How do I know whether a communicator also has topology information attached to it topoitype is one of the following constants e MPIicART Cartesian topology 7 MPIiGRAPH General graph topology 7 MPIiUNDEFINED no topolgoy has not been created with MPICartcreate or other similar functions cost 74 Parallel Computatlon Edgar Gabriel In 1a g PSTL MPIDl39mscreate MPIiDimsicreate int np int ndims int dims J How do distribute np processes best in ndims dimensions 7 mp number of process for which to calculate the distribution 7 ndims number of cartesian dimensions 7 dims array containing the extent of each dimension after the call dimensions are set to be as close to each other as possible you can force a certain extent for a dimension by setting its value only dimensions which are initialized coscuqmezgmawill be calculated rquot E agar Gabriel int MP1 MPICommsize MPICOMMWORLD ampsize MPIpimscreate size ndims dims MP17Cart7create MP17COMM7WO dim MPICartshift newcomm 1 distance ampnup ampnlow i PSTL Final example Extend the previous example to work for arbitrary number of processes ndims2 dims2 00 calculate both dimensions periods2 00 no periodic boundaries reorder0 no reordering of processes nleft nright nup nlow distancel We are interested in the direct neighbors of each process 0 RLD nd ms periods der ampnewcomm Cart shift newcomm 0 distance ampnleft ampnright Ill cost 74 Parallel Computatlon Edgar Gabriel PSTL What else is there Creating a communicator where the processes are ordered logically as described by a directed graph using MPliGraphicreate Creating a communicator consisting of two process groups also called an intercommunicator local and remote group have however separate ranking scheme you have two processes having the rank 0 one in the local group and one in the remote group Dynamically adding processes MP17Commispawn Connecting two independent applications MP17Comm4connectMP17Comm4accept
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'