Popular in Course
Popular in Chemistry
This 59 page Class Notes was uploaded by Lowell Harris on Saturday September 19, 2015. The Class Notes belongs to COSC 4397 at University of Houston taught by Edgar Gabriel in Fall. Since its upload, it has received 44 views. For similar materials see /class/208175/cosc-4397-university-of-houston in Chemistry at University of Houston.
Reviews for Sel Top
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/19/15
0080 4397 Parallel Computation Graph Algorithms ll EdgarGabnm Spring 2006 0080 4397 Parallel Computation Edgar Gabriel CSUH Minimum Spawning Tree Minimum Spawning Tree Spawning tree undirected Graph G being a subgraph of G containing all vertices Minimum spawning tree spawning tree with minimum weight 0080 4397 Parallel Computation Edgar Gabriel Prim s Algorithm Given GVEw and an arbitrary vertex r vT r drl O for all v V V do dv wrV while vamp V do find vertex u such that dulmindvlv V V VT VTUU for all v V V5 do dv min dv wluv end while 0080 4397 Parallel Computation Edgar Gabriel 3 C J UH Prim s Algorithm ll VT vector containing the vertices already added to the spawning tree V VT set of vertices which have not yet been added to the spawning tree d distance vector eg dii contains the minimum weight of vertex i to any vertex in the spawning tree 0080 4397 Parallel Computation Edgar Gabriel 4 C J UH Example I O 1 3 oo oo 3 1 O 5 1 oo oo 3 5 O 2 1 oo oo 1 2 O 4 oo oo oo 1 4 O 5 2 oo oo oo 5 O VT1 1 1 1 1 1 d1 5 1 oo oo 1 undefined D node considered in the following search since Vertex is already in the spawning tree 4 9 0080 4397 Parallel Computation Edgar Gabriel 5 C O UH Example ll VT1 3 1 1 1 1 d1 2 4 oo VT1 3 0 1 1 1 d 2 4 3 0080 4397 Parallel Computation Edgar Gabriel 6 C J UH Example Ill VT13 0 2 1 1 dE 1 3 VT13 0 2 4 1 dE 3 0080 4397 Parallel Computation Edgar Gabriel 7 C Example IV VT13 0 2 4 5 dE 0080 4397 Parallel Computation Edgar Gabriel 8 C Sequential implementation I IIll Given ANN N u l Vtivtcount u for i0 iltN i dii Atu i l for i1 iltN i u findu vt vtcount d vtivtcount u updated d vt vtcount N u a 0080 4397 Parallel Computation Edgar Gabriel Sequential implementation ll int findu int vtN int vtcount int dN int i j found int currentminMYINF currentminloc l for i0 iltN i for found O jO jltvtcount j if ivtj foundl if found continue if di lt currentmin currentminloc i currentmin dii i return currentminloc I l I Edgar Gabriel Sequential implementation Ill void updated int diN int vtN int vtcount int N int u int aiNN int i j found for i0 iltN i l for foundO jO jltvtcount j if ivtj foundl if found continue dti min dtii atuitiil return 0080 4397 Parallel Computation Edgar Gabriel 1 1 quotI Parallel Algorithm 1 l Each process owns one column of the adjacency matrix Each process owns the according part of the distance vector d vT is replicated on each process Ch yfindu andupdatedneedtobernodmedl d EllD A 0080 4397 Parallel Computation Edgar Gabriel 12 C S Parallel Algorithm 1 ll int findu int vtN int vtcount int d int minl2 gmian 1 rank MPICommrank MPICOMMWORLD amprank min0 d minl rank for i0 i lt vtcount i if vtli rank minlO MYINF break MPIAllreduce min gmin l MPI21NT MPIMINLOC MPICOMMWORLD 39 return gminll MPIMNLOC and MPIMAXLOC I Operators for reduction operations returning the minimummaximum value and the process owning the minimum and maximum value Special MPI data types have to be used MPI2 INT array of two integers Element zero contains minmax value Element one contains the rank of the process owning minimalmaximal value MP IFLOATINT structure consisting of a float and an int struct float val int 0080 4397 Parallel Computation Ill 14 CSUH rank Edgar Gabriel MPIMNLOC and MPIMAXLOC ll Similarly MP IDOUBLEINT MP ISHORTINT MPILONGINT Note the rank in the second element has to be set by each process for the inputvalues MPI guarantees that each process will have the same rank in the resultvector for the location of the minimummaximum even it several processes have the same minimalmaximal value 0080 4397 Parallel Computation Edgar Gabriel 15 Parallel Algorithm 1 III void updated int d int Vt int vtcount int u int alN int 1 rank MPICommrank MPICOMMWORLD amprank for i0 i lt vtcount i if vtli rank return d min d alu return 0080 4397 Parallel Computation Edgar Gabriel 16 quotI Parallel Algorithm 2 l Each process owns a certain number of columns of the adjacency matrix Each process owns the according elements of the distance vector vT is replicated on each process 039 CD3 l A l l l 7 0080 4397 Parallel Computation Edgar Gabriel 17 C 6 Parallel Algorithm 2 ll for i0 iltnx i l for found0 jO jltvtcount j if cxnxivtj foundl if found continue if dli lt currentmin currentminloc cxnxi currentmin di minlO currentmin minll rank MPIAllreduce min gmin l MPIZINT MP IMINLOC MP ICOMMWORLD 1 MPIINT gmin1 MPIBcast ampcurrentminloc MPICOMMWORLD return currentminloc Parallel Algorithm 2 III void updated int d int Vt int vtcount int u int alN int i j rank found for i0 iltnx i for foundO jO jltvtcount j if cxnxivtj foundl if found continue dli mymin dlil alullill return 0080 4397 Parallel Computation m Edgar Gabriel 1g quotI Shared Memory Parallel Programming Worksharing in OpenMP OpenMP Directives OpenMP Directives 0 We now look at OpenMP directives to create parallel code 0 Main tasks create and terminate threads parallel regions share out work in parallel regions to threads synchronize threads Some Rules for Directives They apply to next statement or set of statements May need to mark end of set of statements Structured block of statements Examples OMP statement IOMP END pragma omp statementl statement2 statement3 OpenMP Parallel Region pragma omp parallel OMP PARALLEL Master creates team of threads at entry Each thread executes the same code Each thread terminates at the end Very similar to a number of createjoin s with the same function in Pthreads Parallel Regions sequential region executed by master thread 0MP Parallel parallel region executed by team of threads 0MP End Parallel sequential region executed master thread by master thread Getting Threads to do Different Things 0 Through explicit thread identification as in Pthreads threads in team are numbered consecutively Through worksharing directives Thread Identification int 0mpgetthreadnum int 0mpgetnumthreads 0 Gets the thread id 0 Gets the total number of threads 0 id of master thread is O Example pragma omp parallel if 0mpgetthreadnum masterO else slaveO Example OMP PARALLEL PRIVATE iam np ipoints iam 0mpgetthreadnum np 0mpgetnumthreads ipoints npoints np call domywork X iam ipoints OMP END PARALLEL Work Sharing Directives 0 Share work Within a parallel region 0 Scope of parallel region is dynamic so may be in different procedure 0 Main ones are OpenMP for and DO OpenMP sections and section OpenMP workshare OpenMP For pragma omp parallel pragma omp for for 0 Each thread executes a subset of the iterations 0 All threads wait at the end of the OpenMP for Multiple Work Sharing Directives 0 May occur within a single parallel region pragma omp parallel pragma omp for f0r pragma omp for f0r 0 All threads wait at the end of each for The NOWait Qualifier pragma omp parallel pragma omp for nowait f0r pragma omp for f0r Nowait n0 synchronization at end of loop 0 Threads proceed immediately to second for OpenMP DO OMP PARALLEL OMP DO do 100p OMP END DO OMP END PARALLEL Fortran parallel D0 same as parallel for in C 0 End directive is optional but a good idea 0 All threads wait at the end of the OpenMP d0 The NoWait Qualifier OMP PARALLEL OMP DO do loop OMP END DO NOWAIT OMP DO do loop OMP END DO OMP END PARALLEL Nowait no synchronization at end of first loop 0 Threads proceed immediately to second loop Fortran90 OpenMP Workshare REAL DIMENSION 100100 A B C D OMP PARALLEL WORKSHARE A B C D OMP END PARALLEL WORKSHARE 0 Threads share work in array statements 0 End directive is optional when scope is clear but still a good idea 0 All threads wait at end of parallel workshare unless NOWAIT is specified Parallel Sections Directive OMP PARALLEL SECTIONS OMP SECTION structured block of code OMP SECTION structured block of code OMP END PARALLEL SECTIONS Divides different sections of code among threads 0 Each section is executed once by a thread Example pragma omp parallel pragma omp sections pragma omp section dOXdirO pragma omp section doydir O pragma omp section dozdir O Useful Shorthand pragma omp parallel pragma omp for f01 is equivalent to pragma omp parallel for f0r Same for parallel sections Note the Difference between pragma omp parallel pragma omp for f0r f0 pragma omp for f01 and pragma omp parallel for f0r f0 pragma omp parallel for f0r OpenMP Matrix Multiply pragma omp parallel for for i0 iltn i forlt 10 jltn j gt Ciil 00 for k0 kltn k emu aikbki OpenMP Jacobi for some number of timestepsiterations pragma omp parallel for for i0 iltn i for j0 jltn j tempij 025 grim11m gridi1m gridii1l gridii1l pragma omp parallel for for i0 iltn i forj0 jltn j grid lli temp lli OpenMP Jacobi for some number of timestepsiterations pragma omp parallel pragma omp for for i0 iltn i f0r j0 jltn j tempij 025 gridi 1j gridi1j grid lljl gridij1 pragma omp for for i0 iltn i f0r j0 jltn j gridnm tempnm1 Equivalent OpenMP Jacobi for some number of timestepsiterations pragma omp parallel updategrid 1 O updategridZO updategridl pragma omp for for i0 iltn i for j0 jltn j tempij 025 grim11m gridi1m gridii1 gridii1 Gaussian Elimination 1 0f 2 for i0 iltn i f0r ji1 jltn j f0r ki1 kltn k ajk ajk aikaij aUHj The j loop is outermost parallelizable 100p Gaussian Elimination 2 0f 2 for i0 iltn i pragma omp parallel for for ji1 jltn j f0r ki1 kltn k aUHk ajk aikaij aUHj Single Directive OMP PARALLEL OMP SINGLE structured block of code OMP END SINGLE OMP END PARALLEL Executed by only one thread 0 Other threads wait for it unless NOWAIT is specified Example for i0 iltlOO i ai fi X ga for i0 iltl i bi X h ai 0 First 100p can be run in parallel 0 Middle statement is sequential 0 Second 100p can be run in parallel Example pragma omp parallel pragma omp for for i0 ilt100 i ai fi pragma omp single X ga pragma omp for for i0 ilt100 i bi X h ai Conditional Parallelism OpenMP constructs have an implementation cost Parallelism is only useful if there is sufficient work to amortize the overhead 0 For small amount of work overhead of parallelization may exceed benefit Conditional Parallelism pragma omp parallel if expression pragma omp for if expression pragma omp parallel for if expression 0 Expression must be scalar logical Execute in parallel if expression is true otherwise execute sequentially Similar in OpenMP Fortran Conditional Parallelism Example for i0 iltn i pragma omp parallel for if n i gt 100 f0r jil jltn j f0r ki1 kltn k ajlkl ajlkl ailklailjl alilljl Scheduling of Iterations Issue 0 User may assign specific loop iterations or individual parallel sections to a thread 0 The mapping is called a schedule 0 OpenMP allows for a variety of scheduling strategies default is block chunks cyclic blockcyclic selfscheduling guided selfscheduling Scheduling of Iterations Specification pragma omp parallel for schedu1eltschedgt ltschedgt can be one of block default cyclic gss Scheduling of Iterations Specification OMP PARALLEL Do SCHEDULE ltschedgt ltschedgt can be one of STATIC block default STATIC chunk blockcyclic chunk is size DYNAMIC chunk selfscheduled GUIDED chunk size of pieces decreases down to chunk Example 0 Multiplication of two matrices C A X B Where the A matrix is uppertriangular all elements below diagonal are 0 OpenMP Matrix Multiply pragma omp parallel for schedule cyclic for i0 iltn i forlt 11 jltn j gt Ciil 00 for ki kltn k Ciil aikbklil OpenMP Directives Parallelization directives parallel region parallel for do worksharing sections section single ordered master 0 We have seen directives for distribution of work OpenMP Directives More directives eXist for creating local data and synchronizing threads 0 Data environment directives shared private firstprivate threadprivate reduction etc Synchronization directives barrier critical atomic flush nowait