Physics For Future Presidents
Physics For Future Presidents ECE 49500
Popular in Course
Popular in Electrical Engineering & Computer Science
This 14 page Class Notes was uploaded by Cassidy Casper on Saturday September 19, 2015. The Class Notes belongs to ECE 49500 at Purdue University taught by Rudolf Eigenmann in Fall. Since its upload, it has received 49 views. For similar materials see /class/207889/ece-49500-purdue-university in Electrical Engineering & Computer Science at Purdue University.
Reviews for Physics For Future Presidents
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/19/15
Section 7 Loop Parallelization Techniques DataDependence Analysis DependenceRemoving Techniques Parallelizing Transformations Performanceenchancing Techniques Loop Parallelization ECE 4958 Fall 2008 Some motivating examples Doi1 n ls itlegal to ai bi S1 Run the i loop in parallel ci ai1 S2 Put 82 first in the loop End do Is it legal to DO I 1 n Fuse the two i loops ai bi End do In general it is desirable to determine if two references access the same memory DO I 1 n location and the order they execute so that we can determine if the references 00 304 might execute in a different order after End do some transformation Dependence an example Doi1n 30 W 31 Indicates dependences ie CO 8104 S the statement at the head 2 gt of the arc is somehow End do dependent on the statement at the tail i1 i2 i3 i4 i5 i6 W W W W b5 b6 a1 a2 a3 a4 a5 a6 aoa1a2gt a3a4ia5gt 01 02 03 04 05 06 Loop Parallelization ECE 4958 Fall 2008 Can this loop be run in parallel Assume 1 iteration per processor then if for Do i 1 n some reason some iterations execute ai bi S1 out of lockstep bad things can happen Ci ai1 S2 In this case read of a2 in i3 will get an End do invalid value i 3 b3 i1 i2 a3 i4 i5 i6 W W a2b4 b5 b6 a1 a2 03 a4 a5 a6 aoa1 a3a4 a5gt 01 02 04 05 06 v time Loop Parallelization ECE 4958 Fall 2008 Can we change the order of the statements Doi1n Doi1n ai bi S1 ci ai1 S2 ci ai1 S2 ai bi S1 End do End do No problem with a serial execution Access order before statement reordering bm a1aoc1 b2a2a1 C2 H W as a2 C3 b4 a4 as C4 i1 i2 i3 i4 Access order after statement reordering ao cm b1a1 N am e 2 b2 a2 a2 C3 b3 a3 II a3 C4 b4 a4 i i2 i3 i4 5 Loop Parallelization ECE 4958 Fall 2008 Types of dependence a2 Flow or true dependence data for a read comes from C a2 a previous write writeread hazard in hardware quot terms a2 Antidependence write to a location cannot occur C before a previous read is finished a2 a2 Output dependence write a location must wait for C a previous write to finish a2 Dependences always go from earlier in a program execution to later in the execu on Anti and output dependences can be eliminated by using more storage6 Loop Parallelization ECE 4958 Fall 2008 Eliminating antidependence a2 Antidependence write to a location cannot occur C before a previous read is finished a2 Let the Program in bei The new program is a2 Create additional storage a2 a2 to eliminate the anti H dependence quot a a2 2 a2 aa2 No more antidependence Loop Parallelization ECE 4958 Fall 2008 Getting rid of output dependences a previous write to finish C a2 a2 Let the program be Again by creating new a2 storage we can a2 eliminate the output a2 dependence a2 Loop Parallelization ECE 4958 Fall 2008 Output dependence write a location must wait for The new program is a2 a2 aa2 aa2 Eliminating dependences In theory can always get rid of anti and output dependences Only flow dependences are inherent ie must exist thus the name true dependence In practice it can be complicated to figure out how to create the new storage Storage is not free cost of creating new variables may be greater than the benefit of eliminating the dependence Loop Parallelization ECE 4958 Fall 2008 An example of when it is messy to Doi1n a3i1 a2i ai End do Loop Parallelization create new storage A3i writes locations 2 5 8 11 14 17 20 23 A2i writes locations 2 4 6 8 10 12 14 16 18 20 22 Ai reads from outside the of loop when i 1 3 7 9 13 15 19 21 Ai reads from a3i1 when l 5 11 17 23 Ai reads from a2i when l 2 4 6 8 1012 14 1618 20 22 ECE 4958 Fall 2008 Can we fuse the loop Do i 1 n Do i 1 n In original execution of ai W 31 ai 30 31 the unfused loops End do Ci ai1 2 1 Ai1 gets value 0 i End do assigned in ai C0 304 32 2 Can t overwrite End do value assigned to aI or c 3 Bi value comes from outside the loop 1 Is ok after fusing because get ai1 from the value assigned in the previous iteration 2 No output dependence on ai or ci not overwritten 3 No input flow or true dependence on a bi so value comes from outside of the loop nest 11 Loop Parallelization ECE 4958 Fall 2008 Data Dependence Tests Other Motivating Examples Statement Reordering Loop Parallelization can these two statements be Can the iterations of this swapped loop be run concurrently DO i11002 DO i11002 B2i B2i B3i B2i B3i ENDDO ENDDO An array data dependence exists between two data references iff both references access the same storage location at least one of them is a write access Loop Parallelization ECE 4958 Fall 2008 Dependence sources and sinks The Sink of a for i1 i lt n i dependenceisthe 39 statement at the ai1 head of the dependence arrow am The source is the am am statement at the tail a1 of the dependence am am arrow a4 a3 Loop Parallelization ECE 4958 Fall 2008