Note for CMPSCI 691 at UMass
Note for CMPSCI 691 at UMass
Popular in Course
Popular in Department
This 84 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 15 views.
Reviews for Note for CMPSCI 691 at UMass
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
26 Chapter 2 REGIONS AND THE ZPL LANGUAGE This chapter describes the ZPL language concentrating on the role of regions in its design To make this discussion both readable and precise some language concepts are introduced informally at rst and are then reconsidered with increased formality as the chapter progresses While this format would not be appropriate for a language reference manual it is designed to provide an appropriate mixture of clarity and precision for this presentation Note that this chapter focuses on the sequential interpretation of ZPL largely ignoring the parallel implications of regions and the language itself Since parallelism is inherent in the de nition and use of regions this will leave some questions unanswered at the chapter s conclusion These questions will be addressed in the following chapter which describes the parallel implications of regions This chapter s description of ZPL is meant to provide a general overview of the lan guage For a more complete description refer to the ZPL Programmer s Guide and the ZPL web page Sny99 ZPLOl The structure ofthis chapter is as follows Sections 21 2 14 describe the ZPL language including such fundamental concepts as regions arrays and array operators Section 215 illustrates ZPL s use in a number of small sample applications that will be used in subse quent chapters Section 216 describes other sequential approaches for array programming including vector indexing and slicing Finally Sections 217 and 218 provide an evaluation of ZPL s features in the sequential context listing both bene ts and liabilities of the region as it currently exists This chapter s contents serve as an expanded discussion of work that was published previously CLLS99 CLS99 27 21 ZPL s Guiding Principles Languages are for Communicating One of the primary principles that has guided ZPL s development is the notion that pro gram ming languages are meant to be a means of communication between human and computer Programmers have algorithms in their minds that they would like to execute on a computer Computers have nite resources and an extremely limited capacity for understanding high level languages Programming languages should form a bridge between these two points spanning the gap between programmer and computer using a direct natural route that com plements the abilities of both When this principle is violated communication is broken and a heroic effort is required by the user andor compiler if the program is to have its intended effect Such broken languages can result in macho compiling in which tremendous effort is put into helping a compiler recognize idioms that are not made apparent by the language and to implement them ef ciently These efforts tend to result in brittle optimizations that are easily broken if the programmer does not stick to the speci c set of idioms that the com piler recognizes Lew00 When the optimization does not re programmers must expend great effort to achieve their desired results Frustration abounds for both programmers and compiler implementors In contrast creating a language that is natural to compile to a given architecture allows implementors more time to work on general improvements and optimizations rather than worrying about particular syntactic patterns or corner cases It should be noted that most programming languages which have enjoyed widespread use have not relied on sophisti cated compiler optimizations to achieve acceptable baseline performance ZPL strives to implement this principle for parallel programming by providing a syntax that directly re ects parallelism This allows users to express the parallelism that is inherent in their algorithms and to evaluate the parallel overheads of their programs It also allows ZPL s implementors to detect parallelism trivially and create a straightforward baseline 28 implementation By avoiding the recognition problem implementors can concentrate their efforts on optimizations that improve the baseline implementation The False Seduction ofLegacy Code Reuse Many parallel computing approaches have been designed in hopes of taking advantage of existing sequential codes with minimal programmer effort For example a perfect par allelizing compiler would transform sequential programs into parallel code automatically Similarly languages such as High Performance Fortran HPF Hig94 and CoArray For tran CAF NR98 were designed with the idea of leveraging existing code as a primary goal Ideally programmers can take their existing sequential programs make minimal modi cations to them and end up with a good parallel implementation While this is a laudable goal the assumption that incremental changes can turn a good sequential algorithm into a good parallel one is naive The seductive pitch of these ap proaches is that the compiler will do all of the hard work for you once you add a line of code here or there to help it out The reality of the situation is that the work required to transform sequential programs into an optimal parallelizable form is often nontrivial for both the programmer and the compiler FJY 98 This effect is demonstrated by the con ceptual leap between the sequential and SUMNIA matrix multiplication implementations of Chapter 1 Often a parallel code bears little resemblance to its sequential counterpart In such cases the effort required to convert a sequential program into an effective parallel one can be greater than that which would have been required to write a new program from scratch with parallelism in mind Starting from First Principles ZPL approaches this problem from the opposite direction Rather than starting with a se quential language and striving to detect the parallelism inherent in its sequential syntax ZPL s design starts with nothing and incrementally adds concepts and operations that are 29 Listing 21 Simple Type Constant and Variable Declarations in ZPL type age shortint coord record x integer y integer end constant pi double 314159265 tabsize integer 1000 maxage age 128 var done boolean length integer name string origin coord table array 1tabsize of complex implicitly parallel By starting from rst principles in this way ZPL was able to avoid supporting language constructs that disable parallelism As an example ZPL does not per mit traditional scalar indexing of its parallel arrays due to the fact that it is an inherently sequential construct This approach forces programmers to consider the opportunities for parallelism in a program from its inception rather than doing the minimal amount of work to get the compiler to accept their sequential code and then spending hours with feedback tools trying to determine why it is not achieving good parallel performance ZPL s syntax is based on ModulaZ Wir83 rather than a more popular language like C or Fortran This decision reinforces the idea of starting from scratch by forcing C and Fortran users to confront the notion that certain features of those languages are not present in ZPL due to their interference with parallelism eg pointers scalar array indexing and common blocks It also reinforces the idea that programmers should consider their se quential algorithms afresh when implementing them in parallel by making it dif cult for existing C and Fortran codes to be tweaked slightly and run through the compiler 30 Listing 22 Sample Con guration Variable Declarations in ZPL config var n integer 100 if a sample problem size verbose boolean true if use to control output logn integer ngn 77 log of the problem size nsq integer nAZ if the problem size squared npi double pin if n times the constant pi A second reason for choosing ModulaZ was to support a language whose syntax is both readable and intuitive While it would be possible to create C and Fortran dialects of ZPL no such effort has been made at this point The primary challenge would be to ensure that the features of C and Fortran which have been deliberately omitted from ZPL would interact appropriately with its parallel concepts or simply outlaw them altogether As Chapter 4 will discuss ZPL is compiled by translating it to C For this reason C s in uence is occasionally seen in the language s syntax For example the names of ZPL s data types and its formatting of IO both strongly re ect C 22 Scalar ZPL Concepts ZPL s scalar concepts are largely unoriginal and uninteresting but form an important foun dation for the rest of the language so are described here quite brie y 221 Data types Constants and Variables To start with the basics ZPL supports standard data types type declarations and declara tions for constants and variables as in most languages It supports integers of varying sizes as well as oating point and complex values of varying precision ZPL supports homoge neous array types referred to as indexed arrays and heterogeneous record types For some sample type constant and variable declarations refer to Listing 21 31 Table 21 A Summary of ZPL s Scalar Operators Arithmetic Operators Relational Operators Assignment Operators addition equality standard subtraction inequality accumulative multiplication lt less than subtractive division gt greater than multiplicative modulus lt less thanequal divisive exponentiation gt greater thanequal amp conjunctive I disjunctive Logical Operators Bitwise Operators amp and band and I or bor or not bnot complement bxor xor 222 Con guration Variables ZPL s con guration variables are a somewhat more unique scalar concept Each con g uration variable represents a loadtime constant a value that can be de ned at the outset of a program s execution but which cannot be changed thereafter This allows users to de ne values that they may not want to constrain at compile time such as problem sizes verbosity levels or tolerance values The advantage of making such values con guration variables rather than traditional variables is that it allows the compiler to treat the variable as a constant of unknown value during analysis and optimization Con guration variables are de ned similarly to normal constants except that their ini tializing values are merely defaults that can be overridden on the pro gram s commandline Con guration variable initializers may be de ned using expressions composed of constants scalar procedures and other con guration variables Currently ZPL only supports con g uration variables of scalar types including records and indexed arrays Listing 22 shows some sample con guration variable declarations 32 Listing 23 Sample Uses of ZPL s Control Structures if age gt maxage then write1nquotAge too largelquot end for i l to tabsize do tablei 0 end repeat length 2 done length lt 100 until done while originx gt originy do leftshiftorigin end 223 Scalar Operators ZPL supports a standard set of scalar arithmetic logical relational bitwise and assignment operators See Table 21 for an overview 224 Control Structures ZPL supports standard control structures such as conditionals for loops while loops and repeatuntil loops See Listing 23 for some simple examples 225 Blank Array References To encourage arraybased thinking ZPL s indexed arrays support a shorthand notation to operate over their entire index range without a loop This is done by omitting the indexing expression for an array reference For example the assignment to table in Listing 23 could be written as follows using blank array references table 0 33 Listing 24 Sample ZPL Procedures prototype mycompx double y double integer procedure leftshiftvar pt coord begin ptx 10 end procedure mycompx double y double integer begin if x lt y then return 1 elsif x y then return 0 else return 1 end end This syntactic shortcut is designed to aid With the common case of performing purely ele mentWise operations on indexed arrays In many codes blank array references can elimi nate a number of trivial and uninteresting loops over an array s indices 226 Procedures ZPL s primary functional unit is the procedure Which can accept value or reference pa rameters and return a single value of arbitrary type Procedures strongly resemble their Modula 2 counteIparts and may be recursive ZPL also supports prototypes that allow a procedure s signature to be declared for use before the procedure is de ned Listing 24 contains some sample prototype and procedure de nitions 227 Interfacing with External Code Though ZPL s choice of Modula 2 as a base syntax limits the amount of code reuse that can take place Within the parallel portion of a ZPL program existing scalar code can be 34 Listing 25 An Example of Using extern in ZPL extern constant MPI double extern var errno integer extern type timezone opaque timeval record tvsec longint tvusec longint end extern prototype gettimeofdayvar tv timeval var t2 timezone integrated into a ZPL program if it can be called by and linked into a C program This is achieved using the extern keyword which can be applied to types constants valiables and procedures Exteinal types may be partially speci ed or omitted completely using the opaque keyword which allows the programmer to store vaIiables of exte1nal types and pass them around but not to operate on them directly or modify them See Listing 25 for some sample exte1nal declarations 23 Regions and Parallel Arrays As mentioned in the introduction ZPL s fundamental concept is that of the region A region is simply an index set a set of indices in a coordinate space of arbitrary dimensions ZPL s regions are regular and rectilinear in nature In this sense they are much like traditional alrays with no associated data This similality is emphasized syntactically simple regions are de ned using syntax that resembles a traditional array s bounds For example the following shows a simple twodimensional region and the set of indices that it desc1ibes 1 m l n 1l121n21m71 Regions may contain singleton dimensions which descIibe only a single index value These are de ned by replacing the degenerate range with a single index eg l l 11 rather thanll ln 35 TopRow BigR a b Figure 21 Using Regions and Arrays a An illustration of the three regions declared in Section 23 R TopRow and BigR b Three parallel integer arrays of size BigR A B and C c An example of how a statement s enclosing region scope restricts the range of its operators Only indices within R interior to the arrays are referenced in this statement ZPL programmers can name regions For example the following declarations name the simple regions above R and TopRow They also create a third region BigR which extends both dimensions of R by a single index in each direction region R 1m ln TopRow 1 ln BigR 0ml 0nl See Figure 21a for an illustration of these regions The dimension bounds of named regions must be expressions composed of constants or con guration variables The rank or dimensionality of a region refers to the number of dimensions that it contains For example all of the regions above have rank 2 Regions have two primary purposes The rst is to declare parallel arrays This is done by specifying a region and an element type as a variable s type declaration Such declarations result in the allocation of an array with an element of the speci ed type for each index described by the region For example the following declaration creates three m 2 x 71 2 arrays of integers named A B and C Figure 21b var A B C BigR integer The rank of a parallel array is de ned to be the rank of its region For example all of 36 the parallel arrays above have a rank of 2 Parallel arrays may not be nested That is the element type of a parallel array may not contain a parallel array itself Parallel arrays are the primary data structure in ZPL and will generally be referred to as arrays within this dissertation The traditional scalar arrays described in Section 22 will always be referred to as indexed arrays to avoid confusion Note that this chapter does not explain why parallel arrays are so named but merely uses the term as a label The following chapter provides the justi cation for the name though discerning readers will possibly gure it out on their own The second purpose of regions is to provide indices for array references within a ZPL statement Unlike indexed arrays ZPL s parallel array elements cannot be referenced using traditional indexing mechanisms Instead regions are required to specify the indices for an array reference As an example consider the following statement RABC This statement says to add arrays B and C elementwise assigning their resulting sums to the corresponding values in A The statement is pre xed by the region scope R which speci es that the addition and assignment operations should be performed for all indices described by R namely the interior m x n elements Thus this statement describes the matrix addition computation from Chapter 1 Region scopes serve as a form of universal quanti cation For example the statement above is equivalent to Am Bm CiJ7Vi7j E R See Figure 21c for an illustration Using region scopes any of ZPL s standard scalar operators can be applied to arrays in an elementwise manner The chief constraint is that arrays cannot be read or written at indices that were not in their de ning region since no data is allocated for those indices Region de nitions may also be speci ed explicitly within a region scope These are called dynamic regions since their bounds are typically based on expressions whose values 37 are not known until runtime For example the following code fragment adds row i of arrays B and C where i may be computed during the program s execution i ln A B C Note that technically this region scope should contain another set of square brackets to be consistent with the region speci cation syntax described previously However ZPL allows programmers to drop the redundant square brackets for readability Subsequent sections will describe regions in more depth but for now this example based overview of the ZPL language continues 2 4 Array Opera tors If ZPL could only express elementwise computations on its arrays it would not be a very useful language More general computations are supported by using array operators to modify a region scope s indices for a given array variable or expression This section provides a brief introduction to the most important array operators the operator oods reductions and remaps 241 The Operator The operator G is ZPL s simplest array operator providing ameans for translating array references using constant offset vectors known as directions Directions are speci ed and named in ZPL as follows direction north l 0 south l 0 east 0 1 west 0 1 These declarations create four vectors one for each of the cardinal directions Figure 22a The operator is applied to an array reference in a post x manner taking a direction as its second operand Applying the operator to an array causes the indices of the en closing region scope to be translated by the direction vector as they are applied to the array 38 rioi tli so th v u Ea st est A B C A B R A Bwest ceeast BigR A Beast a b c Figure 22 The Operator a An illustration of the directions declared in Section 241 b A use of the operator to add shifted references of B and C storing the result in region R of A c A diagram illustrating the application of the wrap operator to assign a cyclicallyshifted version of B to A reference For example the expression Asouth would increment all indices in the region by 1 in the rst dimension As a slightly more interesting example consider the following statement R A Bwest Ceast This replaces each interior element of A with the element just to its left in B and just to its right in C More formally Am BM I Cij17Visj E R Refer to Figure 22b for an illustration Note that the legality of this code hinges on the fact that B and C are declared using region BigR causing the references to access declared values Had they been declared using region R the references would refer to values outside of their declared boundaries which would be illegal Expressions using the operator may be used on either side of an assignment but may not be passed by reference to a procedure This dissertation will primarily concentrate on reading references and not writing them 39 D TTRRQB A B A B biggest B R A gtgtTopRow B Topr A ltltR B R biggest maxltlt B a b c Figure 23 The Flood and Reduction Operators a An illustration of the ood operator causing the top row of B within R to be replicated across all rows of A within R b An ap plication of the sum reduction operator which totals the values of B within each column of R and assigns the sum to the corresponding value of A within TopRow c A full reduction which nds the biggest value of B within R and assigns the result to the scalar biggest The Wrap Operator One variation on the operator is the wrap operator A which causes accesses to the array that fall outside of its declared boundaries to wrap around and access the opposite side Thus a statement like BigR A BAeast would cyclically shift B one position to the left assigning it to A 242 The Flood Operator The ood operator gtgt provides a means for replicating a slice of an array s values either explicitly or implicitly Symbolically it can be viewed as taking a small piece of the array expression to its right and expanding it to make it bigger when used to the left The ood operator is a pre x operator which is followed by a region to indicate the slice of the array to be replicated This region is referred to as the source region while the enclosing region of matching rank is called the destination region As an example consider the following assignment R A gtgtTopRow B 40 This statement assigns the rst row of B restricted to columns 1 through 11 to rows 1 through In of A See Figure 23a for an illustration In this statement the ood operator s role is to replicate the values of B described by the source region TopRow or l l n such that they conform to the destination region R This action can be interpreted in either an active or a passive way Actively the ood operator is taking the row of values described by TopRow and using it to create an array of size R for assignment to A Passively the operator can be thought of as causing the rst dimension of indices in R to be ignored when accessing B replacing them by the indeX 1 Formally this statement can be interpreted as follows Am BImW iaj E R The main legality issues for the ood operator concern the conformability of the source and destination regions First they must be the same rank In addition each dimension of the source region must either be a singleton as in this example s rst dimension or it must be identical to the destination region as in the second dimension The former case results in replication of the values described by the singleton indeX The second results in a traditional array reference 243 The Reduction Operator The reduction Operator ltlt is the dual of the ood operator It compresses an array s values down to form a smaller array As with the ood operator it uses pre X notation and expects a source region to describe the values that should be reduced The resulting size of the expression is described by the enclosing region scope of matching rank Because multiple values are being collapsed into a single item some sort of reduction operation must also be speci ed to indicate how this collapsing should take place These operations are typically commutative and associative and they precede the reduction oper ator syntactically Built in reduction operations include addition multiplication min and 41 max as well as logical and bitwise operators Users may also create custom reduction operations using scalar ZPL procedures As a simple example consider the following statement which uses a plus reduction TopRow A ltltR B This statement computes the sum of each column of B for the rows and columns speci ed by R storing each result in the rst row of the corresponding column of A See Figure 23b for an illustration Again this operator has both an active and a passive interpretation Actively it compresses B from rows 1 through In down to a single row the rst Passively it expands the reference to row 1 of B so that it refers to rows 1 through In as combined using addition Formally AM 4 ZBMNQJ E TopRowVk l E R such that l j The legality rules for reductions are similar to those for the ood operator The source and destination regions must have the same rank In addition each dimension of the source and destination regions must either be the same causing the dimension to be read nor mally or the destination dimension must contain a singleton causing the values in that dimension to be reduced Full Reductions One special case for reductions collapses an entire array to a single scalar value This is known as a full or complete reduction in contrast with the partial reductions described previously Full reductions require only a single covering region since the scalar reference requires no indices A simple example is shown here var biggestzinteger R biggest maxltlt B This statement nds the maximum value of B within the indices described by R and assigns it to the scalar value bigge st See Figure 23c for an illustration Note that full reductions 42 44 A A B R A MBC Figure 24 The Remap Operator The B and C arrays serve as the map arrays for the remap of A in this assignment thus they must contain values from 0 to 5 Within region R displayed here using varying levels of grey As a speci c example consider the assignment to row 2 column 3 outlined With a dotted line The corresponding values in B and C are both 0 indicating that element 00 of A should be assigned to this location compute the same value as a partial reduction over all dimensions but they store the result in a scalar rather than an array element For example the full reduction above is similar to the following partial reduction k1 k2 A maxltltR B 244 The Remap Operator The remap operator it serves as a catchall operator supporting parallel random array accesses Unlike traditional array indexing the remap operator requires an entire array of indices per dimension rather than a single index The following is a simple example R A ABC This use of the remap operator randomly accesses the source array A as speci ed by the map arrays B and C In this statement the result is assigned back into A The B array provides the indices in the rst dimension for each access to A While C provides the indices for the second dimension This is actually easiest to see in the formal version Am H An WM 6 R 45317 This statement is illustrated in Figure 24 43 The main legality constraint for the remap operator is that the number of map arrays must be equal to the rank of the source array so that each of its dimensions has an index In addition the map arrays must not refer to indices that are outside of the source array s de n ing region since that would refer to values with no allocated storage As Section 2152 will demonstrate remap operators can be used to operate on arrays of different ranks and are in fact ZPL s only mechanism for doing so Remap operators may be applied to expressions on either side of an assignment though this dissertation focuses on uses on the right hand side 245 Other Array Operators ZPL has a few other array operators that will not be described in this thesis most notably the scan operator for performing parallel pre x operations and the wrap and re ect oper ators for supporting boundary conditions These are omitted in this discussion for brevity and because they do not pose signi cant challenges or intrigues in ZPL s design and im plementation beyond the array operators described here For more information on these operators please refer to the literature Sny99 25 Formal Region De nition Given the intuitive de nitions of array operators we now reconsider regions more for mally Each dimension of a region can be represented by a 4 tuple sequence descriptor r l h 311 The variables l and h represent the low and high bounds of the sequence The 5 value represents the sequence s stride and a encodes its alignment A sequence descriptor 39r is interpreted as de ning a set of integers 87 as follows Sr 11315 handfm a d sl 21 For example the descriptor 6 2 0 4 describes the set of even integers between one and six inclusive 110 51 A r dimensional region 7quot is de ned as a list of r sequence descriptors 1 d 1 i where 1239 represents the indices of the region s 2 11 dimension f 7 1 2r 2 211 I The index set Q de ned by a region 7quot is simply the cross product of the sets speci ed by each of its sequence descriptors i rS X1 X1gt 1 X1i For example the indeX set of the 2 dimensional region 622202 26222026 I would be de ned as follows 1 622202 26222026 I X622202 X62 226 402 3 Q g g Recall the simple region declarations described in Section 23 which take the following general form R h h0 lll hum Such declarations correspond to the following formal region de nition quot h2 2620 2Wzd2620 2 2k2 z 2620 I These sequence descriptors specify that each dimension contains all indices from h39 to 5239 due to the triVial values used for the stride and alignment Note that while ZPL could allow programmers to eXpress regions in a sequence descriptor format the syntaX used here allows the common case to be described in a clearer more intuitive manner 45 26 Region Opera tors In addition to the simple region declarations of Section 23 ZPL provides a set of region Operators that allow new regions to be created relative to existing ones These are provided to give the user a more descriptive way of creating regions than specifying them by hand They also provide the only means of changing a region s stride or alignment Region operators are de ned using a set of prepositional Operators 0f in at and by that are de ned for sequence descriptors Each of these operators modi es a sequence descriptor using an integer value The operators are de ned as follows 4 MK mm 0lt oflhahah gt Helm if gt its lt 0h hahaigtiflt lh1gtl ltlt bah gt ifJlt in Mullah Jain1 if gt 12ka hahgtif0lt lihaik1t ll hB hah 2 Ughi Imhls gt Us a gtgt limz i I zhghahby Walla ifO lt l1L1halz1lrxzmhllgt 193 j azgtgt lQImzgt if r To summarize the of and in operators modify the sequence bounds relative to the eX isting bounds leaving the stride and alignment unchanged The of operator describes a range adjacent to the original range whereas in describes a range interior to the previous range The at operator translates the sequence bounds and alignment of a sequence The by operator is used to scale the stride of the sequence and possibly shift the alignment leaving the bounds unchanged 46 Listing 26 Applications of Region Operators direction north l 0 east2 0 2 n2e3 2 3 step2 2 2 region R 1m ln NorthernBoundary north of R EasternInterior east2 in R ShiftedN2E3 R at n2e3 OddCols R by east2 ZPL de nes a region operator for each prepositional operator Each region operator takes a base region and an offset vector in the form of a direction The operator is evaluated by distributing each component of the direction to the region s corresponding sequence descriptor and applying the prepositional operator For example the at operator would be distributed as follows at 72117 Mina22lllzdlzalzil at ti721 7 th72a727atlyZZHlezalziiatli lhE 17257E 172a727 gt2 1E12SL12021gt As a more concrete example the code in Listing 26 shows some direction declarations followed by region declarations that use the region operators These regions as well as sev eral others are illustrated relative to the base region R in Figure 25 In each case the role of the direction in de ning the new region is indicated Though the formulas de ning the prepositional operators seem fairly complex at rst glance they de ne regions which intu itively match the English de nition of the preposition making the mathematical de nitions simply a formality Intuitively the of operator de nes regions that are adjacent to the base region while in de nes regions that are just within the base region The at operator shifts the base region while by strides the base region In each case the offset vector provides the notion of the direction and magnitude of the operation 47 Region Operator baseregion direction of in at by l l l l l l l t l l R north NorthernBoundary R eastz EasternInterior R n2e3 ShiftedNZES U D lj U D j Rby stepz 3 eastz 1 ngl U Ll L D39D39E D e E D D W25 DUE D Dina D D D l D D l3 Ll 1 Figure 25 The Region Operators This diagram illustrates the region operators applied using the declarations of Listing 26 Each column of pictures represents a single region operator of in at and by while each row illustrates a different base regiondirec tion pair The indices of the regions created by applying the region operator to the base region and direction are shown in grey Arrows gives a sense of the directions roles in the de nition Those regions that were given names in Listing 26 are labeled 48 l l l l FloodRow R F gtgt1 0n1 B a d Figure 26 Flood Dimensions and Flood Arrays a An illustration of a region Whose rst dimension is ooded It represents a single row of values that are conformable to any row b An array F declared using region FloodRow c An assignment from F to A Within region R d An assignment from a row of B to F using the ood operator Although there are certainly other region operators that could be useful to a prograrn mer those listed here were selected as a basis set since they support common array refer ences and are closed over our region notation This chapter s discussion section considers this topic further 2 7 Flood Dimensions 27 Introduction to Flood Dimensions In addition to traditional dimensions eg l h and singleton dimensions eg i re gions can have a third type of dimension the ood dimension Flood dimensions are syn tactically represented using an asterisk and they represent a single index that conforms to any other index in the dimension As an example consider the following code fragment region FloodRow 0 nl var F FloodRow integer R A F This code begins by declaring a region Which is oodable in the rst dimension and contains columns 1 through 11 in the second Figure 26a It then uses the region to declare 49 a row of integers named F Figure 26b The assignment to A reads from the appropriate column of F for all rows in R That is the single row of values from F is explicitly replicated in rows 1 through In of A See Figure 26c for an illustration Note that FloodRow differs from a row declared using a singleton dimension like TopRow In particular if F was declared using TopRow in the example above the as signment would attempt to read from F in rows other than the rst This constitutes an error since F did not allocate storage for those rows The use of the ood dimension in FloodRow allows it to conform to all indices making the assignment legal 272 Relationship with the Flood Operator The code above illustrates a similarity between ood dimensions and the ood operator both are used to represent replicated values In fact the ood operator can be used to assign to the values of a ood array Consider the following example FloodRow F gtgtl 0nl B In this code fragment row 1 of B is replicated by the ood operator to conform to the ood dimension of FloodRow Figure 26d Similar assignments without the ood operator would be illegal FloodRow F B 1 0nl F B The rst assignment is illegal because B is de ned for rows 1 through In making it am biguous which row of B should be stored in F Even if B was declared to be a single row eg l O n1 this assignment would remain illegal since the righthand side of the assignment needs to conform to all row indices not simply a particular one For a standard array like B this can only be achieved using the ood operator The second as signment is illegal because it attempts to assign to a single row of F rather than assigning all of its rows using a ood dimension 50 273 Formal De nition As described above an array with a ood dimension can intuitively be thought of as having a single set of values in that dimension which conform to all indices Equivalently the ood dimension can be thought of as representing an in nite number of indices all of which are constrained to contain the same values Flood dimensions are represented using a special sequence descriptor oo 0 0 This states that the dimension covers all indices 005 The stride and alignment of 0 re ects the fact that there is a single implementing set of values and therefore no way to step from one index to the next The ood sequence descriptor cannot be interpreted like those of traditional dimensions due to the nonsensical nature of working in a modulo 0 system Rather it serves as a placeholder that readily distinguishes ood dimensions from traditional ones By convention lt 00gt 0gt 00is de ned to be 0gtogt 0 0gt The index de ning the single set of values will be referred to as F For example the element in the fourth column of F would be referred to as m 43 Only the identity forms 6439 of the prepositional operators for sequence descriptors are de ned for ood dimension sequence descriptors This matches the intuitive sense that a dimension which represents an in nite number of indices cannot have adjacent or interior indices cannot be shifted and cannot be strided Thus only direction components of 0 may be applied to a ood dimension using ZPL s region operators The legality of interactions between ood dimensions traditional dimensions and array operators will be summarized in Section 212 which contains a more formal treatment of these subjects 28 Index Constants ZPL provides a set of built in array constants referred to collectively as the index con stants These are a group of built in virtual parallel arrays named Indexl IndexZ Index3 etc When read each element of I ndexi evaluates to its index in the Ith dimen 51 000000 012345 000000 012345 111111 012345 111111 012345 222222 012345 222222 012345 1n 333333 012345 333333 012345 444444 012345 444444 012345 555555 012345 555555 012345 Indexl IndexZ Indexl IndexZ R A Index1 1n1ndex2 11 b Figure 27 The Index Constants a Pictures depicting Indexl and Index2 BigR and R are indicated by the outlines b A diagram showing the unique numbering of elements in Rusing Indexl and Index2 sion Thus Indexl s elements evaluate to their row indices Indexz s elements to their column indices etc More formally d nhexj lnz 2412 i Figure 27a gives a pictorial depiction of Indexl and Index2 within regions R and BigR As a sample use the following code fragment numbers the values of A within R from 1 to m 141 in rowmajor order R A Indexl ln Indexz Using the formal de nition of index constants this assignment is interpreted as follows 74 1 Mpg 114d2 dfxliz I 46de 1 See Figure 27 for an illustration As a second example the following code uses the remap operator to assign the transpose of B to A within region R assuming that m n if it did not the map arrays might refer to values outside of B s declared size R A BIndex2 Indexl 52 012245 000000 012245 111111 012245 222222 712345 33333 F391l2345 TAMAAA 01 2345 sslssss A B Indexz Indexl R A BHIndeszndexi Figure 28 An Array Transpose This diagram illustrates how the Index constants can be used to transpose arrays When used as the map arrays in a remap operation By using col umn indices as the map array for B s rows and row indices for its columns the elements of B are transposed during their assignment to A The dotted lines indicate how element 331 A ofA 1s ass1gned element 3 1 40f B Using the formal de nition of index constants this assignment is interpreted as follows X 239 i 031 Meaghymm 1 I 7 See Figure 28 for an illustration of this assignment Each index constant can be thought of as being oodable in every dimension other than the 1 since its size is arbitrarily large and its values only differ in dimension However note that Indexi conforms to arrays of rank f eta making it more exible A than a similar userde ned ood array 29 Masks As de ned in Section 23 regions must be rectilinear This results in index sets that are very rectangular and regular though possibly strided In many applications programmers may want to refer to a more arbitrary set of indices than those permitted by regions For this reason regions may be modi ed by boolean masks to select a subset of indices from the region The mask must have the same rank as the region that it is modifying and must be allocated for all indices described by the region 53 m p u N H o H m p u N H o m m N m p u N H o m p u N H o H m p u N H o m p u N H o o o o o o o H H H H H H H N N N N N N u u u u u u p p p p p p mmmmmm oquot N H O Mask 9 n exz A B R with Mask A B a b Figure 29 An Example of Using Masks a The mask is assigned true for all locations in R Where the sum of the row and column indices is even 13 The mask is used to restrict the indices of R When assigning from B to A Here is a simple example that uses masks var Mask R boolean R Mask Indexl Index2 2 0 R with Mask A B The rst assignment initializes the values of Mask to be true Wherever the sum of the row and column indices is even Figure 29a The second assignment is pre xed by a region scope in Which R is modi ed by Mask This causes the assignment of B to A to take place only at those indices Where the sum of the row and column indices is even More formally xx M X iJ B i le Rsuchthatlpdk j true See Figure 29 for an illustration Masks can also be applied using the without keyword Which reverses the sense of the mask computing only at indices Where the mask s value is false Masks Will not be covered With any more depth or formality in this chapter as they are fairly intuitive and not a central concept in this dissertation In general any region scope can be masked and the mask has the effect of ltering the region s indices as they are applied to array expressions Within the region s scope 54 Listing 27 A Demonstration of Region Scoping 1 R begin 2 A Bwest Ceast 3 BigR A BAeast A A gtgtTopRow B 5 TopRow A ltltR B 6 biggest maxltlt B 7 k1 k2 A maxltltR B a A AB C 9 end 2 10 Region Scoping 2101 Region Scoping Overview Up to this point each statement that has referred to a parallel array has been pre xed by a region scope to provide the statement s base indices In general region scopes can be applied to an entire block of statements using compound statements like control ow or a simple beginkklend block Moreover new region scopes can be applied to individual statements within the compound statement eclipsing the enclosing scope for that statement but no others As an example all of the array statements in Section 24 could be written in a single block of statements though an admittedly nonsensical one as shown in Listing 27 The outermost region scope R serves as the enclosing region for lines 24 6 and 8 Lines 3 5 and 7 are enclosed by an oveniding region scope Floods and partial reductions as in lines 4 5 and 7 open additional region scopes that enclose their array arguments B for each statement in this example Region scopes should be thought of as being passive rather than active objects They do not cause things to occur but merely supply indices if needed for the array references that they enclose To this end statements may be enclosed by multiple region scopes of different ranks each of which can provide indices for array references of matching rank 55 Listing 28 An Example of Multiple Enclosing Region Scopes region RlD 1m R2D 1n lp var x integer Y RlD integer Z R2D integer RlD R2D begin x 1 Y 2 Z 3 end As an example consider Listing 28 In this fragment the assignment to X is a scalar and therefore does not require the enclosing region scopes at all The assignment to Y refers to a 1dimensional array and therefore makes use of the enclosing 1dimensional region scope RlD Likewise the assignment to Z is 2dimensional and uses R2D as its enclosing region scope The enclosing region scope that controls an expression s array references is referred to as its covering region 2102 Dynamic Region Scoping Region scoping occurs not only Within static blocks of code but also across procedure calls As an example consider Listing 29 The addmat procedure takes three array variables as arguments adding two of them and assigning to the third Note that since no region scope is speci ed Within the procedure each procedure call s enclosing 2D region scope Will be used during execution Thus the rst call performs the computation for all indices Within R the second performs it for the top row of R and the third performs it for the 1 column of R 56 Listing 29 A Demonstration of Dynamic Region Scoping procedure addmatvar X Y Z BigR integer begin X Y Z end R addmatABC TopRow addmatABC lmk addmatABC 2103 Region Inheritance Due to the scoped nature of regions it is often useful to inherit aspects of the enclosing region scope when opening a new region scope ZPL provides two mechanisms for inheri tance the blank dimension and the doubleiquole reference quot Each is described here Blank Dimensions When opening a dynamic region one or more dimensions may be inherited from the en closing region scope by omitting their de nitions As an example consider that line 4 of Listing 27 can be rewritten using a dynamic region as follows A gtgtl ln B However since this statement is enclosed by region R which also spans columns lkkln the second dimension can be inherited using a blank dimension as follows A gtgtl B Since oods require that all nonreplicated dimensions match this syntax can save some re dundant speci cation It is especially useful when the source region s indices are computed dynamically The same technique can be used to rewrite the partial reduction of line 5 in Listing 27 as follows 1 A ltlt 1m B 57 Listing 210 Region Inheritance Using DoubleQuote References R begin north of quot A 0 if refers to R south of quot A 0 if refers to R east of quot A 0 if refers to R west of quot A 0 if refers to R end Listing 211 Mask Inheritance Using a DoubleQuote Reference R with Mask begin A 0 k without quot A l 77 refers to Mask end Blank dimensions can inherit from a procedure s dynamically enclosing scope In addi tion they can be used to leave the size of formal array parameters unspeci ed For example the addmat procedure of Listing 29 could be written in a more general manner using blank dimensions as follows procedure addmatvar X Y Z integer This speci es that addmat takes three 2dimensional parallel arrays as its parameters but does not specify their size or indices DoubleeQuole References Doublequote references are used Within region scopes to refer to the enclosing region as a Whole This is especially useful With region operators For example the code fragment in Listing 210 zeroes out the four boundaries of variable A Figure 210a The rank of the inherited region is inferred from the direction supplied to the region operator For example in this listing since north is 2dimensional the enclosing 2dimensional region R is inherited As With blank dimensions the doublequote can be used to refer to a procedure s dynamically enclosing region scope 58 11 b Figure 210 Region Inheritance Examples In both diagrams White is used to represent 0 black to represent 1 and grey to indicate values that are untouched a The result of the assignments using doublequote references in Listing 210 b The result of the statements in Listing 211 using the same checkerboard mask as Figure 29 The doublequote can also be used to inherit a mask from the enclosing region scope For example in Listing 211 the inner region scope restricts the enclosing scope R down to its r h row It then inherits the mask from the enclosing scope determining its rank using that of the dynamic region Thus this code rst zeroes out A for all indices in R for Which Mask is true It then assigns the value 1 to all elements for Which it is false in the r h row of R See Figure 210 for an illustration 2 11 Scalar Promotion Scalar promotion is the idea of permitting a concept that is scalar in nature to interact naturally With a parallel array concept Scalar promotion is an intrinsic concept in ZPL For example most of the sample codes in this chapter have made use of scalar promotion by using the scalar assignment operator to assign one array expression to another Similarly the codes have applied scalar addition subtraction multiplication and modulus operators to array expressions With the understanding that the operator would be applied to corresponding elements of the arrays In these instances scalar promotion causes the operator to be applied to the array expressions one scalar at a time for all indices in the enclosing region The use is so intuitive that it is virtually transparent 59 Listing 212 An Example of Scalar Procedure Promotion var W V R double Res R integer R Res mycompw V R Res mycompw 0 The rest of this section explores the concept of promotion and its uses in ZPL beginning with a discussion of scalar conforrnability 2111 Conformability of Scalar Promotion When a scalar operator is promoted and applied to two array arguments ZPL requires that the expressions be of the same rank This means for example that scalar addition cannot be used to add a onedimensional array to a twodimensional array although a similar effect can be achieved by storing the onedimensional values in a twodimensional array with a ooded dimension Furthermore the result of any promoted scalar operator is an array expression with the same rank as its operands These are the requirements for array conforrnability in ZPL Just as scalar operators can be promoted so can scalar values As an example in Listing 28 the scalar constants 2 and 3 were assigned to parallel arrays Y and Z In these assignments the scalar is promoted much like a scalar operator The scalar value is treated as an array of appropriate rank that stores the scalar value in every location Scalar variables are much like arrays that are ooded in every dimension they are conformable with arbitrary indices in any dimension and they hold the same value at all locations However scalars are strictly more powerful than ood arrays in that they are conformable with arrays of arbitrary rank That is scalar values may interact with arrays of rank 1 2 etc whereas any userde ned ood array will have a xed rank 60 Listing 213 Using Shattered Control Flow to Compute an Array s Absolute Value R if A lt 0 then B A else B A end 2112 Procedure Promotion Just as scalar operators can be promoted using array operands so can scalar procedures be promoted using array actual parameters As an example the scalar procedure mycomp in Listing 24 can be applied to array arguments as shown in Listing 212 In the rst call arrays W and V are passed to mycomp an element at a time for all indices in R with the return value being assigned to the corresponding value of Res In the second call only the rst argument is promoted forcing the second argument a scalar to be promoted to act as a 2D array making the parameters conformable A promoted scalar procedure s actual parameters must have the same rank For ex ample it would be illegal to call mycomp with array arguments that were 2D and 3D respectively As expected the return value of a promoted scalar procedure will be promoted to the rank of its array parameters Note that procedure promotion only applies to scalar procedures That is procedures which refer to regions parallel arrays or ZPL s array operators may not be promoted In addition regions that use 10 modify global variables or call other parallel procedures are considered to be parallel to ensure deterministic execution 2113 Shattered Control Flow Just as scalar operators and functions can be promoted so can control structures condi tionals loops that are traditionally scalar in nature For example consider the conditional in Listing 213 which branches based on an array expression rather than a scalar value 61 Listing 214 Using Promoted Procedures Instead of Shards procedure absx integer integer begin if x lt 0 then return x else return x end end R B absA This conditional is evaluated for each element of A described by region R Array references within the body of the conditional refer to elements with the same indices at which the conditional was evaluated Thus the conditional in this example will assign each element of B the absolute value of its corresponding element in A for all indices in R This promotion of control structures is referred to as shattered control ow because the single thread of control which is implicit in traditional ZPL statements may now take different actions on an elementbyelement basis In effect it is shattered giving each index its own logical thread of control At the end of the shattered control ow statement or shard for short the conceptual threads are joined and a single thread of execution resumes It should be noted that shards are similar to inlining a promoted scalar function For example the code in Listing 213 could be rewritten as shown in Listing 214 For this reason the bodies of shattered control ow statements have many restrictions similar to those for promoted scalar procedures In particular they may not contain regions or parallel array references whose rank differs from that of the controlling expression Most array operators are also disallowed in shattered control ow expressions though limited uses of the operator are allowed corresponding to passing references to a procedure by value 62 Table 22 Formal De nition of Writing an Array Within a Region Scope A aaa Where is de nedoverI I is normal or singleton I is ooded Legal if Q Ille al since Cand Ais normal or singleton Ag i f g Wntes Ni EC 4 s values must be identical Ais ooded Illegal since lng Legal Writes a 212 Array Operators More Formally Now that regions have been formally de ned and their uses have been described more completely the array operators can be de ned more formally This section gives a more precise de nition of the array operators and also for the legality of reading and writing arrays Within an enclosing region For simplicity these de nitions are given for the single dimensional case Multidimensional cases simply extend these rules by applying them to each dimension independently in the natural manner We begin by de ning simple array writes and reads 2121 Array Writes Arrays can be modi ed by being on the lefthand side of an assignment operator or by being passed by reference to a promoted scalar procedure To test the legality of a write to an array each dimension of its de ning region r must be compared to that of the enclosing region scope of matching rank r Table 22 summarizes the different cases that are possible classifying them based on Whether the dimensions of the enclosing region and the array s de ning region are singleton ooded or a normal range of indices In the case that neither dimension is ooded the write is legal so long as the array is declared over the indices referenced by the region When both dimensions are ooded the 63 Table 23 Formal De nition of Reading an A1ray Within a Region Scope A4A A4 1AA4 Where a is de ned over I I is normal or singleton I is ooded Legal if Q 6ng Legal is nonnal or singleton E Reads a W E g Reads 1 Vi E g is ooded Illegal since gb Legal Reads 3 WIite is legal and the single replicated value Will be modi ed As descIibed in Section 27 the cases in Which one dimension is ooded but the other is not are illegal due to the fact that one index set represents an in nite index range While the other is nite 2122 Array Reads The legality of an array read is de ned in Table 23 In most cases airay reads are identical in legality to airay writes The one exception is that it is legal to read an array s ood di mension Within a non ooded region dimension In this case the programmer is specifying that a nite subset of the in nite index space be read Which makes sense All of the other cases match their airay WIite counteIparts 2123 The Operator To be legal the airay and direction supplied to an operator must match in dimensionality This rank is also used to determine the enclosing region r As With array reads and writes the dimensions of r and the array s de ning region a must be considered Table 24 de nes the legality of each case For each legal reference the transformation from the airay s data space to the region s index space is also given indicating Which array values are read for each index in the enclosing region 64 Table 24 Formal de nition of the Operator 1 A whereAisde nedovera a is normal or singleton a is ooded Legal if Ir at 6 g 1a Legal r is normal or singleton Returns A at39iVi E Ir Returns A at i Vi E 1139 1 is ooded Illegal since Ir D 11 Legal Returns A at The cases in which one or both of the regions have a ood dimension are identical to a traditional array read This implies that applying the operator to a ood dimension has no effect as one would expect When neither dimension is ooded the legality condition is similar to that of an array read if the array is declared for the region s indices shifted by the offset the reference is legal The array reference evaluates to the values located at those shifted indices 2124 The Flood Operator Evaluating a ood operator differs from previous sections in that two regions are involved the source s and destination d regions of the ood The rst legality constraint is that the array expression being ooded must match the source region in dimensionality This rank is also used to determine the enclosing region so these will match as well In addition it must be legal to read the array expression using the source region as its covering region If these conditions are met corresponding dimensions of the source and destination region are compared with the possible outcomes summarized in Table 25 Floods are typically used to replicate a single value across a range of indices making the cases where s is a singleton and d is normal or ooded the interesting ones These uses of the ood operator cause the value at the index indicated by the source region to be referenced for 65 Table 25 Formal De nition of the Flood Operator d gtgts A s is normal 5 is singleton s is ooded Legal if 5 1d Legal Returns Ak at Legal dis normal Returns A at i Where Is Returns A4 at i W EId ViEId ViEId Legal Returns Ah at 1 Legal Illegal since dis singleton Where 15 Returns A at i WdJI lt WSJI d Where Id Illegal since Legal Returns Ah at Legal dis ooded Id gt IS gt 1 Where 15 Returns All at 4 all indices in the destination region The case Where d is also a singleton is considered a degenerate case the value is replicated over that single index When 5 is ooded or s and d are both normal and equal the reference is treated as a traditional array read When 5 is normal but d is not replication is nonsensical so these cases are illegal 2125 The Reduce Operator The reduce operator also utilizes a source and destination region Once again the argument expression and the source region must match in rank and it must be legal to read the argument in the context of the source region Table 26 summarizes the different cases for reduction operators The cases are essen tially the dual of the ood operator as one would expect For simplicity the table describes a sum reduction though other operators may be substituted by replacing the summations in the de nitions The interesting cases reduce a range of values to a single value and these occur When 5 is normal and d is either a singleton or ood dimension The degenerate case 66 Table 26 Formal De nition of the plus Reduce Operator d ltlts A s is normal 5 is singleton s is ooded Legal if 15 1d Legal Illegal since 1sl 1 d is normal Returns A at Returns A at i and l1dl gt 1 Vi61d Vi61d Legal Returns Legal Returns Aj at 1 Legal dis singleton ZjE1sAj at Where 15 and Returns 14 at i Where 1d 1d Where 1d Legal RCtUIHS Legal Returns Aj at Legal 1 is ooded 21515 Aj at Where 15 Returns A at occurs When s is a singleton causing the reduction to be trivial If s is ooded or s and d are normal and equal the dimension is treated as a traditional array read The only case that is illegal is trying to reduce a single value down to a range of values Which is nonsensical Full reductions are less complicated than partial reductions Legality is determined by Whether the array can be legally read Within the covering region of matching rank If it can all values described by that region are combined by the given operation and returned as a scalar 2126 The Remap Operator The covering region for a remap expression is determined not by the rank of the source array but by that of the map arrays being applied to it all of Which must have the same rank The number of map arrays must equal the rank of the source array so that they provide an index for each of its dimensions In addition it must be legal to read each map array Within the context of the covering region 67 Table 27 Formal De nition of the Remap Operator 1 AMWhereAisde nedovera a is normal or singleton a is ooded ris normal or Legal if M E Ia Vi E Ir Legal singleton Returns AM at Vi E Ir Returns A at 139 Vi 6 11 Legal if Mr E 1a Legal r is ooded Returns A M at Returns A at The singledimensional case is de ned by Table 27 If neither the covering region 1 nor the source array s de ning region a are ooded the reference is legal as long as the array is declared for the indices described by the map array The values corresponding to the map indices Will be returned by the reference If r is ooded the map array must be ooded as well in order to be read The reference Will therefore be legal if the source array is de ned for the index stored in the map array s unique location and the value corresponding to that index Will be returned If a is ooded the value of the map array is inconsequential Regardless of its value the single de ning value of the source array Will be returned This case corresponds to a traditional array read The multidimensional case is handled using the obvious extension the legality of each dimension is tested independently and the indices for each dimension are determined by reading each map array in turn For each index in the covering region the single value in the source array de ned by the map arrays indices is referenced 213 Files and InputOutput Console and le IO have not been a primary focus of research in ZPL nor Will they serve a large role in this dissertation but they deserve the very briefest mention ZPL supports the 68 ability to open les for reading and writing and also supports the standard console input output and error streams zin zout and zerr respectively ZPL supports read write and write1n statements that can be used to read or write expressions to one of these streams or to a le Expressions can be formatted using control strings like those accepted by CS printf and scanf routines Binary 10 is supported using the bread and bwrite statements Array expressions are read or written for all indices in the enclosing region scope of the same rank in rowmajor order with some minimal formatting in the case of text output 214 ZPL Summary This chapter s description of ZPL concludes with a brief recap of its contents To summa rize ZPL contains traditional scalar language constructs using a Modula based syntax In addition ZPL supports con guration variables that serve as runtime constants and can be set on the resulting executable s command line ZPL supports arraybased programming using the concept of the region to represent a regular rectilinear set of indices Regions may be named or speci ed inline A re gion s dimensions can represent a range of indices potentially strided a single index or a replicated index using a ood dimension Region operators may also be used to create new regions from existing ones Regions are used to declare parallel arrays which are the primary unit of computation in ZPL The language also supplies builtin Indexi array constants which evaluate to their own indices in a particular dimension Regions are also used to de ne region scopes which passively provide indices forparal lel array references and expressions of matching rank Array operators are used to transform a region s indices as applied to a particular array expression Array operators support trans lation replication reduction or general remapping of an array s values Region scopes are dynamically scoped and may inherit from their enclosing scopes of matching rank Masks can be applied to region scopes to lter out a subset of their indices 69 ZPL allows the promotion of scalar operators values functions and control ow to interact with arrays in a natural manner It also contains support for binary and text 10 of scalar and array expressions to les or the console Nagging Questions At this point it is likely that there are several aspects of ZPL which seem arbitrary or strange For example Why does ZPL prevent interactions between regions and arrays of different rank if they are the same shape Since the remap operator can be used to express translations oods and reductions why does ZPLbother supporting other array operators Why can ZPL regions only be applied to statements and certain array operators rather than arbitrary expressions Why are ood dimensions nonconformable with singleton dimen sions given that they each represent a single set of de ning values Why are references not allowed to be passed by reference to parallel procedures The answers to these questions are based on the parallel interpretation of regions and arrays and therefore will have to wait until the following chapter For now let us turn our attention to some sample applications written in ZPL 2 15 Sample Codes This section contains several sample applications written in ZPL The problems consid ered are the Jacobi iteration matrixvector multiplication matrix multiplication and tridi agonal matrix multiplication These applications were chosen because they are simple wellknown and useful for demonstrating the language features described in this chap ter Most of the problems have a few different implementations to illustrate different ap proaches in ZPL For a larger variety of application domains in ZPL please consult the literature WVGSOO DLMW95 RBS96 LLST95 Sny99 70 Figure 211 The Jacobi Iteration 2151 Jacobi Iteration The Jacobi iteration is a simple relaxation method for solving Laplace s equation on a regular grid BBC94 Given an initial approximate solution it re nes the values using a veipoinl stencil until the solution converges Within some tolerance e The vepoint stencil simply replaces each value by the average of its neighbors in the four cardinal di rections See Figure 211 for an illustration The Jacobi iteration can be used for example to approximate the electric potential in a at metal sheet Whose edges have a xed electric potential Listing 215 shows an implementation of the Jacobi iteration in ZPL This code makes use of many of the concepts that this chapter introduced con guration variables regions directions and parallel arrays region inheritance using blank dimensions and doublequote references the operator and full reductions promotion of scalars operators and proce dures and 10 The code begins With the program statement Which names the program and identi es the code s entry procedure Lines 3 5 declare three con guration variables 11 Which serves as the size of the grid epsilon Which speci es the termination condition and verbose Which indicates Whether or not to print verbose output during the program s run Lines 7 8 declare two regions for the program The rst R is the region Which speci es the size of the regular grid The second region BigR is used to declare the main data array Which requires an extra row and column in each direction to store boundary values 71 Listing 215 The Jacobi Iteration 1 program jacobi 2 2 config var n integer 100 H the problem size A epsilon double 000001 H the convergence condition 5 verbose boolean false H Verbose output 6 7 region R 1n ln H the computation indices 8 BigR 0nl 0nl H the declaration indices 9 1n var A BigR double H the main data Values 11 New R double H the new iteration s Values 12 delta double H change between iterations 13 1A direction north l 0 H the four cardinal directions 15 south l 0 16 east 0 l 17 west 0 1 18 19 procedure initvar X double H array initialization routine 2n begin 21 X 0 22 north of quot X 00 22 south of quot X 10 2A east of quot X 00 25 west of quot X 00 26 end 27 28 procedure jacobi H the main entry point 29 R begin 2n initA 31 22 repeat 22 New Aenorth Aesouth H fiveepoint stencil on A 2A Aeeast A9west40 35 2s delta maxltlt fabsA New H find maximum change 37 38 A New H copy back 29 until delta lt epsilon H continue while change is big AU A1 if verbose then A2 writelnquotAnquot A H write data if desired A2 end 44 A5 writelnquotdelta lequot delta H always write delta A6 end 72 Lines 10 12 declare the variables for the problem Array A serves as the primary data array which is declared over region BigR to store the boundary values Array New stores the new values computed during each iteration and requires no boundary values so it is declared using region R The variable delta is a scalar value that is used to store the maximum change that an array value undergoes in a single iteration Lines 14 17 declare the four cardinal directions used to express the vepoint stencil Lines 19 26 declare a procedure init that is used to initialize the data array A Note that this procedure is written in a generic manner for twodimensional arrays taking a 2D array of any size as its input parameter and containing statements that rely on dynamic region inheritance The procedure zeroes out the array for all indices speci ed by the dynamically enclosing region scope as well as its north east and west boundaries The southern boundary is initialized to 10 The main procedure spans lines 28 46 It opens a region scope using R that supplies indices to all parallel expressions within the procedure It also serves as the enclosing region for the call to init on line 30 The main computation takes place in lines 32 39 Lines 3334 compute the 5point stencil on A using the operator and the four cardinal directions The result is stored in the array New Next in line 36 the scalar fabs routine is promoted across the array expression A New The f abs routine is part of the standard C library and is included in ZPL s standard context This computes the absolute value of the difference between corresponding elements of A and New The resulting array of values is then collapsed to a scalar using the max reduction operator and assigned to delta The new values are assigned back into A in preparation for the next iteration in line 38 This loop is repeated until delta falls below the convergence value epsilon Lines 41 45 output the results If the verbose ag is true line 42 prints the values of A described by R to the console in rowmajor order The nal value of delta is printed using exponential notation in line 45 and the program exits 73 2152 MatrixiVector Multiplication Matrixvector multiplication is a fundamental operation that is used in a wide variety of numerical computations This section considers two possible implementations using 2D and 1D vector representations 2D Vector Implementation Listing 216 shows an implementation of matrixvector multiplication in ZPL Though a fairly simple program it demonstrates the use of ood dimensions le IO partial reduc tions and the remap operator Typically matrices are thought of as being 2dimensional while vectors are considered 1dimensional However since ZPL makes interactions between 1D and 2D arrays non trivial this program represents all vectors using 2D arrays with either a ood or singleton dimension In particular it uses a ooded row array to store the vector argument so that its values will conform to all rows of the matrix Lines 3 5 declare the con guration variables The values In and n are used to repre sent the number of rows and columns of the matrix respectively The third con guration variable is of the string type and stores the lenarne for reading the matrix and vector Lines 7 10 declare the regions for this program Region R is the base region which describes the matrix indices Lines 8 9 declare two row regions TopRow a singleton row and RowVect a ooded row Line 10 declares a singleton column region ColVect that describes the result of the multiplication Arrays are declared for each region in lines 12 15 The matvectmult procedure itself spans lines 17 34 Lines 20 23 open the le speci ed by the filename con guration variable and read values for matrix M and input vector I from it Line 25 uses the ood operator to assign a replicated copy of the input vector to V the ooded vector Note that the source region for the ood is a dynamic region that inherits its second dimension from RowVect Equivalently the region TopRow could have served as the source region The dynamic region is used here for illustrative purposes 74 Listing 216 MatIiXVector Multiplication Using 2D Vectors 1 p rogram matvec tmult 2 2 config var m integer 100 A n integer 100 5 filename string quotMVdatquot 6 7 region R 1m ln 8 TopRow 1 ln 9 RowVect ln 1n ColVect 1m n 12 var M R double 12 I TopRow double M V RoWect double 15 S ColVect double 17 procedure matvectmult 18 var infile file 19 begin 2n infile openfilename fileiread 21 R readinfile M 22 TopRow readinfile I 22 closeinfile 25 RowVect V gtgtl I 27 ColVect begin 28 s ltltR M V 29 21 write1nS 21 end 32 22 H RoWVect V SIndex2 n M end number of matrix rows number of matrix columns input filename matrix index set top row of the matrix row Vector index set col Vector index set the matrix the input Vector the Vector flooded the solution Vector open file read matrix Values read Vector Values close file flood the input Vector matrixevector mul t transpose solution 75 The actual matrixvector multiplication takes place on line 28 Since V is ooded in its dimension all of the vector values are aligned with the appropriate matrix values in M Thus they can simply be multiplied elementvvise using scalar multiplication over region R Since the solution vector is formed by summing the products in each row a partial sum reduction is used to reduce the data from R down to the singleton column ColVect This represents the solution which is written to the console in line 30 In many matrixvector multiplications the matrix is square and the solution vector must be used in subsequent multiplications With this in mind line 33 indicates how the solution vector could be reassigned to a row vector using the remap operator In particular the column index Index2 of the row is used to access the rst dimension of S while the con guration variable 11 is promoted to access the second dimension It should be noted that region RowVect and array V could be completely omitted from this program by inlining the ood expression into the matrixvector multiplication state ment as follows 5 ltltR M gtgt1 I For this discussion this version was not used due to the fact that it is somewhat less clear and does not demonstrate the use of ood dimensions Alternatively region TopRow and array I could be removed from the program by read ing directly into array V While this would make the program even clearer it was not used for this discussion in order to demonstrate the ood operator ID Vector Implementation What if users really want to store their vectors as 1dimensional arrays is it possible in ZPL Certainly although the next chapter demonstrates that there may be compelling reasons to avoid such an implementation This section illustrates matrixvector multipli cation using a 1D vector representation For this program and all that follow IO will be omitted for brevity 76 Listing 217 MatrixVector Multiplication Using 1D Vectors p rogram matvec tmult l 2 3 config var m integer 100 H number of matrix rows A n integer 100 H number of matrix columns 5 6 region R 1 m l n H matrix index set 7 InVect 1n H 1D input Vector indices 8 OutVect 1 m H 1D output Vector indices 9 1n var M R double H the matrix 11 V InVect double H the input Vector 12 P R double H an array of products 13 S OutVect double H the solution Vector 14 15 procedure matvectmult 16 R begin 17 P M VIndex2 H compute the mults 18 H then sum the rows 19 OutVect n S ltltR PIndex1 11 21 end Listing 217 shows one way of writing such a code To make a rather complex operation somewhat more readable it has been broken into two lines 17 and 19 Line 17 computes the m 71 products storing them in array P These products are computed using the remap operator to read the 1D input vector V as though it was a 2D array Recall that the number of map arrays in a remap must match the rank of the source array 1 in this case and that the rank of the result is inferred by the rank of the map arrays This program uses Index2 as its map array which has ambiguous rank since it is conformable to arrays of rank 2 or greater However in this case it must be 2D to allow the remap expression to conform to the multiplication with 2D array M Line 19 adds up the rows of P assigning the result to S This is done using a partial reduction as in the previous version using source region R and destination region n which inherits rows 1 m from R Since storing the result in a 2D column vector seems contrary to the spirit of this approach it is immediately remapped for assignment to S using Indexl and n as its map arrays As in the previous statement Indexl and the scalar n 77 Listing 2 18 The SUMlVIA Algorithm in ZPL 1 program summa 2 3 config var m integer 100 H first dimension A n integer 100 H inner dimension 5 o integer 100 H last dimension 6 7 region RA 1m ln H indices for A 8 RB 1n lo H indices for B 9 RC 1m lo H indices for C ID 11 var A RA double H matrix A 12 B RB double H matrix B 13 C RC double H result matrix C 14 15 procedure summa 16 var i integer 17 RC begin 18 C 0 H Zero C 19 21 for i l to 11 do H loop over inner dim 21 C gtgt i A gtgti B H cross ith col of A 22 end H With ith row of B 23 end have ambiguous rank but they can be inferred to be ID by context due to the assignment to S The assignment itself is controlled by the enclosing 1D region scope OutVect That was fairly painful The next chapter will show that this is not without good reason 2153 Matrix Multiplication Matrix multiplication is yet another fundamental operation and one that was used as mo tivation throughout Chapter 1 This section presents three different algorithms for matrix multiplication The S UMMA Algorithm As described in the introduction the SUMNIA algorithm for matrix multiplication is con sidered one of the most scalable parallel approaches VdGW95 It has a fairly straight 78 forward implementation in ZPL due to the support for replication provided by the ood operator See Listing 218 for an implementation The program is fairly simple The size of the matrices is speci ed by three con guration variables m n and o A region is declared for each of the matrix sizes and an array declared for each matrix The execution is controlled by region RC since all computations are done with respect to the result matrix C First C is zeroed out in line 18 Then a loop is opened which speci es the n iterations of the algorithm On iteration 239 column 239 of A and row 7 of B are ooded across RC and multiplied elementwise accumulating into C At the end of the loop C holds the result Cannon s Algorithm Cannon s algorithm takes a systolic approach to matrix multiplication illustrated in Fig ure 212 The algorithm begins by skewing the rows of A and the columns of B In particular each row i of A is cyclically shifted 239 1 columns to the left Similarly column 239 of B is shifted 239 1 rows upward This has the effect of shifting A s main diagonal into its rst column and B s main diagonal into its rst row Matrix C is initialized to contain all zeroes The main algorithm consists of n iterations On each iteration the initial m X 0 elements of each matrix are multiplied elementwise and accumulated into C The A matrix is then cyclically shifted one row to the left and B is cyclically shifted one column upward When all n iterations have completed C contains the resulting matrix Listing 219 shows an implementation of Cannon s algorithm written in ZPL The dec larations are identical to those of the SUMMA algorithm except that additional copies of A and B are declared to hold the skewed versions of the arrays This was done in order to leave the original arrays unperturbed Note that these copies could be eliminated by skew ing the original matrices and then un skewing them at the end of the computation Here the extra copies are used for simplicity In addition to the extra arrays two directions are declared for use in the shifting 79 matrix A skewed A skewed B matrix B multiply elementwise Y ske W rotate B s columns rotate A s rows matrix C repeat n times Figure 212 Cannon s Algorithm For lVIatIix Multiplication 79 matrix A skewed A skewed B matrix B multiply elementwise Y ske W rotate B s columns rotate A s rows matrix C repeat n times Figure 212 Cannon s Algorithm For lVIatIix Multiplication 80 Listing 219 Cannon s Algon39thm in ZPL 1 p rogralll c annon 2 3 config var m integer 100 if first dimension A n integer 100 if inner dimension 5 o integer 100 77 last dimension 6 7 region RA 1m ln if indices for A 8 RB 1n lo if indices for B 9 RC 1m lo if indices for C 1U n var A RA double 7 matrix A u Askew RA double 7 skewed matrix A w B RB double 7 matrix B M Bskew RB double 7 skewed matrix B m C RC double 7 result matrix C 16 n direction nextcol 0 l 7 directions for shifting 18 nextrow 1 0 m procedure cannon m var i integer RC begin Skew A s rows and B s columns RA Askew AIndex1 Index2 Indexl 2n 1 RB Bskew BIndex1 Index2 2n l Index2 C 0 77 Zero C for i l to n do C Askew Bskew if accumulate into C RA Askew AskerAnextcol 7 shift Askew RB Bskew BskerAnextrow 7 shift BSkew end end 81 matrix A pmmote 4 1 n 390 3D mum 1 1 tw39 4 p y e emen rse u 39 v and sum in third 1 g d Imenslon mm B demote 0 2D m pmmote 4 to 3D mm c a b Figure 213 The PSP Algorithm For Matrix Multiplication The initial skewing of the arrays is implemented in lines 24 25 using the remap operator and map expressions involving the Indexl and Index2 constant arrays Matrix C is zeroed out in preparation for the main computation Within the main loop line 30 performs a single elementvvise multiplication of the skewed matrices accumulating the products into C Lines 32 33 use the wrap oper ator to shift Askew and BSkew for the next iteration At the end of the program C Will contain the result matrix as expected PSP Algorithm A third algorithm to consider is an instance of problem space promotion PSP CLS99 Problem space promotion is the idea of turning instances of iterations in an algorithm into explicit data dimensions In particular the PSP matrix multiplication algorithm converts the loop from 1 to n in the SUMNIA and Cannon algorithms into a third data dimension By doing so the m n o multiplications required for the matrix product are represented by a 3D index space Figure 213a Conceptually matrix A represents one face of the box While matrix B represents a second perpendicular face The algorithm proceeds by 82 Listing 220 PSP Matrix Multiplication in ZPL 1 p rogram matmultpsp 2 2 config var m integer 100 H first dimension A n integer 100 H inner dimension 5 o integer 100 H last dimension 6 7 region RA lm ln H 2B indices for A 8 RB ln lo H 2B indices for B 9 RC lm lo H 2B indices for C 1n R3D lm ln lo H 3B index space 11 RA3D lm ln H 3B indices for A 12 RB3D ln lo H 3B indices for B 12 RC3D lm l lo H 3B indices for C 14 15 var A RA double H matrix A 16 B RB double H matrix B 17 C RC double H result matrix C 18 A3D RA3D double H matrix A in 3D 19 RB3D double H matrix B in 3D 2n C3D RC3D double H matrix C in 3D 21 22 procedure matmultpsp 22 begin 2A RA3D A3D AIndex1 Index2 H promote A to 3D 25 RB3D B3D BIndex2 Index3 H promote B to 3D 26 27 RC3D C3D ltltR3D A3D B3D H compute C in 3D 28 29 RC C C3DIndex1 l Index2 H demote C to 2D an end replicating these faces throughout the box computing their elementWise products and then summing along the third dimension to form C This idea is illustrated in Figure 213 In ZPL the m 39rL 0 elementWise products need not be represented explicitly but can be expressed using ood dimensions and a partial reduction See Listing 220 for an imple mentation The code declares the same 2D con guration variables regions and arrays as in the previous codes However it also declares a 3D region to represent the 3dimensional computation space and three faces Within that space two ood regions for the argument arrays and a third singleton region for the result 83 The algorithm begins by using the remap operator to align the 2dimensional A and B matrices in the 3D space lines 24 25 The computation itself is expressed in line 27 which multiplies values of A and B within R3D and then reduces the products to the third plane of the space Finally in line 29 the result array is mapped from 3D back to 2D 2154 Tridiagonal Matrix Multiplication As a nal application area consider the multiplication of two tridiagonal matrices Though any of the algorithms from the previous section can be used for this problem the presence of so many zeroes allows more specialized techniques to be used In particular the result ing product will be a pentadiagonal matrix whose values are formed from the products of neighboring values in the tridiagonal argument matrices See Figure 214 for an illustration Since this code only needs to reference nearby neighbors our implementations will use the operator rather than the oods and reductions of the previous matrix multiplication al gorithms Maskibased Solution One approach for implementing tridiagonal matrix multiplication in ZPL is to use a mask to restrict computation to one of the ve resulting diagonals at a time An implementation of this approach is given in Listing 221 The implementation begins by declaring the problem size in line 3 and a region to describe the matrix indices in line 5 A larger region BigR is also declared to store the argument matrices such that references can spill outside of the main problem area The mask arrays and directions are declared in lines 8 16 The implementation begins by zeroing out C so that all values not lying on the pentadi agonal will be correct This could be done using a mask over all nonpentadiagonal indices but the approach shown is asymptotically equivalent and used for simplicity Next each di agonal is computed one at a time by setting the mask using an expression that compares the 84 matrix C matrix A matrix B main diagonal 39 39 39 E diagonal 71 diagonal 72 diagonal 1 diagonal 2 a matrix C matrix A matrix B g1 AeastBnorth lt i ABnorth AeastB eastBnorth AB AwestBsouth AwestB ABsouthlt AwestBsouth nothing 4 b Figure 214 Tn39diagonal Matrix Multiplication 85 Listing 221 TIidiagonal MatIix Multiplication in ZPL Using Masks 1 program trimask 2 3 config var n integer 100 H assume n x n arguments A 5 region R 1n ln H the base matrix size 6 BigR 0nl 0nl H matrix with boundaries 7 8 var A BigR double H matrix A 9 B BigR double H matrix 13 11 C R double H the product matrix C 11 Mask R boolean H mask for selecting diagonals 12 13 direction north l 0 H the four cardinal directions 1A south l 0 15 east 0 l 16 west 0 l 18 procedure trimask 19 R begin 2U 21 22 23 2A 25 26 27 28 29 an 31 32 33 3A 35 36 37 38 39 An 41 A2 43 AA Assume We ve zeroed A and 13 s boundaries C 0 H Zero out C Mask lowest diagonal 72 and compute Mask Indexl Index2 2 II with Mask C Aeeast Benorth compute diagonal 71 Mask Indexl Index2 1 II with Mask C A Benorth Aeeast B compute main diagonal Mask Indexl Index2 II with Mask C Aewest Benorth A B Aeeast Besouth compute diagonal l Mask Indexl Index2 1 II with Mask C Aewest B A Besouth compute diagonal 2 Mask Indexl Index2 2 II with Mask C Aewest Besouth end 86 Indexl and Index2 arrays The computation of each diagonal s values is expressed in a straightforward manner using the mask to restrict it to the appropriate values At the end of the program C contains the matrix product Shattered Control Flow Solution A second implementation is very similar to the rst but uses shattered control ow rather than a mask The obvious advantage is that no time or space are required to compute and store the mask The declarations are identical to the mask based version The main computation con sists of a shattered conditional that branches based on the relative values of Indexl and Index2 Since the comparison of these arrays implies that they can be of any rank greater than 1 the body of the conditional is examined to determine that this is a 2 dimensional conditional due to its references to A B and C Each branch of the conditional simply assigns C using that diagonal s de nition The else clause at the end causes all non pentadiagonal values to be zeroed out Compact Solution The nal implementation uses a more compact representation for the banded matrices In particular it uses an n x 3 region Tri to represent the tridiagonal matrices and an n x 5 re gion Pent to represent the resulting pentadiagonal The regions second dimensions refer to the diagonal numbers rather than matrix columns and are therefore numbered between 2 2 as appropriate Note that directions north and south have been transformed to ne and sw to re ect this index space transformation The tridiagonal region is also ex tended by an additional row in each direction to handle references that spill outside of the array s bounds The computation proceeds by opening a single dynamic region per diagonal which in herits its row dimension from the enclosing region Pent The expression to compute each 87 Listing 222 TIidiagonal MatIix Multiplication in ZPL Using Shattered Control Flow 1program trishard 2 3 config var n integer 100 A 5 region R 1n ln 6 7 BigR 0nl 0nl 8 var A BigR double 9 1U 11 a direction north B BigR double C R double south east west n procedure trishard m R begin Assume A and B s boundaries are zeroed out 19 2D 21 22 23 2A 25 26 27 28 29 an 31 32 33 3A 35 assume n x n arguments the base matrix size matrix With boundaries matrix A matrix B the product matrix C the four cardinal directions shatter control flow based on the row and column indices if Indexl Indexz 2 then C Aeeast Benorth elsif Indexl Index2 1 then if compute diagonal 72 if compute diagonal 71 if compute main diagonal C Aewest Benorth A B Aeeast Besouth if compute diagonal 1 if compute diagonal 2 77 Zero all other indices C A Benorth Aeeast B elsif Indexl Index2 then elsif Indexl Index2 1 then C Aewest B A Besouth elsif Indexl Index2 2 then C Aewest Besouth else C 0 88 Listing 223 Tridiagonal Matrix Multiplication in ZPL Using Compact Arrays 1 program tridense 2 2 config var n integer 100 H assume n x n arguments A 5 region Tri 0nl l 1 H dense tridiagonal storage 6 Pent 1n 2 2 H dense pentadiagonal storage 7 8 var A Tri double H matrix A 9 B Tri double H matrix 13 11 C Pent double H the product matrix C 11 12 direction me l l H northeast acts as north 12 SW 1 1 H southwest acts as south 1A east 0 l H east 15 west 0 1 H West 17 procedure tridense 18 Pent begin 19 Assume A and 13 s boundaries are zeroed out 2U 21 one statement per diagonal in the product 22 2 C Aeeast Bene 22 l C A Bene Aeeast B 2A 0 C Aewest Bene A B Aeeast Besw 25 l C Aewest B A Besw 2s 2 C Aewest Besw d 27 en diagonal is the same as in the previous codes but substitutes me for north and sw for south This implementation is attractive because it uses an amount of memory proportional to the number of interesting values in the problem rather than the conceptual problem space However this has the disadvantage of making it more awkward to operate on tridiagonal matrices in conjunction with traditional H X n matrices For example adding a traditional matrix to a tridiagonal matrix in this format would require the remap operator to transform one index space to the other The next chapter will reexamine all of the sample codes in this section and further evaluate their strengths and weaknesses in the context of a parallel implementation 89 216 Related Work This section describes alternatives to regionbased programming that are used to express array computations in other languages Its focus is restricted to the indexing mechanisms of sequential languages Parallel languages will be covered in the related work section of the following chapter 2161 Scalar Indexing The oldest and most prevalent form of expressing array computation is scalar indexing or array subscripling as found in languages such as FORTRAN its later incarnation as FORTRAN 77 F77 and more recent languages such as C Mac87 Bac98 Sec78 KR88 In each of these languages basic operations such as assignment and addition are de ned only for scalar values and promotion is not supported As a result operations on arrays must be written to explicitly loop over the index space and reference the array s values one at a time Scalar indexing allows the programmer to specify a single value per dimension in order to specify a single array value For example adding two arrays might appear as follows in F77 The primary disadvantage of scalar indexing is that it burdens the programmer with the task of performing the explicit looping and subscripting required to express array compu tations This can quickly become a tedious task that requires more keystrokes than it does intelligence Moreover the loops required by scalar indexing describe a sequential order ing on the operations that runs counter to parallelism This is not a problem in a sequential context but can complicate the parallelization of languages using scalar indexing in the parallel domain 90 Scalar indexing does have the advantage of being a very simple and general mechanism for expressing array computations For instance there is no need for ZPL s array operators nor its restrictions governing what types of expressions can and cannot interact Moreover conformability rules in such a language are simple since all arguments to an operator must be scalars the only check required is that all array dimensions are being indexed 2162 Vector Indexing In the late 1950 s Kenneth Iverson developed a mathematical notation that was designed to clarify some of the ambiguities that he felt standard mathematical notation contained Shortly thereafter this notation evolved into APL A Programming Language one of the earliest higher level programming languages Ive62 Mac87 APL was the rst pure array based programming language since all data items in the language are arrays scalars are simply arrays of rank 0 APL s arrays support vector indexing in which the programmer supplies a vector of indices per dimension The outer product of these vectors speci es the indices of the array In this way array references no longer refer to a single value of the array but a subarray of values or possibly the entire array For example matrix addition would appear as follows in APL C1LM1LNlt A1LM1LNB1LM1LN In this notation bk represents a k element vector containing values from 0 to k 1 sim ilar to ZPL s Indexi arrays Each element is incremented to use 1 based indexing to be consistent with the Fortran implementation Thus the outer product of these vectors causes each array reference to refer to elements 1 1 m Addition 9 and assignment 4 are promoted across the elements as in ZPL and the sum is computed Although APL contained many elegant and revolutionary ideas it has not remained in widespread use over the years Detractors nd it too terse and unreadable due primarily to its large set of unique operators most of which require non standard characters like the lt used for assignment above Though enthusiasts are quick to rush to its defense it 91 remains largely unused and unknown today Even so its use of arrays as operators and vector subscripting have had an in uence on more modern languages including ZPL 2163 Array Slicing Modern array based languages have adopted syntax to support APL s array operands with out so much generality and built in support for mathematical operators Many of these languages have provided a syntax for accessing a regularly strided set of indices within an array This is known as array slicing or array sectioning and represents an excellent example of optimizing for the common case Consider for example how few of the ZPL programs from the previous section would require an APL style index vector that was not a simple expression One language with support for array slicing is Fortran 90 F90 ABM 92 a successor to F77 F90 allows the user to specify indices using a 3 tuple per dimension l h s These values are identical to the low high and stride values in ZPL s sequence descriptors and they can be used to express a wide variety of simple APL style Lexpressions In F90 serves as the alignment value which was the fourth value in ZPL s sequence descriptors Another language that supports slicing is Matlab an interactive interpreted matrix ma nipulation language Mat93 Matlab supports a slice notation similar to F90 s but without the stride value In both languages the low and high bounds may be omitted which cause the array s declared bounds to be used Omitting the stride in F90 results in a stride of 1 Omitting the slice notation altogether causes the entire array to be referenced Matrix addi tion would appear as follows in F90 and Matlab Clzn lzn Aln lzn Bln lzn As in APL both F90 and Matlab allow the programmer to use vector indexing though in practice this rarely seems to be used For example a ve element vector could be per muted in F90 using the following assignment Xl5 Y 25143 92 Naturally traditional scalar indexing may be used as well should slicing or vector indexing fail to express a desired array reference The primary advantage of array slicing over traditional indexing is that it is a more concise means of specifying operations over arrays or subarrays eliminating the need for explicit loops for many array operations Like scalar indexing slices allow for more general array interactions than ZPL s regions yet use a notation that is more concise and optimized for the common case than that of APL s vector indexing The chief disadvantage of array slicing is the syntactic overhead of specifying a region like set of indices for each array reference Though small examples like matrix addition are not so bad slice notation can become rather cumbersome and errorprone in larger array codes In particular the conforrnability requirements of array slices require the size and shape of each array operand to match causing redundant information to be supplied with each array reference Moreover these conforrnability rules require more analysis than scalar indexing or regionbased indexing both of which can be satis ed by simple checks of the arguments note that the related problem of bounds checking is common to all approaches and is not made simpler by any scheme 2164 Forall loops A nal array access mechanism that is often supported by languages to simplify scalar in dexing is the forall loop This structure iterates over a multidimensional index range or an arbitrary index set typically in an unspeci ed order In this sense forall loops repre sent universal quanti cation much like regions in that they generate a set of indices with which to operate Unlike regions forall loops tend to use iterator variables like traditional for loops These iterators are typically used to access an array s values using scalar index ing Thus forall loops can be considered a compact representation of a nested loop whose iteration order is unconstrained FDIL Finite DIfference Language SH89 HC93 is an example of an array language that uses forall loops FIDIL was designed for use in scienti c computation and supports 93 general index sets called domains Domains need neither be rectangular nor dense and FIDIL supports computation over them using settheoretic union intersection and differ ence operations Domains are used both to specify the structure of arrays maps and to provide index sets for FIDIL s forall loops Fortran 95 Geh96 also supports a forall loop structure that takes an array slice as its bounds and iterates over the indices that it describes In this context the forall loop is much more of a syntactic sugar as there is no languagelevel support for index sets as in FIDIL and ZPL 2 1 7 Evaluation To evaluate the impact of regionbased programming on a program s clarity the sample codes of this chapter are compared to sequential handwritten implementations in C Ap pendix A contains the source code for these C implementations for reference Each line of code in the ZPL and C implementations of the benchmarks is classi ed as serving one of three purposes 1 declaring a variable procedure or other identi er 2 computing the benchmark s result or 3 performing other nonessential work such as 10 initialization timing etc This study only considers lines in the rst two categories The graphs in Figures 215 and 216 indicate the number of useful lines and characters of code used by each implementation Characters were counted by removing all extraneous spaces and whitespace from the codes other than that which is required to represent the algorithms in a simple readable form The general trend shown in these graphs is that ZPL codes express computation with a conciseness similar to C both in terms of line and character counts These benchmarks also indicate that the coding effort in the ZPL versions of these benchmarks tends to be weighted more heavily towards declarations than computation This is a result of ZPL s use of highlevel named concepts like regions and directions to replace traditional loops and indexing Presumably the overhead of these declarations will be lessened in longer codes 94 Jacobi Line Counts Jacobi Character Counts 50 900 loania anon eciarations 40 m a n Dcornutation 395 8 600 o 30 5 5 20 j E 300 i 10 0 i 0 i C ZPL C ZPL Language Language a b MatVect Multiplication Line Counts MatVect Multiplication Character Counts 20 400 I edarations D corn ulation 15 g 3007 8 8 5 E 10 200 I 5 5 1007 0 0 c ZPL2D ZPL1D c ZPL2D ZPL1D Language Language c d Figure 215 Conciseness of Sample Codes These graphs display the number of useful lines and characters required for the sample Jacobi and matrixvector multiplication codes in ZPL and C 95 Matrix Multiplication Line Counts Matrix Multiplication Character Counts I dedarations I U Corn utation 393 3 o a u o a u m 2 2 E u 3 ii o C ZPL ZPL ZPL PSP C ZPL ZPL ZPL PSP SUMMA Cannon SUMMA Cannon Language Language 41 b Tridiagonal Multiplication Line Counts Tridiagonal Multiplication Character Counts 40 800 declarations D computation m 30 7 600 t o a u o a u w g 20 E 400 a u u 5 2 10 7 o 200 l 0 i i i 0 i i i C ZPL Mask ZPL ZPL C ZPL ZPL ZPL Shard Compact Mask Shard Compact Language Language C d Figure 216 Conciseness of Sample Codes continued These graphs display the num ber of useful lines and characters required for the sample normal and tridiagonal matrix multiplication codes in ZPL and C 96 where the same declarations can be used again and again amortizing the cost of declaring them over a larger code base The experiments in Chapters 5 and 6 support this hypothesis The following paragraphs give a few notes for each benchmark The Jacobi Iteration The Jacobi Iteration best demonstrates the bene ts of regionbased programming While most of the other benchmarks consist of a small number of array statements surrounded by a single region the Jacobi benchmark uses a number of diverse regions to establish its boundary conditions In the C code each of these regions requires its own loop demon strating the region s concise support for array computation Since most realworld codes will tend to require many regionsloop nests to express a computation this benchmark best demonstrates the concise power that a few appropriate ZPL declarations can have MatrixiVector Multiplication C and ZPL represent matrixvector multiplication using reasonably equivalent code sizes Once again the ZPL representations tend to be slightly more concise in terms of computa tion due to the use of regions rather than loops Matrix Multiplication Matrix multiplication represents a worstcase for ZPL simply due to the difference in com plexity between sequential and parallel matrix multiplication algorithms In particular se quential C algorithms can simply iterate over the matrices and compute on them inplace In contrast all of the parallel algorithms described by this chapter require some amount of data movement and copying The SUMNIA algorithm is ZPL s most concise implementation and the only one that is more compact than C s triplynested loop Cannon s algorithm requires signi cantly more computation due to its skewing operation and cyclic shifts It also requires additional decla 97 rations to create the directions used to implement the shifting The PSP algorithm requires the greatest amount of declarations due to its use of 2D and 3D regions to describe its two computational domains The PSP computation itself is fairly concise in terms of lines but these lines are long due to the Indexi expressions used to implement the alignment of arrays from 2D to 3D Tridiagonal Matrix Multiplication The handcoded C implementation of tridiagonal matrix multiplication uses a compact rep resentation of the matrices similar to the third ZPL implementation As a result these two codes are the most comparable in size due to their similar approach ZPL s mask and shardbased approaches tend to require more computation due to the overhead of restricting computation to the diagonals within the logical 2D index space In contrast the compact ZPL and C implementations can trivially isolate a single diagonal SM mmary In summary ZPL can represent these simple algorithms as concisely as C ZPL tends to re quire slightly less syntax to specify computation due to its use of regions to replace looping and indexing This effect is offset somewhat by ZPL s more verbose syntax eg Indexi constants the use of begin end rather than curly braces For the benchmarks stud ied here ZPL tends to require more declarations than C though it is expected that these declarations will be amortized in larger benchmarks by the savings in computation It is crucial to keep in mind that the ZPL implementations of this section differ from the C codes in one crucial way they represent fullyfunctional parallel implementations of the benchmarks whereas the C codes can only be run on a single processor For this reason regions must be considered a bene t to clarity since they support the expression of a more complex program using syntax that tends to be as concise as sequential C As the next chapter will show this cannot be said for most parallel languages 98 2 18 Discussion 2181 Bene ts ofRegions Though Section 216 argued that regions are somewhat less exible and adaptable than array indexing and slicing they are not without their bene ts This section describes the advantages that regions give programmers syntactically and semantically Cleaner Elementwise Operations Performing strict elementwise operations on arrays remains an extremely common case in arraybased programming Though interesting programs will require more complex in teractions between their arrays most large programs will still require many elementwise operations in addition to the more complex ones In these cases regions represent a posi tive evolution in array reference syntax Array slices can be seen as a factoring of F77 loop bounds into the array references in order to optimize the common case of iterating over an array in a regular manner In the same spirit regions can be thought of as factoring a set of indices that describe the size and shape of slicebased array references into a single pre xing slice the region scope Table 28 shows a number of simple array statements written in F77 F90 and ZPL In the rst row a simple array addition is demonstrated The F77 version requires explicit loops and repetitive array indexing The F90 version eliminates the loops but requires identical slices to be applied to each individual array reference The ZPL version factors this common slice into the region scope leaving the array references unadorned and eliminating a lot of redundant typing Since elementwise operations constitute a common case the result is that many array references will be unadorned in ZPL Table 28 Language Syntax Compan39son F77 F90 Mi 339 Bi j Clzm lzn Alm Blm lzn lzn Bi Ci jl 339 Alm lzn Blm Clzm Ozn l 2nl ln A B0 l C90 1 R A B west C east 1311339 do i l m Aii lzm enddo Bll lzn lm ln A gtgtl B 44 or 44 R A gtgtTopRow B 0E M1 339 Bi j All lzn do i l m All lzn enddo lzn lzn 1 ln A ltltlm B 44 or 44 TopRow A ltltR B A13i j Ci j do j l n do i l m Mi 339 enddo enddo A13i j Ci j lm ln A AB C R A AB C 99 100 Clearer Array Reference Patterns When array operations are not strictly elementwise regions still serve a purpose by describ ing the size and shape of the subarray accesses Array operators express any modi cations to these base indices for a particular array expression This has the effect of syntactically factoring the redundant part of each array reference out of the main computation leaving only indications of how each reference differs As an example consider the second row of Table 28 in which shifted references to and are summed In F77 and F90 the array indices and slices encode redundant Enforngltion In particular F77 speci es that each access is based on indexma j while F90 speci es three slices each of which arem x n in size In ZPL the common aspects of these array references the m X n base indices are factored into the region leaving only the differences in how the arrays are accessed using the operator By removing much of the redundant clutter the meaning of the statement is clearer As further evidence consider each column of the table one at a time to see how long it takes you to identify the operation that is being performed by each statement Note that very different array operations end up looking rather similar in F77 and F90 whereas ZPL does a better job of distinguishing them Fewer Loops Required In F77 programmers expect to use loops and indices In F90 many array operations no longer require loops due to the availability of array slicing and vector indexing However as the last two entries in Table 28 show these mechanisms are not strong enough to express oods partial reductions or remap operations without using loops The oods and partial reductions fail due to the requirement that the two sides of an operation must have the same size and shape This makes it illegal to add an arbitrary number of rows to a single row without a loop The remap operator cannot be written succinctly due to the fact that F90 has no mechanism for taking the dot product of vector indices rather than the cross product 101 It should be noted that F90 supports intIinsic functions such as SUM RESHAPE and TRANSPOSE that may be used to w1ite such statements in a single line However these functions should be considered part of a standard libraiy context rather than a syntaxbased means for expressing array computation Similarly F95 s forall loops allow such state ments to be w1itten on a single line but still rely on a loopstyle concept albeit one that syntactically begins to resemble the region Naming Improves Readability The fact that regions can be named greatly improves the readability of ZPL code since it allows index sets to be given identi ers that are meaningful to the programmer Column 3 in Table 28 shows each ZPL statement written both with and without identi ers These examples demonstrate that names can improve the claiity of each statement Note that the ability to name airay slices or index ranges could improve the readability of F90 codes somewhat but would not produce a ZPLequivalent syntax Regions Promote Code Reuse The fact that regions are dynamically scoped allows procedures to be written in a more gene1ic way As a simple example note that the ZPL implementation of the SUMlVIA alg01ithm Figure 12 could be moved into a procedure that takes only the size of the inner dimension as an aigument and inheIits the matIix size from the callsite In contrast a gene1ic F90 implementation would require the bounds for each dimension to be passed in as arguments Furthermore even when an alg01ithm does require more than a single inheIited region as in the Cannon and PSP alg01ithms regions represent a concise means of bundling index inf01mation for passing to another procedure 102 2182 Region De ciencies Though regions have many bene ts there are also some de ciencies that should be ad dressed in future region based languages including Advanced ZPL This section brie y describes some of them Regions are Rectangular One obVious limitation of regions is their regular and rectilinear nature Though masks and shattered control ow can be used to restrict a region to an arbitrary subset of indices these concepts tend to require time and space proportional to the region s size in order to compute the irregularity For example the mask and shattered control ow based implementations of tridiagonal matrix multiplication require nQ time and space rather than the n time and space that the computation requires One partial solution would be to expand the region speci cation to allow Indexi ex pressions in a region s index ranges For example a lower triangular matrix could be repre sented using the region l n Indexl n Similarly a tridiagonal matrix could be represented using l n Indexl l Indexl l The main challenge to such an approach is the fact that such regions cannot be speci ed using a cross product of sequence descriptors Therefore they would require a different formal speci cation and implementation Legality issues would also be a concern in such a scheme Chapter 6 presents a different solution to this problem in the form of sparse regions Inability to Capture Dynamic Regions As currently de ned ZPL allows users to open dynamic region scopes but not to capture them in a way that allows them to be reused again later Rather they must be explicitly rede ned for each use As a motivating example imagine that a program dynamically calculates three subregions of an array in which it wants to perform further computation It would be nice to have some way of snapshotting these three index sets using named 103 regions so that they could be reused later rather than explicitly maintaining their bounds using scalar variables This lack also makes it dif cult to declare persistent arrays using a dynamic region Arrays that are local to a procedure may be declared using the dynamically covering region or an explicit dynamic region but such arrays will not persist once the procedure returns This restricts the user s ability to declare an array over the three subregions of the previous paragraph for example such that they could be used throughout the program naturally Inability to Rede ne Regions A related problem is that named regions cannot be rede ned In a sense ZPL s regions can be viewed as constant sets of indices that cannot be altered during the program s execution For some applications in which a problem size cannot be known at con guration time it would be nice to incrementally grow regions to meet a program s speci c requirements dynamically For example while a 1D region can be used to implement an arraybased list in ZPL the list cannot be grown using standard array resizing techniques because the bounds of its de ning region cannot be modi ed Regions are In exible In ZPL regions are technically neither a type nor a rstclass object This limits their use in a number of ways programmers may not declare collections of regions using indexed ar rays or records they may not assign regions they may not pass regions to a procedure etc This design choice was made in order to provide the compiler with as much information about the regions as possible The idea was to start with a restricted region de nition and then broaden it as much as the compiler could tolerate without sacri cing performance or the ability to effectively parallelize a ZPL program During the past decade virtually no relaxation of these restrictions has taken place though it has seemed increasingly feasible to do so 104 2183 Proposed Supporlfor Regions as Values One solution to many of the previous section s problems would be to promote the region concept to that of a rstclass value In doing so traditional region declarations would be interpreted as declarations of constant or con guration regions That is the following two declarations would be considered equivalent to one another region R 1 m l n config var R region 1 m l n Any regions for which ZPL s current rules are overly strict could be declared as variables of type region allowing them to be assigned dynamically modi ed or grown Parallel airays declared using region variables would be reallocated after the region was assigned preserving any values in the intersection of the old and new index sets Procedures could be written with formal parameters of type region Moreover types could be created that have region components such as indexed arrays of regions and records with region elds The primary liability of this scheme is that current ZPL optimizations may be compro mised due to an increased amount of confusion over a region s de nition For example in the presence of region assignments and aliasing will the compiler be able to determine whether a region s dimension is oodable singleton or normal What about its rank It seems reasonable to be optimistic about these issues since the absence of pointers should make most of a region s salient features statically detectable using interprocedural analysis Even in the worstcase such an approach is worthy of more study in order to attempt to support more general regionbased programming 2184 Proposed Supporlfor UseriDe ned Region Operators One feature that I believe is missing from the ZPL language is the ability for users to de ne their own region operators While the region operators supported by ZPL form a useful basis set it is not dif cult to conceive of other operators that might also bene t a programmer Rather than hoping to supply all region operators that a user could want it 105 Listing 224 Proposed Syntax for UserDe ned Region Operators 1postfixregionop growdelta integer var l h s a integer 2 begin 3 if delta lt 0 then A l delta 5 else a h delta 7 end s end in direction nw 1l 11 se l l 13 region R 1m ln 14 BigR R grow nw grow se makes more sense to give the user the ability to de ne custom operators by describing the effect of a 6 value on a sequence descriptor For example I might de ne a grow region operator that pulls a region s corner in the speci ed direction without changing its stride or alignment Listing 224 shows proposed syntax for such an operator Lines 1 8 de ne the grow operator by indicating the delta value s effect on the fourtuple sequence descriptor which could be expressed using a record type rather than four scalar variables The region operator could then be used to de ne BigR as shown in line 14 rather than by explicitly specifying its bounds Such support seems like a useful and simple extension to ZPL as it currently stands One important sideeffect that this might have is to require parenthesization to indicate operator precedence in a region expression ZPL s builtin region operators are associative so parenthesization is neither required nor allowed 2185 Implicit Storage Considered Frustrating One feature of ZPL that has not been described in this chapter is its support for implicit storage The implicit storage concept causes certain speci cations of an array s boundary 106 conditions to implicitly extend the amount of memory allocated for it For example in the Jacobi iteration of Listing 215 variable A can be declared over region R and the initializa tion of its borders using of regions would cause its storage to automatically be extended by a row and column in each direction This feature was motivated by the observation that many ZPL programs use two re gions for each problem size one for the computation space and a second that extends the computation space by a few extra rows or columns to describe the array s data space with boundaries For example most of the sample programs of Section 215 exhibit this charac teristic Therefore it was believed that implicit storage would reduce the number of regions that programmers would have to declare saving them some trouble In the longrun it has turned out quite the opposite Implicit storage allocation contin ues to be a confusing issue to most programmers due to the fact that 1 the rules are not as clear to them as they should be 2 the rules are not always as general or intuitive as they ought to be andor 3 the fact that implicit storage is invisible in the program makes it hard to debug or even feel reassured that the expected behavior is going to take place More questions and bugs have probably been addressed due to misunderstandings related to implicit storage than any other concept in ZPL Most programmers eventually give up on the idea and simply explicitly declare the complete memory that their airays require as I have done in this chapter s sample codes As a result of these experiences I believe that implicit storage allocation is a bad idea Users are accustomed to explicitly declaring the type of their vaiiables which includes the sizes of their arrays While giving them a mechanism to handle the common case with one less identi er is an admirable idea it has caused more confusion than it is worth In this sense implicit storage seems much like optional variable declarations Mac87 and should be similarly avoided 107 2186 Scalar Issues For the most part ZPL s scalar concepts are not particularly interesting or inventive They provide basic functionality without many surprises This section touches on a few notewor thy characteristics Con guration Variables The con guration variable is ZPL s most interesting scalar concept It has proven extremely useful as a means of specifying a value that a programmer will want to change from run to run but which the compiler can use as a basis for optimization This makes programmers lives easier by not requiring them to create and maintain a separate executable per program con guration Yet it provides more semantic exibility than the const keyword in C which requires its initializer to be statically speci able Having worked with ZPL for a number of years I often nd myself wishing that other languages had a concept that was equivalent to the con guration variable The Lack of Pointers Up to this point the lack of pointers in ZPL has kept the language clean and easy to analyze Furthermore the ZPL applications that have been studied to this date have not suffered due to the lack of pointers As the language strives to support more irregular graphbased data structures some sort of pointer mechanism will be required It will be an interest ing challenge to see whether such a mechanism can be supported in a clean highlevel way analogous to regions or whether such data structures necessarily require pointers and the dif culties that they entail An interesting starting point for anyone approaching this problem would be Vassily Litvinov s exploratory work in Graph ZPL Lit95 108 Parameter Speci cations One key place where I nd ZPL s scalar syntax lacking is in its lack of richness for de scribing how a procedure s parameter will be used In particular the byreference var speci cation might mean any of the following 1 I will be sending this value into the pro cedure modifying it there and want the resulting modi cation to be re ected in the actual parameter 2 The current value of this parameter is unimportant butI will be using this parameter as a means of returning a new value calculated within the procedure or 3 This is a very large data structure and the compiler should not make a copy of it when passing it into the procedure though I will not be modifying it There are many instances during ZPL compilation in which the compiler would like to differentiate between these meanings for optimization purposes While many cases can be differentiated by analyzing procedure de nitions aliasing can complicate matters and foil the analysis My sense is that a better designed set of parameter tags perhaps similar to those provided by Ada TD97 would not represent a hardship to the user and would support better communication between the programmer and compiler 2 19 Summary This chapter has de ned the concept of the region and explained its use in de ning ZPL Regions represent a succinct means of describing a set of indices for use in declaring arrays and expressing array computation This chapter argues that regions have many syntactic bene ts including the elimination of redundant indexing expressions and an emphasis on different array access styles However the most important bene ts of regions are related to their use in parallel computing The next chapter describes these bene ts
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'