New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Class Note for COSC 4397 at UH


Class Note for COSC 4397 at UH

Marketplace > University of Houston > Class Note for COSC 4397 at UH

No professor available

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

No professor available
Class Notes
25 ?




Popular in Course

Popular in Department

This 44 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Houston taught by a professor in Fall. Since its upload, it has received 20 views.

Similar to Course at UH


Reviews for Class Note for COSC 4397 at UH


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15
Shared Memory Parallel Programming OpenMP Environment and Synchronization So Far 0 Parallel region pragma omp parallel OMP PARALLEL Work sharing constructs OMP DO OMP WORKSHARE pragma omp for pragma omp sections pragma omp single Reminder 0 Other features conditional parallelism 100p scheduling Data Environment Directives All variables are by default shared all threads access the same memory location 0 Exception the loop variable of a parallel forparallel do is private 0 By using data directives some variables can be made private to a thread each thread has its own copy memory location Matrix Multiply pragma omp parallel for for i0 iltn i forlt 10 jltn j gt Ciil 00 for k0 kltn k emu aikbki 0 a b C are shared 0 i j k are private Private Variables pragma omp parallel for private list Compiler sets up a private copy of each variable in the list for each thread Our examples use OpenMP for and DO But these apply to other region and work sharing directives For compiler buffs thread has its own stack Private Variables Example 1 0f 2 for i0 iltn i tmp ai ai bi bi tmp Swaps the values of a and b 0 Loopcarried dependence on tmp Easily fixed by privatizing tmp Private Variables Example 2 0f 2 pragma omp parallel for private tmp for i0 iltn i tmp ai ai bi bi tmp Removes dependence on tmp 0 Would be more difficult to do in Pthreads Private Variables Alternative 1 for i0 iltr1 i tmph ai ai bi bil tmplil Requires sequential program Change 0 Wasteful in space On vs Op Private Variables Alternative 2 pragma omp parallel private iam for for i O i numthreads i iam ompgetthreadnum s iam fS ffrom int tmp local allocation on stack gtk for ifrom iltfromconst i tmp ai ai bi bi tmp Remember Example OMP PARALLEL PRIVATE iam np ipoints iam 0mpgetthreadnum np 0mpgetnumthreads ipoints npoints np call domywork X iam ipoints OMP END PARALLEL Default Behavior OMP PARALLEL DEFAULT PRIVATE SHARED X NPOINTS iam 0mpgetthreadnum up ompgetnumthreads ipoints npoints up call domywork X iam ipoints OMP END PARALLEL 0 We can set the default to be either private or shared Firstprivate and Lastprivate The initial and final values of private variables are unspecified 0 A firstprivate variable is private and the private copies are initialized using its value before the loop 0 A lastprivate variable is private and the thread executing the sequentially last iterationlexically last section updates the version of the object outside the parallel region FirstprivateLastprivate Example 1 of 2 f0ri0 iltn ampamp bi i ai bi forg39zi jltn j aj 10 0 Sets all elements of a to the value of the corresponding element in b up to first zero value in b 0 Sets all further elements of a to 10 Firstprivatelastprivate Example pragma omp parallel for lastprivate i for i0 iltn ampamp bi i ai bi pragma omp parallel for firstprivate i f0r jzi jltn j aj 10 Firstprivatelastprivate Example OMP PARALLEL OMP DO LASTPRIVATE i DO 1 1 n aibici END DO OMP END PARALLEL CALL NEXT i Threadprivate 0 Variables are private Within an entire parallel region Threadprivate variables are global variables that are private throughout the execution of the program Threadprivate pragma omp threadprivate list Example pragma omp threadprivate X Requires program change in Pthreads Requires an array of size p Access as Xpthreadself Costly if accessed frequently Not cheap in OpenMP either Reduction Variables Example pragma omp parallel for reduction sum for i0 iltn i sum ai 0 Sum is automatically initialized to zero Reduction Variables Example OMP PARALLEL DO reducti0n max m do i 1 100 m maXmsinreali end d0 0 m is automatically initialized to smallest integer Reduction Variables pragma omp parallel for reduction opzlist op is one of amp A I ampamp or II The variables in list must be used with this operator in the loop A private copy is created for each thread The copies are automatically initialized to sensible values Reduction Variables OMP PARALLEL DO reduction opzlist op is one of AND OR EQV NEQV New Fortran 20 standard extends this to MAX MIN IAND IOR OR Permits reductions on array elements very common in scientific applications OpenMP Jacobi Code with Convergence for diff gt delta pragma omp parallel for for i0 iltn i forj0jltnj diff 0 pragma omp parallel for reduction max diff for i0 iltn i forj0 jltn jgt diff maxdiff fabsgridij tempij grid lli temp m N0 reduction Operator for max 0r min in C Data Environment Directives Summary For good performance OpenMP code should use private variables wherever possible reduces cache problems However this could waste a lot of memory Use of reductions also extremely important New version of standard fixes major practical problems with reductions for Fortran 90 Summary Data Environment Directives Private Firstprivate Lastprivate 0 Reduction 0 T hreadprivate Copyin Synchronization Primitives 0 Critical pragma omp critical name Implements critical sections by name Similar to Pthreads muteX locks name lock Barrier pragma omp barrier Implements global barrier OpenMP Jacobi with Convergence l of 2 pragma omp parallel private mydiff f0r diff gt delta pragma omp for nowait f0r ifr0m iltt0 i f0r j0 jltn j diff 00 mydiff 00 pragma omp barrier OpenMP Jacobi with Convergence 2 of 2 pragma omp for nowait f0r ifr0m iltt0 i f0rj0 jltn j mydiffmaxmydifffabsgridij tempij gridii tempii pragma critical diff maX diff mydiff pragma barrier Synchronization Primitives No condition variables Result must busy wait for condition synchronization Clumsy Very inefficient on some architectures PIPE Sequential Program for i0 iltnumpic readinpic i intpic1 transl inpic intpic2 tran32 intpic1 intpic3 trans3 intpic2 0utpic trans4 intpic3 PIPE Parallel Program P0 for i0 iltnumpics readinpic i intpicli transl inpic signaleventl2i Pl for i0 iltnumpics i wait eventl2 i intpic2i tran32 intpicli signalevent23i PIPE Main Program pragma omp parallel sections pragma omp section stagel pragma omp section stage2 pragma omp section stage3 pragma omp section stage4 PIPE Stage 1 void stagelO numl 0 for i0 iltnumpics readinpic i intpic1i transl inpic pragma omp critical 1 num1 PIPE Stage 2 void stageZ for i0 iltnumpic i do pragma omp critical 1 cond numl lt2 i While cond intpic2 i tran32intpic1i pragma omp critical 2 num2 OpenMP PIPE 0 Note the need to eXit critical during wait 0 Otherwise no access by other thread 0 Never busywait inside critical Sequential TSP Code Outline initq initbest While pdequeue NULL for each expansion by one city q addcityp if completeq updatebestq else enqueueq OpenMP TSP An exercise 0 Be careful Cannot simply use critical in dequeue enqueue If thread busywaits inside critical no progress 0 Careful use of critical on parts of enqueue dequeue OpenMP Synchronization Barrier Ordered Section Atomic update Locks Difficult Stuff Flush how do you know that this is needed Dynamic vs static threads Role of environment esp other user codes on chUMA Nesting not implemented yet in general OpenMP Summary Directives for shared memory parallel pro gramming Fortran and CC Compiler translates directives But user can have big impact on performance Via careful use of features Much easier to use than MP1 OpenMP Summary Initially targeted engineering applications But more widely applicable than that Many other features proposed e g support for task queues The ARB is working on defining this and other extensions OpenMP may need to be suitable for big variety of programs on multicore multithreading platforms OpenMP Performance 0 Performance of different constructs depends on compiler and target machine EPCC lowlevel benchmarks LLNL benchmarks OpenMP Applicability Can express fine and coarse grain parallelism Can be compiled for SMPs and DSM systems Can be combined with MP1 for cluster of SMPs Software DSM can compile it for SMP clusters OpenMP Limits 0 OpenMP shares program s work among threads It says nothing about how to store data that is not private Because assumption is that memory is shared and cost of access is same for all processors threads 0 This is not true for nonuniform memory access NUMA and distributed memory systems NUMA machines are popular vendor extensions enable OpenMP on chUMA DSM platforms


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.