### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Seminar Applied Statistics 22S 295

UI

GPA 3.72

### View Full Document

## 24

## 0

## Popular in Course

## Popular in Natural Sciences and Mathematics

This 38 page Class Notes was uploaded by Cullen Conn on Friday October 23, 2015. The Class Notes belongs to 22S 295 at University of Iowa taught by Staff in Fall. Since its upload, it has received 24 views. For similar materials see /class/228084/22s-295-university-of-iowa in Natural Sciences and Mathematics at University of Iowa.

## Reviews for Seminar Applied Statistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/23/15

Grid Computing and the TeraGrid GI Science Gateway Kate Cowles 228295 High Performance Computing Seminar Nov 29 2007 9 Example data 9 Grid computing 9 The TeraGrid 0 TeraGrid Science Gateways e GISoIve 9 Example data n 4711 observations from 870 sites in the years 1998 2005 mm 9 Grid computing a several different definitions all involving distributed computing 0 networking together computing clusters at different geographic locations to harness computing and storage resources e The TeraGrid o the world39s largest most comprehensive distributed cyberintrastructure for open scientific research a began in 2001 when NSF awarded 45 million to establish a Distributed Terascale Facility DTF o to NCSA SDSC Argonne National Laboratory and the Center for Advanced Computing Research CACR at California Institute of Technology 0 coordinated through the Grid Infrastructure Group GIG at the University of Chicago o more than 250 teratlops of computing capability May 2007 o teraflop trillion 1012 floating point operations per second 0 more than 30 petabytes of online and archival data storage May 2007 o petabyte quadrillion 1015 bytes 0 rapid access and retrieval over highperformance networks 0 Indiana University 0 Oak Ridge National Laboratory 0 National Center for Supercomputing Applications 0 Pittsburgh Supercomputing Center 0 Purdue University 0 San Diego Supercomputer Center 0 Texas Advanced Computing Center 0 University of ChicagoArgonne National Laboratory 0 the Joint Institute for Computational Sciences 0 the Louisiana Optical Network Initiative 0 the National Center for Atmospheric Research an I I 0 Each resource provider maintains 10 Gbps to one of three TeraGrid hubs Chicago Denver or Los Angeles 0 The hubs are interconnected via 10 Gbps lambdas fiberoptic communications lines The TeraGHd a TeraGrid Science Gateways a enable users with a common scientific goal to use national resources through a common interface 0 account management accounting certificates management and user support is delegated to the gateway developers 0 three common forms a A gateway that is packaged as a web portal with users in front and TeraGrid services in back 0 Gridbridging Gateway Science gateway is a mechanism to extend the reach of the community s existing Grid so it may use the resources of the TeraGrid o A gateway that involves application programs running on users machines Le workstations and desktops and accesses services in TeraGrid and elsewhere m t http www teragrid orgprogramssciigateways e GISoIve 0 Geographic Information Science gateway 0 web portal 0 The purpose of this project is to develop a TeraGrid Science Gateway toolkit for GIScience Our gateway toolkit provides userfriendly capabilities for performing geographic information analysis using computational Grids and help nontechnical users directly benefit from accessing cyberinfrastructure capabilitiesquot current modules 0 random spatial point generator 9 distanceweighted interpolation of surfaces 9 cluster detection algorithm G 9 Bayesian geostatistical spatial model fitting using MCMC my it 0 local job scheduler manages individual TeraGrid resource 0 Condor 0 Portable Batch System PBS 9 Globus Resource Access and Management GRAM o interacts with local job schedulers to allocate computational resources for applications 0 monitors and controls computing processes 0 user interactions with Science Gateways through TeraGrid software supporting Web Services Globus Toolkit Domain Decomposition amp Task Scheduling Applications PBSJCONDOR PBSJCONDOR TeraGrid TeraGrid Computing Computing Resource Resource ml 0 natural and interpretable way to model spatial correlation for data measured at irregularlyspaced point sites 0 correlation is a function of the distance and possibly orientation between sites Simple geostatistical model with spatial correlation and additive measurement error Y N NltXT a a I o X is a matrix of locationspecific covariates o 6 is a vector of coefficients to be estimated 0 X gt is spatial correlation matrix 9 entries are calculated from correlation function 0 is spatial variance a is random variance measurement error variance is identity matrix Bayesian model completed by specification of prior distributions on gt 0 US and 6 o facilitates prior specification and computing algorithm 0 reparameterized covariance matrix 0 X gt Til 05mm 7 3 X gt 3 where 2 2 2 Trot Usae 2 s e 79 2 2 as Te 0 continuous uniform prior on gt o endpoints chosen to reflect belief as to largest and smallest possible distances at which spatial correlation could decay to 0 a joint prior on S and afar obtained by changeotvariable from inverse gamma priors on US and 0 o multivariate normal or flat prior on 6 Spatiotemporal model with separable correlation structure v NXT arm 1 is K Home KT s o where 2p is an AR1 matrix representing temporal correlation 0 K is a matrix of 139s and 039s that matches each observation Y with the correct row and column of X gt 2p 0 K is not needed if data are rectangular 0 prior on p uniform on 11 or 01 slightly bounded away from endpoints 2538 llll l Hal 0 very efficient MCMC computing algorithm that produces low autocorrelation in MCMC output 0 computational bottleneck is linear algebra operations on big matrices especially cholesky decomposition o singlechain and multichain parallelization a linear algebra operations for each chain are parallelized using PIaPACK 0 all CPUs for an individual chain must be on same TeraGrid resource 0 multiple chains may be run simultaneously 0 embarrassingly parallelquot 9 different chains may be run on different TeraGrid resources 0 SPRNG used to make sure random number streams for different chains are independent my 0 different TeraGrid sites have different software libraries and batch scheduling programs installed 0 had to get PLAPACK installed and working on all sites where GISoIve could be used 9 details of model and algorithm currently implemented in GlSolve are in Yan Cowles Wang and Armstrong Statistics and Computing 2007 0 extension of the sequential version of algorithm to handle prediction areal data fusion of areal and point source data complicated spatial and nonspatial covariance structure a implemented in ramps package for R o explained in Cowles Yan and Smith 2007 and Smith Yan and Cowles 2007 available as tech reports on stats dept web page 0 not yet incorporated into GlSolve o in advance of your session a get accounton GISoIve a request to reserve TeraGrid resources 0 go to WWW gisolve org to log in o upload file of spatiotemporal data you want to analyze o upload configuration file 0 select which TeraGrid partner sites you want to use 0 how many CPUs at each 0 how many parallel MCMC chains to run a numberof iterations 0 specify maximum wall clocktime 0 must be long enough for the number of requested iterations to finish 0 must not run past the end of the reserved time on resource a submit job a click Visualize outputquot to View plots of accumulating samples 0 download zip files of plots and numeric output 0 Data files must be plain text tiles 9 First line is two integers a number of rows of data 0 number of regression coefficients in model including intercept 0 data itself in rectangular format with the following columns in order 0 response variable a values of predictor variables including a column of ones if intercept is required 0 x coordinate of spatial location longitude o y coordinate of spatial location latitude o integer representing measurement time 2000 4 0 0575 1376 0665 9081 9882 8069 6049 HHH HHHH COO 0000 8056 6244 7375 2165 7621 4893 4142 4516 0421 4964 9086 20 20 20 20 0000 2165 7621 4893 4142 9732 1 08056 09732 1 4855 1 06244 04855 1 3452 1 07375 03452 1 4516 0421 4964 9086 20 20 20 20 o specifies model to be fit 0 choice among three spatial correlation functions 9 specification of parameters of prior distributions on a3 05 415 S p 0 provides initial values for each MCMC chain 0 content of individual lines 0 Correlation type 1 spherical 2 exponential 3 Gaussian 9 Distance metric 1 great circle distance 2 euclidean distance 0 The unit of distance for example distunit 10 means that the distances are in 10s a a9 59 12 B2 parameters of IG priors for sigma29 and sigmaZZ 6 Left and right endpoints of uniform prior distribution for phi 0 Left and right endpoints of support of distribution for S 0 Left and right endpoints of uniform prior distribution for rho 0 blockisize blockisizeialg PIaPACK configuration leave as in example 0 Chain index from 0 to number of chains 1 and initial values for phi S and rho o as many rows of this kind as there are chains by if 121 05 05 05 05 200 400 0 05 075 075 2 15 025 025 Specifying number of CPUs and number of chains at each site 0 number of CPUs is total number to be divided among all the chains at the site make number of CPUs per chain a perfect square to use PLAPACK efficiently 0 how big a perfect square determined by size of dataset see graph of speedups in nextin e running parallel chains 0 helps in assessing convergence 0 generates more samples per unit time if CPUs are available 0 samples from different chains are independent efficient MCMC algorithm results in short burnin so a relatively small number of iterations are wasted39 ncpu speedup o choosing numbers of CPUs and chains 0 for dataset of 10000 observations if you have 32 CPUs available at each of 3 sites perhaps run 1 chain using 25 CPUs at each site a for dataset of 2000 observations perhaps 3 chains at each site each chain using 9 CPUs o ability to extend chains from where they left off is being added

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.