Date Created: 10/20/15
CS 696 Introduction to Grid Computing Lecture 17 Data Mangement on the TeraGrid Mary Thomas San Diego State Thuesday 17Apr07 Comment I These slides are mostly adapted and updated from an 3004 tutorial April 770 7 GSI Authenticated Data Transfer TeraGrid File Management I Data Transfer Performance Tools 1 GridFTP uberFTP new a tgcp new a gsissh new Handson Exercises 1 Use of GridFTP clients amp servers to transfer files April 770 7 TeraGrid File Placement No common crosssite filesystems currently User controls where their data resides Appropriate sitess Appropriate storage Online Filesystems Speed visibility quotas backup policy Each filesystem directly accessible from single site Mass Storage Systems Longterm storage slower access Accessible from all sites April 770 7 TeraGrid File Movement File movement responsibility of user Between Online Filesystems Intrasite Crosssite Between Mass Storage and Online Filesystems ntrasite Crosssite Session focuses on these types of transfers April 770 7 TeraGrid Transfer Environment Many sites have nodes dedicated to transferring files TeraGrid backbone bandwidth 40 Gbsec means Wide Area Network is rarely a bottleneck GSI authentication and proxy certificates provide security for transfers Transfer requests can be integrated into job execution scripts Moving input data to sites ofjob execution Moving results to another filesystem site or archive Apr717707 6 Data Transfer Performance What impacts transfer rates 1 Disk speed Connectivity of disk to node Node characteristics amp load Connectivity of node to WAN For all networks I Don t expect 40 Gbsec 53E Boat 1 Gbs Bandwidth Latency Buffer Size Protocol Load Encryption WAN TG Backbone 40 Gbsl I 30 Gbs Apr 17 07 Performance Choices Matter Transfer large files for best performance Use fast filesystems dedicated transfer nodes optimized transfer parameters Transfer 1 GByte file from NCSA to SDSC 1062004 Choices Transfer Time Transfer Rate Home filesystems Login nodes Default parameters 20 min 18 sec 845 MBytessec 0066 Gbitssec Parallel filesystems Transfer nodes Optimized parameters 11 sec 93091 MBytessec 727 Gbitssec April 770 7 Transfer times See a httpgridinfopscedugridftpspeedpagephp April 770 7 GridFTP Tools GridFTP Terminology Protocol GridFTP is a highperformance secure reliable data transfer protocol optimized for high bandwidth widearea networks GridFTP is based on FTP the highly popular Internet file transfer protocol Quoted from Globus Alliance website April 770 7 Terminology Server A GridFTP server process understands requests that adhere to the GridFTP protocol and performs authentication and data transfer operations based on those requests a A system that is configured to automatically start GridFTP server processes is sometimes referred to as a GridFTP server a Not all systems nodes on TeraGrid machines are GridFTP servers u Some mass storage frontends are GridFTP servers April 770 7 Terminology Client GridFTP client programs issue requests that adhere to the GridFTP protocol Users run GridFTP client programs to transfer files globus urlcopy and uben tp are two GridFTP client programs that are part of the Common TeraGrid Software Stack CTSS There is no client program named gridFTP which can be confusing because users are told use gridFTP to transfer your files April 770 7 13 Terminology 3rd Party Transfer A GridFTP transfer between two GridFTP servers rather than between a server and a client is called a thirdparty transfer a A thirdparty transfer occurs when the GridFTP client initiating the transfer is run on a system that is neither the source nor the destination of the transfer operation a Allows use of dedicated transfer nodes April 770 7 14 TG GridFTP Server Deployment tglogimltsitegtteragridorg is a GridFTP server Shared resource Many tasks See httpwwwteraqridorduserinfodatatransfer locationphp deployment for list of servers tggridftpltsitegtteragridorg resolves to one or more machines that are GridFTP servers Dedicated file transfer resources at many sites Fewer tasks possibly better connectivity GridFTP Server Apr717707 15 TG GridFTP Client Deployment globusurlcopy ltsourceurlgt ltdestinationurlgt a command line interface a tcpbs ltsizegt tcpbuffersize ltsizegt specify the size in bytes of the buffer to be used by the underlying ftp data channels a p ltparaeismgt parae ltparaeismgt specify the number of streams to be used in the ftp transfer uberftp a interactive GridFTP transfer client a configurable tcp buffersize and number of parallel streams April 770 7 16 Using globus url copy tgIogin3 thomasmdatatestsgt globusurIcopy file pwd pythontargz gsiftpgridftp concsateragridorg2811pythontranstargz tgIogin3 thomasmdatatestsgt Is total 10860 drwxrx 2 thomasm mpk 26 20070417 1945 drwxxx 10 thomasm ac 4096 20070417 1945 rwrr 1 thomasm mpk 11108613 20070417 1914 pythontargz globusurIcopy file pwd pythontargz gsiftptg gridftpsdscteragridorg2811pythontranstargz tgIogin3 thomasmdatatestsgt globusurlcopy file pwd pythontargz gsiftptg gridftpsdscteragridorg2811pythontranstargz tgIogin3 thomasmdatatestsgt gsissh tgIoginsdscteragridorg quotbinIsquot pythontranstargz April 770 7 1 7 Hands on Exercise 2 Copy a 1 MByte le from the current directory at NCSA to your home directory at SDSC Use a thirdparty transfer and the GridFTP server nodes at both NCSA and SDSC Use optimized transfer parameters Look at the transfer script tg login1gt cat ex2 usrbintime f E globus url copy tcpbs 8388608 gsiftptggridftpncsateragridorg pwd OneMBfile gsiftptg gridftpsdscteragridorgOneMBfile Ex2 Run the transfer script tg login1gt ex2 00272 April 770 7 1 8 Hands on Exercise 3 Copy a 1 MByte le from your home directory at SDSC to your home directory at ANLUC Use a thirdparty transfer Use optimized transfer parameters Look at the transfer script tg login1gt cat ex3 usrbintime f E globus url copy tcpbs 8388608 gsiftptg gridftpsdscteragridorgOneMBfile Ex2 gsiftptggridftpucteragridorgOneMBfile Ex3 Run 