INTRODUCTION TO DATABASE SYSTEMS
INTRODUCTION TO DATABASE SYSTEMS CS 3423
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Ms. Bridie Kohler on Sunday November 1, 2015. The Class Notes belongs to CS 3423 at Oklahoma State University taught by Douglas Heisterkamp in Fall. Since its upload, it has received 20 views. For similar materials see /class/232836/cs-3423-oklahoma-state-university in ComputerScienence at Oklahoma State University.
Reviews for INTRODUCTION TO DATABASE SYSTEMS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/01/15
Notes on Programming Assignment 4 0 To get an 8byte unsigned integer variable using CC 0n acs use unsigned long long 0 Need a fd option on your radixhash program 0 Note that 500 is a bad table size It should generate more collision than using a prime number for the table size Try a prime number and compare the results On acs factor n will generate the prime factors for a number n Evaluation of Progressive Over ow 0 Cost to retrieve a record is based on the search length to the record Every time a collision occurs the search length for a record increase 0 average search length is an indicator of over ow prob lems total search length A h 1 h verage seam engt total number of records 0 Average search lengths greater than 20 are generally con sider unacceptable Load Factor 9 o The number of collisions and hence the average search length is tried to the load factor of the hashed le 0 The load factor 7 or just load also called packing den sity is 7 Number of records 7 Number of record locations 0 The average search length grows rapidly after 7 passes 60 when not using buckets See Figure 117 from book Buckets Reading a block of records from a le system cost about the same as reading a single record Reading a block at a time is used in Btrees When used in hashing the block of records is called a bucket The bucket can hold a set of records The hash address goes to a bucket not an individual record Multiple records can be placed at the same hash address Collisions are not a problem until the bucket over ows Using buckets cause a large improvement in the average search length See Table 115 from book Predicting the Distribution of Records 0 Assume a random distribution of records what is the likely behavior of hashing 0 Would like to answer the following questions What is the likelihood that No keys will hash to a given address Exactly one key will hash to address Exactly k keys will hash to address 0 This answers to the above questions can be approximated by aPoz39sson function where N number of available addresses r number of records x number of records assigned to a given address EX ample 1000 address and 1000 records 7 1 p0 105f1 0368 p1 0368 p2 0184 p3 13394 0061 Record Keys 0 primary key 7 unique dataless unchanging o secondary key 7 other elds to search generally not unique Sequential Search 0 baseline for measurement 0 count lowlevel Read calls 7 assume each Read is as costly as any other 0 On 7 average Reads Time doubles when le size doubles Record Blocking 0 Reading in a block of records during each Read 0 If 100 Records are read for each Read call then average is Reads Still On 7 time doubles when le size doubles When Sequential Search is Appropriate o searching text les for some pattern 0 les with few records 7 for small n anything works well 0 Files that hardly ever needs to be search 7 small payback for optimizing the rare case 0 searching on secondary keys and expecting a large num ber matches Unix Tools in Text Files 7 View each line as a record and elds are separated by white spaces wc word count 7 counts lines records words elds and characters grep 7 searches sequentially through a le for a pattern then returns all lines that have that pattern others diff tr sed sort Direct Access 0 Seeking directly to the beginning of a record and reading it 0 Need to know where the record is o RNN 7 Relative Record Number can be used directly for xed sized records logical View 7 like indexing an array need an index for variable length records Header Records 0 Information about the le 7 within the le 0 Helps make the le selfdescribing Mixing object types in one le 0 keywords or tags with selfdescribing headers 0 index table 7 keywords offsets and lengths RepresentationIndependent le access application delegates the responsibility of translating to and from the physical format of the object Example an image is stored in memory as an array of bytes but is stored in le with jpeg format Extensibility 7 easy to add new le formats for an object Such as storing the image in le with png format Portability Problems di erences among operating systems lan guages machine architectures Agree on standard 7 IEEE ASCII etc Selfdescribing 7 speci es structure Conversions Unix dd command converts from one block size to another converts ASCII tofrom EBCDIC swap every pair of bytes converts xed length records tofrom variablelength records Fold and Add 0 Folding 7 the characters of the key are split into short sequences 0 Adding 7 the character sequences from the folding pro cess are converted to numerical values and added to gether Example using a fold of 2 the key JOHN is split into JO and HN On a Intel PC converting to unsigned integer yields values 20298 and 20040 Adding together yields 40338 If the le size was 511 then dividing by 511 yields record address of 480 Printing out binary value static void pBinLongString 3 long l Systemoutprintln s quot long quot l quot binary quot Systemoutprintquot quot forint i 63 i gtO iii iflL ltlt 1 amp l I m Systemoutprintquotlquot else Systemoutprintquot0quot Systemoutprintln Examples 1L long 1 binary 71L long 1 binary llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 32L long 32 binary 100000 ilLgtgtgtl long 9223372036854775807 binary 0lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Converting String into numerical value String a quot aquot pBinIntquothash aquot ahashCode byte d agetBytes long WOL for int j0 jltdlengthj W lOHgdj ltlt3j pBinLongquotwquotw WO for int j0 jltdlengthj W dj Wltlt8 pBinLongquotwquotw hash a int 1281 binary 00000000000000000000010100000001 w long 24864 binary 110000100100000 w long 8289 binary 10000001100001 static long Hash String s long maXAddress long sum 0 long a 0 long mask ilLgtgtgtl byte b sgetBytes forint j0 jlt blength2j a b2jl l b2jll ltlt 8 sum a if blength2 I m sum bblength fl sum ampmask return summaXAddress Hash JOHN 51 1 long 480 binary 0000000000000000000000000000000000000000000000000000000111100000 Fold and radix Still using folding to split key into manageable chunks Convert the numeric representation of the fold into the new base After the conversion use the result as if it was a normal binary numeric value That is if after conversion you have 54 in base 11 then place it in a byte as 0X54 7 making a conversion from base 11 to base 16 Each base 11 value will take 4 bits Converting bytes to radix 11 static long radixllfromZbytes byte a byte b long temp a bltlt8 long result 0 int counter 0 While temp I m result templl ltlt 4counter temp ll counter return result radixl 1fromZbytesbyte J byte O long 82563 binary 0000000000000000000000000000000000000000000000010100001010000011 Collisions o A collision between two keys occurs when the hash func tion generates the same address for both keys 0 Can not place both records at the same location so need an approach to resolve the collision 7 collision resolu tions 0 Avoiding collisions Spread out the records 7 the hash function dis tributes the records evenly among the addresses Use extra disk space 7 a larger le size hence table size decreases the load Place more than one record at an address 7 bucket Collision Resolution Progressive over ow 7 linear search for an open location Double hashing 7 2nd hash generates a step value to use in searching for an open location Chain progressive over ow 7 maintain a thread chain from an over owed record back to its home address Chaining with separate over ow area 7 over ow is maintained outside of the hash table Progressing over ow Searching for a key with addr hkey and table is not full then while addr is not empty and key not found do addr addr 1 mod tablesize Note with the table not full we will reach an open location before completing a cycle Progressing over ow Insertion with addr hkey and table is not full then while addr is not empty and key not found do addr addr 1 mod tablesize Place record at addr Note if key was found overwriting previous value Otherwise placing in a new location Difference is important when updat ing number of records in table value Progressing over ow Deletion with addr hkey and table is not full then while addr is not empty and key not found do addr addr 1 mod tablesize if addr1 mod tablesize is empty then mark empty else mark with a tombstone Logging on to the Computer Science Computer wwwcsokstateedufacilitieslogon98html o Obtaining user id and password 0 Logging on to the SUN Enterprise 3000 from lab and home 0 First time login 0 Changing password New User Information wwwcsokstateedufacilitieslabmenuhtml 0 Basic uniX commands 0 email system 0 basic use of emacs and Vi Chapter 2 Fundamental File Processing Operations What is a le Operations on les 7 open close read write IO Redirection Unix le system commands Logical Files 0 a source and or sink of bytes 0 An application generally works at this level by using a le descriptor or handle 0 The operating system makes the connection between the logical le and some physical le or device Physical Files and Devices 0 Physical Files 7 a particular collection of bytes on sec ondary storage disk tape etc o Devices 7 parts of the system that can behave like a le that is a source or sink of bytes with the le operations open close readwrite Examples keyboard modem and network socket o In uniX devices are represented by le names in the direc tory dev For example devmouse and devcdrom Opening and Closing Files 0 Opening Files 7 Make the connection between the log ical le and the physical le or deVice Can create new les or use existing ones try man open to see the man page for the operating system open command and its options try man fopen for opening a le as a C stream 0 Closing Files 7 like hanging up a telephone 7 it allows the logical le descriptor to be reused It ushes the bu ers ensuring that all of the data has been written to disk Reading and Writing 0 ReadSourcefile Destaddr Size 0 WriteDestfile Sourceaddr Size C iostreams o header les iostreamh and fstreamh located on the Sun in directory usrlocalincludeg o constructors fstream and fstreamchar filename int mode 0 int openchar filename int mode int close int readunsigned char dest int size int writeunsigned char source int size 0 Overloaded in X operators gtgt extraction and ltlt inser tion Homework 1 Due August 27 0 Exercise 2 page 40 0 Exercise 3 page 41 0 Programming Exercise 10 page 41 7 implement tail n Comments if no le name is given use stdin ifno n is given use a default of 10 lines
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'