New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Data And File Structures

by: Lisette Hodkiewicz

Data And File Structures CS 3310

Lisette Hodkiewicz
GPA 3.83

Ajay Gupta

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Ajay Gupta
Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 13 page Class Notes was uploaded by Lisette Hodkiewicz on Wednesday September 30, 2015. The Class Notes belongs to CS 3310 at Western Michigan University taught by Ajay Gupta in Fall. Since its upload, it has received 51 views. For similar materials see /class/216875/cs-3310-western-michigan-university in ComputerScienence at Western Michigan University.

Similar to CS 3310 at WMU

Popular in ComputerScienence


Reviews for Data And File Structures


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/30/15
A Hash Function for Hash Table Lookup lofl3 http burtleburtle netbobhashdoobs html Abstract I offer you a new hash function for hash table lookup that is faster and more thorough than the one you are using now I also give you a way to verify that it is more thorough All the text in this color wasn39t in the 1997 Dr Dobbs article The code given here are all public domain The Hash Over the past two years I ve built a general hash function for hash table lookup Most of the two dozen old hashes I ve replaced have had owners who wouldn39t accept a new hash unless it was a plugin replacement for their old hash and was demonstrably better than the old hash These old hashes de ned my requirements The keys are unaligned variablelength byte arrays Sometimes keys are several such arrays Sometimes a set of independent hash functions were required Average key lengths ranged from 8 bytes to 200 bytes Keys might be character strings numbers bitarrays or weirder things Table sizes could be anything including powers of 2 The hash must be faster than the old one 0 The hash must do a goodjob Without further ado here39s the fastest hash I39ve been able to design that meets all the requirements The comments describe how to use it Update I m leaving the old hash in the text below but I ve got a new hash at httpburtleburtlenetbobc lookup3c that is roughly twice as fast and more thorough than the one below It s designed along the same lines as the hash below 12 byte blocks switch statements etc The biggest theoretical distinction is it has different mixing functions for the last block than for all but the last block at 21 and 24 instructions instead of the 36 instruction mix below that serves for both It also has separate code paths for aligned and unaligned strings to take advantage of 2byte and 4byte reads when it can typedef unsigned long int ub4 unsigned 4 byte quantities typedef unsigned char ubl unsigned l byte quantities define hashsizen ub4lltltn define hashmaskn hashsizen D k mix mix 3 32 bit values reversibly For every delta with one or two bits set and the deltas of all three high bits or all three low bits whether the original value of abc is almost all zero or is uniformly distributed If mix is run forward or backward at least 32 bits in abc have at least l4 probability of changing x If mix is run forward every bit of c will change between l3 and 23 of the time Well 22100 and 78100 for some 2 bit deltasJ mix was built out of 36 single cycle latency instructions in a structure that could supported 2x parallelism like so a b a c x cgtgtl3 222009 438 PM A Hash Function for Hash Table Lookup 20f13 hItp burt1 eburtle netbobhashdoobs html b c a A x b a x altlt8 c a b A x c b x bgtgtl3 Unfortunately superscalar Pentiums and Sparcs can39t take advantage of that parallelism They39ve also turned some of those single cycle latency instructions into multi cycle latency instructions Still this is the fastest good hash I could find There were about 2AA68 to choose from I only looked at a billion or so define mixabc a a c a A cgtgtl3 b b a b A altlt8 c c b c A bgtgtl3 a a c a A cgtgt12 b b a b A altltl6 c c b c A bgtgt5 a a c a A cgtgt3 b b a b A altltlO c c b c A bgtgt15 k hash hash a variable length key into a 32 bit value k the key the unaligned variable length array of bytes len the length of the key counting by bytes initval can be any 4 byte value Returns a 32 bit value Every bit of the key affects every bit of the return value Every l bit and 2 bit delta achieves avalanche About 6len35 instructions The best hash table sizes are powers of 2 mod a prime mod is sooo slowl use a bitmask For example h h amp hashmasklO In which case There is no need to do If you need less than 32 bits if you need only 10 bits do the hash table should have hashsizelO elements If you are hashing n strings ubl k do it like this for iO hO iltn i h hash ki leni h By Bob Jenkins 1996 bobijenkinsburtleburtlenet You may use this code any way you wish private educational or commercial It39s free See httpburtleburtlenetbobhashevahashhtml Use for hash table lookup or anything where one collision in 2AA32 is acceptable Do NOT use for cryptographic purposes ub4 hash k length register ubl k register ub4 length register ub4 initval initval the key the length of the key the previous hash or an arbitrary value register ub4 abclen Set up the internal state len length a b Ox9e3779b9 c initval the golden ratio an arbitrary value the previous hash value 222009 438 PM A Hash Function for Hash Table Lookup 3ofl3 httpburtleburtlenetbobhashdoobs html handle most of the key while len gt 12 i a kO ub4klltlt8 ub4k2ltltl6 ub4k3ltlt24 b k4 ub4k5ltlt8 ub4k6ltltl6 ub4k7ltlt24 c k8 ub4k9ltlt8 ub4k10ltltl6ub4kllltlt24 mixabc k 12 len 12 l handle the last 11 bytes C length switchlen all the case statements fall through case 11 0ub4klOltlt24 case 10 0ub4k9ltlt16 case 9 0ub4k8ltlt8 the first byte of C is reserved for the length case 8 bub4k7ltlt24 case 7 bub4k6ltltl6 case 6 bub4k5ltlt8 case 5 bk4 case 4 aub4k3ltlt24 case 3 aub4k2ltlt16 case 2 aub4klltlt8 case 1 akOL case 0 nothing left to add mixabc report the result return C Most hashes can be modeled like this initializeinternal state for each text block combineinternal state mixinternal state text block return postprocessinternal state In the new hash miX takes 3n of the 6n35 instructions needed to hash n bytes Blocks of text are combined with the internal state abc by addition This combining step is the rest of the hash function consuming the remaining 3n instructions The only postprocessing is to choose c out of abc to be the result Three tricks promote speed 1 Mixing is done on three 4byte registers rather than on a lbyte quantity 2 Combining is done on 12byte blocks reducing the loop overhead 3 The nal switch statement combines a variablelength block with the registers abc without a loop The golden ratio really is an arbitrary value Its purpose is to avoid mapping all zeros to all zeros The Hash Must Do a Good Job The most interesting requirement was that the hash must be better than its competition What does it mean for a hash to be good for hash table lookup 222009 438 PM A Hash Function for Hash Table Lookup 4ofl3 httpbu1tlebu1tlenetbobhashdoobs html A good hash function distributes hash values uniformly If you don t know the keys before choosing the function the best you can do is map an equal number of possible keys to each hash value If keys were distributed uniformly an excellent hash would be to choose the first few bytes of the key and use that as the hash value Unfortunately real keys aren t unifome distributed Choosing the first few bytes works quite poorly in practice The real requirement then is that a good hash function should distribute hash values unifome for the keys that users actually use How do we test that Let39s look at some typical user data Since I work at Oracle I39ll use Oracle39s standard the EMP table The EMP table Is this data distributed JOB 7DEC80 APR81 81 lMAY81 JUN81 APR87 7NOV81 81 87 81 81 JAN82 Consider each horizontal row to be a key Some patterns appear 1 Keys often differ in only a few bits For example all the keys are ASCII so the high bit of every byte is zero 2 Keys often consist of substrings arranged in different orders For example the MGR of some keys is the EMPNO of others 3 Length matters The only difference between zero and no value at all may be the length of the value Also quotaa aaaquot and quotaaa aaquot should hash to different values 4 Some keys are mostly zero with only a few bits set That pattern doesn t appear in this example but it s a common pattern Some patterns are easy to handle If the length is included in the data being hashed then lengths are not a problem If the hash does not treat text blocks commutatively then substrings are not a problem Strings that are mostly zeros can be tested by listing all strings with only one bit set and checking if that set of strings produces too many collisions The remaining pattern is that keys often differ in only a few bits If a hash allows small sets of input bits to cancel each other out and the user keys differ in only those bits then all keys will map to the same handful of hash values 222009 438 PM A Hash Function for Hash Table hookup 50f13 httpburtleburtlenetbobl1asl1doobs him A common weakness Usually when a small set of inth bits cancel each other out it is because those input bits affect only a smaller set of bits in the intemal state Consider this hash function for hashO 10 1lthash 1 hash hashltlt5 hashgtgt27 keyl return hash prime This function maps the strings quotEXXXXXBquot and quotAXXXXXCquot to the same value These keys differ in bit 3 of the rst byte and bit 1 of the seventh byte A er the seventh bit is combined any further postprocessing will do no good because the intemal states are already the same Any time n input bits can only affect m output bits and n gt m then the 2H keys that differ in those inth bits can only produce 2m distinct hash values The same is true if n inth bits can only affect m bits of the intemal state later mixing may make the 2m results look uniformly distributed but there will still be only 2m results The function above has many sets of 2 bits that affect only 1 bit of the internal state If there are n inth bits there are n choose 2nn 2 n2 pairs of inth bits only a few of which match weaknesses in the function above It is a common pattern for keys to differ in only a few bits If those bits match one of a hash s weaknesses which is a rare but not negligible event the hash will do extremely bad In most cases though it will do just ne This allows a function to slip through sanity checks like hashing an English dictionary uniformly while still frequently bombing on user data inlemal stale quotwtquot blacks In hashes built of repeated combinemix steps this is what usually causes whining quotmmquot this weakness 5 l A small number of bits y of one inth block are combined affecting only y bits of the internal state So far so good 2 The mixing step causes those y bits of the intemal state to affect only z bits of the internal state 3 The next combining step overwrites those bits with z more inth bits cancelling out the rst y inth bits iqu blank I J mixing lunctiun El Elliquot A funnel M 3 inpul hils inln z nulpm hils 2 inputhlack When z is smaller than the number of bits in the output then yz inth bits have affected only z bits of the intemal state causing 2erZ possible keys to produce at most 2Z hash values The same thing can happen in reverse 1 Uncombine this block causing y block bits to unaffect y bits of the internal state 2 U 39 the internal state leaving x bits unaffected by the y bits from this block 3 Unmixing the previous block unaffects those x bits cancelling out this block39s y bits If x is less than the number of bits in the output then the 2ery keys differing in only those xy input bits can produce at most 2X hash values If the mixing function is not a permutation of the internal state it is not reversible Instead it loses information about the earlier blocks every time it is applied so keys diffenng only in the rst few inth blocks are more likely to collide The mixing function ought to be a permutation 222009 438 PM A Hash Function for Hash Table Lookup 60fl3 http burtleburtle netbobhashdoobs html It is easy to test whether this weakness exists if the mixing step causes any bit of the internal state to affect fewer bits of the internal state than there are output bits the weakness exists This test should be run on the reverse of the mixing function as well It can also be run with all sets of 2 internal state bits or all sets of 3 Another way this weakness can happen is if any bit in the final input block does not affect every bit of the output The user might choose to use only the unaffected output bit then that s 1 input bit that affects 0 output bits A Survey of Hash Functions We now have a new hash function and some theory for evaluating hash functions Let39s see how various hash functions stack up Additive Hash ub4 additivechar key ub4 len ub4 prime ub4 hash i for hashlen iO hash keyi return hash prime iltlen i This takes 5n3 instructions There is no mixing step The combining step handles one byte at a time Input bytes commute The table length must be prime and can t be much bigger than one byte because the value of variable hash is never much bigger than one byte Rotating Hash ub4 rotatingchar key ub4 len ub4 prime ub4 hash i for hashlen iO iltlen i hash hashltlt4Ahashgtgt28Akeyi return hash prime This takes 8n3 instructions This is the same as the additive hash except it has a mixing step a circular shift by 4 and the combining step is exclusiveor instead of addition The table size is a prime but the prime can be any size On machines with a rotate such as the Intel x86 line this is 6n2 instructions I have seen the hash prime replaced with hash hash A hashgtgtlO A hashgtgt20 amp mask eliminating the and allowing the table size to be a power of 2 making this 6n6 instructions can be very slow for example it is 230 times slower than addition on a Sparc 20 One at a Time Hash ub4 oneiatiaitimechar key ub4 len ub4 for hash keyi hash hash ltlt 10 hash i hashO iO iltlen i 222009 438 PM A Hash Function for Hash Table Lookup 7ofl3 http burtleburtle netbobhashdoobs html hash A hash gtgt 6 hash hash ltlt 3 hash A hash gtgt 11 hash hash ltlt 15 return hash amp maskL I This is similar to the rotating hash but it actually mixes the internal state It takes 9n9 instructions and produces a full 4byte result Preliminary analysis suggests there are no funnels This hash was not in the original Dr Dobb39s article I implemented it to ll a set of requirements posed by Colin Plumb Colin ended up using an even simpler and weaker hash that was sufficient for his purpose Bernstein39s hash ub4 bernsteinubl key ub4 len ub4 level I ub4 hash u 4 i for iO iltlen return hash I level i hash 33hash keyi If your keys are lowercase English words this will t 6 characters into a 32bit hash with no collisions you d have to compare all 32 bits If your keys are mixed case English words 65hashkeyi ts 5 characters into a 32bit hash with no collisions That means this type of hash can produce for the right type of keys fewer collisions than a hash that gives a more truly random distribution If your platform doesn39t have fast multiplies no sweat 33hash hashhashltlt5 and most compilers will gure that out for you On the down side if you don t have short text keys this hash has a easily detectable aws For example there39s a 3into2 funnel that 0x002l and 0x0100 both have the same hash hex 0x21 decimal 33 you saw that one coming yes FNV Hash I need to ll this in Search the web for FNV hash It s faster than my hash on Intel because Intel has fast multiplication but slower on most other platforms Preliminary tests suggested it has decent distributions I have three complaints against it First it s speci c about how to reduce the size if you don t use all 32 bits it s not just a mask Increasing the result size by one bit gives you a completely different hash If you use a hash table that grows by increasing the result size by one bit one old bucket maps across the entire new table not to just two new buckets Ifyour algorithm has a sliding pointer for which buckets have been split that just won t work with FNV Second it s linear That means that widely separated things can cancel each other out at least as easily as nearby things Third since multiplication only affects higher bits the lowest 7 bits of the state are never affected by the 8th bit of all the input bytes On the plus side very nearby things never cancel each other out at all This makes FNV a good choice for hashing very short keys like single English words FNV is more robust than Bernstein s hash It can handle any byte values not just ASCII characters ub4 fnv I need to fill this in and Check if FNV is public domain I 222009 438 PM A Hash Function for Hash Table Lookup 80fl3 httpburtleburtlenetbobhashdoobs html MurmurHash I need to ll this in too This is faster than any of my hash and is more nonlinear than a rotating hash or FNV I can see it s weaker than my lookup3 but I don t by how much I haven t tested it httpmurmurhashgoog epagescom Cessu The Cessu hash search for msse2 in his blog uses SSE2 allowing it to be faster and more thorough at first glance than what I can do 32 bits at a time I haven39t done more than a first glance at it Pearson39s Hash char pearsonchar key ub4 len char tab256D char hash ub4 i for hashlen iO iltlen hashtabhashAkeyi return hash i This preinitializes tab 1 to an arbitrary permutation of 0 255 It takes 6n2 instructions but produces only a lbyte result Larger results can be made by running it several times with different initial hash values CRC Hashing ub4 crcchar key ub4 len ub4 mask ub4 tab256 I ub4 hash i for hashlen iO iltlen i hash hash gtgt 8 A tabhash amp Oxff A keyi return hash amp maskL This takes 9n3 instructions tab is initialized to simulate a maximallength Linear Feedback Shift Register LFSR which shifts out the loworder bit and XORs with a polynomial if that bit was 1 I used a 32bit state with a polynomial of Oxede 8320 for the tests Keys that differ in only four consecutive bytes will not collide A sample implementation complete with table is here You could also implement it like ub4 crcchar key ub4 len ub4 mask ub4 tab256 i ub4 hash i for hashlen iO iltlen i hash hash ltlt 8 A tabhash gtgt 24 A keyi return hash amp maskL but since shifts are sometimes slow the other way might be faster If you did it that way you39d have to reverse the bits of the generating polynomial because bits shift out the top instead of the bottom Generalized CRC Hashing 222009 438 PM A Hash Function for Hash Table Lookup 9ofl3 httpburtleburtlenetbobhashdoobs html This is exactly the same code as CRC hashing except it lls tab with each of the 4 bytes forming a random permutation of 0255 Unlike a true CRC hash its mixing is nonlinear Keys that differ in only one byte will not collide The top byte has to be a permutation of 0255 so no information is lost when the low byte is shifted out The other bytes are permutations of 0255 only to make hold the guarantee that keys differing in one byte will not collide A sample implementation complete with table is here Universal Hashing ub4 universalchar key ub4 Ten ub4 mask ub4 tabMAXBITS ub4 hash i for hashlen iO iltlenltlt3 i8 register char k keyigtgt3 if kampOxOl hash A tabi0 if kamp0x02 hash A tabil if kampOxO4 hash A tabi2 if kamp0x08 hash A tabi3 if kampOxlO hash A tabi4 if kamp0x20 hash A tabi5 if kampOx40 hash A tabi6 if kamp0x80 hash A tabi7 return hash amp mask This takes 52n3 instructions The size of tab is the maximum number of input bits Values in tab are chosen at random Universal hashing can be implemented faster by a Zobrist hash with carefully chosen table values Zobrist Hashing ub4 zobrist char key ub4 Ten ub4 mask ub4 tabMAXBYTES 256 ub4 hash i for hashlen iO iltlen i hash A tabi keyi return hash amp mask This takes 10n3 instructions The size of tab 256 is the maximum number of input bytes Values of tab 256 are chosen at random This can implement universal hashing but is more general than universal hashing Zobrist hashes are especially favored for chess checkers othello and other situations where you have the hash for one state and you want to compute the hash for a closely related state You xor to the old hash the table values that you re removing from the state then xor the table values that you re adding For chess for example that s 2 xors to get the hash for the next position given the hash of the current position Paul Hsieh39s hash This is kind of a cross between that big hash at the start of this article and my oneatatime hash Paul39s timed it and it was than that big hash It has a 4byte internal state that it does light nonlinear mixing after every combine That s good It combines 2byte blocks with its 4byte state which is something I39d never tried 222009 438 PM A Hash Function for Hash Table Lookup lOofl3 http burtleburtle netbobhashdoobs html FNV and CRC and oneatatime combine lbyte blocks with the 4byte state Their input blocks are all smaller than their state and they mix their state after each input block which makes it impossible for consecutive input blocks to cancel On the down side it has funnels of3 bits into 2 for example hex 01 00 00 00 00 00 00 00 and 00 00 20 00 01 00 00 00 both hash to 0xc754ae23 include quotpstdinthquot Replace with undef gethbits if defined GNUC defined iiMSCiV define getl6bitsd endif if appropriate ampamp defined47i38647 ll defined47WATCOMC47 ll defined AiBORLANchi ll defined AiTURBocgi Const uinthit d if ldefined gethbits define getl6bitsd Const uint87t dl ltlt UINT327C8 const uint87t dOD endif uint324t SuperFastHash const char data int len uint327t hash len tmp int rem if len lt 0 ll data NULL return 0 rem len amp 3 len gtgt 2 Main loop for len gt O len hash gethbits data tmp getl6bits data2 ltlt 11 A hash hash hash ltlt 16 A tmp data 2sizeof uinthit hash hash gtgt 11 l Handle end cases switch rem case 3 hash getl6bits data hash A hash ltlt 16 hash A datasizeof hash hash gtgt 11 break case 2 hash getl6bits data hash A hash ltlt 11 hash hash gtgt 17 break case 1 hash data hash A hash ltlt 10 hash hash gtgt 1 uinthit ltlt 18 l Force quotavalanchingquot of final 127 bits hash A hash ltlt 3 hash hash gtgt 5 hash A hash ltlt 4 hash hash gtgt 17 hash A hash ltlt 25 hash hash gtgt 6 return hash 222009 438 PM A Hash Function for Hash Table Lookup ll ofl3 httpburtleburtlenetbobhashdoobs html My Hash This takes 6n35 instructions It s the big one at the beginning of the article It s implemented along with a selftest at httpburtleburtlenetbobclookup2c 100kup3 c A hash I wrote nine years later designed along the same lines as quotMy Hashquot see httpburtleburtlenetbobc lookup3c It takes 211 instructions per byte for mixing instead of 3n When tting bytes into registers the other 311 instructions it takes advantage of alignment when it can a trick learned from Paul Hsieh s hash It doesn39t bother to reserve a byte for the length That allows zerolength strings to require no mixing More generally the length that requires additional mixes is now 132537 instead of 122436 One theoretical insight was that the last mix doesn39t need to do well in reverse though it has to affect all output bits And the middle mixing steps don t have to affect all output bits affecting some 32 bits is enough though it does have to do well in reverse So it uses different mixes for those two cases quotMy Hashquot lookup2c had a single mixing operation that had to satisfy both sets of requirements which is why it was slower On a Pentium 4 with gcc 34 Paul39s hash was usually faster than lookup3c On a Pentium 4 with gcc 32 they were about the same speed On a Pentium 4 with ice O2 lookup3c was a little faster than Paul s hash I don t know how it would play out on other chips and other compilers lookup3c is slower than the additive hash pretty much forever but it s faster than the rotating hash for keys longer than 5 bytes lookup3c does a much more thorough job of mixing than any of my previous hashes lookup2c lookupc Oneatatime All my previous hashes did a more thorough job of mixing than Paul Hsieh s hash Paul s hash does a good enough job ofmixing for most practical purposes The most evil set of keys I know of are sets of keys that are all the same length with all bytes zero except with a few bits set This is tested by frogc To be even more evil I had my hashes return b and 0 instead of just c yielding a 64bit hash value Both lookupc and lookup2c start seeing collisions after 253 frogc keypairs Paul Hsieh s hash sees collisions after 217 keypairs even if we take two hashes with different seeds lookup3c is the only one of the batch that passes this test It gets its first collision somewhere beyond 263 keypairs which is exactly what you d expect from a completely random mapping to 64bit values MD4 This takes 95n230 instructions MD4 is a hash designed for cryptography by Ron Rivest It takes 420 instructions to hash a block of 64 aligned bytes I combined that with my hash s method of putting unaligned bytes into registers adding 3n instructions MD4 is overkill for hash table lookup The table below compares all these hash functions NAME is the name of the hash SIZE1000 is the smallest reasonable hash table size greater than 1000 SPEED is the speed of the hash measured in instructions required to produce a hash value for a table with SIZE1000 buckets It is assumed the machine has a rotate instruction These aren t very accurate 222009 438 PM A Hash Function for Hash Table Lookup 12 of13 httpburtleburtlenetbobhashdoobs html measures I should really just do timings on a Pentium 4 or such INLINE This is the speed assuming the hash is inlined in a loop that has to walk through all the characters anyways such as a tokenizer Such a loop doesn39t always exist and even when it does inlining isn t always possible Some hashes my new hash and MD4 work on blocks larger than a character Inlining a hash removes 3n1 instructions of loop overhead It also removes the n instructions needed to get the characters out of the key array and into a register It also means the length isn39t known Inlining offers other advantages It allows the string to be converted to uppercase andor to unicode before the hash is performed without the expense of an extra loop or a temporary buffer FUNNEL15 is the largest set of input bits affecting the smallest set of internal state bits when mapping 15byte keys into a 1byte result FUNNEL100 is the largest set of input bits affecting the smallest set of internal state bits when mapping 100byte keys into a 32bit result COLLIDE32 is the number of collisions found when a dictionary of 38470 English words was hashed into a 32bit result The expected number of collisions is 02 COLLIDE1000 is a chi2 measure of how well the hash did at mapping the 38470word dictionary into the SIZE1000 table A chi2 measure greater than 3 is significantly worse than a random mapping less than 3 is signi cantly better than a random mapping in between is just random uctuations Comparison of several hash functions NAME SIZE 1000 SPEED INLINE FUNNEL 15 FUNNEL 100 COLLIDE 32 COLLIDE 1000 Additive 1009 5n3 n2 15 into 2 100 into 2 37006 80602 Rotating 1009 6n3 2n2 4 into 1 25 into 1 24 124 1024 9n9 5n8 none none 0 005 Bernstein 1024 7n3 3n2 3 into 2 3 into 2 4 169 FNV 1024 Pearson 1024 12n5 4n3 none none 0 165 CRC 1024 9n3 5n2 2 into 1 11 into 10 0 007 Generalized 1024 9n3 5n2 none none 0 183 Universal 1024 52n3 48n2 4 into 3 50 into 28 0 020 Zobrist 1024 10n3 6n2 none none 1 003 Paul Hsieh s 1024 5n17 NA 3 into 2 3 into 2 1 112 My Hash 1024 6n35 NA none none 0 033 lookup3c 1024 5n20 NA none none 0 008 MD4 1024 95n230 NA none none 1 073 From the measurements we can conclude that the Additive and Rotating hash and maybe Bernstein were noticably bad for 32bit results and only the Additive hash was noticably bad for 10bit results If inlining is possible the Rotating hash was the fastest acceptable hash followed by Bernstein Pearson or the Generalized CRC if table lookup is OK or Bernstein or Oneata Time if table lookup is not OK Ifinlining is not possible it s a draw between lookup3 and Paul Hsieh s hash Note that for many applications the Rotating hash is noticably bad and should not be used and the Bernstein hash is marginal Table lengths 222009 438 PM A Hash Function for Hash Table Lookup httpburtleburtlenetbobhashdoobs html should always be a power of 2 because that39s faster than prime lengths and all acceptable hashes allow it The COLLIDE 1000 numbers should be ignored unless the numbers are bigger than 3 or less than 3 For example generalized CRC produced 8 8 or l8 for three different tables Itried It39s just noise A different set of keys would give unrelated random uctuations Conclusion A common weakness in hash function is for a small set of input bits to cancel each other out There is an efficient test to detect most such weaknesses and many functions pass this test I gave code for the fastest such function I could find Hash functions without this weakness work equally well on all classes of keys Testimonials 0 A fractal mountain and image of same 13 ofl3 222009 438 PM


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.