Monday, May 7, 2012

Blogging my way through CLRS Section 11.1 (edition 2)

I've taken a brief break from blogging about my Cormen readings but I decided to write up the answers to chapter 11. Note that the chapters and question numbers may not match up because I'm using an older edition of the book.
Question 11.1-1:

Suppose that a dynamic set $S$ is represented by a direct address table $T$ of length $m$. Describe a procedure that finds the maximum element of $S$. What is the worst case performance of your procedure?

Assuming the addresses are sorted by key: Start at the end of the direct address table and scan downward until a non-empty slot is found. This is the maximum and if not:

  1. Initialize $max$ to $-\infty$
  2. Start at the first address in the table and scan downward until a used slot is found. If you reach the end goto #5
  3. Compare key to $max$. If it is greater assign it to $max$
  4. Goto #2
  5. Return $max$

The performance of this algorithm is $\Theta(m)$. A slightly smaller bound can be found in the first case of $\Theta(m - max)$

Question 11.1-2:

Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in $O(1)$ time.

Initialize a bit vector of length $|U|$ to all $0$s. When storing key $k$ set the $k$th bit and when deleting the $k$th bit set it to zero. This is $O(1)$ even in a non-transdichotomous model though it may be slower.

Question 11.1-3:

Suggest how to implement a direct address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations must take $O(1)$ time.

Each element in the table should be a pointer to the head of a linked list containing the satellite data. $nul$ can be used for non-existent items.

Question 11.1-4:

We wish to implement a dictionary by using direct addressing on a large array. At the start the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct address dictionary on the array. Dictionary operations should take $O(1)$ time. Using an additional stack with size proportional to the number of stored keys is permitted.

On insert the array address is inserted into a stack. The array element is then initialized to the value of the location in the stack.

On search the array element value is to see if it is pointing into the stack. If it is the value of the stack is checked to see if it is pointing back to the array.[1]

On delete, the array element can be set to a value not pointing the stack but this isn't required. If the element points to the value of the stack, it is simply popped off. If it is pointing to the middle of the stack, the top element and the key element are swapped and then the pop is performed. In addition the value which the top element was pointing to must be modified to point to the new location

Question 11.2-1:

Suppose we have use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing what is the expected number of collisions?

Since each new value is equally likely to hash to any slot we would expect $n/m$ collisions.

Question 11.2-2:

Demonstrate the insertion of the keys: $5, 28, 19, 15, 20, 33, 12, 17, 10$ into a hash table with 9 slots and $h(k) = k \mod{9}$[2]

hash values
1 28 -> 19 -> 1
2 20
3 12
5 5
6 15 -> 33
17 8

Question 11.2-3:

If the keys were stored in sorted order how is the running time for successful searches, unsuccessful searches, insertions, and deletions affected under the assumption of simple uniform hashing?

Successful and unsuccessful searches are largely unaffected although small gains can be achieved if if the search bails out early once the search finds a key later in the sort order than the one being searched for.

Insertions are the most affected operation. The time is changed from $\Theta(1)$ to $O(n/m)$

Deletions are unaffected. If the list was doubly linked the time remains $O(1)$. If it was singly linked the time remains $O(1 + \alpha)$

Question 11.2-4:

Suggest how storage for elements can be allocated and deallocated within the ash table by linking all unused slots into a free list. Assume one slot can store a flag and either one element or two pointers. All dictionary operations should run in $O(1)$ expected time.

Initialize all the values to a singly linked free list (flag set to false) with a head and tail pointer. On insert, use the memory pointed to by the head pointer and set the flag to true for the new element and increment the head pointer by one. On delete, set the flag to false and insert the newly freed memory at the tail of the linked list.

Question 11.2-5:

Show that if $|U| > nm$ with $m$ the number of slots, there is a subset of $U$ of size $n$ consisting of keys that all hash to the same slot, so that the worst case searching time for hashing with chaining is $\Theta(n)$

Assuming the worst case of $|U|$ keys in the hash tabe assuming the optimial case of simple uniform hashing all m slots will have $|U|/m = n$ items. Removing the assumption of uniform hashing will allow some chains to become shorter at the expense of other chains becoming longer. There are more items then the number of slots so at least one slot must have at least $n$ items by the pigeon hole principle.

Question 11.3-1:

Suppose we wish to search a linked list of length $n$, where every element contains a key $k$ along with a hash value $h(k)$. Each key is a long character string. How might we take advantage of the hash values when searching the list for an element of a given key?

You can use $h(k)$ to create a bloom filter of strings in the linked list. This is an $\Theta(1)$ check to determine if it is possible that a string appears in the linked list.

Additionally, you can create a hash table of pointers to elements in the linked list with that hash value. this way you only check a subset of the linked list. Alternatively, one can keep the hash of the value stored in the linked list as well and compare the hash of the search value to the hash of each item and only do the long comparison if the hash matches.

Question 11.3-2:

Suppose that a string of length $r$ is hashed into $m$ slots by treating it as a radix-128 number and then using the division method. The number $m$ is easily represented as a 32 bit word but the string of $r$ character treated as a radix-128 number takes many words. How can we apply the division method to compute the hash of the character string without using more than a constant number of words outside of the string itself?

Instead of treating the word as a radix-128 number some form of combination could be used. For example you may add the values of each character together modulus 128.

Question 11.3-4:

Consider a hash table of size $m = 1000$ and a corresponding hash function $h(k) = \lfloor m (k A \mod{1})\rfloor$ for $ A = \frac{\sqrt{5} - 1}{2}$ Compute the locations to which the keys 61, 62, 63, 64, 65 are mapped.

key hash
61 700
62 318
63 936
64 554
65 172
  1. This is required because it is possible that the random garbage in the array points to the stack by random chance
  2. unused slots not shown