## Tuesday, June 21, 2011

### Cormen on Algorithms: Blogging my way through [1/?]

Two of my good friends recently started reading Introduction to Algorithms by Thomas H. Cormen, et. al. Being unable to resist peer pressure I decided to follow and read along.

I plan on blogging my way through the chapters writing my answers to the questions as I go through the book. Like most of my plans they don't always work out, but one could try.

Here it goes!

1.1-1: Give a real-world example in which each of the following computational problems appears: (a)Sorting, (b) Determining the best order for multiplying matrices, (c) finding the convex hull of a set of points.
Sorting - Sorting comes up in virtually every algorithm one could think of. Everything from optimizing monetary investments to efficient compression algorithms has to sort data at some point or another. A harder question might be: Name one non-trivial algorithm that doesn't require sorting.
Multiplying Matrices - graphics and scientific problems frequently require matrix operations.
Convex Hull - Collision detection for use in games, modeling biological systems, or other related work could make use of this
1.1-2: Other than speed what other measures of efficiency might one use in a real-world setting?
It is possible to optimize for (and against) every limited resource. For example minimizing the amount of memory usage is important for embedded applications (and desktop ones too). Reducing total disk I/O is important to increase the longevity of hard drives. On a less technical note optimizing for monetary cost or man hours expended is important too.
1.1-3: Select a data structure you have seen previously and discuss its strengths and limitations
One of the most interesting data structures I know is the Bloom Filter. It is a probabilistic data structure that can determine if an element is NOT in a set but can't determine definitively if an element is in a set. It works by hashing each element in a set to a fixed size bit array. It then ORs the hash with itself (which starts at all zeros). One can test to see if an element is in a set by generating the hash and testing to see if every bit set to 1 in the queried element is set to 1 in the filter. If it is then you have some degree of confidence that the element is in the set. Any negative means that what you are querying for has not been added.
While most probabilistic structures have certain properties in common, bloom filters have a number of interesting pros and cons.
1. A negative result is definitive - if a query returns that an element has not been added then one knows this to be 100% true.
2. Since hashes are fixed size the amount of memory a Bloom Filter uses is known and bounded.
3. Bloom filters can quickly become useless with large amounts of data. It is possible that every bit will be set to 1 which effectively makes the query a NOP.
4. It is impossible to remove data from a bloom filter. One can't just set all the bits of the hash to a zero because that might be removing other elements as well.
5. Without a second set of data there is no way to deterministically list all elements (unlike other probabilistic data structures such as Skip Lists).
1.1-4: How are the shortest path and traveling salesmen problems similar? How are they different?
The shortest path problem is
Given a weighted (undirected) graph G:, a start vertex $V_0$ and an end vertex $V_e$, find a path between $V_0$ and $V_e$ such that the sum of the weights is minimized. This could be expanded to $Given a weighted graph G:, find a path between every pair such that the sum of the weights for each path is minimized. Traveling salesman is defined as: Given a weighted, undirected, graph G: and a start vertex$V_0$find a path starting and ending at$V_0\$ such that it passes through every other vertex exactly once and the sum of the weights is minimized.
The traveling salesman problem might make use of the shortest path problem repeatedly in order to come up with the correct solution.
1.1-5: Come up with a real-world problem in which only the best solution will do. Then come up with a problem in which a solution that is "approximately" the best will do?
There are very few problems where one needs the objectively optimal solution. Mathematical questions are the only problems I could think of that need that level of accuracy. Virtually every problem needs a good enough solution. Some examples include finding a fast route for packets on the internet or locating a piece of data in a database.
update 2011-06-30: modified text of answers 1.1-3 and 1.1-5 to be more clear.