Friday, December 21, 2012

Correctly Verifying an Email Address

Some services that accept email addresses want to ensure that these email addresses are valid.

There are multiple aspects to an email being valid:
  1. The address is syntactically valid.
  2. An SMTP server accepts mail for the address.
  3. A human being reads mail at the address.
  4. The address belongs to the person submitting it.

How does one verify an email address? I'll start with the wrong solutions and build up the correct one.

Possibility #0 - The Regular Expression

Discussions on a correct regular expression to parse email addresses are endless. They are almost always wrong. Even really basic pattern matching such as *@*.* is wrong: it will reject the valid email address n@ai.[5]

Even a fully correct regular expression does not tell you if the mailbox is valid or reachable.

This scores 0/4 on the validity checking scale.

Possibility #1 - The VRFY Command

The oldest mechanism for verifying an email address is the VRFY mechanism in RFC821 section 4.1.1:

VERIFY (VRFY) This command asks the receiver to confirm that the argument identifies a user. If it is a user name, the full name of the user (if known) and the fully specified mailbox are returned.

However this isn't sufficient. Most SMTP servers disable this feature for security and anti-spam reasons. This feature could be used to enumerate every username on the server to perform more targeted password guessing attacks:

Both SMTP VRFY and EXPN provide means for a potential spammer to test whether the addresses on his list are valid (VRFY)... Therefore, the MTA SHOULD control who is is allowed to issue these commands. This may be "on/off" or it may use access lists similar to those mentioned previously.

This feature wasn't guaranteed to be useful at the time the RFC was written:[1]

The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

Finally, even if VRFY was fully implemented there is no guarantee that a human being reads the mail sent to that particular mailbox.

All of this makes VRFY useless as a validity checking mechanism so it scores 1/4 on the validity checking scale.

Possibility #2 - Sending a Probe Message

With this method you try to connect with a mail server and pretends to send a real mail message but cut off before sending the message content. This is wrong for a for the following reasons:

A system administrator that disabled VRFY has a policy of not allowing for the testing for email addresses. Therefore the ability to test the email address by sending a probe should be considered a bug and must not be used.

The system might be set up to detect signs up of a probe such as cutting off early may rate limit or block the sender.

In addition, the SMTP may be temporarily down or the mailbox temporarily unavailable but this method provides no resilience against failure. This is especially true if this mechanism is attempting to provide real-time feedback to the user after submitting a form.

This scores 1/4 on the validity checking scale.

Possibility #3 - Sending a Confirmation Mail

If one cares about if a human is reading the mailbox the simplest way to do so is send a confirmation mail. In the email include a link to a website (or set a special reply address) with some indication of what is being confirmed. For example, to confirm "" is valid the link might be or[2].

This method is resilient against temporary failures and forwarders. Temporary failures could be retried like a normal SMTP conversation.

This way it is unlikely that a non-human will trigger the verification email[3]. This approach solves some of the concerns, it suffers from a fatal flaw:

It isn't secure. It is usually trivial to guess the ID number, email account, other identifier. An attacker could sign up with someone else's email account and then go to the verification page for that user's account. It might be tempting to use a random ID but randomness implementations are usually not secure.

This scores 3/4 on the validity checking scale

Possibility #4 - Sending a Confirmation Mail + HMAC

The correct solution is to send a confirmation, but include a MAC of the identifier in the verification mechanism (reply, or url) as well. A MAC is a construction used to authenticate a message by combining a secret key and the message contents. One family of constructions, HMAC, is a particularly good choice. This way the url might become[4]

Remember that the HMAC is a specific construction, not a naive hash. It would be wise to use a framework native function such as PHP's hash_hmac. Failing to include a secret into the construction would make the MAC trivially defeated by brute force.

This scores 4/4 on the validity checking scale

Closing Notes

Getting email validation right is doable, but not as trivial as many of the existing solutions make it seem.

  1. Note that RFC1123 more specifically spells out that VRFY MUST be implemented but MAY be disabled.
  2. This is not my luggage password.
  3. It is still possible for a auto-reply bot to trigger reply based verification schemes. Bots that click every link in received email are uncommon.
  4. This is HMAC-MD5. It isn't insecure as collisions aren't important for HMAC. I chose it because it is short.
  5. n@ai is a in-use email address by a person named Ian:
    %dig +short ai MX
Thank you to bd for proofreading and reviewing this blog post.

Wednesday, November 21, 2012

Don't Use Timing Functions for Profiling

One common technique for profiling programs is to use the gettimeofday system call (with code that looks something like this):

Example (incorrect) code that uses gettimeofday - click to view
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
void function(void)
  struct timeval before;
  struct timeval after;
  gettimeofday(&before, NULL);
  gettimeofday(&after, NULL); 
  time_t delta = after.tv_sec - before.tv_sec;

However, using gettimeofday(2) or time(3) or any function designed to get a time of day to obtain profiling information is wrong for many reasons:

  1. Time can go backwards. In a virtualized environment this can happen quite often. In non-virtualized environments this can happen due to time zones. Even passing CLOCK_MONOTONIC to clock(3) doesn't help as it can go backwards during a leap second expansion.
  2. Time can change drastically for no reason. Systems with NTP enabled periodically sync their time with a time source. This can cause the system time to change by minutes, hours, or even days!
  3. These functions measure Wall Clock time. Time spent on entirely unrelated processes is going to be included in the profiling data!
  4. Even if you have disabled everything else on the system[1] the delta computed above includes both of User time and System Time. If your algorithm is very fast but the kernel has a slow implementation of some system call you won't learn much.
  5. gettimeofday relies on the cpu clock which may differ across cores resulting in time skew.

So what should be used instead?

There isn't a good, portable, function to obtain profiling information. However there are options for those not tied to a particular system (or those willing to maintain multiple implementations for different systems.

The getrusage(2) system call is one option for profiling data. This provides different fields for user time (ru_utime) and system time (ru_stime) at a relatively high level of precision and accuracy.

Using DTraces profiling provider also seems to be a decent choice although I limited experience with it.

Finally, using APIs meant to access hardware specific features such as FreeBSD's hwpmc is likely to provide the best results at the cost of being the least portable. Linux has similar features such as oprofile and perf. Using dedicated profilers such as Intel's vtunes[2] may also be worthwhile.

  1. Including networking, background process swapping, cron, etc.
  2. A FreeBSD version is available.
update 2012-11-26: Include note about clock skew across cores.
Update 2013-02-13: Update and fix a massive error I had w.r.t. clock(3)

Wednesday, October 31, 2012

Finding the majority element in a stream of numbers

Some time ago I came across the following question.
As input a finite stream stream of numbers is provided. Define an algorithm to find the majority element of the input. The algorithm need not provide a sensible result if no majority element exists. You may assume a transdichotomous memory model.
There are a few definitions which may not be immediately clear:
A possibly infinite set of data which may not be reused in either the forward or backward direction without explicitly storing it.
Majority element
An element in a set which occurs more than half the time.
The integer size is equal to the word size of memory. One does not need to worry about storing partial pieces of integers in separate memory units.
Unfortunately this answer isn't of my own invention, but it is interesting and succinct.

The algorithm (click to view) Using 3 registers the accumulator, the guess and the current element (next):
  1. Initialize accumulator to 0
  2. Accept the next element of the stream and place it into next. If there are no more elements go to step #7.
  3. If accumulator is 0 place next into guess and increment accumulator.
  4. Else if guess matches next increment accumulator
  5. Else decrement accumulator
  6. Go to step 2
  7. Return the value in guess as the result

An interesting property of this algorithm is that it can be implemented in $O(n)$ time even on a single tape Turing Machine.

Tuesday, October 30, 2012

Cneonction: closed HTTP header

When you make a request to certain websites you may find an unusual header that looks a little strange:

[8000 eitan@radar ~ ]%curl -I 2>/dev/null|grep close
Cneonction: close
[8001 eitan@radar ~ ]%curl -I 2>/dev/null|grep close
Cneonction: close

This isn't a typo though. Some load balancers that sit between the web server and end user want to implement HTTP keep-alive without modifying the back end web server. The load balancer therefore has to add "Connection: Keep-Alive" to the HTTP header and also has to elide the "Connection: close" from the real webserver. However, if it completely removes the line the load balancer (acting as a TCP proxy) would have to stall before forwarding the complete text in order to recompute the TCP checksum. This increases latency on packet delivery.

Instead, the proxy uses a hack to keep the checksum unchanged. The TCP checksum of a packet is the 1s complement summation of all the 16 bit words (the final word might be right padded with zeros).[1] By manipulating the ordering, but not the content of the header the proxy can avoid changing the TCP checksum except by the fixed amount that the "Connection: Keep-Alive" adds (2061).

In particular:

>>>sum(ord(i) for i in "Connection") - sum(ord(i) for i in "Cneonction")


This reordering also keeps the packet size the same.

  1. RFC793
Edit 2012-10-31: Make the RFC a link and remove pointless "2>&1"
Thanks abbe for the inspiration! Thanks wxs for the proofreading.

Tuesday, October 9, 2012

Reduced Entropy in rand() and random()

TL;DR: Don't rely on undefined behavior, even when you think it should work.

I recently reported a minor issue to the FreeBSD security team.

The libc random functions had code1,2 designed to run when /dev/random is not available. This can easily occur in a chroot or jail environment.

if (!done) {
        struct timeval tv;
        unsigned long junk;

        gettimeofday(&tv, NULL);
        srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);

This code is designed provide a minimal amount of entropy in the "failure" case. Unfortunately, it doesn't even provide the entropy it claims to. This is a minor issue because getpid, getimeday, and a single long variable don't provide a lot of entropy in the first place: (only $log_2{sizeof(long)}$ bits).

The point of the junk value is to add entropy by using uninitialized memory. This relies on the compiler being "stupid" enough not optimize it away.

Unfortunately clang and newer versions of gcc are smart enough to use the undefined behavior in undesired ways.

clang 3 removes any computation which relies on the undefined behavior and so produces the following object code:

 af0:   e8 5f fc ff ff          callq  754 <gettimeofday@plt>
 af5:   e8 7a fc ff ff          callq  774 <getpid@plt>
 afa:   e8 65 fc ff ff          callq  764 <srandom@plt>

Note that the junk variable is entirely unused and that the xor operation between gettimeofday and getpid is non-existent.

gcc 4.6 4 outputs:

 ce8:   e8 03 fa ff ff          callq  6f0 <gettimeofday@plt>
 ced:   e8 4e fa ff ff          callq  740 <getpid@plt>
 cf2:   48 8b 7c 24 08          mov    0x8(%rsp),%rdi
 cf7:   48 33 3c 24             xor    (%rsp),%rdi
 cfb:   c1 e0 10                shl    $0x10,%eax
 cfe:   48 98                   cltq
 d00:   48 31 c7                xor    %rax,%rdi
 d03:   e8 28 fa ff ff          callq  730 <srandom@plt>
Note that in this case the junk value appears to be (%rsp) which isn't all that random.

gcc 4.25 produces the following code

with the junk variable
 d9f:   e8 18 fa ff ff          callq  7bc <gettimeofday@plt>
 da4:   e8 43 fa ff ff          callq  7ec <getpid@plt>
 da9:   48 8b 3c 24             mov    (%rsp),%rdi
 dad:   48 33 7c 24 08          xor    0x8(%rsp),%rdi
 db2:   c1 e0 10                shl    $0x10,%eax
 db5:   48 98                   cltq
 db7:   48 31 c7                xor    %rax,%rdi
 dba:   48 31 df                xor    %rbx,%rdi
 dbd:   e8 1a fa ff ff          callq  <srandom@plt>
and without:
 d9f:   e8 18 fa ff ff          callq  7bc <gettimeofday@plt>
 da4:   e8 43 fa ff ff          callq  7ec <getpid@plt>
 da9:   48 8b 3c 24             mov    (%rsp),%rdi
 dad:   48 33 7c 24 08          xor    0x8(%rsp),%rdi
 db2:   c1 e0 10                shl    $0x10,%eax
 db5:   48 98                   cltq
 db7:   48 31 c7                xor    %rax,%rdi
 dba:   e8 1d fa ff ff          callq  7dc <srandom@plt>

The base version of gcc isn't vulnerable. However, with the upcoming switch to the clang compiler the FreeBSD does become vulnerable.

The first proposed fix was to add the volatile type qualifier to the junk variable. While this seemed to fix the code generation issue I didn't believe this to be a valid fix as the behavior is still undefined6 (I misread text of the standard). Additionally, the value is likely have be very predictable. A preference was expressed to remove the junk variable as using it may leak a small amount of stack data.

I proposed the simple and obvious fix of removing the use of the junk variable7

In a brief survey of other libraries I noticed similar issues. I will attempt to notify the vendors

It should be obvious, but undefined behavior is undefined and can't be relied on to ever to give a sensible result.

  1. random.c r165903 (line 316)
  2. rand.c r241046 (line 131)
  3. FreeBSD clang version 3.1 (branches/release_31 156863) 20120523 compiled with -O2 or -O3
  4. gcc46 (FreeBSD Ports Collection) 4.6.4 20120608 (prerelease) compiled with -O2
  5. gcc (GCC) 4.2.1 20070831 patched [FreeBSD] compiled with -O3
  6. sections and of ISO9899
  7. svn commit r241373
Edit 2010-10-10: update the paragraph referring to undefined behavior of volatile.

Monday, October 8, 2012

NFS Mount Network Booting VirtualBox

In order to test modifications to the FreeBSD kernel, I would like to boot a diskless virtual machine. The goal is to be able to quickly test changed I make without needing to recreate the VM each time, or run the installation procedure inside of the VM.

You can elect to put the root anywhere you want. For this example, I will be using /home/vm as the root of the virtual machine.

There are a few aspects to this:

  • The filesystem that will be booted.
  • VirtualBox setup
  • Virtual machine setup
  • DHCP server
  • Host setup

Filesystem Installation

  1. Create the distribution:
    cd /usr/src 
    make installworld installkernel distribution DESTDIR=/home/vm
  2. Create the /conf directory used for diskless boot. See /etc/rc.initdiskless for details.
    # cd /home/vm/
    # mkdir -p conf/base/etc
    # cd conf/base/etc
    # cat diskless_remount 
    # cat fstab 
    md    /tmp    mfs    -s=30m,rw    0 0
    md    /var    mfs    -s=30m,rw    0 0
    # cat md_size 

VirtualBox Setup

Create a "host-only" interface with DHCP disabled. To do this:

  1. Open up the preferences screen and go the "network" tab.
    screenshot of the preferences screen
  2. Create a new host-only network (the green plus on the right).
    screenshot of the network preferences screen
  3. Disable the DHCP Server.
    screenshot of the host only network DHCP screen
  4. Select an IP Address in the range your VM will have.
    screenshot of the host only network preferences screen
  5. Note that the DHCP server will continue to run until killed or the machine rebooted:
    ## Note that you may kill too much here. Be careful.
    # pgrep -fl DHCP
    # pkill -15 DHCP

Create the Virtual Machine

  1. Create a new virtual machine. Make sure to select "FreeBSD - 64 bit". Note that you should create this machine without any disk (and ignore the warning at the end).
    filled in machine name and type screenshot
  2. Open up the virtual machine's settings
    default settings screenshot
  3. Select only the "Network" option under boot order (System->Motherboard->Boot Order). Also check "Hardware clock in UTC time."
    changed setting screen screenshot
  4. Under network select the Host Only Adapter created before. Under advanced options, change the adapter type to "PCNet-PCI II." Also make a note of the mac address of the virtual machine. It will be needed later.
    changed network setting screen screenshot

DHCP Server

For simplicity I opted to use a really simple DHCP server dnsmasq:
  1. Install dnsmasq
    $ make -C /usr/ports/dns/dnsmasq install clean
  2. Configure dnsmasq to server as a tftp and DHCP server for the virtual machine interface. Modify /usr/local/etc/dnsmasq.conf (I've bolded the parts that are configuration specific):
    #Note that this mac address is the mac address of the vm noted earlier.
  3. Add this to /etc/dhclient.conf so that you can reference your virtual machine by name:
    precede domain-name-servers;

Host Setup

  1. Add the following to /etc/rc.conf:
    # Required for VirtualBox
    # Required for diskless boot
    # NFS Exports
  2. Add the directory to /etc/exports:
    /home/vm -ro -alldirs -network -maproot=root

Now you are all set. Just boot the virtual machine from VirtualBox and watch it go!

  1. Note that the network is RFC 1918 private address space.

Wednesday, October 3, 2012

#!/bin/bash considered harmful

When one writes a shell script there are a variety of shebang lines that could be used:

  • #!/bin/sh
  • #!/usr/bin/env bash
  • #!/bin/bash

or one of many other options.

Of these only the first two are possibly correct.

Using #!/bin/bash is wrong because:

  • Sometimes bash isn't installed.
  • If it is installed, it may not be in /bin
  • If it is in /bin, the user may have decided to set PATH to use a different installation of bash. Using an absolute path like this overrides the user's choices.
  • bash shouldn't be used for scripts intended for portability

If you have bash specific code use #!/usr/bin/env bash. If you want more portable code try using Debian's checkbashism to find instances of non-POSIX compliant shell scripting.

Monday, October 1, 2012

Some git-svn notes

When working with a subversion repository I often miss the use git features. However, it is possible for git to speak the subversion protocol:

Cloning the initial repository

$git svn clone svn://

Resuming an interrupted git svn clone

It is really annoying when you start a git svn clone process overnight and come back to find that it stopped in the middle. Luckily, there is a really simple way to recover - without spending hours to redownload what you already have.

$git svn fetch
$git svn rebase

Committing the final patch

There are a lot of workflows for this, but I prefer the cherry-pick approach:

$git checkout master
$git svn rebase
$git cherry-pick commit-id
$git svn dcommit

Monday, September 3, 2012

Finding the min and max in 1.5n comparisons

A friend of mine recently gave me the following problem:

Given an unsorted set of numbers find the minimum and maximum of set in a maximum of $1.5n$ comparisons.

My answer involves splitting the list up pairwise and finding the result on the only half of the set.

  1. Go through list and compare every even index to its immediate right (odd) index. Sort each pair numerically within itself. This step takes $\dfrac{1}{2}n$ comparisons.
  2. Find the minimum of every odd index and find the maximum of every even element using the typical algorithm. This step takes $n$ comparisons.

Note that this could be done in one pass by doing the pair comparison and the min/max comparison in one pass.

Is there a better way?

Monday, May 7, 2012

Blogging my way through CLRS Section 11.1 (edition 2)

I've taken a brief break from blogging about my Cormen readings but I decided to write up the answers to chapter 11. Note that the chapters and question numbers may not match up because I'm using an older edition of the book.
Question 11.1-1:

Suppose that a dynamic set $S$ is represented by a direct address table $T$ of length $m$. Describe a procedure that finds the maximum element of $S$. What is the worst case performance of your procedure?

Assuming the addresses are sorted by key: Start at the end of the direct address table and scan downward until a non-empty slot is found. This is the maximum and if not:

  1. Initialize $max$ to $-\infty$
  2. Start at the first address in the table and scan downward until a used slot is found. If you reach the end goto #5
  3. Compare key to $max$. If it is greater assign it to $max$
  4. Goto #2
  5. Return $max$

The performance of this algorithm is $\Theta(m)$. A slightly smaller bound can be found in the first case of $\Theta(m - max)$

Question 11.1-2:

Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in $O(1)$ time.

Initialize a bit vector of length $|U|$ to all $0$s. When storing key $k$ set the $k$th bit and when deleting the $k$th bit set it to zero. This is $O(1)$ even in a non-transdichotomous model though it may be slower.

Question 11.1-3:

Suggest how to implement a direct address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations must take $O(1)$ time.

Each element in the table should be a pointer to the head of a linked list containing the satellite data. $nul$ can be used for non-existent items.

Question 11.1-4:

We wish to implement a dictionary by using direct addressing on a large array. At the start the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct address dictionary on the array. Dictionary operations should take $O(1)$ time. Using an additional stack with size proportional to the number of stored keys is permitted.

On insert the array address is inserted into a stack. The array element is then initialized to the value of the location in the stack.

On search the array element value is to see if it is pointing into the stack. If it is the value of the stack is checked to see if it is pointing back to the array.[1]

On delete, the array element can be set to a value not pointing the stack but this isn't required. If the element points to the value of the stack, it is simply popped off. If it is pointing to the middle of the stack, the top element and the key element are swapped and then the pop is performed. In addition the value which the top element was pointing to must be modified to point to the new location

Question 11.2-1:

Suppose we have use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing what is the expected number of collisions?

Since each new value is equally likely to hash to any slot we would expect $n/m$ collisions.

Question 11.2-2:

Demonstrate the insertion of the keys: $5, 28, 19, 15, 20, 33, 12, 17, 10$ into a hash table with 9 slots and $h(k) = k \mod{9}$[2]

hash values
1 28 -> 19 -> 1
2 20
3 12
5 5
6 15 -> 33
17 8

Question 11.2-3:

If the keys were stored in sorted order how is the running time for successful searches, unsuccessful searches, insertions, and deletions affected under the assumption of simple uniform hashing?

Successful and unsuccessful searches are largely unaffected although small gains can be achieved if if the search bails out early once the search finds a key later in the sort order than the one being searched for.

Insertions are the most affected operation. The time is changed from $\Theta(1)$ to $O(n/m)$

Deletions are unaffected. If the list was doubly linked the time remains $O(1)$. If it was singly linked the time remains $O(1 + \alpha)$

Question 11.2-4:

Suggest how storage for elements can be allocated and deallocated within the ash table by linking all unused slots into a free list. Assume one slot can store a flag and either one element or two pointers. All dictionary operations should run in $O(1)$ expected time.

Initialize all the values to a singly linked free list (flag set to false) with a head and tail pointer. On insert, use the memory pointed to by the head pointer and set the flag to true for the new element and increment the head pointer by one. On delete, set the flag to false and insert the newly freed memory at the tail of the linked list.

Question 11.2-5:

Show that if $|U| > nm$ with $m$ the number of slots, there is a subset of $U$ of size $n$ consisting of keys that all hash to the same slot, so that the worst case searching time for hashing with chaining is $\Theta(n)$

Assuming the worst case of $|U|$ keys in the hash tabe assuming the optimial case of simple uniform hashing all m slots will have $|U|/m = n$ items. Removing the assumption of uniform hashing will allow some chains to become shorter at the expense of other chains becoming longer. There are more items then the number of slots so at least one slot must have at least $n$ items by the pigeon hole principle.

Question 11.3-1:

Suppose we wish to search a linked list of length $n$, where every element contains a key $k$ along with a hash value $h(k)$. Each key is a long character string. How might we take advantage of the hash values when searching the list for an element of a given key?

You can use $h(k)$ to create a bloom filter of strings in the linked list. This is an $\Theta(1)$ check to determine if it is possible that a string appears in the linked list.

Additionally, you can create a hash table of pointers to elements in the linked list with that hash value. this way you only check a subset of the linked list. Alternatively, one can keep the hash of the value stored in the linked list as well and compare the hash of the search value to the hash of each item and only do the long comparison if the hash matches.

Question 11.3-2:

Suppose that a string of length $r$ is hashed into $m$ slots by treating it as a radix-128 number and then using the division method. The number $m$ is easily represented as a 32 bit word but the string of $r$ character treated as a radix-128 number takes many words. How can we apply the division method to compute the hash of the character string without using more than a constant number of words outside of the string itself?

Instead of treating the word as a radix-128 number some form of combination could be used. For example you may add the values of each character together modulus 128.

Question 11.3-4:

Consider a hash table of size $m = 1000$ and a corresponding hash function $h(k) = \lfloor m (k A \mod{1})\rfloor$ for $ A = \frac{\sqrt{5} - 1}{2}$ Compute the locations to which the keys 61, 62, 63, 64, 65 are mapped.

key hash
61 700
62 318
63 936
64 554
65 172
  1. This is required because it is possible that the random garbage in the array points to the stack by random chance
  2. unused slots not shown