- Identifiers should not have vowels; they are expensive and difficult to type.
- An identifier must not be longer than 8 characters. The only exception are functions intended for standardization like sched_ss_init_budget.
- Functions must not be reentrant. Relying on internal state means you can avoid allocating memory.
- Functions should take at least two parameters. The second parameter should be a "flags" parameter which causes the function to do entirely different things.
- Flags should be passed as macros with unspecified values. These macros must not have reasonable values.
- Error handling must be done in one of two ways. The choice must not be consistent with other functions in the library:
- The real return value should be stored in an "out" parameter. The return value must only determine if an error has occurred or not.
- If an error occurs, the return value must be undefined. The return value can't be safely used without checking for errors using a separate function (e.g., fgets).
- The error code should be in errno, requiring the setting of `errno = 0` beforehand and checking after an error occurs. However, the return value should be a value legally allowed to be in errno, so that initial attempts to use the function appear to work.
- If the function returns a string, it must do so by modifying a memory location given as a parameter. Whether or not the string is terminated with a null must be determined solely based on the length of the output, a user supplied parameter, and choice of compiler.
Saturday, February 17, 2018
Some rules for designing libc style APIs
Sunday, November 27, 2016
Papers We Read
This is the list of papers we read:
- Lamport Time Clocks
- Spanner: Google’s Globally-Distributed Database
- The Chubby Lock Service for Loosely-Coupled
- A note on distributed computing
- The Byzantine Generals Problem
- Your computer is already a distributed system. Why isn't your OS?
- How Complex Systems Fail
- Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems
- Automatic Management of Partitioned, Replicated Search Services
- Simple Testing Can Prevent Most Critical Failures
- Dynamo: Amazon’s Highly Available Key-value Store
- Wait-free coordination for Internet-scale system
- Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
- SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
- Kafka: a Distributed Messaging System for Log Processing
- DistributedLog: A high performance replicated log service
- The Log: What every software engineer should know about real-time data's unifying abstraction
- Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
- MapReduce: Simplified Data Processing on Large Clusters
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale
- Snowflake - Unique ID Generation. “No two snowflakes are alike.”
- The Hadoop Distributed File System
- Gorilla: A Fast, Scalable, In-Memory Time Series Database
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
- Meltdown
- Spectre Attacks: Exploiting Speculative Execution
- Communicating Sequential Processes
- The Tail at Scale
- Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
- Dapper, A Large Scale Distributed Systems Tracing Infrastructure
- The many faces of consistency
- SDPaxos: Building efficient semi-decentralized geo-replicated state machines
- Dataflow Model
- How to read a paper
- Jupiter Rising: A Decade of Clos Topologies andCentralized Control in Google’s Datacenter Network
- Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook
- Harvest, Yield, and Scalable Tolerant Systems
- Lineage stash: fault tolerance off the critical path.
- F1 Query: Declarative Querying at Scale
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
- EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
Updated 2018-06-20: added additional papers. Linkified a few more
Updated 2018-10-06: added additional papers. Linkified a few more
Updated 2019-07-10: added additional papers. Linkified a few more
Updated 2022-04-05: added additional papers. Linkified a few more
Monday, June 15, 2015
Blogging My Way Through CLRS Section 4.1
- Question 4.1-1:
What does $\textit{Find-Maximum-Subarray}$ return when all elements of $A$ are negative?
The procedure would return the single element of maximum value. This is expected since the maximum subarray must contain at least one element. This can be computed by note that $\textit{Find-Max-Crossing-Subarray}$ will always return the array of solely the midpoint and that $\textit{Find-Maximum-Subarray}$ always finds the maxium of $\{leftsum, rightsum, and crosssum\}$
- Question 4.1-2:
Write pseudocode for the brute-force method of solving the max-subarray problem. Your solution should run in $\theta(n^2)$ time.
max_i = nil
max_j = nil
max_sum = -∞
for i in 0..len(A):
cur_sum = 0
for j in i..len(A):
cur_sum += A[j]
if cur_sum > max_sum:
max_sum = cur_sum
max_i = i
max_j = j
return (max_i, max_j, max_sum)- Question 4.1-3:
Implement both the brute-force and recursive algorithms for the maximum-subarray problem on your own computer. What problem size $n_0$ gives the crossover point at which the recursive algorithm beats the brute-force algorithm? Then, change the base case of the recursive algorithm to use the brute-force algorithm whenever the problem size is less than $n_0$. Does that change the crossover point?
This question asks a question that is specific to the implementation, and the computer on which it is run. I will therefore be skipping it in this writeup. It is worthwhile to note that it is almost guarenteed that changing he implementation to use the brute force method for values less than $n_0$ is very likely to change $n_0$.
- Question 4.1-4:
Suppose we change the definition of the maximum-subarray problem to allow the result to be an empty subarray, where the sum of the values of an empty subarray is 0. How would you change any of the algorithms that do not allow empty subarrays to permit an empty subarray to be the result?
For the brute force algorithm it would be rather trivial to add a check, and if the return max_sum is > 0 return the empty array.
For the recursive divide and conquer algorithm is is sufficient to just change the $\textit{Find-Max-Crossing-Subarray}$ in a manner similar to the brute force method. If $\textit{Find-Max-Crossing-Subarray}$ return the correct value, then $\textit{Find-Maximum-Subarray}$ will do the correct thing.
- Question 4.1-5:
Develop a nonrecursive linear-time algorithm for the maximum-subarray problem.[1]
If one knows a previous answer to the max-subarray problem for a given prefix of the array than any new element consists of only two cases: being part of the maximum subarray or not being part of the maximum subarray. It is easier to explain with pseudocode:
max_start = 0
max_end = 0
max_sum = A[0]
max_with_j = A[0]
for j in 1..len(A):
# If J is in a maximum-subarray, either j is going to being the maximum on its, or it will will add to the current max max_with_j = max(A[j], max_with_j + x)
Determine if J is in a maximum-subarray
if max_with_j >= max_sum:
max_sum = max_with_j
max_end = j
#Set the starting value if j is the sole element of a new subarray
if max_with_j == A[j]:
max_start = j
return (max_start, max_end, cur_max)
Sunday, March 29, 2015
FreeBSD SMB Client under OSX Host
I recently purchased a new Macbook Pro and wanted to get a FreeBSD Virtual Machine set up in order to continue doing development work on it. Unfortunately, FreeBSD as a guest does not support native folder sharing so I decided to try using a samba mounted.
I decided to set up my VM to have two network interfaces: a NATed interface for internet access and a host-only interface for access to SMB and ssh.
The NAT networking configuration looks like:
NetworkName: FreeBSDNatNetwork IP: 10.0.2.1 Network: 10.0.2.0/24 IPv6 Enabled: Yes IPv6 Prefix: DHCP Enabled: Yes Enabled: Yes Port-forwarding (ipv4) SSH IPv4:tcp:[]:5022:[10.0.2.4]:22 Port-forwarding (ipv6) FreeBSD ssh:tcp:[]:6022:[fd17:625c:f037:2:a00:27ff:fefc:9dab]:22 loopback mappings (ipv4)
The Host-Only networking configuration looks like:
Name: vboxnet0 GUID: 786f6276-656e-4074-8000-0a0027000000 DHCP: Disabled IPAddress: 192.168.56.1 NetworkMask: 255.255.255.0 IPV6Address: IPV6NetworkMaskPrefixLength: 0 HardwareAddress: 0a:00:27:00:00:00 MediumType: Ethernet Status: Up VBoxNetworkName: HostInterfaceNetworking-vboxnet0The FreeBSD configuration looks like this:


Unfortunately, when attempting to actually mount the SMB filesystem with:
mount_smbfs -I 192.168.56.1 //eax@192.168.56.1/shared_vbox
I get the error mount_smbfs: can't get server address: syserr = Operation timed out
I tried installing the package net/samba36
and found that I needed the --signing=off
flag to let it work:
It seems based on this setup and research that FreeBSD can not natively mount an OSX samba share. It might be possible to use sysutils/fusefs-smbnetfs
. Other people have recommended NFS or sshfs.
Sunday, November 3, 2013
Two Factor Authentication for SSH (with Google Authenticator)
Two factor authentication is a method of ensuring that a user has a physical device in addition to their password when logging in to some service. This works by using a time (or counter) based code which is generated by the device and checked by the host machine. Google provides a service which allows one to use their phone as the physical device using a simple app.
This service can be easily configured and greatly increases the security of your host.
Installing Dependencies
- There is only one: the Google-Authenticator software itself:
# pkg install pam_google_authenticator
On older FreeBSD intallations you may use:
# pkg_add -r pam_google_authenticatorOn Debian derived systems use:
# apt-get install libpam-google-authenticator
User configuration
Each user must run "google-authenticator" once prior to being able to login with ssh. This will be followed by a series of yes/no prompts which are fairly self-explanatory. Note that the alternate to time-based is to use a counter. It is easy to lose track of which number you are at so most people prefer time-based.-
$ google-authenticator Do you want authentication tokens to be time-based (y/n) ...
Make sure to save the URL or secret key generated here as it will be required later.
Host Configuration
To enable use of Authenticator the host must be set up to use PAM which must be configured to prompt for Authenticator.-
Edit the file /etc/pam.d/sshd and add the following in the "auth" section prior to pam_unix:
auth requisite pam_google_authenticator.so
- Edit /etc/ssh/sshd_config and uncomment
ChallengeResponseAuthentication yes
Reload ssh config
- Finally, the ssh server needs to reload its configuration:
# service sshd reload
Configure the device
- Follow the instructions provided by Google to install the authentication app and setup the phone.
That is it. Try logging into your machine from a remote machine now
Thanks bcallah for proof-reading this post.Sunday, April 28, 2013
Pre-Interview NDAs Are Bad
I get quite a few emails from business folk asking me to interview with them or forward their request to other coders I know. Given the volume it isn't feasible to respond affirmatively to all these requests.
If you want to get a coder's attention there are a lot of things you could do, but there is one thing you shouldn't do: require them to sign an NDA before you interview them.
From the candidates point of view:
- There are a lot more ideas than qualified candidates.
- Its unlikely your idea is original. It doesn't mean anyone else is working on it, just that someone else probably thought of it.
- Lets say the candidate was working on a similar, if not identical project. If the candidate fails to continue with you now they have to consult a lawyer to make sure you can't sue them for a project they were working on before
- NDAs are hard legal documents and shouldn't be signed without consulting a lawyer. Does the candidate really want to find a lawyer before interviewing with you?
- An NDA puts the entire obligation on the candidate. What does the candidate get from you?
- Everyone talks about the companies they interview with to someone. Do you want to be that strange company which made them sign an NDA? It can harm your reputation easily.
- NDAs do not stop leaks. They serve to create liability when a leak occurs. Do you want to be the company that sues people that interview with them?
There are some exceptions; for example government and security jobs may require security clearance and an NDA. For those jobs it is possible to determine if a coder is qualified and a good fit without disclosing confidential company secrets.
Friday, December 21, 2012
Correctly Verifying an Email Address
Some services that accept email addresses want to ensure that these email addresses are valid.
There are multiple aspects to an email being valid:- The address is syntactically valid.
- An SMTP server accepts mail for the address.
- A human being reads mail at the address.
- The address belongs to the person submitting it.
How does one verify an email address? I'll start with the wrong solutions and build up the correct one.
Possibility #0 - The Regular Expression
Discussions on a correct regular expression to parse email addresses are endless. They are almost always wrong. Even really basic pattern matching such as *@*.* is wrong: it will reject the valid email address n@ai.[5]
Even a fully correct regular expression does not tell you if the mailbox is valid or reachable.
This scores 0/4 on the validity checking scale.
Possibility #1 - The VRFY Command
The oldest mechanism for verifying an email address is the VRFY mechanism in RFC821 section 4.1.1:
VERIFY (VRFY) This command asks the receiver to confirm that the argument identifies a user. If it is a user name, the full name of the user (if known) and the fully specified mailbox are returned.
However this isn't sufficient. Most SMTP servers disable this feature for security and anti-spam reasons. This feature could be used to enumerate every username on the server to perform more targeted password guessing attacks:
Both SMTP VRFY and EXPN provide means for a potential spammer to test whether the addresses on his list are valid (VRFY)... Therefore, the MTA SHOULD control who is is allowed to issue these commands. This may be "on/off" or it may use access lists similar to those mentioned previously.
This feature wasn't guaranteed to be useful at the time the RFC was written:[1]
The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.
Finally, even if VRFY was fully implemented there is no guarantee that a human being reads the mail sent to that particular mailbox.
All of this makes VRFY useless as a validity checking mechanism so it scores 1/4 on the validity checking scale.
Possibility #2 - Sending a Probe Message
With this method you try to connect with a mail server and pretends to send a real mail message but cut off before sending the message content. This is wrong for a for the following reasons:
A system administrator that disabled VRFY has a policy of not allowing for the testing for email addresses. Therefore the ability to test the email address by sending a probe should be considered a bug and must not be used.
The system might be set up to detect signs up of a probe such as cutting off early may rate limit or block the sender.
In addition, the SMTP may be temporarily down or the mailbox temporarily unavailable but this method provides no resilience against failure. This is especially true if this mechanism is attempting to provide real-time feedback to the user after submitting a form.
This scores 1/4 on the validity checking scale.
Possibility #3 - Sending a Confirmation Mail
If one cares about if a human is reading the mailbox the simplest way to do so is send a confirmation mail. In the email include a link to a website (or set a special reply address) with some indication of what is being confirmed. For example, to confirm "user@example.com" is valid the link might be http://example.com/verify?email=user@example.com or http://example.com/verify?account=12345[2].
This method is resilient against temporary failures and forwarders. Temporary failures could be retried like a normal SMTP conversation.
This way it is unlikely that a non-human will trigger the verification email[3]. This approach solves some of the concerns, it suffers from a fatal flaw:
It isn't secure. It is usually trivial to guess the ID number, email account, other identifier. An attacker could sign up with someone else's email account and then go to the verification page for that user's account. It might be tempting to use a random ID but randomness implementations are usually not secure.
This scores 3/4 on the validity checking scale
Possibility #4 - Sending a Confirmation Mail + HMAC
The correct solution is to send a confirmation, but include a MAC of the identifier in the verification mechanism (reply, or url) as well. A MAC is a construction used to authenticate a message by combining a secret key and the message contents. One family of constructions, HMAC, is a particularly good choice. This way the url might become http://example.com/verify?email=user@example.com&mac=74e6f7298a9c2d168935f58c001bad88[4]
Remember that the HMAC is a specific construction, not a naive hash. It would be wise to use a framework native function such as PHP's hash_hmac. Failing to include a secret into the construction would make the MAC trivially defeated by brute force.
This scores 4/4 on the validity checking scale
Closing Notes
Getting email validation right is doable, but not as trivial as many of the existing solutions make it seem.
Thank you to bd for proofreading and reviewing this blog post.
%dig +short ai MX
10 mail.offshore.ai.