Specification Document Outline

While we do not yet have a clear set of specification documents for Tahoe (explaining the file formats, so that others can write interoperable implementations), this document is intended to lay out an outline for what these specs ought to contain. Think of this as the ISO 7-Layer Model for Tahoe.

We currently imagine 4 documents.

#1: Share Format, Encoding Algorithm
#2: Share Exchange Protocol
#3: Server Selection Algorithm, filecap format
#4: Directory Format

#3: Server Selection Algorithm, filecap format

This document has two interrelated purposes. With a deeper understanding of the issues, we may be able to separate these more cleanly in the future.

The first purpose is to explain the server selection algorithm. Given a set of N shares, where should those shares be uploaded? Given some information stored about a previously-uploaded file, how should a downloader locate and recover at least k shares? Given a previously-uploaded mutable file, how should a modifier locate all (or most of) the shares with a reasonable amount of work?

This question implies many things, all of which should be explained in this document:

the notion of a “grid”, nominally a set of servers who could potentially hold shares, which might change over time
a way to configure which grid should be used
a way to discover which servers are a part of that grid
a way to decide which servers are reliable enough to be worth sending shares
an algorithm to handle servers which refuse shares
a way for a downloader to locate which servers have shares
a way to choose which shares should be used for download

The server-selection algorithm has several obviously competing goals:

minimize the amount of work that must be done during upload
minimize the total storage resources used
avoid “hot spots”, balance load among multiple servers
maximize the chance that enough shares will be downloadable later, by uploading lots of shares, and by placing them on reliable servers
minimize the work that the future downloader must do
tolerate temporary server failures, permanent server departure, and new server insertions
minimize the amount of information that must be added to the filecap

The server-selection algorithm is defined in some context: some set of expectations about the servers or grid with which it is expected to operate. Different algorithms are appropriate for different situtations, so there will be multiple alternatives of this document.

The first version of this document will describe the algorithm that the current (1.3.0) release uses, which is heavily weighted towards the two main use case scenarios for which Tahoe has been designed: the small, stable friendnet, and the allmydata.com managed grid. In both cases, we assume that the storage servers are online most of the time, they are uniformly highly reliable, and that the set of servers does not change very rapidly. The server-selection algorithm for this environment uses a permuted server list to achieve load-balancing, uses all servers identically, and derives the permutation key from the storage index to avoid adding a new field to the filecap.

An alternative algorithm could give clients more precise control over share placement, for example by a user who wished to make sure that k+1 shares are located in each datacenter (to allow downloads to take place using only local bandwidth). This algorithm could skip the permuted list and use other mechanisms to accomplish load-balancing (or ignore the issue altogether). It could add additional information to the filecap (like a list of which servers received the shares) in lieu of performing a search at download time, perhaps at the expense of allowing a repairer to move shares to a new server after the initial upload. It might make up for this by storing “location hints” next to each share, to indicate where other shares are likely to be found, and obligating the repairer to update these hints.

The second purpose of this document is to explain the format of the file capability string (or “filecap” for short). There are multiple kinds of capabilties (read-write, read-only, verify-only, repaircap, lease-renewal cap, traverse-only, etc). There are multiple ways to represent the filecap (compressed binary, human-readable, clickable-HTTP-URL, “tahoe:” URL, etc), but they must all contain enough information to reliably retrieve a file (given some context, of course). It must at least contain the confidentiality and integrity information from document #1 (i.e. the encryption key and the UEB hash). It must also contain whatever additional information the upload-time server-selection algorithm generated that will be required by the downloader.

For some server-selection algorithms, the additional information will be minimal. For example, the 1.3.0 release uses the hash of the encryption key as a storage index, and uses the storage index to permute the server list, and uses an Introducer to learn the current list of servers. This allows a “close-enough” list of servers to be compressed into a filecap field that is already required anyways (the encryption key). It also adds k and N to the filecap, to speed up the downloader’s search (the downloader knows how many shares it needs, so it can send out multiple queries in parallel).

But other server-selection algorithms might require more information. Each variant of this document will explain how to encode that additional information into the filecap, and how to extract and use that information at download time.

These two purposes are interrelated. A filecap that is interpreted in the context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies a specific peer-selection algorithm, a specific Introducer, and therefore a fairly-specific set of servers to query for shares. A filecap which is meant to be interpreted on a different sort of grid would need different information.

Some filecap formats can be designed to contain more information (and depend less upon context), such as the way an HTTP URL implies the existence of a single global DNS system. Ideally a tahoe filecap should be able to specify which “grid” it lives in, with enough information to allow a compatible implementation of Tahoe to locate that grid and retrieve the file (regardless of which server-selection algorithm was used for upload).

This more-universal format might come at the expense of reliability, however. Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or an individual host might then impact file availability (however the Introducer contains DNS names or IP addresses).

#4: Directory Format

Tahoe directories are a special way of interpreting and managing the contents of a file (either mutable or immutable). These “dirnode” files are basically serialized tables that map child name to filecap/dircap. This document describes the format of these files.

Tahoe-1.3.0 directories are “transitively readonly”, which is accomplished by applying an additional layer of encryption to the list of child writecaps. The key for this encryption is derived from the containing file’s writecap. This document must explain how to derive this key and apply it to the appropriate portion of the table.

Future versions of the directory format are expected to contain “deep-traversal caps”, which allow verification/repair of files without exposing their plaintext to the repair agent. This document wil be responsible for explaining traversal caps too.

Future versions of the directory format will probably contain an index and more advanced data structures (for efficiency and fast lookups), instead of a simple flat list of (childname, childcap). This document will also need to describe metadata formats, including what access-control policies are defined for the metadata.

Specification Document Outline

#1: Share Format, Encoding Algorithm

#2: Share Exchange Protocol

#3: Server Selection Algorithm, filecap format

#4: Directory Format