Notes from IPFS whitepaper and other resources for student reference.
The InterPlanetary File System (IPFS) is a p2p distributed filesystem. A filesystem is an abstraction that contols how data is stored and retrieved.
IPFS can be compared to the Web, but can more accurately be seen as a single BitTorrent swarm, exchanging objects within one Git repository. It provides high throughput content-addressed block storage model, with content-addressed hyperlinks. The structure forms a generalized Merkle DAG, upon which one can build cersioned file systems, blockchains, and even a Permanent Web.
Underlying all this, we have a distributed hashtable, incentivized block exchange, and self-certifying namespace. IPFS has no point of failure, and is Byzantine Fault Tolerant.
Historically, there have been no success in creating general pupose distributed filesystems.
HTTP has been used for the past 2 decades, but since then there have been significant new findings. Especially in the face of new challenges, upgrade is increasingly necessary:
In general: lots of data accessible everywhere.
Some projects and ideas that inspired IPFS design philosophy. These solved very specific problems, and IPFS aims to generalize all this.
DHTs widely used to coordinate and maintain metadata about p2p systems. BitTorrent MainlineDHT tracks sets of peers part of a torrent swarm. Other examples:
BitTorrent coordinates networks of untrusting peers (swarms) to cooperate in distributing pieces of files to each other. Key features:
amount uploaded/amount downloaded. Interesting related article
Equally important to efficient data distribution is version control. Git, which is probably the most well known version control system today, was a pioneer in this. Git’s underlying Merkle DAG data model enables powerful file distribution strategies.
The central IPFS design principle is to model all data as part of the same Merkle DAG.
Using a scheme such that the name of an SFS file system certifies its server.
/sfs/<Location>:<HostID> HostID = hash(public_key || Location)
Using ideas from DHTs, BitTorrent, Git, and SFS. IPFS is p2p, no nodes are privileged, IPFS nodes store IPFS objects (files, data structures, etc) in local storage. Stack of sub-protocols:
Nodes identified by
NodeId, which is the (cryptographic) hash of a public key. Uses S/Kademlia cryptographic puzzle for choosing pub/priv key.
NodeId = hash(pubkey)
hash(NodeId)must be at least a specified
Also, cryptographic hash digets are self-describing, so that the system can choose the best function for a given use case, and to evolve as function choices change.
<function code><digest length><digest bytes>
Needed to find other peers’ network addresses, and peers who can serve particular objects. IPFS uses DSHT based on S/Dademlia and Coral. Distinction based on file size: small values stored directly in DHT, large values stored as references to
NodeIds of peers who can serve the block.
BitTorrent inspired protocol to exchange blocks with peers. Peers have
have_list, and “barter” on the BitSwap “persistent marketplace.” (More advanced functionality needs DLT and probably an underlying digital currency.)
Base case: Peers have blocks that provide direct value to one another, then swap.
Working: Node has nothing its peers want, so seek the pieces its peers want.
BitSwap Credit to incentivize nodes to seed when they don’t need anything in particular.
On top of DHT and BitSwap (which allow p2p storage and distribution), need to link objects together. Generalization of Git data structure.
IPFS Object Format contains name/alias, multihash, size.
Defining a set of objects for modeling a versioned filesystem on top of the Merkle DAG:
“Unless your file gets popular and a lot of people pin it from their computer, your file will die. So better be prevented and store it yourself with this tutorial.”
Tutorial uses AWS, but doesn’t that mean survival of your file is tied to survival of your AWS instance?
Building off of IPFS, adding blockchain elements. As an incentive lay on top of IPFS. Other blockchain systems allow devs to write smart contracts, but have very little storage capability and at a high cost. IPFS as a way to reference and distribute content, but need to add Filecoin support to guarantee storage of IPFS content (in exchange of Filecoin tokens).
Proof of replication to convince a user that a server/prover that some data has been replicated. Proof of spacetime to prove that data was stored throughout a period of time.