# Distributed File Systems

Distributed filesystems are like distributed storage facilities. It should provide features like

* Access via well-defined interface
* Focus on consistent state across nodes
* Offer a mixture of distribution models if needed

![Distributed Filesystem](https://github.com/calvinfeng/omscs/tree/c279027c79f8b544ceaad527bc85b13e817011bb/CS8803/P4/diagrams/P4L2_distributed_fs.png)

Underneath a file virtual file system interface, the operating system can hide the fact that there isn't even a local physical storage on a particular machine where the files are stored.

But instead, everything is maintained on a remote machine, or on a remote file system that is being accessed over the network.

## DFS Models

* Client/server on different machines
* File server distributed on multiple machines which support replication and high availability
  * Replicated or
  * Partitioned
* Files stored on and served from all machines (peer-to-peer network like BitTorrent)

### Remote File Service: Extremes

Let's talk about few extreme cases, for small distributed system with clients and one server.

> Extreme 1: Upload/download for every access

File is moved to client entirely. Client will access and modify the file. When client is done, file is returned to server.

* (+) Local write/read are on client machine, fast for making updates.
* (-) Extremely slow for synchronization even for small access.
* (-) Server gives up control entirely.

> Extreme 2: True remote file access

File stays on server but every access to file is done remotely, client makes request to server for read and write.

* (+) File access is centralized, easy to reason about consistency.
* (-) Every file operation is costly via network.
* (-) Limits server availability.

### Remote File Service: Practical with Compromise

1. Allow clients to store parts of file locally (blocks.)
   * (+) Low latency on file operations, reduce server load
2. Force clients to interact with server frequently.
   * (+) Server has insights into what clients are doing.
   * (+) Server has control into which accesses can be permitted, easier to maintain consistency
   * (-) Server becomes more complex, requires different file sharing semantics.

File server can be either stateful or stateless.

Stateless file server cannot support practical model.

* (-) Cannot support caching and consistency management
* (-) Every request self-contained, more bits transferred
* (+) No resources are used on server side
* (+) On failure, just restart

Stateful file server is needed for practical model to track what is cached/accessed.

* (+) Support locking, caching, incremental operations
* (-) On failure, need to perform checkpointing and recovery mechanisms
* (-) Overheads to maintain state and consistency

## State Caching in DFS

Locally, clients maintain portion of state, e.g. file blocks. Locally, clients perform operations on cached state (e.g. open/read/write.) However, it requires coherence mechanisms.

The question is when to trigger the coherence mechanisms? DFS tends to trigger on demand or periodically when clients submit changes to a file.

## File Sharing Semantics in DFS

For Unix-based single machine (Unix Semanitcs), every write is visible immediately because each process writes to a cache buffer before the buffer is flushed to disk. However, this may not be true for distributed file system when there are multiple machines involved.

### Session Semanitcs

> Write back on `close()`

Whenever a file is closed, the client writes back the server all of the changes that it has applied to that file in its cache.

> Update on `open()`

Whenever a client needs to open a file, it doesn't use the cache contents, instead, goes and checks with the file server whether or not there is more cent version of that file.

The time between `open()` and `close()` is called a session. Although the session semantics is really easy to reason about. They are not really good for situations wehn clients want to concurrently share a file, write to it, and see each other's updates.

### Periodic Updates

If a session is held too long, it may cause a huge inconsistency in the system. It's reasonable to introduce some periodic updates to keep the system in sync.

* Client should write back periodically to the server for any changes it has applied to cache.

  Clients would have a *lease* on a cache data, which specifies how long they can use the cached

  data before the data need to be *returned* back to the server.
* Server invalidates periodically and sets a time bound on inconsistency.

## Replication vs Partitioning

### Replication

Each machine holds all files.

* (+) Easier to load balance, offer high availability for read and fault tolerance&#x20;
* (-) Writes become more complicated
  * Either synchronously write to all (slow)
  * Or write to one and then propagate to all others (inconsistent)
* (-) Replicas must be reconcilied periodically using voting

### Partition

Each machine has a subset of files.

* (+) Greater scalability for size of filesystem and availability
* (+) Single file write is simpler, less synchronization needed
* (-) On failure, some portion of data will be lost
* (-) Load balancing is more difficult, cannot perform naive round robin

However, two techniques can be combined depending on usage.
