Distributed File Systems
Last updated
Last updated
Distributed filesystems are like distributed storage facilities. It should provide features like
Access via well-defined interface
Focus on consistent state across nodes
Offer a mixture of distribution models if needed
Underneath a file virtual file system interface, the operating system can hide the fact that there isn't even a local physical storage on a particular machine where the files are stored.
But instead, everything is maintained on a remote machine, or on a remote file system that is being accessed over the network.
Client/server on different machines
File server distributed on multiple machines which support replication and high availability
Replicated or
Partitioned
Files stored on and served from all machines (peer-to-peer network like BitTorrent)
Let's talk about few extreme cases, for small distributed system with clients and one server.
Extreme 1: Upload/download for every access
File is moved to client entirely. Client will access and modify the file. When client is done, file is returned to server.
(+) Local write/read are on client machine, fast for making updates.
(-) Extremely slow for synchronization even for small access.
(-) Server gives up control entirely.
Extreme 2: True remote file access
File stays on server but every access to file is done remotely, client makes request to server for read and write.
(+) File access is centralized, easy to reason about consistency.
(-) Every file operation is costly via network.
(-) Limits server availability.
Allow clients to store parts of file locally (blocks.)
(+) Low latency on file operations, reduce server load
Force clients to interact with server frequently.
(+) Server has insights into what clients are doing.
(+) Server has control into which accesses can be permitted, easier to maintain consistency
(-) Server becomes more complex, requires different file sharing semantics.
File server can be either stateful or stateless.
Stateless file server cannot support practical model.
(-) Cannot support caching and consistency management
(-) Every request self-contained, more bits transferred
(+) No resources are used on server side
(+) On failure, just restart
Stateful file server is needed for practical model to track what is cached/accessed.
(+) Support locking, caching, incremental operations
(-) On failure, need to perform checkpointing and recovery mechanisms
(-) Overheads to maintain state and consistency
Locally, clients maintain portion of state, e.g. file blocks. Locally, clients perform operations on cached state (e.g. open/read/write.) However, it requires coherence mechanisms.
The question is when to trigger the coherence mechanisms? DFS tends to trigger on demand or periodically when clients submit changes to a file.
For Unix-based single machine (Unix Semanitcs), every write is visible immediately because each process writes to a cache buffer before the buffer is flushed to disk. However, this may not be true for distributed file system when there are multiple machines involved.
Write back on
close()
Whenever a file is closed, the client writes back the server all of the changes that it has applied to that file in its cache.
Update on
open()
Whenever a client needs to open a file, it doesn't use the cache contents, instead, goes and checks with the file server whether or not there is more cent version of that file.
The time between open()
and close()
is called a session. Although the session semantics is really easy to reason about. They are not really good for situations wehn clients want to concurrently share a file, write to it, and see each other's updates.
If a session is held too long, it may cause a huge inconsistency in the system. It's reasonable to introduce some periodic updates to keep the system in sync.
Client should write back periodically to the server for any changes it has applied to cache.
Clients would have a lease on a cache data, which specifies how long they can use the cached
data before the data need to be returned back to the server.
Server invalidates periodically and sets a time bound on inconsistency.
Each machine holds all files.
(+) Easier to load balance, offer high availability for read and fault tolerance
(-) Writes become more complicated
Either synchronously write to all (slow)
Or write to one and then propagate to all others (inconsistent)
(-) Replicas must be reconcilied periodically using voting
Each machine has a subset of files.
(+) Greater scalability for size of filesystem and availability
(+) Single file write is simpler, less synchronization needed
(-) On failure, some portion of data will be lost
(-) Load balancing is more difficult, cannot perform naive round robin
However, two techniques can be combined depending on usage.