Georgia Tech OMSCS
  • Introduction
  • CS8803 Introduction to Operating System
    • Processes and Process Management
    • Threads and Concurrency
    • Threads Case Study: PThreads
    • Thread Design Considerations
    • Thread Performance Considerations
    • Scheduling
    • Memory Management
    • Inter-Process Communication
    • Synchronization Cosntructs
    • I/O Management
    • Virtualization
    • Remote Procedure Calls
    • Distributed File Systems
    • Distributed Shared Memory
    • Datacenter Technologies
Powered by GitBook
On this page
  • DFS Models
  • Remote File Service: Extremes
  • Remote File Service: Practical with Compromise
  • State Caching in DFS
  • File Sharing Semantics in DFS
  • Session Semanitcs
  • Periodic Updates
  • Replication vs Partitioning
  • Replication
  • Partition

Was this helpful?

  1. CS8803 Introduction to Operating System

Distributed File Systems

PreviousRemote Procedure CallsNextDistributed Shared Memory

Last updated 4 years ago

Was this helpful?

Distributed filesystems are like distributed storage facilities. It should provide features like

  • Access via well-defined interface

  • Focus on consistent state across nodes

  • Offer a mixture of distribution models if needed

Distributed Filesystem

Underneath a file virtual file system interface, the operating system can hide the fact that there isn't even a local physical storage on a particular machine where the files are stored.

But instead, everything is maintained on a remote machine, or on a remote file system that is being accessed over the network.

DFS Models

  • Client/server on different machines

  • File server distributed on multiple machines which support replication and high availability

    • Replicated or

    • Partitioned

  • Files stored on and served from all machines (peer-to-peer network like BitTorrent)

Remote File Service: Extremes

Let's talk about few extreme cases, for small distributed system with clients and one server.

Extreme 1: Upload/download for every access

File is moved to client entirely. Client will access and modify the file. When client is done, file is returned to server.

  • (+) Local write/read are on client machine, fast for making updates.

  • (-) Extremely slow for synchronization even for small access.

  • (-) Server gives up control entirely.

Extreme 2: True remote file access

File stays on server but every access to file is done remotely, client makes request to server for read and write.

  • (+) File access is centralized, easy to reason about consistency.

  • (-) Every file operation is costly via network.

  • (-) Limits server availability.

Remote File Service: Practical with Compromise

  1. Allow clients to store parts of file locally (blocks.)

    • (+) Low latency on file operations, reduce server load

  2. Force clients to interact with server frequently.

    • (+) Server has insights into what clients are doing.

    • (+) Server has control into which accesses can be permitted, easier to maintain consistency

    • (-) Server becomes more complex, requires different file sharing semantics.

File server can be either stateful or stateless.

Stateless file server cannot support practical model.

  • (-) Cannot support caching and consistency management

  • (-) Every request self-contained, more bits transferred

  • (+) No resources are used on server side

  • (+) On failure, just restart

Stateful file server is needed for practical model to track what is cached/accessed.

  • (+) Support locking, caching, incremental operations

  • (-) On failure, need to perform checkpointing and recovery mechanisms

  • (-) Overheads to maintain state and consistency

State Caching in DFS

Locally, clients maintain portion of state, e.g. file blocks. Locally, clients perform operations on cached state (e.g. open/read/write.) However, it requires coherence mechanisms.

The question is when to trigger the coherence mechanisms? DFS tends to trigger on demand or periodically when clients submit changes to a file.

File Sharing Semantics in DFS

For Unix-based single machine (Unix Semanitcs), every write is visible immediately because each process writes to a cache buffer before the buffer is flushed to disk. However, this may not be true for distributed file system when there are multiple machines involved.

Session Semanitcs

Write back on close()

Whenever a file is closed, the client writes back the server all of the changes that it has applied to that file in its cache.

Update on open()

Whenever a client needs to open a file, it doesn't use the cache contents, instead, goes and checks with the file server whether or not there is more cent version of that file.

The time between open() and close() is called a session. Although the session semantics is really easy to reason about. They are not really good for situations wehn clients want to concurrently share a file, write to it, and see each other's updates.

Periodic Updates

If a session is held too long, it may cause a huge inconsistency in the system. It's reasonable to introduce some periodic updates to keep the system in sync.

  • Client should write back periodically to the server for any changes it has applied to cache.

    Clients would have a lease on a cache data, which specifies how long they can use the cached

    data before the data need to be returned back to the server.

  • Server invalidates periodically and sets a time bound on inconsistency.

Replication vs Partitioning

Replication

Each machine holds all files.

  • (+) Easier to load balance, offer high availability for read and fault tolerance

  • (-) Writes become more complicated

    • Either synchronously write to all (slow)

    • Or write to one and then propagate to all others (inconsistent)

  • (-) Replicas must be reconcilied periodically using voting

Partition

Each machine has a subset of files.

  • (+) Greater scalability for size of filesystem and availability

  • (+) Single file write is simpler, less synchronization needed

  • (-) On failure, some portion of data will be lost

  • (-) Load balancing is more difficult, cannot perform naive round robin

However, two techniques can be combined depending on usage.