Re: VFS functions

Jan Hudec <bulb@ucw.cz> · Tue, 29 Jun 2004 22:48:23 +0200

On Tue, Jun 29, 2004 at 21:31:18 +0530, Siddhartha Jain wrote:
> I guess at this point I should discuss what I intend to do and try to weed
> out any design flaws.
> 
> Basically, I intend to create a module that, when loaded, allows for
> file/directory replication. So if /home is meant to be replicated to /opt
> then /etc can contain a config file that has a line like:
> /home	   /opt
> 
> I chose to place my hooks on the VFS layer so that the implementation is FS
> independant.

I believe that in Linux 2.6, there is a concept of a stackable
filesystem. It should be somehow possible to mount a filesystem over
another and the filesystem mounted on top should be able to access the
filesystem below, provided it's designed to do that.

In other words, I think you should NOT do it in VFS, but do it as
a special filesystem. This filesystem would implement all the methods of
dentry, inode and file and in these methods, redispatch to another FS
driver (by doing manual path_walk and calling respective
inode/dentry/file objects).

It's also possible to use the coda interface to kernel and implement
your filesystem in userland. There are libraries -- fuse and lufs --
that will help you with that.

After all, when I think of it, coda itself is quite probably a good
solution for you, without writing a single line of code.

> The flow would be - If a sys_open is issued with any sort of write mode then
> the sys_open function should check if the file is meant to be replicated. If
> yes, then it should also open the replica. The file descriptor of the source
> and destination files should be stored in a global data structure.
> 
> Now when, the file is written to by sys_write (or sys_writev etc), the
> function needs to check if the file descriptor passed to it is listed in the
> data structure created by sys_open above. If yes, it writes the data written
> to source to replica also.
> 
> sys_close would check if the descriptor passed to it is listed in the
> replication data structure and close the replica fd as well.
> 
> A global flag should be set by the module to indicate whether sys_open,
> sys_write and sys_close should deviate and check for replica file/fd.
> 
> Unloading the module should unset the global flag for replication and clear
> out of the file descriptor data structure.
> 
> All functions doing the various jobs of file path comparison, data structure
> updation etc would go in the module so that minimal changes are done to the
> original functions. I guess there will be still be some concurrency issues
> to deal with?
> 
> Does this sound like a proper and feasible design?

No. That's error-prone and dirty.

Actulay, first have a look on the four replicating network filesystems
available for linux. It's quite likely, that what you need can work
sufficiently well with one of them. These are:

* coda: This one is distributed with kernel (well, since the server as
  well as most of the client is in userland, you will need those, of
  course). It is a replicating filesystem, where the local replica
  behaves like a cache. It should also allow to explicitely say, that
  a file must be cached, so you can disconnect the computer, continue to
  work disconnected and then sync the changes on reconnect. It needs to
  have whole file in local replica before opening it, but that is not
  necessarily a problem. The server needs a dedicated partition, that
  can only be accessed throut it, though. It should be quite well tested
  and stable.
* intermezzo: This is similar to coda in features, but lighter in
  implementation (it actualy uses normal http server for serving file
  content). It is younger and thus a bit less tested. It is also
  supported on less other OSes.
* afs: This filesystem predates, and inspires, coda. It has the client
  fully in kernel. It does not support disconnected operation. On the
  other hand since it caches blocks and not whole files, it works better
  real-time. It's drawback is, that the driver is not that stable. It's
  not that bad, however. We have it in a computer lab and it works just
  well (NFS used to be much worse), even across different systems
  (linux, solaris, irix and windows). It is not shipped with kernel.
* lustre: This is a brand new thing. It's a distributed filesystem
  designed for large clusters. It's designed to scale to tens of
  thousands of nodes. Quite probably an overkill, but who knows.

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>
Attachment:
signature.asc

Description: Digital signature