Amir Goldstein <amir73il@xxxxxxxxx> wrote: > My thinking is: Can't we implement a stackable cachefs which interfaces > with fscache and whose API to the netfs is pure vfs APIs, just like > overlayfs interfaces with lower fs? In short, no - doing it with pure the VFS APIs that we have is not that simple (yes, Solaris does it with a stacking filesystem, and I don't know anything about the API details, but there must be an auxiliary API). You need to handle: (1) Remote invalidation. The netfs needs to tell the cache layer asynchronously about remote modifications - where the modification can modify not just file content but also directory structure, and even file data invalidation may be partial. (2) Unique file group matching. The info required to match a group of files (e.g. an NFS server, an AFS volume, a CIFS share) is not necessarily available through the VFS API - I'm not sure even the export API makes this available since it's built on the assumption that it's exporting local files. (3) File matching. The info required to match a file to the cache is not necessarily available through the VFS API. NFS has file handles, for example; the YFS variant of AFS has 96-bit 'inode numbers'. (This might be done with the export API - it that's counted so). Further, the file identifier may not be unique outside the file group. (4) Coherency management. The netfs must tell the cache whether or not the data contained in the cache is valid. This information is not necessarily available through the VFS APIs (NFS change IDs, AFS data version, AFS volume sync info). It's also highly filesystem specific. It might also have security implications for netfs's that handle their own security (such as AFS does), but that might fall out naturally. > As long as netfs supports direct_IO() (all except afs do) then the active page > cache could be that of the stackable cachefs and network IO is always > direct from/to cachefs pages. What about objects that don't support DIO? Directories, symbolic links and automount points? All of these things are cacheable objects with AFS. And speaking of automount points - how would you deal with those beyond simply caching the contents? Create a new stacked instance over it? How do you see the automount point itself? I see that the NFS FH encoder doesn't handle automount points. > If netfs supports export_operations (all except afs do), then indexing > the cache objects could be done in a generic manner using fsid and > file handle, just like overlayfs index feature works today. FSID isn't unique and doesn't exist for all filesystems. Two NFS servers, for example, can give you the same FSID, but referring to different things. AFS has a textual cell name and a volume ID that you need to combine; it doesn't have an FSID. This may work for overlayfs as the FSID can be confined to a particular overlay. However, that's not what we're dealing with. We would be talking about an index that potentially covers *all* the mounted netfs. Also, from your description that sounds like a bug in overlayfs. If the overlain NFS tree does a referral to a different server, you no longer have a unique FSID or a unique FH within that FSID so your index is broken. > Would it not be a maintenance win if all (or most of) the fscache logic > was yanked out of all the specific netfs's? Actually, it may not help enormously with disconnected operation. A certain amount of the logic probably has to be implemented in the netfs as each netfs provides different facilities for managing this. Yes, it gets some of the I/O stuff out - but I want to move some of that down into the VM if I can and librarifying the rest should take care of that. > Can you think of reasons why the stackable cachefs model cannot work > or why it is inferior to the current fscache integration model with netfs's? Yes. It's a lot more operationally expensive and it's harder to use. The cache driver would also have to get a lot bigger, but that would be reasonable. Firstly, the expense: you have to double up all the inodes and dentries that are in use - and that's not counting the resources used inside the cache itself. Secondly, the administration: I'm assuming you're suggesting the way I think Solaris does it and that you have to make two mounts: firstly you mount the netfs and then you mount the cache over it. It's much simpler if you just need make the netfs mount only and then that goes and uses the cache if it's available - it's also simple to bring the cache online after the fact meaning you can even cache applied retroactively to a root filesystem. You also have the issue of what happens if someone bind-mounts the netfs mount and mounts the cache over only one of the views. Now you have a coherency management problem that the cache cannot see. It's only visible to the netfs, but the netfs doesn't know about the cache. There's also file locking. Overlayfs doesn't support file locking that I can see, but NFS, AFS and CIFS all do. Anyway, you might be able to guess that I'm really against using stackable filesystems for things like this and like UID shifting. I think it adds more expense and complexity than it's necessarily worth. I was more inclined to go with unionfs than overlayfs and do the filesystem union in the VFS as it ought to be cheaper if you're using it (whereas overlayfs is cheaper if you're not). One final thing - even if we did want to switch to an stacked approach, we might still have to maintain the current way as people use it. David