On Mon, Dec 22, 2014 at 02:04:37PM -0500, J. Bruce Fields wrote: > It'd also be nice to see any proposals for a completely correct > solution, even if it's something that will take a while. All I can > think of is protocol extensions, but that's just what I know. I tried to think a little about this over the holidays: say we could scrap NFS and start from scratch, what would we do?: - larger NFS readdir cookies: if NFS cookies were 128 bits, then gluster could stick the filesystem's offset in the lower 64 bits and its own data in the upper 64 bits. This doesn't work if anyone else does this, though: if we change to 128 bits here then people may eventually want to do the same thing to filesystem and systemcall interfaces too and then we're back at square one. If people want to be able to stack arbitrary readdir implementations the we can't really choose a fixed size limit any more. - stateful readdir: make clients open the directory, read through it from start to finish, then close it. That's all clients really want to do anyway--they don't need to seek back to offsets returned arbitrarily long ago. However, they do need to be able to resend the last readdir request in case the reply was lost, and they do need to be able to resume reading a directory after a server reboot. So I think that would still leave gluster needing to keep a (persistent, on-disk) cache mapping the NFS cookies it hands out to the offsets in the backend directories. The difference is just that it would only have to cache the small number of entries that are in use by current readdirs in progress instead of potentially having to keep them all forever. I don't know, does that help much? --b. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel