On Fri, Jul 20, 2007 at 06:38:07AM +0100, Al Viro wrote: > On Fri, Jul 20, 2007 at 12:10:00AM -0400, Jan Harkes wrote: > > I will try to find a clean way to block the close syscall until fput > > drops the last reference. However I realize that such an implementation > > would not be acceptable for other file systems, and there are some > > interesting unresolved details such as 'which close is going to be the > > last close'. > > Simply impossible. As usual you are correct. Originally Coda only used the CODA_CLOSE upcall which was called from fops->release (fput). The problem was that we had various errors that never managed to make it back to the user. So I split up the operation into the part that does the write back and is the source of most errors. (CODA_STORE, called from fops->flush) and the part that drops the last reference (CODA_RELEASE, called from fops->release). I've actually only used the _STORE/_RELEASE upcalls in a development version, so the old code is not only still around, it is the typically used variant. I'll submit a patch that removes those upcalls, they will never work the way I hoped. > Why does CODA need special warranties in that area, anyway? I don't think it really needs special warranties. The intent is to write back data only when the last reference has been released. Ideally we would like to return errors that occur during this writeback to the application. Clearly the latter isn't possible. > Related question: does fsync() force the writeback? I think it should, but at the moment it does not. My guess on why the implementation isn't there is that it mesh well with the last-writer close model. The cache manager actually doesn't trigger any write back operations until the owrite counter drops to 0 and this happens only after all open for write descriptors for a file have been released. As a result it was a bit of a hack to get the _STORE/_RELEASE variant to work because it couldn't rely on the counter dropping to 0, which is one of the reasons why I never released a Coda client that actually uses those new upcalls. We do have an upcall that is sent when fsync is called and could use that. For fsync we could ignore that owrite counter. I am not sure if the in-kernel implementation is correct, I would expect that it should be an atomic operation wrt. other writers to be able to guarantee that the resulting file actually contains what we expect but right now it looks like it does grab the inode mutex when it syncs the file to disk but releases the lock before it sends an upcall to the cache manager. Jan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html