On Fri, Jul 20, 2007 at 01:53:16AM +0100, Al Viro wrote: > On Fri, Jul 20, 2007 at 10:45:34AM +1000, David Chinner wrote: > > On Thu, Jul 19, 2007 at 06:16:00PM -0400, Jan Harkes wrote: > > > On Thu, Jul 19, 2007 at 11:45:08PM +0200, Christoph Hellwig wrote: > > > > ->release is the proper way to detect the last close of a file, > > > > file_count should never be used in filesystems. > > > > > > Has been tried, the problem with that once ->release is called it is too > > > late to pass the the error back to close(2). > > > > I think you'll find the problem is that fput() throws away the error > > from ->release, not that it's too late.... > > Just where would that return value go? Right, there are actually quite a few places where different drivers and file systems are returning non-zero errors from ->release. I just built a kernel with the prototype changed to void to see how many places are affected, 118 files changed, 211 insertions(+), 391 deletions(-) That's with my default config on i386, so there are probably quite a few more. The change also break the ABI quite badly (several EXPORT_SYMBOL functions are affected) There were a few places where the error handling path seems to lead to a possible struct file or private_data memory leak. > BTW, the reason why checks for struct file refcount blow is not far > from that: > > task A: write() > task B (sharing descriptor table with A): close() > task C (with another reference to struct file in question): close() > task A: return from write() > > Now, the final fput() here happens in write(). In particular, no > call of close(2) sees refcount equal to 1. I see. So if I want last-writer semantics, to let the last close trigger writeback (upcall to userspace) combined with the ability to return an error from the writeback back to close(2). To return an error I have to use fops->flush, but as you've shown, I cannot reliably test if this close dropped the last reference. In fact in your example there was write activity even after the last close. I guess I should find a way to delay the close (or at least the userspace notification/return from syscall) until there are no active writes. Jan ===== Some of the possibly suspect error handling cases I found while building a kernel with void release() in fops, I haven't really looked at these too deeply (i.e. only looked at the deallocation, not the allocations so these may very well be harmless), drivers/char/ipmi/ipmi_devintf.c: ipmi_release returns error when ipmi_destroy_user fails, does not call ipmi_fasync and seems to leak memory referenced by file->private_data drivers/scsi/sg.c: sg_release may return ENXIO. leaks file->private_data when sfp->parentdp is NULL. fs/autofs4/root.c: autofs4_dir_close returns EBUSY or ENOENT. In the case it returns busy it isn't releasing a struct file. fs/bad_inode.c: bad_file_release returns EIO. Since the return code is ignored either way, there is no need to even implement this function. fs/dlm/user.c: device_close may return ENOENT, doesn't free memory referenced by file->private_data if it fails this way. fs/ecryptfs/file.c: ecryptfs_release returns error when it fails to close the lower file, it also seems to leak memory in this case. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html