We have an unpleasant HPFS (at least) race with read(2) vs. unlink(2) The thing is, HPFS and several other filesystems keep track of all opened struct file for directory and update the position in it upon directory modifications. For HPFS it's particulary painful, since it encodes the location of directory block (times 64 + number of entry in block) as ->f_pos. lseek() validates the offset under i_mutex, directory modifications update ->f_pos of all files (again, under i_mutex) and readdir does all accesses under i_mutex. The trouble with that scheme is sys_read(). There we save the current position, pass the address of local copy to vfs_read() and, once vfs_read() has returned an error, store the value in that local copy back into ->f_pos. The value is unmodified, of course, and everything would be fine if not for the following problem: directory modification done in parallel with that has no idea of that local copy and does *not* update it. It does update ->f_pos, but that update gets reverted as soon as vfs_read() returns and sys_read() does file_pos_write(). It's not just HPFS - HFS, HFS+ and sysfs have a similar (but probably milder) issue. For HPFS it's really nasty - we might end up with subsequent readdir() reading from a completely unrelated directory ;-/ We obviously don't want to grab i_mutex in sys_read(), but I really wonder if we should do file_pos_write() there in case of vfs_read() returning an error. Most of the ->read() instances leave *pos unchanged if they decide to return an error - after all, if we'd consumed some data, the right thing is short read, with whatever errors happening on subsequent reads. I'm not sure if it's mandated by POSIX, but it seems to be what the userland would reasonably expect. And that matches the usual logics with e.g. interruptible waits catching signals in ->read() instances... Comments? I'd obviously prefer to solve it that way (i.e. leave ->f_pos untouched if vfs_read() returns an error), but I might be missing some case where we want position updated even though read() returns an error. I can't come up with one, but then I hadn't RTFS through every ->read() instance in drivers in search of weird cases like that - we've too many instances ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html