On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote: > Gosh, that's a very sudden new consensus. The consensus over the past > ten or twenty years has been that the Linux kernel enforce locking for > consistent atomic writes, but skip that overhead on reads - hasn't it? I'm not sure there was much of a consensus ever. We XFS people always ttried to push everyone down the strict rule, but there was enough pushback that it didn't actually happen. > Thanks, that's helpful; but I think linux-mm people would want to defer > to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/, > but a fundamental change to VFS locking philosophy is not mm's call. > > I don't see that page locking would have anything to do with it: if we > are going to start guaranteeing reads atomic against concurrent writes, > then surely it's the size requested by the user to be guaranteed, > spanning however many pages and fs-blocks: i_mutex, or a more > efficiently crafted alternative. What XFS does is simply replace (or rather augment currently) i_mutex with a rw_semaphore (i_iolock in XFS) which is used the following way: exclusive: - buffer writes - pagecache flushing before direct I/O (then downgraded) - appending direct I/O writes - less than blocksize granularity direct I/O shared: - everything else (buffered reads, "normal" direct I/O) Doing this in the highest levels of the generic_file_ code would be trivial, and would allow us to get rid of a fair chunk of wrappers in XFS. Note that we've been thinking about replacing this lock with a range lock, but this will require more research. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html