Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 26 Nov 2012 15:13:08 -0500

On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> Gosh, that's a very sudden new consensus.  The consensus over the past
> ten or twenty years has been that the Linux kernel enforce locking for
> consistent atomic writes, but skip that overhead on reads - hasn't it?

I'm not sure there was much of a consensus ever.  We XFS people always
ttried to push everyone down the strict rule, but there was enough
pushback that it didn't actually happen.

> Thanks, that's helpful; but I think linux-mm people would want to defer
> to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
> but a fundamental change to VFS locking philosophy is not mm's call.
> 
> I don't see that page locking would have anything to do with it: if we
> are going to start guaranteeing reads atomic against concurrent writes,
> then surely it's the size requested by the user to be guaranteed,
> spanning however many pages and fs-blocks: i_mutex, or a more
> efficiently crafted alternative.

What XFS does is simply replace (or rather augment currently) i_mutex
with a rw_semaphore (i_iolock in XFS) which is used the following way:

exclusive:
 - buffer writes
 - pagecache flushing before direct I/O (then downgraded)
 - appending direct I/O writes
 - less than blocksize granularity direct I/O

shared:
 - everything else (buffered reads, "normal" direct I/O)

Doing this in the highest levels of the generic_file_ code would be
trivial, and would allow us to get rid of a fair chunk of wrappers in
XFS.

Note that we've been thinking about replacing this lock with a range
lock, but this will require more research.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html