Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range

Zach Brown <zab@xxxxxxxxx> · Mon, 26 Nov 2012 12:15:06 -0800

On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> On Mon, 26 Nov 2012, Theodore Ts'o wrote:
> > On Mon, Nov 26, 2012 at 04:33:28PM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=50981
> > >
> > > as this is working properly with XFS, so in ext4/ext3...etc also we shouldn't
> > > require synchronization at the Application level,., FS should take care of
> > > locking... will we expecting the fix for the same ???
> > 
> > Meetmehiro,
> > 
> > At this point, there seems to be consensus that the kernel should take
> > care of the locking, and that this is not something that needs be a
> > worry for the application.
> 
> Gosh, that's a very sudden new consensus.  The consensus over the past
> ten or twenty years has been that the Linux kernel enforce locking for
> consistent atomic writes, but skip that overhead on reads - hasn't it?

I was wondering exactly the same thing.

> > So the question is whether every file system which supports AIO should
> > add its own locking, or whether it should be done at the mm layer, and
> > at which point the lock in the XFS layer could be removed as no longer
> > necessary.

(This has nothing to do with AIO.  Buffered reads have been copied from
unlocked pages.. basically forever, right?)

> Thanks, that's helpful; but I think linux-mm people would want to defer
> to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
> but a fundamental change to VFS locking philosophy is not mm's call.
> 
> I don't see that page locking would have anything to do with it: if we
> are going to start guaranteeing reads atomic against concurrent writes,
> then surely it's the size requested by the user to be guaranteed,
> spanning however many pages and fs-blocks: i_mutex, or a more
> efficiently crafted alternative.

Agreed.  While this little racing test might be fixed, those baked in
page_size == 4k == atomic granularity assumptions are pretty sketchy.

So we're talking about holding multiple page locks?  Or i_mutex?  Or
some fancy range locking?

There's consensus on serializing overlapping buffered reads and writes? 

- z
*readying the read(, mmap(), ) fault deadlock toy*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>