Re: Async direct IO write vs buffered read race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 22-06-17 12:55:50, Jeff Moyer wrote:
> Lukas Czerner <lczerner@xxxxxxxxxx> writes:
> 
> > Hello,
> >
> > I am dealing with a problem where in case that buffered read happens to
> > land between direct IO submission and completion page cache will contain
> > the stale data, while the new data will be on disk.
> >
> > We are trying to avoid such problems by calling
> > invalidate_inode_pages2_range() before and after direct_IO() in
> > generic_file_direct_write() however that does not seem to be enough,
> > because nothing prevents buffered reads to come in afterwards populating
> > page cache.
> 
> Ugh, right.  With aio, we're doing the invalidate after the submission,
> not the completion.
> 
> > Aside from the fact that mixing direct and buffered IO is not such a
> > good idea, we end up with page cache showing different content than
> > what's on disk even after aio dio completes which seems very strange
> > to me.
> >
> > I can reproduce this on ext4 as well as xfs and kernel version going
> > back at least to v3.10 which leads me to believe that this might
> > actually be known behaviour ?
> 
> At least I didn't know about it.  ;-)

I'm actually aware of it :)

> > I was trying to avoid that by moving invalidate_inode_pages2_range() to
> > after the aio dio completion into dio_complete (or file system ->end_io
> > callback) but it has it's own problems - sometimes this appears to be
> > called from atomic context and I do not really see why...
> 
> Well, I/O completion processing of course happens in atomic context.  We
> do defer some things (like O_(D)SYNC processing) to process context.  I
> guess we could add another qualifier inside of dio_bio_end_aio:
> 
> 	bool defer_completion = false;
> 	if (dio->result)
> 		defer_completion = dio->defer_completion || 
> 			(dio->op == REQ_OP_WRITE && dio->inode->i_mapping->nrpages);
> 
> 	if (remaining == 0) {
> 		if (defer_completion) {
> 			INIT_WORK(&dio->complete_work, dio_aio_complete_work);
> 			queue_work(dio->inode->i_sb->s_dio_done_wq,
> 				   &dio->complete_work);
> ...
> 
> (I'm not sure whether we also have to take into account exceptional entries.)
> 
> And then call invalidate_inode_pages2_range from dio_aio_complete_work.
> That at least wouldn't defer /all/ completion processing to a workqueue.
> However, it will slow things down when there is mixed buffered and
> direct I/O.
> 
> Christoph or Jan, any thoughts on this?

So our stance has been: Do not ever mix buffered and direct IO! Definitely
not on the same file range, most definitely not at the same time.

The thing we do is a best effort thing that more or less guarantees that if
you do say buffered IO and direct IO after that, it will work reasonably.
However if direct and buffered IO can race, bad luck for your data. I don't
think we want to sacrifice any performance of AIO DIO (and offloading of
direct IO completion to a workqueue so that we can do invalidation costs
noticeable mount of performance) for supporting such usecase.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux