Re: [PATCH] ceph: add buffered/direct exclusionary locking for reads and writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-08-06 at 13:25 +0000, Sage Weil wrote:
> On Tue, 6 Aug 2019, Jeff Layton wrote:
> > On Tue, 2019-08-06 at 03:27 +0000, Sage Weil wrote:
> > > On Mon, 5 Aug 2019, Jeff Layton wrote:
> > > > xfstest generic/451 intermittently fails. The test does O_DIRECT writes
> > > > to a file, and then reads back the result using buffered I/O, while
> > > > running a separate set of tasks that are also doing buffered reads.
> > > > 
> > > > The client will invalidate the cache prior to a direct write, but it's
> > > > easy for one of the other readers' replies to race in and reinstantiate
> > > > the invalidated range with stale data.
> > > 
> > > Maybe a silly question, but: what if the write path did the invalidation 
> > > after the write instead of before?  Then any racing read will see the new 
> > > data on disk.
> > > 
> > 
> > I tried that originally. It reduces the race window somewhat, but it's
> > still present since a reply to a concurrent read can get in just after
> > the invalidation occurs. You really do have to serialize them to fix
> > this, AFAICT.
> 
> I've always assumed that viewing the ordering for concurrent operations as 
> non-deterministic is the only sane approach.  If the read initiates before 
> the write completes you have no obligation to reflect the result of the 
> write.
> 
> Is that what you're trying to do?
> 

Not exactly.

In this testcase, we have one thread that is alternating between DIO
writes and buffered reads. Logically, the buffered read should always
reflect the result of the DIO write...and indeed if we run that program
in isolation it always does, since the cache is invalidated just prior
to the DIO write.

The issue occurs when other tasks are doing buffered reads at the same
time. Sometimes the reply to one of those will come in after the
invalidation but before the subsequent buffered read by the writing
task. If that OSD read occurs before the OSD write then it'll populate
the pagecache with stale data. The subsequent read by the writing thread
then ends up reading that stale data out of the cache.

Doing the invalidation after the DIO write reduces this race window to
some degree, but doesn't fully close it. You have to serialize things
such that buffered read requests aren't dispatched until all of the DIO
write replies have come in.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux