On Tue, 2019-08-06 at 13:25 +0000, Sage Weil wrote: > On Tue, 6 Aug 2019, Jeff Layton wrote: > > On Tue, 2019-08-06 at 03:27 +0000, Sage Weil wrote: > > > On Mon, 5 Aug 2019, Jeff Layton wrote: > > > > xfstest generic/451 intermittently fails. The test does O_DIRECT writes > > > > to a file, and then reads back the result using buffered I/O, while > > > > running a separate set of tasks that are also doing buffered reads. > > > > > > > > The client will invalidate the cache prior to a direct write, but it's > > > > easy for one of the other readers' replies to race in and reinstantiate > > > > the invalidated range with stale data. > > > > > > Maybe a silly question, but: what if the write path did the invalidation > > > after the write instead of before? Then any racing read will see the new > > > data on disk. > > > > > > > I tried that originally. It reduces the race window somewhat, but it's > > still present since a reply to a concurrent read can get in just after > > the invalidation occurs. You really do have to serialize them to fix > > this, AFAICT. > > I've always assumed that viewing the ordering for concurrent operations as > non-deterministic is the only sane approach. If the read initiates before > the write completes you have no obligation to reflect the result of the > write. > > Is that what you're trying to do? > Not exactly. In this testcase, we have one thread that is alternating between DIO writes and buffered reads. Logically, the buffered read should always reflect the result of the DIO write...and indeed if we run that program in isolation it always does, since the cache is invalidated just prior to the DIO write. The issue occurs when other tasks are doing buffered reads at the same time. Sometimes the reply to one of those will come in after the invalidation but before the subsequent buffered read by the writing task. If that OSD read occurs before the OSD write then it'll populate the pagecache with stale data. The subsequent read by the writing thread then ends up reading that stale data out of the cache. Doing the invalidation after the DIO write reduces this race window to some degree, but doesn't fully close it. You have to serialize things such that buffered read requests aren't dispatched until all of the DIO write replies have come in. -- Jeff Layton <jlayton@xxxxxxxxxx>