Re: Race Condition Leads to Corruption

Marc Smith <msmith626@xxxxxxxxx> · Mon, 26 Apr 2021 09:23:04 -0700

On Fri, Apr 23, 2021 at 8:53 AM Coly Li <colyli@xxxxxxx> wrote:
>
> On 4/23/21 6:19 AM, Kai Krakow wrote:
> > Hello Coly!
> >
> > Am Do., 22. Apr. 2021 um 18:05 Uhr schrieb Coly Li <colyli@xxxxxxx>:
> >
> >> In direct I/Os, to read the just-written data, the reader must wait and
> >> make sure the previous write complete, then the reading data should be
> >> the previous written content. If not, that's bcache bug.
> >
> > Isn't this report exactly about that? DIO data has been written, then
> > differently written again with a concurrent process, and when you read
> > it back, any of both may come back (let's call it state A). But the
> > problem here is that this is not persistent, and that should actually
> > not happen: bcache now has stale content in its cache, and after write
> > back finished, the contents of the previous read (from state A)
> > changed to a new state B. And this is not what you should expect from
> > direct IO: The contents have literally changed under your feet with a
> > much too high latency: If some read already confirmed that data has
> > some state A after concurrent writes, it should not change to a state
> > B after bcache finished write-back.
>
> Hi Kai,
>
> Your comments make me have a better comprehension. Yes the staled key
> continues to exist even after a reboot, it is problematic.
>
>
> >
> >> You may try the above steps on non-bcache block devices with/without
> >> file systems, it is probably to reproduce similar "race" with parallel
> >> direct read and writes.
> >
> > I'm guessing the bcache results would suggest there's a much higher
> > latency of inconsistency between write and read races, in the range of
> > minutes or even hours. So there'd be no chance to properly verify your
> > DIO writes by the following read and be sure that this state persists
> > - just because there might be outstanding bcache dirty data.
> >
> > I wonder if this is why I'm seeing btrfs corructions with bcache when
> > I enabled auto-defrag in btrfs. OTOH, I didn't check the code on how
> > auto-defrag is actually implemented and if it uses some direct-io path
> > under the hoods.
>
> Hi Marc,
>
> It seems that if the read miss hitting an on-flight writethrough I/O on
> backing device, such read request should served without caching onto the
> cache set.
>
> Do you have a patch for the fix up ?

Yes, we do have a patch that we are testing and would like to be
advised if it's the correct/acceptable approach. I'll post what we
have shortly.

--Marc

>
> Thanks.
>
> Coly Li