Re: generic/418 regression seen on 5.12-rc3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 18, 2021 at 05:38:08PM -0400, Eric Whitney wrote:
> * Matthew Wilcox <willy@xxxxxxxxxxxxx>:
> > On Thu, Mar 18, 2021 at 02:16:13PM -0400, Eric Whitney wrote:
> > > As mentioned in today's ext4 concall, I've seen generic/418 fail from time to
> > > time when run on 5.12-rc3 and 5.12-rc1 kernels.  This first occurred when
> > > running the 1k test case using kvm-xfstests.  I was then able to bisect the
> > > failure to a patch landed in the -rc1 merge window:
> > > 
> > > (bd8a1f3655a7) mm/filemap: support readpage splitting a page
> > 
> > Thanks for letting me know.  This failure is new to me.
> 
> Sure - it's useful to know that it's new to you.  Ted said he's also going
> to test XFS with a large number of generic/418 trials which would be a
> useful comparison.  However, he's had no luck as yet reproducing what I've
> seen on his Google compute engine test setup running ext4.
> 
> > 
> > I don't understand it; this patch changes the behaviour of buffered reads
> > from waiting on a page with a refcount held to waiting on a page without
> > the refcount held, then starting the lookup from scratch once the page
> > is unlocked.  I find it hard to believe this introduces a /new/ failure.
> > Either it makes an existing failure easier to hit, or there's a subtle
> > bug in the retry logic that I'm not seeing.
> > 
> 
> For keeping Murphy at bay I'm rerunning the bisection from scratch just
> to make sure I come out at the same patch.  The initial bisection looked
> clean, but when dealing with a failure that occurs probabilistically it's
> easy enough to get it wrong.  Is this patch revertable in -rc1 or -rc3?
> Ordinarily I like to do that for confirmation.

Alas, not easily.  I've built a lot on top of it since then.  I could
probably come up with a moral reversion (and will have to if we can't
figure out why it's causing a problem!)

> And there's always the chance that a latent ext4 bug is being hit.

That would also be valuable information to find out.  If this
patch is exposing a latent bug, I can't think what it might be.

> I'd be very happy to run whatever debugging patches you might want, though
> you might want to wait until I've reproduced the bisection result.  The
> offsets vary, unfortunately - I've seen 1024, 2048, and 3072 reported when
> running a file system with 4k blocks.

As I expected, but thank you for being willing to run debug patches.
I'll wait for you to confirm the bisection and then work up something
that'll help figure out what's going on.



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux