Re: silent data corruption in fuse in rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 08, 2024 at 11:32:16PM +0100, Malte Schröder wrote:
> On 08/12/2024 21:02, Malte Schröder wrote:
> > On 08/12/2024 02:23, Matthew Wilcox wrote:
> >> On Sun, Dec 08, 2024 at 12:01:11AM +0100, Malte Schröder wrote:
> >>> Reverting fb527fc1f36e252cd1f62a26be4906949e7708ff fixes the issue for
> >>> me.     
> >> That's a merge commit ... does the problem reproduce if you run
> >> d1dfb5f52ffc?  And if it does, can you bisect the problem any further
> >> back?  I'd recommend also testing v6.12-rc1; if that's good, bisect
> >> between those two.
> >>
> >> If the problem doesn't show up with d1dfb5f52ffc? then we have a dilly
> >> of an interaction to debug ;-(
> > I spent half a day compiling kernels, but bisect was non-conclusive.
> > There are some steps where the failure mode changes slightly, so this is
> > hard. It ended up at 445d9f05fa149556422f7fdd52dacf487cc8e7be which is
> > the nfsd-6.13 merge ...
> >
> > d1dfb5f52ffc also shows the issue. I will try to narrow down from there.
> >
> > /Malte
> >
> Ha! This time I bisected from f03b296e8b51 to d1dfb5f52ffc. I ended up
> with 3b97c3652d91 as the culprit.
> 

Willy, I've looked at this code and it does indeed look like a 1:1 conversion,
EXCEPT I'm fuzzy about how how this works with large folios.  Previously, if we
got a hugepage in, we'd get each individual struct page back for the whole range
of the hugepage, so if for example we had a 2M hugepage, we'd fill in the
->offset for each "middle" struct page as 0, since obviously we're consuming
PAGE_SIZE chunks at a time.

But now we're doing this

	for (i = 0; i < nfolios; i++)
		ap->folios[i + ap->num_folios] = page_folio(pages[i]);

So if userspace handed us a 2M hugepage, page_folio() on each of the
intermediary struct page's would return the same folio, correct?  So we'd end up
with the wrong offsets for our fuse request, because they should be based from
the start of the folio, correct?

I'm coming off of vacation so my brain isn't fully engaged yet, feel free to
point and laugh at me if I'm wrong.  Thanks,

Josef




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux