Re: silent data corruption in fuse in rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 9, 2024 at 9:07 AM Malte Schröder <malte.schroeder@xxxxxxxx> wrote:
>
> On 09/12/2024 16:48, Josef Bacik wrote:
> > On Mon, Dec 09, 2024 at 03:28:14PM +0000, Matthew Wilcox wrote:
> >> On Mon, Dec 09, 2024 at 09:49:48AM -0500, Josef Bacik wrote:
> >>>> Ha! This time I bisected from f03b296e8b51 to d1dfb5f52ffc. I ended up
> >>>> with 3b97c3652d91 as the culprit.
> >>> Willy, I've looked at this code and it does indeed look like a 1:1 conversion,
> >>> EXCEPT I'm fuzzy about how how this works with large folios.  Previously, if we
> >>> got a hugepage in, we'd get each individual struct page back for the whole range
> >>> of the hugepage, so if for example we had a 2M hugepage, we'd fill in the
> >>> ->offset for each "middle" struct page as 0, since obviously we're consuming
> >>> PAGE_SIZE chunks at a time.
> >>>
> >>> But now we're doing this
> >>>
> >>>     for (i = 0; i < nfolios; i++)
> >>>             ap->folios[i + ap->num_folios] = page_folio(pages[i]);
> >>>
> >>> So if userspace handed us a 2M hugepage, page_folio() on each of the
> >>> intermediary struct page's would return the same folio, correct?  So we'd end up
> >>> with the wrong offsets for our fuse request, because they should be based from
> >>> the start of the folio, correct?
> >> I think you're 100% right.  We could put in some nice asserts to check
> >> this is what's happening, but it does seem like a rather incautious
> >> conversion.  Yes, all folios _in the page cache_ for fuse are small, but
> >> that's not guaranteed to be the case for folios found in userspace for
> >> directio.  At least the comment is wrong, and I'd suggest the code is too.
> > Ok cool, Malte can you try the attached only compile tested patch and see if the
> > problem goes away?  Thanks,
> >
> > Josef
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 88d0946b5bc9..c4b93ead99a5 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -1562,9 +1562,19 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> >               nfolios = DIV_ROUND_UP(ret, PAGE_SIZE);
> >
> >               ap->descs[ap->num_folios].offset = start;
> > -             fuse_folio_descs_length_init(ap->descs, ap->num_folios, nfolios);
> > -             for (i = 0; i < nfolios; i++)
> > -                     ap->folios[i + ap->num_folios] = page_folio(pages[i]);
> > +             for (i = 0; i < nfolios; i++) {
> > +                     struct folio *folio = page_folio(pages[i]);
> > +                     unsigned int offset = start +
> > +                             (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> > +                     unsigned int len = min_t(unsigned int, ret, folio_size(folio) - offset);
> > +
> > +                     len = min_t(unsigned int, len, PAGE_SIZE);
> > +
> > +                     ap->descs[ap->num_folios + i].offset = offset;
> > +                     ap->descs[ap->num_folios + i].length = len;
> > +                     ap->folios[i + ap->num_folios] = folio;
> > +                     start = 0;
> > +             }
> >
> >               ap->num_folios += nfolios;
> >               ap->descs[ap->num_folios - 1].length -=
>
> The problem persists with this patch.
>

Catching up on this thread now. I'll investigate this today.

>
> /Malte
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux