Re: [REGRESSION][BISECTED] Crash with Bad page state for FUSE/Flatpak related applications since v6.13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/7/25 18:29, Josef Bacik wrote:
> On Fri, Feb 07, 2025 at 05:49:34PM +0100, Vlastimil Babka wrote:
>> On 2/7/25 10:34, Miklos Szeredi wrote:
>> > [Adding Joanne, Willy and linux-mm].
>> > 
>> > 
>> > On Thu, 6 Feb 2025 at 11:54, Christian Heusel <christian@xxxxxxxxx> wrote:
>> >>
>> >> Hello everyone,
>> >>
>> >> we have recently received [a report][0] on the Arch Linux Gitlab about
>> >> multiple users having system crashes when using Flatpak programs and
>> >> related FUSE errors in their dmesg logs.
>> >>
>> >> We have subsequently bisected the issue within the mainline kernel tree
>> >> to the following commit:
>> >>
>> >>     3eab9d7bc2f4 ("fuse: convert readahead to use folios")
>> 
>> I see that commit removes folio_put() from fuse_readpages_end(). Also it now
>> uses readahead_folio() in fuse_readahead() which does folio_put(). So that's
>> suspicious to me. It might be storing pointers to pages to ap->pages without
>> pinning them with a refcount.
>> 
>> But I don't understand the code enough to know what's the proper fix. A
>> probably stupid fix would be to use __readahead_folio() instead and keep the
>> folio_put() in fuse_readpages_end().
> 
> Agreed, I'm also confused as to what the right thing is here.  It appears the
> rules are "if the folio is locked, nobody messes with it", so it's not "correct"
> to hold a reference on the folio while it's being read.  I don't love this way
> of dealing with folios, but that seems to be the way it's always worked.
> 
> I went and looked at a few of the other file systems and we have NFS which holds
> it's own reference to the folio while the IO is outstanding, which FUSE is most
> similar to NFS so this would make sense to do.
> 
> Btrfs however doesn't do this, BUT we do set_folio_private (or whatever it's
> called) so that keeps us from being reclaimed since we'll try to lock the folio
> before we do the reclaim.
> 
> So perhaps that's the issue here?  We need to have a private on the folio + the
> folio locked to make sure it doesn't get reclaimed while it's out being read?
> 
> I'm knee deep in other things, if we want a quick fix then I think your
> suggestion is correct Vlastimil.  But I definitely want to know what Willy
> expects to be the proper order of operations here, and if this is exactly what
> we're supposed to be doing then something else is going wrong and we should try
> to reproduce locally and figure out what's happening.  Thanks,

Thanks, Josef. I guess we can at least try to confirm we're on the right track.
Can anyone affected see if this (only compile tested) patch fixes the issue?
Created on top of 6.13.1.

----8<----

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux