On Fri, Feb 07, 2025 at 05:49:34PM +0100, Vlastimil Babka wrote: > On 2/7/25 10:34, Miklos Szeredi wrote: > > [Adding Joanne, Willy and linux-mm]. > > > > > > On Thu, 6 Feb 2025 at 11:54, Christian Heusel <christian@xxxxxxxxx> wrote: > >> > >> Hello everyone, > >> > >> we have recently received [a report][0] on the Arch Linux Gitlab about > >> multiple users having system crashes when using Flatpak programs and > >> related FUSE errors in their dmesg logs. > >> > >> We have subsequently bisected the issue within the mainline kernel tree > >> to the following commit: > >> > >> 3eab9d7bc2f4 ("fuse: convert readahead to use folios") > > I see that commit removes folio_put() from fuse_readpages_end(). Also it now > uses readahead_folio() in fuse_readahead() which does folio_put(). So that's > suspicious to me. It might be storing pointers to pages to ap->pages without > pinning them with a refcount. > > But I don't understand the code enough to know what's the proper fix. A > probably stupid fix would be to use __readahead_folio() instead and keep the > folio_put() in fuse_readpages_end(). Agreed, I'm also confused as to what the right thing is here. It appears the rules are "if the folio is locked, nobody messes with it", so it's not "correct" to hold a reference on the folio while it's being read. I don't love this way of dealing with folios, but that seems to be the way it's always worked. I went and looked at a few of the other file systems and we have NFS which holds it's own reference to the folio while the IO is outstanding, which FUSE is most similar to NFS so this would make sense to do. Btrfs however doesn't do this, BUT we do set_folio_private (or whatever it's called) so that keeps us from being reclaimed since we'll try to lock the folio before we do the reclaim. So perhaps that's the issue here? We need to have a private on the folio + the folio locked to make sure it doesn't get reclaimed while it's out being read? I'm knee deep in other things, if we want a quick fix then I think your suggestion is correct Vlastimil. But I definitely want to know what Willy expects to be the proper order of operations here, and if this is exactly what we're supposed to be doing then something else is going wrong and we should try to reproduce locally and figure out what's happening. Thanks, Josef