On Sun, Jul 28, 2024 at 1:45 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > That is really weird. AFAICT, 2e9d7e4b984a61 is just removing some > wrapper functions and changing the names of some others. There should > be no functional changes there. Exactly what I thought, I could not imagine how this commit could cause such a bug. The only chance was that netfs_rreq_assess() now always directly calls netfs_rreq_completed(), but not netfs_rreq_write_to_cache(), but I don't know what that means - this different code path could be a candidate for doing something differently. Maybe it's an old bug that only got revealed by this change. Anyway, I tried to verify this and the preceding commit for hours, and the picture was consistent: that commit reproduces the RCU stall within minutes (though only 50% or so of all boots), and the previous commit never did. There is still a tiny chance that I just wasn't trying hard enough. I'm out of ideas, and all I can do now is start digging really deeply into this code, but I thought it would be more productive to reach out to the people who wrote it. Max