On Fri, May 12, 2023 at 09:16:36AM -0600, Tycho Andersen wrote: > On Fri, May 12, 2023 at 11:45:47AM +1000, Dave Chinner wrote: > > > > Yeah, this is papering over the observed symptom, not addressing the > > root cause of the inodegc flush delay. What do you see when you run > > sysrq-w and sysrq-l? Are there inodegc worker threads blocked > > performing inodegc? > > I will try this next time we encounter this. > > > e.g. inodegc flushes could simply be delayed by an unlinked inode > > being processed that has millions of extents that need to be freed. > > > > In reality, inode reclaim can block for long periods of time > > on any filesystem, so the concept of "inode reclaim should > > not block when PF_EXITING" is not a behaviour that we guarantee > > anywhere or could guarantee across the board. > > > > Let's get to the bottom of why inodegc has apparently stalled before > > trying to work out how to fix it... > > I'm happy to try, but I think it is also worth applying this patch. > Like I said in the other thread, having to evac a box to get rid of an > unkillable userspace process is annoying. If inodegc is stuck, then it's only a matter of time before the filesystem will completely lock up and you'll have to cycle the machine anyway. This patch merely kicks the can down the road a few minutes, it doesn't change anything material. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx