On Mon, Sep 30, 2024 at 07:34:39PM +0200, Christian Theune wrote: > Hi, > > we’ve been running a number of VMs since last week on 6.11. We’ve > encountered one hung task situation multiple times now that seems > to be resolving itself after a bit of time, though. I do not see > spinning CPU during this time. > > The situation seems to be related to cgroups-based IO throttling / > weighting so far: ..... > Sep 28 03:39:19 <redactedhostname>10 kernel: INFO: task nix-build:94696 blocked for more than 122 seconds. > Sep 28 03:39:19 <redactedhostname>10 kernel: Not tainted 6.11.0 #1-NixOS > Sep 28 03:39:19 <redactedhostname>10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Sep 28 03:39:19 <redactedhostname>10 kernel: task:nix-build state:D stack:0 pid:94696 tgid:94696 ppid:94695 flags:0x00000002 > Sep 28 03:39:19 <redactedhostname>10 kernel: Call Trace: > Sep 28 03:39:19 <redactedhostname>10 kernel: <TASK> > Sep 28 03:39:19 <redactedhostname>10 kernel: __schedule+0x3a3/0x1300 > Sep 28 03:39:19 <redactedhostname>10 kernel: schedule+0x27/0xf0 > Sep 28 03:39:19 <redactedhostname>10 kernel: io_schedule+0x46/0x70 > Sep 28 03:39:19 <redactedhostname>10 kernel: folio_wait_bit_common+0x13f/0x340 > Sep 28 03:39:19 <redactedhostname>10 kernel: folio_wait_writeback+0x2b/0x80 > Sep 28 03:39:19 <redactedhostname>10 kernel: truncate_inode_partial_folio+0x5e/0x1b0 > Sep 28 03:39:19 <redactedhostname>10 kernel: truncate_inode_pages_range+0x1de/0x400 > Sep 28 03:39:19 <redactedhostname>10 kernel: evict+0x29f/0x2c0 > Sep 28 03:39:19 <redactedhostname>10 kernel: do_unlinkat+0x2de/0x330 That's not what I'd call expected behaviour. By the time we are that far through eviction of a newly unlinked inode, we've already removed the inode from the writeback lists and we've supposedly waited for all writeback to complete. IOWs, there shouldn't be a cached folio in writeback state at this point in time - we're supposed to have guaranteed all writeback has already compelted before we call truncate_inode_pages_final().... So how are we getting a partial folio that is still under writeback at this point in time? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx