> On Jul 18, 2023, at 5:11 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Tue, Jul 18, 2023 at 10:57:38PM +0000, Wengang Wang wrote: >> Hi, >> >> I have a XFS metadump (was running with 4.14.35 plussing some back ported patches), >> mounting it (log recover) hang at log space reservation. There is 181760 bytes on-disk >> free journal space, while the transaction needs to reserve 360416 bytes to start the recovery. >> Thus the mount hangs for ever. > > Most likely something went wrong at runtime on the 4.14.35 kernel > prior to the crash, leaving the on-disk state in an impossible to > recover state. Likely an accounting leak in a transaction > reservation somewhere, likely in passing the space used from the > transaction to the CIL. We've had bugs in this area before, they > eventually manifest in log hangs like this either at runtime or > during recovery... > >> That happens with 4.14.35 kernel and also upstream >> kernel (6.4.0). > > Upgrading the kernel won't fix recovery - it is likely that the > journal state on disk is invalid and so the mount cannot complete > >> The is the related stack dumping (6.4.0 kernel): >> >> [<0>] xlog_grant_head_wait+0xbd/0x200 [xfs] >> [<0>] xlog_grant_head_check+0xd9/0x100 [xfs] >> [<0>] xfs_log_reserve+0xbc/0x1e0 [xfs] >> [<0>] xfs_trans_reserve+0x138/0x170 [xfs] >> [<0>] xfs_trans_alloc+0xe8/0x220 [xfs] >> [<0>] xfs_efi_item_recover+0x110/0x250 [xfs] >> [<0>] xlog_recover_process_intents.isra.28+0xba/0x2d0 [xfs] >> [<0>] xlog_recover_finish+0x33/0x310 [xfs] >> [<0>] xfs_log_mount_finish+0xdb/0x160 [xfs] >> [<0>] xfs_mountfs+0x51c/0x900 [xfs] >> [<0>] xfs_fs_fill_super+0x4b8/0x940 [xfs] >> [<0>] get_tree_bdev+0x193/0x280 >> [<0>] vfs_get_tree+0x26/0xd0 >> [<0>] path_mount+0x69d/0x9b0 >> [<0>] do_mount+0x7d/0xa0 >> [<0>] __x64_sys_mount+0xdc/0x100 >> [<0>] do_syscall_64+0x3b/0x90 >> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 >> >> Thus we can say 4.14.35 kernel didn’t reserve log space at IO time to make log recover >> safe. Upstream kernel doesn’t do that either if I read the source code right (I might be wrong). > > Sure they do. > > Log space usage is what the grant heads track; transactions are not > allowed to start if there isn't both reserve and write grant head > space available for them, and transaction rolls get held until there > is write grant space available for them (i.e. they can block in > xfs_trans_roll() -> xfs_trans_reserve() waiting for write grant head > space). > > There have been bugs in the grant head accounting mechanisms in the > past, there may well still be bugs in it. But it is the grant head > mechanisms that is supposed to guarantee there is always space in > the journal for a transaction to commit, and by extension, ensure > that we always have space in the journal for a transaction to be > fully recovered. > >> So shall we reserve proper amount of log space at IO time, call it Unflush-Reserve, to >> ensure log recovery safe? The number of UR is determined by current un flushed log items. >> It gets increased just after transaction is committed and gets decreased when log items are >> flushed. With the UR, we are safe to have enough log space for the transactions used by log >> recovery. > > The grant heads already track log space usage and reservations like > this. If you want to learn more about the nitty gritty details, look > at this patch set that is aimed at changing how the grant heads > track the used/reserved log space to improve performance: > > https://lore.kernel.org/linux-xfs/20221220232308.3482960-1-david@xxxxxxxxxxxxx/ Thanks Dave a lot! I will look more into the write head and above patch set. Have a good day, Wengang