> On Jun 15, 2023, at 9:27 PM, Wengang Wang <wen.gang.wang@xxxxxxxxxx> wrote: > > > >> On Jun 15, 2023, at 5:42 PM, Wengang Wang <wen.gang.wang@xxxxxxxxxx> wrote: >> >> >> >>> On Jun 15, 2023, at 5:17 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >>> >>> On Thu, Jun 15, 2023 at 11:51:09PM +0000, Wengang Wang wrote: >>>> >>>> >>>>> On Jun 15, 2023, at 4:33 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >>>>> >>>>> On Thu, Jun 15, 2023 at 11:09:41PM +0000, Wengang Wang wrote: >>>>>> When mounting the problematic metadump with the patches, I see the following reported. >>>>>> >>>>>> For more information about troubleshooting your instance using a console connection, see the documentation: https://docs.cloud.oracle.com/en-us/iaas/Content/Compute/References/serialconsole.htm#four >>>>>> ================================================= >>>>>> [ 67.212496] loop: module loaded >>>>>> [ 67.214732] loop0: detected capacity change from 0 to 629137408 >>>>>> [ 67.247542] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030. >>>>>> [ 67.249257] XFS (loop0): Mounting V4 Filesystem af755a98-5f62-421d-aa81-2db7bffd2c40 >>>>>> [ 72.241546] XFS (loop0): Starting recovery (logdev: internal) >>>>>> [ 92.218256] XFS (loop0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x3f6/0x870 [xfs] >>>>>> [ 92.249802] CPU: 1 PID: 4201 Comm: mount Not tainted 6.4.0-rc6 #8 >>>>> >>>>> What is the test you are running? Please describe how you reproduced >>>>> this failure - a reproducer script would be the best thing here. >>>> >>>> I was mounting a (copy of) V4 metadump from customer. >>> >>> Is the metadump obfuscated? Can I get a copy of it via a private, >>> secure channel? >> >> I am OK to give you a copy after I get approvement for that. >> >>> >>>>> Does the test fail on a v5 filesytsem? >>>> >>>> N/A. >>>> >>>>> >>>>>> I think that’s because that the same EFI record was going to be freed again >>>>>> by xfs_extent_free_finish_item() after it already got freed by xfs_efi_item_recover(). >>> >>> How is this happening? Where (and why) are we defering an extent we >>> have successfully freed into a new xefi that we create a new intent >>> for and then defer? >>> >>> Can you post the debug output and analysis that lead you to this >>> observation? I certainly can't see how this can happen from looking >>> at the code >>> >>>>>> I was trying to fix above issue in my previous patch by checking the intent >>>>>> log item’s lsn and avoid running iop_recover() in xlog_recover_process_intents(). >>>>>> >>>>>> Now I am thinking if we can pass a flag, say XFS_EFI_PROCESSED, from >>>>>> xfs_efi_item_recover() after it processed that record to the xfs_efi_log_item >>>>>> memory structure somehow. In xfs_extent_free_finish_item(), we skip to process >>>>>> that xfs_efi_log_item on seeing XFS_EFI_PROCESSED and return OK. By that >>>>>> we can avoid the double free. >>>>> >>>>> I'm not really interested in speculation of the cause or the fix at >>>>> this point. I want to know how the problem is triggered so I can >>>>> work out exactly what caused it, along with why we don't have >>>>> coverage of this specific failure case in fstests already. >>>>> >>>> >>>> I get to know the cause by adding additional debug log along with >>>> my previous patch. >>> >>> Can you please post that debug and analysis, rather than just a >>> stack trace that is completely lacking in context? Nothing can be >>> inferred from a stack trace, and what you are saying is occurring >>> does not match what the code should actually be doing. So I need to >>> actually look at what is happening in detail to work out where this >>> mismatch is coming from.... >> >> The debug patch was based on my previous patch, I will rework the debug patch >> basing on yours. I will share you the debug patch, output and my analysis later. >> > > My analysis: > It’s problem of double free. The first free is from xfs_efi_item_recover(), the > second free is from xfs_extent_free_finish_item(). > Dave’s patch makes it possible that xfs_trans_free_extent() returns -EAGAIN. > The double free problem begins when the -EAGAIN is returned. > > 1. -EAGAIN returned by xfs_trans_free_extent(), see line 5 in the debug output. > 2. according to -EAGAIN, xfs_free_extent_later() is called to create a deferred > operation of type XFS_DEFER_OPS_TYPE_AGFL_FREE to current transaction. > see line 6. > 3. New EFI (xfs_efi_log_item) is created and attached to current transaction. > And the deferred options is moved to capture_list. see line 9. The call path is: > xfs_efi_item_recover() > xfs_defer_ops_capture_and_commit(tp, capture_list) > xfs_defer_ops_capture() > xfs_defer_create_intents() > xfs_defer_create_intent() > ops->create_intent() —> xfs_extent_free_create_intent() > 4. The new EFI is committed with current transaction. see line 10. > 5. The mount process (log recover) continue to work with other intents. line 11 and line 12. > 6. The committed new EFI was added to AIL after log flush by a parallel thread. line 20. > 7. The mount process (log recover) continue work with other intents. line 21 to line 35. > 8. The mount process (log recover) come to process the new EFI that was added to AIL > at step 6. I guess the previous log items, which were added to AIL together with the > EFI, were removed by xfsaild (I didn’t log that) so they didn’t appear in the AIL intents > iteration in xlog_recover_process_intents(). see line 36. Checking the code again in xfs_trans_ail_update_bulk(), Every log items are added to the temporary list as head. The new EFI was the last one of those which were added to the temporary list. As a result, the new EFI is finally the first one among those log items in the temp list to be added to AIL. That’s why we see the new EFI before other newly added. thanks, wengang > 9. The new EFI record (0x25441ca, 0x30) is freed by xfs_efi_item_recover(). \ > That’s the first free. see line 37. > 10. The AIL intents iteration is done. It begins to process the capture_list. > 11. it comes to the XFS_DEFER_OPS_TYPE_AGFL_FREE deferred operation which is > added in step 3. xfs_extent_free_finish_item() is called to free (2nd free) (0x25441ca, 0x30) > and it failed because (0x25441ca, 0x30) was already freed at step 9. see line 43. > So, it’s a double free issue. > > ftrace log output: > 1 mount-4557 [003] ...1. 160.746048: xlog_recover_process_intents: last_lsn: 1b0000515f > 2 mount-4557 [003] ...1. 160.746049: xlog_recover_process_intents: processing intent 00000000e190b7a0 lsn=1a000067b4 > 3 mount-4557 [003] ..... 160.746059: xfs_efi_item_recover: recover EFI 00000000e190b7a0 (0x2543cc4, 0x30) > 4 mount-4557 [003] ..... 160.746089: xfs_efi_item_recover: recover EFI 00000000e190b7a0 (0x25441ca, 0x30) > 5 mount-4557 [003] ..... 160.746095: xfs_efi_item_recover: -EAGAIN for EFI record (0x25441ca, 0x30) > 6 mount-4557 [003] ..... 160.746096: __xfs_free_extent_later: adding deferred op XFS_DEFER_OPS_TYPE_AGFL_FREE (0x25441ca, 0x30) to trans 00000000e503eaf8 > 7 mount-4557 [003] ..... 160.746097: xfs_efi_item_recover: recover EFI 00000000e190b7a0 (0x25441b0, 0x1) > 8 mount-4557 [003] ..... 160.746098: __xfs_free_extent_later: adding deferred op XFS_DEFER_OPS_TYPE_AGFL_FREE (0x25441b0, 0x1) to trans 00000000e503eaf8 > 9 mount-4557 [003] ..... 160.746099: xfs_extent_free_create_intent: new EFI 000000002027a359 to trans 00000000e503eaf8 > 10 mount-4557 [003] ..... 160.746100: xfs_defer_ops_capture_and_commit: committing trans: 00000000e503eaf8 > 11 mount-4557 [003] ...1. 160.746109: xlog_recover_process_intents: processing intent 00000000def04d9f lsn=1a000069c4 > 12 mount-4557 [003] ..... 160.746110: xfs_efi_item_recover: recover EFI 00000000def04d9f (0x22ad0b9, 0x1) > 13 kworker/42:1H-502 [042] ...1. 161.025882: xfs_trans_ail_update_bulk: adding lip 00000000cc1ccecb to AIL > 14 kworker/42:1H-502 [042] ...1. 161.025883: xfs_trans_ail_update_bulk: adding lip 00000000b8a3774a to AIL > 15 kworker/42:1H-502 [042] ...1. 161.025884: xfs_trans_ail_update_bulk: adding lip 000000002190f497 to AIL > 16 kworker/42:1H-502 [042] ...1. 161.025884: xfs_trans_ail_update_bulk: adding lip 0000000043992ff9 to AIL > 17 kworker/42:1H-502 [042] ...1. 161.025885: xfs_trans_ail_update_bulk: adding lip 000000003554d2a3 to AIL > 18 kworker/42:1H-502 [042] ...1. 161.025885: xfs_trans_ail_update_bulk: adding lip 00000000823e9198 to AIL > 19 kworker/42:1H-502 [042] ...1. 161.025885: xfs_trans_ail_update_bulk: adding lip 00000000ee3155a9 to AIL > 20 kworker/42:1H-502 [042] ...1. 161.025886: xfs_trans_ail_update_bulk: adding lip 000000002027a359 to AIL > 21 mount-4557 [003] ...1. 161.025937: xlog_recover_process_intents: processing intent 000000005baced87 lsn=1a00006e20 > 22 mount-4557 [003] ..... 161.025945: xfs_efi_item_recover: recover EFI 000000005baced87 (0x254425f, 0x2c) > 23 mount-4557 [003] ..... 161.025952: xfs_efi_item_recover: recover EFI 000000005baced87 (0x2544350, 0x14) > 24 mount-4557 [003] ...1. 161.025961: xlog_recover_process_intents: processing intent 0000000032c6e417 lsn=1a000072aa > 25 mount-4557 [003] ..... 161.025962: xfs_efi_item_recover: recover EFI 0000000032c6e417 (0x20e30a4, 0x1) > 26 mount-4557 [003] ...1. 161.025972: xlog_recover_process_intents: processing intent 000000001ce29d3a lsn=1a00007c5c > 27 mount-4557 [003] ..... 161.025973: xfs_efi_item_recover: recover EFI 000000001ce29d3a (0x20d2ec8, 0x1) > 28 mount-4557 [003] ...1. 161.025980: xlog_recover_process_intents: processing intent 0000000021ef376d lsn=1a0000abfb > 29 mount-4557 [003] ..... 161.025981: xfs_efi_item_recover: recover EFI 0000000021ef376d (0x23552bd, 0x1) > 30 mount-4557 [003] ...1. 161.025988: xlog_recover_process_intents: processing intent 00000000e1d79fc3 lsn=1a00011100 > 31 mount-4557 [003] ..... 161.025989: xfs_efi_item_recover: recover EFI 00000000e1d79fc3 (0x258bf61, 0x1) > 32 mount-4557 [003] ...1. 161.025996: xlog_recover_process_intents: processing intent 00000000a1474aa7 lsn=1a00017075 > 33 mount-4557 [003] ..... 161.025998: xfs_efi_item_recover: recover EFI 00000000a1474aa7 (0x2028b96, 0x1) > 34 mount-4557 [003] ...1. 161.026005: xlog_recover_process_intents: processing intent 000000005ae385b9 lsn=1a0001810e > 35 mount-4557 [003] ..... 161.026006: xfs_efi_item_recover: recover EFI 000000005ae385b9 (0x231579e, 0x1) > 36 mount-4557 [003] ...1. 161.026013: xlog_recover_process_intents: processing intent 000000002027a359 lsn=1b0000515f > 37 mount-4557 [003] ..... 161.026014: xfs_efi_item_recover: recover EFI 000000002027a359 (0x25441ca, 0x30) > 38 mount-4557 [003] ..... 161.026019: xfs_efi_item_recover: recover EFI 000000002027a359 (0x25441b0, 0x1) > 39 kworker/42:1H-502 [042] ...1. 161.027123: xfs_trans_ail_update_bulk: adding lip 00000000cc1ccecb to AIL > 40 kworker/42:1H-502 [042] ...1. 161.027123: xfs_trans_ail_update_bulk: adding lip 000000006c56fcb9 to AIL > 41 kworker/42:1H-502 [042] ...1. 161.027124: xfs_trans_ail_update_bulk: adding lip 00000000823e9198 to AIL > 42 kworker/42:1H-502 [042] ...1. 161.027124: xfs_trans_ail_update_bulk: adding lip 00000000b8a3774a to AIL > 43 mount-4557 [003] .N... 161.076277: xfs_extent_free_finish_item: failed, efi: 000000002027a359, (0x25441ca, 0x30) > > > The debug patch: > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > 2 index 7c675aae0a0f..0986af74012f 100644 > 3 --- a/fs/xfs/libxfs/xfs_alloc.c > 4 +++ b/fs/xfs/libxfs/xfs_alloc.c > 5 @@ -2495,6 +2495,10 @@ xfs_defer_agfl_block( > 6 return 0; > 7 } > 8 > 9 +bool should_debug(struct xfs_mount *mp) > 10 +{ > 11 + return strcmp(mp->m_super->s_id, "loop0") == 0; > 12 +} > 13 /* > 14 * Add the extent to the list of extents to be free at transaction end. > 15 * The list is maintained sorted (by block number). > 16 @@ -2551,6 +2555,9 @@ __xfs_free_extent_later( > 17 XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); > 18 > 19 xfs_extent_free_get_group(mp, xefi); > 20 + if (should_debug(mp)) > 21 + trace_printk("adding deferred op XFS_DEFER_OPS_TYPE_AGFL_FREE (0x%llx, 0x%x) to trans %p\n", > 22 + xefi->xefi_startblock, xefi->xefi_blockcount, tp); > 23 xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); > 24 return 0; > 25 } > 26 diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > 27 index bcfb6a4203cd..e4a4f39bdd7f 100644 > 28 --- a/fs/xfs/libxfs/xfs_defer.c > 29 +++ b/fs/xfs/libxfs/xfs_defer.c > 30 @@ -342,6 +342,7 @@ xfs_defer_restore_resources( > 31 } > 32 } > 33 > 34 +extern bool should_debug(struct xfs_mount *mp); > 35 /* Roll a transaction so we can do some deferred op processing. */ > 36 STATIC int > 37 xfs_defer_trans_roll( > 38 @@ -799,6 +800,8 @@ xfs_defer_ops_capture_and_commit( > 39 if (!dfc) > 40 return xfs_trans_commit(tp); > 41 > 42 + if (should_debug(mp)) > 43 + trace_printk("committing trans: %p\n", tp); > 44 /* Commit the transaction and add the capture structure to the list. */ > 45 error = xfs_trans_commit(tp); > 46 if (error) { > 47 diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c > 48 index 3b33d27efdce..2c19010d8e5b 100644 > 49 --- a/fs/xfs/xfs_extfree_item.c > 50 +++ b/fs/xfs/xfs_extfree_item.c > 51 @@ -463,6 +463,7 @@ xfs_extent_free_log_item( > 52 extp->ext_len = xefi->xefi_blockcount; > 53 } > 54 > 55 +extern bool should_debug(struct xfs_mount *mp); > 56 static struct xfs_log_item * > 57 xfs_extent_free_create_intent( > 58 struct xfs_trans *tp, > 59 @@ -476,6 +477,8 @@ xfs_extent_free_create_intent( > 60 > 61 ASSERT(count > 0); > 62 > 63 + if (should_debug(mp)) > 64 + trace_printk("new EFI %p to trans %p\n", efip, tp); > 65 xfs_trans_add_item(tp, &efip->efi_item); > 66 if (sort) > 67 list_sort(mp, items, xfs_extent_free_diff_items); > 68 @@ -524,6 +527,7 @@ xfs_extent_free_finish_item( > 69 { > 70 struct xfs_extent_free_item *xefi; > 71 int error; > 72 + struct xfs_efi_log_item *efip = EFD_ITEM(done)->efd_efip; > 73 > 74 xefi = container_of(item, struct xfs_extent_free_item, xefi_list); > 75 > 76 @@ -536,6 +540,9 @@ xfs_extent_free_finish_item( > 77 if (error == -EAGAIN) > 78 return error; > 79 > 80 + if (error && should_debug(tp->t_mountp)) > 81 + trace_printk("failed, efi: %p, (0x%llx, 0x%x)\n", efip, xefi->xefi_startblock, xefi->xefi_blockcount); > 82 + > 83 xfs_extent_free_put_group(xefi); > 84 kmem_cache_free(xfs_extfree_item_cache, xefi); > 85 return error; > 86 @@ -694,6 +701,9 @@ xfs_efi_item_recover( > 87 fake.xefi_startblock = extp->ext_start; > 88 fake.xefi_blockcount = extp->ext_len; > 89 > 90 + if (should_debug(mp)) > 91 + trace_printk("recover EFI %p (0x%llx, 0x%x)\n", efip, fake.xefi_startblock, fake.xefi_blockcount); > 92 + > 93 if (!requeue_only) { > 94 xfs_extent_free_get_group(mp, &fake); > 95 error = xfs_trans_free_extent(tp, efdp, &fake); > 96 @@ -706,6 +716,8 @@ xfs_efi_item_recover( > 97 * run again later with a new transaction context. > 98 */ > 99 if (error == -EAGAIN || requeue_only) { > 100 + if (error == -EAGAIN && should_debug(mp)) > 101 + trace_printk("-EAGAIN for EFI record (0x%llx, 0x%x)\n", fake.xefi_startblock, fake.xefi_blockcount); > 102 xfs_free_extent_later(tp, fake.xefi_startblock, > 103 fake.xefi_blockcount, &XFS_RMAP_OINFO_ANY_OWNER); > 104 requeue_only = true; > 105 diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > 106 index 82c81d20459d..a40ed9a2b70a 100644 > 107 --- a/fs/xfs/xfs_log_recover.c > 108 +++ b/fs/xfs/xfs_log_recover.c > 109 @@ -2515,6 +2515,7 @@ xlog_abort_defer_ops( > 110 } > 111 } > 112 > 113 +extern bool should_debug(struct xfs_mount *mp); > 114 /* > 115 * When this is called, all of the log intent items which did not have > 116 * corresponding log done items should be in the AIL. What we do now is update > 117 @@ -2540,15 +2541,16 @@ xlog_recover_process_intents( > 118 struct xfs_log_item *lip; > 119 struct xfs_ail *ailp; > 120 int error = 0; > 121 -#if defined(DEBUG) || defined(XFS_WARN) > 122 xfs_lsn_t last_lsn; > 123 -#endif > 124 + int debug = 0; > 125 > 126 ailp = log->l_ailp; > 127 spin_lock(&ailp->ail_lock); > 128 -#if defined(DEBUG) || defined(XFS_WARN) > 129 last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block); > 130 -#endif > 131 + if (should_debug(log->l_mp)) { > 132 + debug = 1; > 133 + trace_printk("last_lsn: %llx\n", last_lsn); > 134 + } > 135 for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); > 136 lip != NULL; > 137 lip = xfs_trans_ail_cursor_next(ailp, &cur)) { > 138 @@ -2563,7 +2565,8 @@ xlog_recover_process_intents( > 139 * of recovery. > 140 */ > 141 ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0); > 142 - > 143 + if (debug) > 144 + trace_printk("processing intent %p lsn=%llx\n", lip, lip->li_lsn); > 145 /* > 146 * NOTE: If your intent processing routine can create more > 147 * deferred ops, you /must/ attach them to the capture list in > 148 diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > 149 index 8c0bfc9a33b1..42b50b4e1c47 100644 > 150 --- a/fs/xfs/xfs_trans.c > 151 +++ b/fs/xfs/xfs_trans.c > 152 @@ -1125,6 +1125,7 @@ xfs_trans_cancel( > 153 xfs_trans_free(tp); > 154 } > 155 > 156 +extern bool should_debug(struct xfs_mount *mp); > 157 /* > 158 * Roll from one trans in the sequence of PERMANENT transactions to > 159 * the next: permanent transactions are only flushed out when > 160 diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > 161 index 7d4109af193e..013889b18846 100644 > 162 --- a/fs/xfs/xfs_trans_ail.c > 163 +++ b/fs/xfs/xfs_trans_ail.c > 164 @@ -769,6 +769,7 @@ xfs_ail_update_finish( > 165 xfs_log_space_wake(log->l_mp); > 166 } > 167 > 168 +extern bool should_debug(struct xfs_mount *mp); > 169 /* > 170 * xfs_trans_ail_update - bulk AIL insertion operation. > 171 * > 172 @@ -801,9 +802,10 @@ xfs_trans_ail_update_bulk( > 173 { > 174 struct xfs_log_item *mlip; > 175 xfs_lsn_t tail_lsn = 0; > 176 - int i; > 177 + int i, debug; > 178 LIST_HEAD(tmp); > 179 > 180 + debug = should_debug(ailp->ail_log->l_mp); > 181 ASSERT(nr_items > 0); /* Not required, but true. */ > 182 mlip = xfs_ail_min(ailp); > 183 > 184 @@ -823,6 +825,8 @@ xfs_trans_ail_update_bulk( > 185 trace_xfs_ail_insert(lip, 0, lsn); > 186 } > 187 lip->li_lsn = lsn; > 188 + if (debug) > 189 + trace_printk("adding lip %p to AIL\n", lip); > 190 list_add(&lip->li_ail, &tmp); > 191 } > 192 > > > thanks, > wengang