On Tue, Jan 11, 2022 at 09:58:29AM -0800, Darrick J. Wong wrote: > On Tue, Jan 11, 2022 at 04:04:37PM +1100, Dave Chinner wrote: > > On Thu, Jan 06, 2022 at 01:40:33PM -0800, Darrick J. Wong wrote: > > > On Fri, Dec 10, 2021 at 11:09:56AM +1100, Dave Chinner wrote: > > > > Hi Darrick, > > > > > > > > Can you please pull the following changes from the tag listed below > > > > for the XFS dev tree? > > > > > > Hi Dave, > > > > > > I tried, but the regressions with generic/017 persist. It trips the > > > ticket reservation pretty consistently within 45-60 seconds of starting, > > > at least on the OCI VM that I created. /dev/sd[ab] are (software > > > defined) disks that can sustain reads of ~50MB/s and ~5000iops; and > > > writes of about half those numbers. > > > > > > run fstests generic/017 at 2022-01-06 13:18:59 > > > XFS (sda4): Mounting V5 Filesystem > > > XFS (sda4): Ending clean mount > > > XFS (sda4): Quotacheck needed: Please wait. > > > XFS (sda4): Quotacheck: Done. > > > XFS (sda4): ctx ticket reservation ran out. Need to up reservation > > > XFS (sda4): ticket reservation summary: > > > XFS (sda4): unit res = 548636 bytes > > > XFS (sda4): current res = -76116 bytes > > > XFS (sda4): original count = 1 > > > XFS (sda4): remaining count = 1 > > > XFS (sda4): Log I/O Error (0x2) detected at xlog_write+0x5ee/0x660 [xfs] (fs/xfs/xfs_log.c:2499). Shutting down filesystem. > > > XFS (sda4): Please unmount the filesystem and rectify the problem(s) > > > XFS (sda3): Unmounting Filesystem > > > XFS (sda4): Unmounting Filesystem > > > > Ok, I *think* I've worked out what was going on here. The patch > > below has run several hundred iterations of g/017 with an external > > log on two different fs/log size configurations that typically > > reproduced in within 10 cycles. > > > > Essentially, the problem is largely caused by using > > XLOG_CIL_BLOCKING_SPACE_LIMIT() instead of XLOG_CIL_SPACE_LIMIT() > > when determining how much used space we can allow the percpu > > counters to accumulate before aggregating them back into the global > > counter. Using the hard limit meant that we could accumulate almost > > the entire hard limit before we aggregate even a single percpu value > > back into the global limit, resulting in failing to trigger either > > condition for aggregation until we'd effectively blown through the > > hard limit. > > > > This then meant the extra reservations that need to be taken for > > space used beyond the hard limit didn't get stolen for the ctx > > ticket, and it then overruns. > > > > It also means that we could overrun the hard limit substantially > > before throttling kicked in. With the percpu aggregation threshold > > brought back down to the (soft limit / num online cpus) we are > > guaranteed to always start aggregation back into the global counter > > before or at the point in time the soft limit should be hit, meaning > > that we start updating the global counter much sooner and so are it > > tracks actual space used once over the soft limit much more closely. > > > > Darrick, can you rerun the branch with the patch below also included, and > > see if it reproduces on your setup? If it does, can you grab a trace > > of the trace_printk() calls I left in the patch? > > Ok, I'll do that and report back. ...everything passes now, except for generic/650 on the same machine that has a 128M external log: [21310.267037] run fstests generic/650 at 2022-01-11 22:41:45 [21311.121539] XFS (sda3): Mounting V5 Filesystem [21312.295609] XFS (sda3): Ending clean mount [21314.160622] smpboot: CPU 2 is now offline [21314.737842] smpboot: Booting Node 0 Processor 2 APIC 0x2 [21314.740726] kvm-clock: cpu 2, msr 43f608081, secondary cpu clock [21314.787354] kvm-guest: stealtime: cpu 2, msr 43fd1b040 [21315.917154] smpboot: CPU 1 is now offline [21317.993809] x86: Booting SMP configuration: [21317.996484] smpboot: Booting Node 0 Processor 1 APIC 0x1 [21318.001940] kvm-clock: cpu 1, msr 43f608041, secondary cpu clock [21318.020207] kvm-guest: stealtime: cpu 1, msr 43fc9b040 [21320.126617] smpboot: CPU 3 is now offline [21320.127744] XFS (sda3): ctx ticket reservation ran out. Need to up reservation [21320.153944] XFS (sda3): ticket reservation summary: [21320.158868] XFS (sda3): unit res = 2100 bytes [21320.163064] XFS (sda3): current res = -40 bytes [21320.167323] XFS (sda3): original count = 1 [21320.170436] XFS (sda3): remaining count = 1 [21320.171742] XFS (sda3): Log I/O Error (0x2) detected at xlog_write+0x5f3/0x670 [xfs] (fs/xfs/xfs_log.c:2512). Shutting down filesystem. [21320.176445] XFS (sda3): Please unmount the filesystem and rectify the problem(s) [21320.179719] potentially unexpected fatal signal 6. [21320.182490] potentially unexpected fatal signal 6. [21320.182632] potentially unexpected fatal signal 6. [21320.183842] CPU: 0 PID: 3460987 Comm: fsstress Tainted: G W 5.16.0-rc5-djwx #rc5 79050ab45c4cbd1b9fbe98125ec0eea3a2cdfa1d --D > > Note that this change does not make the algorithm fully correct - we > > can still have accumulation on other CPUs that isn't folded back > > into the global value. What I want is feedback on whether it makes > > the problem largely go away on configs other than my own before > > spending more time coming up with a better lockless aggregation > > algorithm... > > --D > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > > > --- > > fs/xfs/xfs_log_cil.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > > index 366c0aaad640..47d46d6e15b3 100644 > > --- a/fs/xfs/xfs_log_cil.c > > +++ b/fs/xfs/xfs_log_cil.c > > @@ -96,6 +96,9 @@ xlog_cil_pcp_aggregate( > > ctx->ticket->t_curr_res += cilpcp->space_reserved; > > ctx->ticket->t_unit_res += cilpcp->space_reserved; > > cilpcp->space_reserved = 0; > > + trace_printk("cilpcp space used %d, reserved %d unit-res %d cur-res %d", > > + cilpcp->space_used, cilpcp->space_reserved, > > + ctx->ticket->t_unit_res, ctx->ticket->t_curr_res); > > > > if (!list_empty(&cilpcp->busy_extents)) { > > list_splice_init(&cilpcp->busy_extents, > > @@ -515,11 +518,16 @@ xlog_cil_insert_items( > > * > > * This can steal more than we need, but that's OK. > > */ > > - space_used = atomic_read(&ctx->space_used); > > + space_used = atomic_read(&ctx->space_used) + len; > > if (atomic_read(&cil->xc_iclog_hdrs) > 0 || > > - space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { > > + space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { > > int split_res = log->l_iclog_hsize + > > sizeof(struct xlog_op_header); > > + > > + trace_printk("space used %d, len %d iclog hdrs %d, slim %d, hlim %d", > > + space_used, len, atomic_read(&cil->xc_iclog_hdrs), > > + XLOG_CIL_SPACE_LIMIT(log), > > + XLOG_CIL_BLOCKING_SPACE_LIMIT(log)); > > if (ctx_res) > > ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1); > > else > > @@ -540,8 +548,9 @@ xlog_cil_insert_items( > > cilpcp->space_used += len; > > if (space_used >= XLOG_CIL_SPACE_LIMIT(log) || > > cilpcp->space_used > > > - ((XLOG_CIL_BLOCKING_SPACE_LIMIT(log) - space_used) / > > - num_online_cpus())) { > > + (XLOG_CIL_SPACE_LIMIT(log) / num_online_cpus())) { > > + trace_printk("cilpcp space used %d, reserved %d ctxres %d", > > + cilpcp->space_used, cilpcp->space_reserved, ctx_res); > > atomic_add(cilpcp->space_used, &ctx->space_used); > > cilpcp->space_used = 0; > > } > > @@ -1331,6 +1340,7 @@ xlog_cil_push_background( > > > > spin_lock(&cil->xc_push_lock); > > if (cil->xc_push_seq < cil->xc_current_sequence) { > > + trace_printk("push sapce used %d", space_used); > > cil->xc_push_seq = cil->xc_current_sequence; > > queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); > > }