Re: [PATCH 3/5] [RFC] xfs: use percpu counters for CIL context counters

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 14 May 2020 11:50:55 +1000

On Thu, May 14, 2020 at 07:52:41AM +1000, Dave Chinner wrote:
> On Wed, May 13, 2020 at 08:09:59AM -0400, Brian Foster wrote:
> > On Wed, May 13, 2020 at 09:36:27AM +1000, Dave Chinner wrote:
> > > On Tue, May 12, 2020 at 10:05:44AM -0400, Brian Foster wrote:
> > > > Particularly as it relates to percpu functionality. Does
> > > > the window scale with cpu count, for example? It might not matter either
> > > 
> > > Not really. We need a thundering herd to cause issues, and this
> > > occurs after formatting an item so we won't get a huge thundering
> > > herd even when lots of threads block on the xc_ctx_lock waiting for
> > > a push to complete.
> > > 
> > 
> > It would be nice to have some debug code somewhere that somehow or
> > another asserts or warns if the CIL reservation exceeds some
> > insane/unexpected heuristic based on the current size of the context. I
> > don't know what that code or heuristic looks like (i.e. multiple factors
> > of the ctx size?) so I'm obviously handwaving. Just something to think
> > about if we can come up with a way to accomplish that opportunistically.
> 
> I don't think there is a reliable mechanism that can be used here.
> At one end of the scale we have the valid case of a synchronous
> inode modification on a log with a 256k stripe unit. So it's valid
> to have a CIL reservation of ~550kB for a single item that consumes
> ~700 bytes of log space.
> 
> OTOH, we might be freeing extents on a massively fragmented file and
> filesystem, so we're pushing 200kB+ transactions into the CIL for
> every rolling transaction. On a filesystem with a 512 byte log
> sector size and no LSU, the CIL reservations are dwarfed by the
> actual metadata being logged...
> 
> I'd suggest that looking at the ungrant trace for the CIL ticket
> once it has committed will tell us exactly how much the reservation
> was over-estimated, as the unused portion of the reservation will be
> returned to the reserve grant head at this point in time.

Typical for this workload is a CIl ticket that looks like this at
ungrant time:

t_curr_res 13408 t_unit_res 231100
t_curr_res 9240 t_unit_res 140724
t_curr_res 46284 t_unit_res 263964
t_curr_res 29780 t_unit_res 190020
t_curr_res 38044 t_unit_res 342016
t_curr_res 21636 t_unit_res 321476
t_curr_res 21576 t_unit_res 263964
t_curr_res 42200 t_unit_res 411852
t_curr_res 21636 t_unit_res 292720
t_curr_res 62740 t_unit_res 514552
t_curr_res 17456 t_unit_res 284504
t_curr_res 29852 t_unit_res 411852
t_curr_res 13384 t_unit_res 206452
t_curr_res 70956 t_unit_res 518660
t_curr_res 70908 t_unit_res 333800
t_curr_res 50404 t_unit_res 518660
t_curr_res 17480 t_unit_res 321476
t_curr_res 33948 t_unit_res 436500
t_curr_res 17492 t_unit_res 317368
t_curr_res 50392 t_unit_res 489904
t_curr_res 13360 t_unit_res 325584
t_curr_res 66812 t_unit_res 506336
t_curr_res 33924 t_unit_res 366664
t_curr_res 70932 t_unit_res 551524
t_curr_res 29852 t_unit_res 374880
t_curr_res 25720 t_unit_res 494012
t_curr_res 42152 t_unit_res 506336
t_curr_res 21684 t_unit_res 543308
t_curr_res 29840 t_unit_res 440608
t_curr_res 46320 t_unit_res 551524
t_curr_res 21624 t_unit_res 387204
t_curr_res 29840 t_unit_res 522768

So we are looking at a reservation of up to 500KB, and typically
using all but a few 10s of KB of it.

I'll use this as the ballpark for the lockless code.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx