Re: [patch 0/5] device mapper percpu patches

Mike Snitzer <snitzer@xxxxxxxxxx> · Wed, 7 Nov 2018 17:29:05 -0500

On Tue, Nov 06 2018 at  4:34pm -0500,
Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

> Hi
> 
> These are the device mapper percpu patches.
> 
> Note that I didn't test request-based device mapper because I don't have
> hardware for it (the patches don't convert request-base targets to percpu
> values, but there are a few inevitable changes in dm-rq.c).

Patches 1 - 3 make sense.  But the use of percpu inflight counters isn't
something I can get upstream.  Any more scalable counter still needs to
be wired up to the block stats interfaces (the one you did in patch 5 is
only for the "inflight" fsffs file, there is also the generic diskstats
callout to part_in_flight(), etc).  Wiring up both part_in_flight() and
part_in_flight_rw() to optionally callout to a new callback isn't going
to fly.. especially if that callout is looping up the sum of percpu
counters.

I checked with Jens and now that in 4.21 all of the old request-based IO
path is gone (and given that blk-mq bypasses use of ->in_flight[]): the
only consumer of the existing ->in_flight[] is the bio-based IO path.

Given that now only bio-based is consuming it, and your work was focused
on making bio-based DM's "pending" IO accounting more scalable, it is
best to just change block core's ->in_flight[] directly.

But Jens is against switching to using percpu counters because they are
really slow when summing the counts.  And diskstats does that
frequently.  Jens said at least 2 other attempts were made and rejected
to switch over to percpu counters.

Jens' suggestion is to implement a new generic rolling per-node
counter.  Would you be open to trying that?

Mike

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel