Re: [vdo-devel] [PATCH v2 00/39] Add the dm-vdo deduplication and compression device mapper target.

Sweet Tea Dorminy <sweettea-kernel@xxxxxxxxxx> · Thu, 27 Jul 2023 11:29:51 -0400

If kernel workqueues have higher overhead per item for the lightweight
work VDO currently does in each step, perhaps the dual of the current
scheme would let more work get done per fixed queuing overhead, and
thus perform better? VIOs could take locks on sections of structures,
and operate on multiple structures before requeueing.

Can you suggest a little more specifically what the "dual" is you're
picturing?

It sounds like your experiment consisted of one kernel workqueue per 
existing thread, with VIOs queueing on each thread in turn precisely as 
they do at present, so that when the VIO work item is running it's 
guaranteed to be the unique actor on a particular set of structures 
(e.g. for a physical thread the physical zone and slabs).

I am thinking of an alternate scheme where e.g. each slab, each block 
map zone, each packer would be protected by a lock instead of owned by a 
thread. There would be one workqueue with concurrency allowed where all 
VIOs would operate.

VIOs would do an initial queuing on a kernel workqueue, and then when 
the VIO work item would run, they'd take and hold the appropriate locks 
while they operated on each structure. So they'd take and release slab 
locks until they found a free block; send off to UDS and get requeued 
when it came back or the timer expired; try to compress and take/release 
a lock on the packer while adding itself to a bin and get requeued if 
appropriate when the packer released it; write and requeue when the 
write finishes if relevant. Then I think the 'make whatever modification 
to structures is relevant' part can be done without any requeue: take 
and release the recovery journal lock; ditto on the relevant slab; again 
the journal; again the other slab; then the part of the block map; etc.

Yes, there's the intriguing ordering requirements to work through, but 
maybe as an initial performance experiment the ordering can be ignored 
to get an idea of whether this scheme could provide acceptable performance.

There are also occasionally non-VIO objects which get queued to invoke
actions on various threads, which I expect might further complicate the
experiment.

I think that's the easy part -- queueing a work item to grab a lock and 
Do Something seems to me a pretty common thing in the kernel code. 
Unless there are ordering requirements among the non-vios I'm not 
calling to mind.