Re: [vdo-devel] [PATCH v2 00/39] Add the dm-vdo deduplication and compression device mapper target.

Sweet Tea Dorminy <sweettea-kernel@xxxxxxxxxx> · Sun, 23 Jul 2023 02:24:32 -0400

We use a sort of message-passing arrangement where a worker thread is 
responsible for updating certain data structures as needed for the I/Os 
in progress, rather than having the processing of each I/O contend for 
locks on the data structures.  It gives us some good throughput under load but it does mean upwards of a dozen handoffs per 4kB write, depending on compressibility, whether the block is a duplicate, and various other factors. So processing 1 GB/s means handling over 3M messages per second, though each step of processing is generally lightweight. 

 There seems a natural duality between
work items passing between threads, each exclusively owning a structure, 
vs structures passing between threads, each exclusively owning a work 
item. In the first, the threads are grabbing a notional 'lock' on each 
item in turn to deal with their structure, as VDO does now; in the 
second, the threads are grabbing locks on each structure in turn to deal 
with their item.

If kernel workqueues have higher overhead per item for the lightweight 
work VDO currently does in each step, perhaps the dual of the current 
scheme would let more work get done per fixed queuing overhead, and thus 
perform better? VIOs could take locks on sections of structures, and 
operate on multiple structures before requeueing.

This might also enable more finegrained locking of structures than the 
chunks uniquely owned by threads at the moment. It would also be 
attractive to let the the kernel work queues deal with concurrency 
management instead of configuring the number of threads for each of a 
bunch of different structures at start time.

On the other hand, I played around with switching messagepassing to 
structurelocking in VDO a number of years ago for fun on the side, just 
extremely naively replacing each message passing with releasing a mutex 
on the current set of structures and (trying to) take a mutex on the 
next set of structures, and ran into some complexity around certain 
ordering requirements. I think they were around recovery journal entries 
going into the slab journal and the block map in the same order; and 
also around the use of different priorities for some different items. I 
don't have that code anymore, unfortunately, so I don't know how hard it 
would be to try that experiment again.

Sweet Tea