Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up queued tags

Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx> · Tue, 8 Nov 2022 23:28:11 +0000

> For more interesting cases, where there is queueing, we need to take
> into account the cross-communication of the atomic operations.  I've
> been benchmarking by running parallel fio jobs against a single hctx
> nullb in different hardware queue depth scenarios, and verifying both
> IOPS and queueing.
> 
> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
> varying only the hardware queue length per test.
> 
> queue size 2                 4                 8                 16                 32                 64
> 6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
> patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)

> 

So if I understand correctly
QD 2,4,8 shows clear performance benefit from this patch whereas
QD 16, 32, 64 shows drop in performance it that correct ?

If my observation is correct then applications with high QD will
observe drop in the performance ?

Also, please share a table with block size/IOPS/BW/CPU (system/user)
/LAT/SLAT with % increase/decrease and document the raw numbers at the
end of the cover-letter for completeness along with fio job to others
can repeat the experiment...

> The following is a similar experiment, ran against a nullb with a single
> bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
> parallel fio jobs operating on the same device
> 
> queue size 2 	             4                 8              	16             	    32		       64
> 6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
> patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)
> 

same here ...

> It has also survived blktests and a 12h-stress run against nullb. I also
> ran the code against nvme and a scsi SSD, and I didn't observe
> performance regression in those. If there are other tests you think I
> should run, please let me know and I will follow up with results.
> 
-ck