On Wed, Mar 16, 2022 at 12:06:27PM +1100, Dave Chinner wrote: > On Tue, Mar 15, 2022 at 01:49:43PM +0100, Jan Kara wrote: > > Hello, > > > > I was tracking down a regression in dbench workload on XFS we have > > identified during our performance testing. These are results from one of > > our test machine (server with 64GB of RAM, 48 CPUs, SATA SSD for the test > > disk): > > > > good bad > > Amean 1 64.29 ( 0.00%) 73.11 * -13.70%* > > Amean 2 84.71 ( 0.00%) 98.05 * -15.75%* > > Amean 4 146.97 ( 0.00%) 148.29 * -0.90%* > > Amean 8 252.94 ( 0.00%) 254.91 * -0.78%* > > Amean 16 454.79 ( 0.00%) 456.70 * -0.42%* > > Amean 32 858.84 ( 0.00%) 857.74 ( 0.13%) > > Amean 64 1828.72 ( 0.00%) 1865.99 * -2.04%* > > > > Note that the numbers are actually times to complete workload, not > > traditional dbench throughput numbers so lower is better. .... > > This should still > > submit it rather early to provide the latency advantage. Otherwise postpone > > the flush to the moment we know we are going to flush the iclog to save > > pointless flushes. But we would have to record whether the flush happened > > or not in the iclog and it would all get a bit hairy... > > I think we can just set the NEED_FLUSH flag appropriately. > > However, given all this, I'm wondering if the async cache flush was > really a case of premature optimisation. That is, we don't really > gain anything by reducing the flush latency of the first iclog write > wehn we are writing 100-1000 iclogs before the commit record, and it > can be harmful to some workloads by issuing more flushes than we > need to. > > So perhaps the right thing to do is just get rid of it and always > mark the first iclog in a checkpoint as NEED_FLUSH.... So I've run some tests on code that does this, and the storage I've tested it on shows largely no difference in stream CIL commit and fsync heavy workloads when comparing synv vs as cache flushes. On set of tests was against high speed NVMe ssds, the other against old, slower SATA SSDs. Jan, can you run the patch below (against 5.17-rc8) and see what results you get on your modified dbench test? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx xfs: drop async cache flushes from CIL commits. From: Dave Chinner <dchinner@xxxxxxxxxx> As discussed here: https://lore.kernel.org/linux-xfs/20220316010627.GO3927073@xxxxxxxxxxxxxxxxxxx/T/#t This is a prototype for removing async cache flushes from the CIL checkpoint path. Fast NVME storage.