Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/09/2014 01:02 AM, Eric Wheeler wrote:
On Fri, 5 Dec 2014, Mike Snitzer wrote:
I do wonder what the performance impact is on this for dm. Have you
tried a (worst case) test of writing blocks that are zero filled,

Jens, thank you for your help w/ fio for generating zeroed writes!
Clearly fio is superior to dd as a sequential benchmarking tool; I was
actually able to push on the system's memory bandwidth.

Results:

I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled()
and then complete without writing to disk, regardless of the return value
from bio_is_zero_filled().  In loop.c this was done in
do_bio_filebacked(), and for dm-thin.c this was done within
provision_block().

This allows us to compare the performance difference between the simple
loopback block device driver vs the more complex dm-thinp implementation
just prior to block allocation.  These benchmarks give us a sense of how
performance differences relate between bio_is_zero_filled() and block
device implementation complexity, in addition to the raw performance of
bio_is_zero_filled in best- and worst-case scenarios.

Since we always complete without writing after the call to
bio_is_zero_filled, regardless of the bio's content (all zeros or not), we
can benchmark the difference in the common use case of random data, as
well as the edge case of skipping writes for bio's that contain all zeros
when writing to unallocated space of thin-provisioned volumes.

These benchmarks were performed under KVM, so expect them to be lower
bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2
@ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.

Benchmarks were performed using fio-2.1.14-33-gf8b8f
  --name=writebw
  --rw=write
  --time_based
  --runtime=7 --ramp_time=3
  --norandommap
  --ioengine=libaio
  --group_reporting
  --direct=1
  --bs=1m
  --filename=/dev/X
  --numjobs=Y

Random data was tested using:
   --zero_buffers=0 --scramble_buffers=1

Zeroed data was tested using:
   --zero_buffers=1 --scramble_buffers=0

Values below are from aggrb.

               dm-thinp (MB/s)   loopback (MB/s)   loop faster by factor of
==============+======================================================
random jobs=4 | 18496.0          33522.0           1.68x
zeros  jobs=4 |  8119.2           9767.2           1.20x
==============+======================================================
random jobs=1 |  7330.5          12330.0           1.81x
zeros  jobs=1 |  4965.2           6799.9           1.11x

This looks more reasonable in terms of throughput.

One major worry here is that checking every write is blowing your cache, so you could have a major impact on performance in general. Even for O_DIRECT writes, you are now accessing the memory. Have you looked into doing non-temporal memory compares instead? I think that would be the way to go.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux