Re: max_sectors_kb limitations with VDO and dm-thin

Ryan Norwood <ryan.p.norwood@xxxxxxxxx> · Tue, 23 Apr 2019 13:02:38 -0400

I have added vdo-devel to the converation: https://www.redhat.com/archives/vdo-devel/2019-April/msg00017.html

Here is some more info to describe the specific issue:

A dm-thin volume is configured with a chunk/block size that determines the minimum allocation size that it can track, between 64KiB and 1GiB. If an application performs a write to a dm-thin block device, and that IO operation completely overlaps a thin block, dm-thin will skip zeroing after allocation before performing the write. This is a pretty big performance optimization as it effectively halves IO for large sequential writes. When a block device has a snapshot the data is referenced by both the original block and the snapshot. If a write is issued dm-thin will  normally allocate a new chunk, copy the old data to that new chunk, then perform the write. If the new write completely overlaps a chunk it will skip the copy.

So for example dm-thin block device is created in a thin pool with a 512k block size. A new block is created and an application performs a 4k sequential write at the beginning of the volume. dm-thin will do the following, 

1) allocate 512k block
2) write 0's to the block
3) perform the 4k write

This does 516k of writes for a 4k write (ouch). If the write was at least 512k, it will skip zeroing and just do the write.

Similarly assume there is a dm-thin block device with a snapshot and data is shared between the two. Again the application performs a 4k write.

1) allocate new 512k block
2) copy 512k form the old block to the new
3) perform the 4k write

This does 512k in reads and 516k in writes (big ouch). If the write was at least 512k it will skip all the overhead.

Now fast forward to VDO. Normally the IO size is determined by the max_sectors_kb setting in /sys/block/DEVICE/queue. This value is inherited for stacked DM devices and can be modified by the user up to the hardware limit max_hw_sectors_kb, which also appears to be inherited for stacked DM devices. VDO sets this value to 4k which in turn forces all layers stacked above it to also have a 4k maximum. If you take my previous example but place VDO beneath the dm-thin volume, all IO sequential or otherwise will be split down to 4k which will completely eliminate all the performance optimizations that dm-thin provides.

1) Is this known behavior? 
2) Is there a possible workaround?

On Tue, Apr 23, 2019 at 6:11 AM Zdenek Kabelac <zkabelac@xxxxxxxxxx> wrote:
Dne 19. 04. 19 v 16:40 Ryan Norwood napsal(a):

> We have been using dm-thin layered above VDO and have noticed that our 

> performance is not optimal for large sequential writes as max_sectors_kb 

> and max_hw_sectors_kb for all thin devices are set to 4k due to the VDO layer 

> beneath.

> 

> This effectively eliminates the performance optimizations for sequential 

> writes to skip both zeroing and COW overhead when a write fully overlaps a 

> thin chunk as all bios are split into 4k which always be less than the 64k 

> thin chunk minimum.

> 

> Is this known behavior? Is there any way around this issue?

Hi

If you require highest performance - I'd suggest to avoid using VDO.

VDO replaces performance with better space utilization.

It works on 4KiB block - so by design it's going to be slow.

I'd also probably not mix 2 provisioning technologies together - there

is nontrivial amount of problematic states when the whole device stack

runs out of real physical space.

Regards

Zdenek

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel