Re: [Scst-devel] Thin Provisioning and Ceph RBD's

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Mon, 1 Aug 2016 19:05:31 -0400

Hi Ilya,

On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>> RBD illustration showing RBD ignoring discard until a certain
>> threshold - why is that?  This behavior is unfortunately incompatible
>> with ESXi discard (UNMAP) behavior.
>>
>> Is there a way to lower the discard sensitivity on RBD devices?
>>
<snip>
>>
>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28
>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>> print SUM/1024 " KB" }'
>> 819200 KB
>>
>> root@e1:/var/log# blkdiscard -o 0 -l 40960000 /dev/rbd28
>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>> print SUM/1024 " KB" }'
>> 782336 KB
>
> Think about it in terms of underlying RADOS objects (4M by default).
> There are three cases:
>
>     discard range       | command
>     -----------------------------------------
>     whole object        | delete
>     object's tail       | truncate
>     object's head       | zero
>
> Obviously, only delete and truncate free up space.  In all of your
> examples, except the last one, you are attempting to discard the head
> of the (first) object.
>
> You can free up as little as a sector, as long as it's the tail:
>
> Offset    Length  Type
> 0         4194304 data
>
> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28
>
> Offset    Length  Type
> 0         4193792 data

Looks like ESXi is sending in each discard/unmap with the fixed
granularity of 8192 sectors, which is passed verbatim by SCST.  There
is a slight reduction in size via rbd diff method, but now I
understand that actual truncate only takes effect when the discard
happens to clip the tail of an image.

So far looking at
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513

...the only variable we can control is the count of 8192-sector chunks
and not their size.  Which means that most of the ESXi discard
commands will be disregarded by Ceph.

Vlad, is 8192 sectors coming from ESXi, as in the debug:

Aug  1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector
1342099456, nr_sects 8192)

Thank you,
Alex

>
> Thanks,
>
>                 Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com