On Tue, Aug 2, 2016 at 3:49 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Aug 1, 2016 at 11:03 PM, Vladislav Bolkhovitin <vst@xxxxxxxx> wrote: >> Alex Gorbachev wrote on 08/01/2016 04:05 PM: >>> Hi Ilya, >>> >>> On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>>>> RBD illustration showing RBD ignoring discard until a certain >>>>> threshold - why is that? This behavior is unfortunately incompatible >>>>> with ESXi discard (UNMAP) behavior. >>>>> >>>>> Is there a way to lower the discard sensitivity on RBD devices? >>>>> >>> <snip> >>>>> >>>>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28 >>>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>>>> print SUM/1024 " KB" }' >>>>> 819200 KB >>>>> >>>>> root@e1:/var/log# blkdiscard -o 0 -l 40960000 /dev/rbd28 >>>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>>>> print SUM/1024 " KB" }' >>>>> 782336 KB >>>> >>>> Think about it in terms of underlying RADOS objects (4M by default). >>>> There are three cases: >>>> >>>> discard range | command >>>> ----------------------------------------- >>>> whole object | delete >>>> object's tail | truncate >>>> object's head | zero >>>> >>>> Obviously, only delete and truncate free up space. In all of your >>>> examples, except the last one, you are attempting to discard the head >>>> of the (first) object. >>>> >>>> You can free up as little as a sector, as long as it's the tail: >>>> >>>> Offset Length Type >>>> 0 4194304 data >>>> >>>> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28 >>>> >>>> Offset Length Type >>>> 0 4193792 data >>> >>> Looks like ESXi is sending in each discard/unmap with the fixed >>> granularity of 8192 sectors, which is passed verbatim by SCST. There >>> is a slight reduction in size via rbd diff method, but now I >>> understand that actual truncate only takes effect when the discard >>> happens to clip the tail of an image. >>> >>> So far looking at >>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513 >>> >>> ...the only variable we can control is the count of 8192-sector chunks >>> and not their size. Which means that most of the ESXi discard >>> commands will be disregarded by Ceph. >>> >>> Vlad, is 8192 sectors coming from ESXi, as in the debug: >>> >>> Aug 1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector >>> 1342099456, nr_sects 8192) >> >> Yes, correct. However, to make sure that VMware is not (erroneously) enforced to do this, you need to perform one more check. >> >> 1. Run cat /sys/block/rbd28/queue/discard*. Ceph should report here correct granularity and alignment (4M, I guess?) > > This seems to reflect the granularity (4194304), which matches the > 8192 pages (8192 x 512 = 4194304). However, there is no alignment > value. > > Can discard_alignment be specified with RBD? It's exported as a read-only sysfs attribute, just like discard_granularity: # cat /sys/block/rbd0/discard_alignment 4194304 Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com