On Tue, Aug 2, 2016 at 1:05 AM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > Hi Ilya, > > On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>> RBD illustration showing RBD ignoring discard until a certain >>> threshold - why is that? This behavior is unfortunately incompatible >>> with ESXi discard (UNMAP) behavior. >>> >>> Is there a way to lower the discard sensitivity on RBD devices? >>> > <snip> >>> >>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28 >>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>> print SUM/1024 " KB" }' >>> 819200 KB >>> >>> root@e1:/var/log# blkdiscard -o 0 -l 40960000 /dev/rbd28 >>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>> print SUM/1024 " KB" }' >>> 782336 KB >> >> Think about it in terms of underlying RADOS objects (4M by default). >> There are three cases: >> >> discard range | command >> ----------------------------------------- >> whole object | delete >> object's tail | truncate >> object's head | zero >> >> Obviously, only delete and truncate free up space. In all of your >> examples, except the last one, you are attempting to discard the head >> of the (first) object. >> >> You can free up as little as a sector, as long as it's the tail: >> >> Offset Length Type >> 0 4194304 data >> >> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28 >> >> Offset Length Type >> 0 4193792 data > > Looks like ESXi is sending in each discard/unmap with the fixed > granularity of 8192 sectors, which is passed verbatim by SCST. There > is a slight reduction in size via rbd diff method, but now I > understand that actual truncate only takes effect when the discard > happens to clip the tail of an image. ... the tail of the *object*. And again, with "filestore punch hole = true", page-sized discards anywhere within the image would free up space, but "rbd diff" won't reflect that. > > So far looking at > https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513 > > ...the only variable we can control is the count of 8192-sector chunks > and not their size. Which means that most of the ESXi discard > commands will be disregarded by Ceph. > > Vlad, is 8192 sectors coming from ESXi, as in the debug: > > Aug 1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector > 1342099456, nr_sects 8192) They won't be disregarded, but it would definitely work better if they were aligned. 1342099456 isn't 4M-aligned. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com