On Mon, Aug 8, 2016 at 7:56 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Sun, Aug 7, 2016 at 7:57 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>> I'm confused. How can a 4M discard not free anything? It's either >>> going to hit an entire object or two adjacent objects, truncating the >>> tail of one and zeroing the head of another. Using rbd diff: >>> >>> $ rbd diff test | grep -A 1 25165824 >>> 25165824 4194304 data >>> 29360128 4194304 data >>> >>> # a 4M discard at 1M into a RADOS object >>> $ blkdiscard -o $((25165824 + (1 << 20))) -l $((4 << 20)) /dev/rbd0 >>> >>> $ rbd diff test | grep -A 1 25165824 >>> 25165824 1048576 data >>> 29360128 4194304 data >> >> I have tested this on a small RBD device with such offsets and indeed, >> the discard works as you describe, Ilya. >> >> Looking more into why ESXi's discard is not working. I found this >> message in kern.log on Ubuntu on creation of the SCST LUN, which shows >> unmap_alignment 0: >> >> Aug 6 22:02:33 e1 kernel: [300378.136765] virt_id 33 (p_iSCSILun_sclun945) >> Aug 6 22:02:33 e1 kernel: [300378.136782] dev_vdisk: Auto enable thin >> provisioning for device /dev/rbd/spin1/unmap1t >> Aug 6 22:02:33 e1 kernel: [300378.136784] unmap_gran 8192, >> unmap_alignment 0, max_unmap_lba 8192, discard_zeroes_data 1 >> Aug 6 22:02:33 e1 kernel: [300378.136786] dev_vdisk: Attached SCSI >> target virtual disk p_iSCSILun_sclun945 >> (file="/dev/rbd/spin1/unmap1t", fs=409600MB, bs=512, >> nblocks=838860800, cyln=409600) >> Aug 6 22:02:33 e1 kernel: [300378.136847] [4682]: >> scst_alloc_add_tgt_dev:5287:Device p_iSCSILun_sclun945 on SCST lun=32 >> Aug 6 22:02:33 e1 kernel: [300378.136853] [4682]: scst: >> scst_alloc_set_UA:12711:Queuing new UA ffff8810251f3a90 (6:29:0, >> d_sense 0) to tgt_dev ffff88102583ad00 (dev p_iSCSILun_sclun945, >> initiator copy_manager_sess) >> >> even though: >> >> root@e1:/sys/block/rbd29# cat discard_alignment >> 4194304 >> >> So somehow the discard_alignment is not making it into the LUN. Could >> this be the issue? > > No, if you are not seeing *any* effect, the alignment is pretty much > irrelevant. Can you do the following on a small test image? > > - capture "rbd diff" output > - blktrace -d /dev/rbd0 -o - | blkparse -i - -o rbd0.trace > - issue a few discards with blkdiscard > - issue a few unmaps with ESXi, preferrably with SCST debugging enabled > - capture "rbd diff" output again > > and attach all of the above? (You might need to install a blktrace > package.) > Latest results from VMWare validation tests: Each test creates and deletes a virtual disk, then calls ESXi unmap for what ESXi maps to that volume: Test 1: 10GB reclaim, rbd diff size: 3GB, discards: 4829 Test 2: 100GB reclaim, rbd diff size: 50GB, discards: 197837 Test 3: 175GB reclaim, rbd diff size: 47 GB, discards: 197824 Test 4: 250GB reclaim, rbd diff size: 125GB, discards: 197837 Test 5: 250GB reclaim, rbd diff size: 80GB, discards: 197837 At the end, the compounded used size via rbd diff is 608 GB from 775GB of data. So we release only about 20% via discards in the end. Thank you, Alex _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com