Re: [Scst-devel] Thin Provisioning and Ceph RBD's

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Sat, 13 Aug 2016 12:36:28 -0400

On Mon, Aug 8, 2016 at 7:56 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Sun, Aug 7, 2016 at 7:57 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>> I'm confused.  How can a 4M discard not free anything?  It's either
>>> going to hit an entire object or two adjacent objects, truncating the
>>> tail of one and zeroing the head of another.  Using rbd diff:
>>>
>>> $ rbd diff test | grep -A 1 25165824
>>> 25165824  4194304 data
>>> 29360128  4194304 data
>>>
>>> # a 4M discard at 1M into a RADOS object
>>> $ blkdiscard -o $((25165824 + (1 << 20))) -l $((4 << 20)) /dev/rbd0
>>>
>>> $ rbd diff test | grep -A 1 25165824
>>> 25165824  1048576 data
>>> 29360128  4194304 data
>>
>> I have tested this on a small RBD device with such offsets and indeed,
>> the discard works as you describe, Ilya.
>>
>> Looking more into why ESXi's discard is not working.  I found this
>> message in kern.log on Ubuntu on creation of the SCST LUN, which shows
>> unmap_alignment 0:
>>
>> Aug  6 22:02:33 e1 kernel: [300378.136765] virt_id 33 (p_iSCSILun_sclun945)
>> Aug  6 22:02:33 e1 kernel: [300378.136782] dev_vdisk: Auto enable thin
>> provisioning for device /dev/rbd/spin1/unmap1t
>> Aug  6 22:02:33 e1 kernel: [300378.136784] unmap_gran 8192,
>> unmap_alignment 0, max_unmap_lba 8192, discard_zeroes_data 1
>> Aug  6 22:02:33 e1 kernel: [300378.136786] dev_vdisk: Attached SCSI
>> target virtual disk p_iSCSILun_sclun945
>> (file="/dev/rbd/spin1/unmap1t", fs=409600MB, bs=512,
>> nblocks=838860800, cyln=409600)
>> Aug  6 22:02:33 e1 kernel: [300378.136847] [4682]:
>> scst_alloc_add_tgt_dev:5287:Device p_iSCSILun_sclun945 on SCST lun=32
>> Aug  6 22:02:33 e1 kernel: [300378.136853] [4682]: scst:
>> scst_alloc_set_UA:12711:Queuing new UA ffff8810251f3a90 (6:29:0,
>> d_sense 0) to tgt_dev ffff88102583ad00 (dev p_iSCSILun_sclun945,
>> initiator copy_manager_sess)
>>
>> even though:
>>
>> root@e1:/sys/block/rbd29# cat discard_alignment
>> 4194304
>>
>> So somehow the discard_alignment is not making it into the LUN.  Could
>> this be the issue?
>
> No, if you are not seeing *any* effect, the alignment is pretty much
> irrelevant.  Can you do the following on a small test image?
>
> - capture "rbd diff" output
> - blktrace -d /dev/rbd0 -o - | blkparse -i - -o rbd0.trace
> - issue a few discards with blkdiscard
> - issue a few unmaps with ESXi, preferrably with SCST debugging enabled
> - capture "rbd diff" output again
>
> and attach all of the above?  (You might need to install a blktrace
> package.)
>

Latest results from VMWare validation tests:

Each test creates and deletes a virtual disk, then calls ESXi unmap
for what ESXi maps to that volume:

Test 1: 10GB reclaim, rbd diff size: 3GB, discards: 4829

Test 2: 100GB reclaim, rbd diff size: 50GB, discards: 197837

Test 3: 175GB reclaim, rbd diff size: 47 GB, discards: 197824

Test 4: 250GB reclaim, rbd diff size: 125GB, discards: 197837

Test 5: 250GB reclaim, rbd diff size: 80GB, discards: 197837

At the end, the compounded used size via rbd diff is 608 GB from 775GB
of data.  So we release only about 20% via discards in the end.

Thank you,
Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com