Re: [Scst-devel] Thin Provisioning and Ceph RBD's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 13, 2016 at 12:36 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Aug 8, 2016 at 7:56 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>> On Sun, Aug 7, 2016 at 7:57 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>>> I'm confused.  How can a 4M discard not free anything?  It's either
>>>> going to hit an entire object or two adjacent objects, truncating the
>>>> tail of one and zeroing the head of another.  Using rbd diff:
>>>>
>>>> $ rbd diff test | grep -A 1 25165824
>>>> 25165824  4194304 data
>>>> 29360128  4194304 data
>>>>
>>>> # a 4M discard at 1M into a RADOS object
>>>> $ blkdiscard -o $((25165824 + (1 << 20))) -l $((4 << 20)) /dev/rbd0
>>>>
>>>> $ rbd diff test | grep -A 1 25165824
>>>> 25165824  1048576 data
>>>> 29360128  4194304 data
>>>
>>> I have tested this on a small RBD device with such offsets and indeed,
>>> the discard works as you describe, Ilya.
>>>
>>> Looking more into why ESXi's discard is not working.  I found this
>>> message in kern.log on Ubuntu on creation of the SCST LUN, which shows
>>> unmap_alignment 0:
>>>
>>> Aug  6 22:02:33 e1 kernel: [300378.136765] virt_id 33 (p_iSCSILun_sclun945)
>>> Aug  6 22:02:33 e1 kernel: [300378.136782] dev_vdisk: Auto enable thin
>>> provisioning for device /dev/rbd/spin1/unmap1t
>>> Aug  6 22:02:33 e1 kernel: [300378.136784] unmap_gran 8192,
>>> unmap_alignment 0, max_unmap_lba 8192, discard_zeroes_data 1
>>> Aug  6 22:02:33 e1 kernel: [300378.136786] dev_vdisk: Attached SCSI
>>> target virtual disk p_iSCSILun_sclun945
>>> (file="/dev/rbd/spin1/unmap1t", fs=409600MB, bs=512,
>>> nblocks=838860800, cyln=409600)
>>> Aug  6 22:02:33 e1 kernel: [300378.136847] [4682]:
>>> scst_alloc_add_tgt_dev:5287:Device p_iSCSILun_sclun945 on SCST lun=32
>>> Aug  6 22:02:33 e1 kernel: [300378.136853] [4682]: scst:
>>> scst_alloc_set_UA:12711:Queuing new UA ffff8810251f3a90 (6:29:0,
>>> d_sense 0) to tgt_dev ffff88102583ad00 (dev p_iSCSILun_sclun945,
>>> initiator copy_manager_sess)
>>>
>>> even though:
>>>
>>> root@e1:/sys/block/rbd29# cat discard_alignment
>>> 4194304
>>>
>>> So somehow the discard_alignment is not making it into the LUN.  Could
>>> this be the issue?
>>
>> No, if you are not seeing *any* effect, the alignment is pretty much
>> irrelevant.  Can you do the following on a small test image?
>>
>> - capture "rbd diff" output
>> - blktrace -d /dev/rbd0 -o - | blkparse -i - -o rbd0.trace
>> - issue a few discards with blkdiscard
>> - issue a few unmaps with ESXi, preferrably with SCST debugging enabled
>> - capture "rbd diff" output again
>>
>> and attach all of the above?  (You might need to install a blktrace
>> package.)
>>
>
> Latest results from VMWare validation tests:
>
> Each test creates and deletes a virtual disk, then calls ESXi unmap
> for what ESXi maps to that volume:
>
> Test 1: 10GB reclaim, rbd diff size: 3GB, discards: 4829
>
> Test 2: 100GB reclaim, rbd diff size: 50GB, discards: 197837
>
> Test 3: 175GB reclaim, rbd diff size: 47 GB, discards: 197824
>
> Test 4: 250GB reclaim, rbd diff size: 125GB, discards: 197837
>
> Test 5: 250GB reclaim, rbd diff size: 80GB, discards: 197837
>
> At the end, the compounded used size via rbd diff is 608 GB from 775GB
> of data.  So we release only about 20% via discards in the end.

Ilya has analyzed the discard pattern, and indeed the problem is that
ESXi appears to disregard the discard alignment attribute.  Therefore,
discards are shifted by 1M, and are not hitting the tail of objects.

Discards work much better on the EagerZeroedThick volumes, likely due
to contiguous data.

I will proceed with the rest of testing, and will post any tips or
best practice results as they become available.

Thank you for everyone's help and advice!

Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux