Re: ONE pg deep-scrub blocks cluster

Jean-Charles Lopez <jelopez@xxxxxxxxxx> · Mon, 29 Aug 2016 15:02:39 -0700

How Mehmet

OK so it does come from a rados put. 

As you were able to check the VM device objet size is 4 MB. 

So we'll see after you have removed the object with rados -p rbd rm. 

I'll wait for an update. 

JC

While moving. Excuse unintended typos.

> On Aug 29, 2016, at 14:34, Mehmet <ceph@xxxxxxxxxx> wrote:
> 
> Hey JC,
> 
> after setting up the ceph-cluster i tried to migrate an image from one of our production vm into ceph via
> 
> # rados -p rbd put ...
> 
> but i have got always "file too large". I guess this file
> 
> # -rw-r--r-- 1 ceph ceph 100G Jul 31 01:04 vm-101-disk-2__head_383C3223__0
> 
> is the result of this :) - did not thought that there will be something stay in ceph after the mentioned error above.
> Seems i was wrong...
> 
> This could match the time where the issue happened first time...:
> 
> 1. i tried to put via "rados -p rbd put..." this did not worked (tried to put a ~400G file...)
> 2. after ~ 1 week i see the blocked requests after first running "deep-scrub" (default where ceph starts deep-scrubbing)
> 
> I guess the deleting of this file should solve the issue.
> Did you see my mail where i wrote the test results of this?
> 
> # osd_scrub_chunk_max = 5
> # osd_deep_scrub_stride = 1048576
> 
> Only corner note.
> 
>> This seems more to me like a pure radios object of 100GB that was
>> uploaded to the cluster. From the name it could be a VM disk image
>> that was uploaded as an object. If it was an RBD object, it’s size
>> would be in the boundaries of an RBD objects (order 12=4K order
>> 25=32MB).
> 
>> Verify that when you do a "rados -p rbd ls | grep vm-101-disk-2”
>> command, you can see an object named vm-101-disk-2.
> 
> root@:~# rados -p rbd ls | grep vm-101-disk-2
> rbd_id.vm-101-disk-2
> vm-101-disk-2
> 
>> Verify if you have an RBD named this way “rbd -p rbd ls | grep vm-101-disk-2"
> 
> root@:~# rbd -p rbd ls | grep vm-101-disk-2
> vm-101-disk-2
> 
>> As I’m not familiar with proxmox so I’d suggest the following:
>> If yes to 1, for security, copy this file somewhere else and then to a
>> rados -p rbd rm vm-101-disk-2.
> 
> root@:~# rbd -p rbd info vm-101-disk-2
> rbd image 'vm-101-disk-2':
>        size 400 GB in 102400 objects
>        order 22 (4096 kB objects)
>        block_name_prefix: rbd_data.5e7d1238e1f29
>        format: 2
>        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>        flags:
> 
> The VM with the id "101" is up and running. This is using "vm-101-disk-2" as disk - i have moved the disk sucessfully in another way :) (same name :/) after "rados put" did not worked. And as we can see here the objects for this image also exists within ceph
> 
> root@:~# rados -p rbd ls | grep "rbd_data.5e7d1238e1f29" | wc -l
> 53011
> 
> I assumed here to get 102400 objects but as ceph is doing thin provisining this should be ok.
> 
>> If no to 1, for security, copy this file somewhere else and then to a
>> rm -rf vm-101-disk-2__head_383C3223__0
> 
> I should be able to delete the mentioned "100G file".
> 
>> Make sure all your PG copies show the same content and wait for the
>> next scrub to see what is happening.
> 
> Will make a backup of this file and in addition from the vm within proxmox tomorrow on all involved osds and then start a deep-scrub and of course keep you informed.
> 
>> If anything goes wrong you will be able to upload an object with the
>> exact same content from the file you copied.
>> Is proxmox using such huge objects for something to your knowledge (VM
>> boot image or something else)? Can you search the proxmox mailing list
>> and open tickets to verify.
> 
> As i already wrote in this eMail i guess that i am the cause for this :*( with the wrong usage of "rados put".
> Proxmox is using librbd to talk with ceph so it should not be able to create such a large one file.
> 
>> And is this the cause of the long deep scrub? I do think so but I’m
>> not in front of the cluster.
> 
> Let it see :) - i hope that my next eMail will close this issue.
> 
> Thank you very much for your help!
> 
> Best regards,
> - Mehmet
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com