Re: RBD exclusive-lock and lqemu/librbd

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Fri, 2 Jun 2017 12:22:52 +0200



On 06/02/17 12:06, koukou73gr wrote:
> Thanks for the reply.
>
> Easy?
> Sure, it happens reliably every time I boot the guest with
> exclusive-lock on :)
If it's that easy, also try with only exclusive-lock, and not object-map
nor fast-diff. And also with one or the other of those.

>
> I'll need some walkthrough on the gcore part though!
gcore is pretty easy... just do like:
    gcore -o "$outfile" "$pid"

And then to upload it to the devs in some *sorta private way
    ceph-post-file -d "gcore dump of hung qemu process with
exclusive-lock" "$outfile"


* sorta private warning from ceph-post-file:
> WARNING:
>   Basic measures are taken to make posted data be visible only to
>   developers with access to ceph.com infrastructure. However, users
>   should think twice and/or take appropriate precautions before
>   posting potentially sensitive data (for example, logs or data
>   directories that contain Ceph secrets).


> -K.
>
>
> On 2017-06-02 12:59, Peter Maloney wrote:
>> On 06/01/17 17:12, koukou73gr wrote:
>>> Hello list,
>>>
>>> Today I had to create a new image for a VM. This was the first time,
>>> since our cluster was updated from Hammer to Jewel. So far I was just
>>> copying an existing golden image and resized it as appropriate. But this
>>> time I used rbd create.
>>>
>>> So I "rbd create"d a 2T image and attached it to an existing VM guest
>>> with librbd using:
>>>     <disk type='network' device='disk'>
>>>       <driver name='qemu'/>
>>>       <auth username='lalala'>
>>>         <secret type='ceph' uuid='uiduiduid'/>
>>>       </auth>
>>>       <source protocol='rbd' name='libvirt-pool/srv-10-206-123-87.mails'/>
>>>       <target dev='sdc' bus='scsi'/>
>>>       <address type='drive' controller='0' bus='0' target='1' unit='0'/>
>>>     </disk>
>>>
>>>
>>> Booted the guest and tried to partition it the new drive from inside the
>>> guest. That's it, parted (and anything else for that matter) that tried
>>> to access the new disk would freeze. After 2 minutes the kernel would
>>> start complaining:
>>>
>>> [  360.212391] INFO: task parted:1836 blocked for more than 120 seconds.
>>> [  360.216001]       Not tainted 4.4.0-78-generic #99-Ubuntu
>>> [  360.218663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>> Is it easy for you to reproduce it? I had the same problem, and the same
>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for
>> a gcore dump of a hung process but I wasn't able to get one. Can you do
>> that, and when you reply, CC  Jason Dillaman <jdillama@xxxxxxxxxx> ?
>

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com