On 06/02/17 11:59, Peter Maloney wrote: > On 06/01/17 17:12, koukou73gr wrote: >> Hello list, >> >> Today I had to create a new image for a VM. This was the first time, >> since our cluster was updated from Hammer to Jewel. So far I was just >> copying an existing golden image and resized it as appropriate. But this >> time I used rbd create. >> >> So I "rbd create"d a 2T image and attached it to an existing VM guest >> with librbd using: >> <disk type='network' device='disk'> >> <driver name='qemu'/> >> <auth username='lalala'> >> <secret type='ceph' uuid='uiduiduid'/> >> </auth> >> <source protocol='rbd' name='libvirt-pool/srv-10-206-123-87.mails'/> >> <target dev='sdc' bus='scsi'/> >> <address type='drive' controller='0' bus='0' target='1' unit='0'/> >> </disk> >> >> >> Booted the guest and tried to partition it the new drive from inside the >> guest. That's it, parted (and anything else for that matter) that tried >> to access the new disk would freeze. After 2 minutes the kernel would >> start complaining: >> >> [ 360.212391] INFO: task parted:1836 blocked for more than 120 seconds. >> [ 360.216001] Not tainted 4.4.0-78-generic #99-Ubuntu >> [ 360.218663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. > Is it easy for you to reproduce it? I had the same problem, and the same > solution. But it isn't easy to reproduce... Jason Dillaman asked me for > a gcore dump of a hung process but I wasn't able to get one. Can you do > that, and when you reply, CC Jason Dillaman <jdillama@xxxxxxxxxx> ? I mean a hung qemu process on the vm host (the one that uses librbd). And I guess that should be TO rather than CC. >> After much headbanging, trial and error, I finaly thought of checking >> the enabled rbd features of an existing image versus the new one. >> >> pre-existing: layering, stripping >> new: layering, exclusive-lock, object-map, fast-diff, deep-flatten >> >> Disabling exclusive-lock (and fast-diff and object-map before that) >> would allow the new image to become usable in the guest at last. >> >> This is with: >> >> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) >> qemu-img version 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.3.1), Copyright (c) >> 2004-2008 Fabrice Bellard >> >> on a host running: >> CentOS Linux release 7.3.1611 (Core) >> Linux host-10-206-123-184.physics.auth.gr 3.10.0-327.36.2.el7.x86_64 #1 >> SMP Mon Oct 10 23:08:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >> >> and a guest >> DISTRIB_ID=Ubuntu >> DISTRIB_RELEASE=16.04 >> DISTRIB_CODENAME=xenial >> DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS" >> Linux srv-10-206-123-87.physics.auth.gr 4.4.0-78-generic #99-Ubuntu SMP >> Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >> >> I vagually remember references of problems when exclusive-lock was >> enabled on rbd images but trying Google didn't reveal much to me. >> >> So what is it with exclusive lock? Why does it fail like this? Could you >> please point me to some documentation on this behaviour? >> >> Thanks for any feedback. >> >> -K. >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx Internet: http://www.brockmann-consult.de -------------------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com