On 06/02/17 12:25, koukou73gr wrote: > On 2017-06-02 13:01, Peter Maloney wrote: >>> Is it easy for you to reproduce it? I had the same problem, and the same >>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for >>> a gcore dump of a hung process but I wasn't able to get one. Can you do >>> that, and when you reply, CC Jason Dillaman <jdillama@xxxxxxxxxx> ? >> I mean a hung qemu process on the vm host (the one that uses librbd). >> And I guess that should be TO rather than CC. >> > Peter, > > Can it be that my situation is different? > > In my case the guest/qemu process it self does not hang. The guest root > filesystem resides in an rbd image w/o exclusive-lock enabled (the > pre-existing kind I described). Of course it could be different, but it seems the same so far... same solution, and same warnings in the guest, just it takes some time before the guest totally hangs. Sometimes the OS seems ok but has those warnings... then worse is you can see the disk looks busy in iostat like 100% but has low activity like 1 w/s... and worst is that you can't even get anything to run or any screen output or keyboard input at all, and kill on the qemu process won't even work at that point, except with -9. And sometimes you can get the exact same symptoms with a curable problem... like if you stop too many osds and min_size is not reached for just one pg that the image uses, then it looks like it works, until it hits that bad pg, then the above symptoms happen. And then most of the time the VM recovers when the osds are up again, but sometimes not. But since you mentioned exclusive lock, I still think it seems the same or highly related. > > The problem surfaced when additional storage was attached to the guest, > through a new rbd image created with exclusive-lock as it is the default > on Jewel. > > Problem being when parted/fdisk is run on that device, they hang as > reported. On the other hand, > > dd if=/dev/sdb of=/tmp/lala count=512 > > has no problem completing, While the reverse, > > dd if=/tmp/lala of=/dev/sdb count=512 > > hangs indefinately. While in this state, I can still,ssh to the guest > and work as long as I don't touch the new device. It appears that when a > write to the device backed by the exclusive-lock featured image hangs, a > read to it will hang as well. > > -K. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx Internet: http://www.brockmann-consult.de -------------------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com