Re: RBD exclusive-lock and lqemu/librbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/02/17 12:25, koukou73gr wrote:
> On 2017-06-02 13:01, Peter Maloney wrote:
>>> Is it easy for you to reproduce it? I had the same problem, and the same
>>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for
>>> a gcore dump of a hung process but I wasn't able to get one. Can you do
>>> that, and when you reply, CC  Jason Dillaman <jdillama@xxxxxxxxxx> ?
>> I mean a hung qemu process on the vm host (the one that uses librbd).
>> And I guess that should be TO rather than CC.
>>
> Peter,
>
> Can it be that my situation is different?
>
> In my case the guest/qemu process it self does not hang. The guest root
> filesystem resides in an rbd image w/o exclusive-lock enabled (the
> pre-existing kind I described).
Of course it could be different, but it seems the same so far... same
solution, and same warnings in the guest, just it takes some time before
the guest totally hangs.

Sometimes the OS seems ok but has those warnings...
then worse is you can see the disk looks busy in iostat like 100% but
has low activity like 1 w/s...
and worst is that you can't even get anything to run or any screen
output or keyboard input at all, and kill on the qemu process won't even
work at that point, except with -9.

And sometimes you can get the exact same symptoms with a curable
problem... like if you stop too many osds and min_size is not reached
for just one pg that the image uses, then it looks like it works, until
it hits that bad pg, then the above symptoms happen. And then most of
the time the VM recovers when the osds are up again, but sometimes not.
But since you mentioned exclusive lock, I still think it seems the same
or highly related.

>
> The problem surfaced when additional storage was attached to the guest,
> through a new rbd image created with exclusive-lock as it is the default
> on Jewel.
>
> Problem being when parted/fdisk is run on that device, they hang as
> reported. On the other hand,
>
> dd if=/dev/sdb of=/tmp/lala count=512
>
> has no problem completing, While the reverse,
>
> dd if=/tmp/lala of=/dev/sdb count=512
>
> hangs indefinately. While in this state, I can still,ssh to the guest
> and work as long as I don't touch the new device. It appears that when a
> write to the device backed by the exclusive-lock featured image hangs, a
> read to it will hang as well.
>
> -K.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux