Re: How to avoid kernel conflicts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The systems on which the `rbd map` hangs problem occurred are
definitely not under memory stress. I don't believer they
are doing a lot of disk I/O either. Here's the basic set-up:

* all nodes in the "data-plane" are identical
* they each host and OSD instance, sharing one of the drive
* I'm running Docker containers using an RBD volume plugin and
  Docker Compose
* when the hang happens, the most visible behavior is that
  `docker ps` hangs
* then I run `systemctl status` and see and `rbd map` process
  spawned by the RBD volume plugin
* I then tried an `strace -f -p <pid of rbd map>` and that process
  promptly exits (with RC 0) and the hang resolves itself

I'll tried to capture the strace output the next time I run into
it and share with the mailing list.

Thanks, Ilya.

-kc

> On May 9, 2016, at 2:21 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> 
> On Mon, May 9, 2016 at 12:19 AM, K.C. Wong <kcwong@xxxxxxxxxxx> wrote:
>> 
>>> As the tip said, you should not use rbd via kernel module on an OSD host
>>> 
>>> However, using it with userspace code (librbd etc, as in kvm) is fine
>>> 
>>> Generally, you should not have both:
>>> - "server" in userspace
>>> - "client" in kernelspace
>> 
>> If `librbd` would help avoid this problem, then switch to `rbd-fuse`
>> should do the trick, right?
>> 
>> The reason for my line of question is that I've seen occasionl freeze
>> up of `rbd map` that's resolved by a 'slight tap' by way of an strace.
>> There is definitely great attractiveness to not have specialized nodes
>> and make every one the same as the next one on the rack.
> 
> The problem with placing the kernel client on the OSD node is the
> potential deadlock under heavy I/O when memory becomes scarce.  It's
> not recommended, but people are doing it - if you don't stress your
> system too much, it'll never happen.
> 
> "rbd map" freeze is definitely not related to the abov.  Did the actual
> command hang?  Could you describe what you saw in more detail and how
> did strace help?  It could be that you ran into
> 
>    http://tracker.ceph.com/issues/14737
> 
> Thanks,
> 
>                Ilya

K.C. Wong
kcwong@xxxxxxxxxxx
4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux