Re: 100% IO Wait with CEPH RBD and RSYNC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

nope, we have no iptables rules on those hosts and the gateway is on the
same subnet as the ceph cluster.

I will see if I can find some informations on how to debug the rbd
kernel module (any suggestions are appreciated :))

Regards,
Christian

Am 21.04.2015 um 10:20 schrieb Dan van der Ster:
> Hi Christian,
> 
> I've never debugged the kernel client either, so I don't know how to
> increase debugging. (I don't see any useful parms on the kernel
> modules).
> 
> Your log looks like the client just stops communicating with the ceph
> cluster. Is iptables getting in the way ?
> 
> Cheers, Dan
> 
> On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann
> <christian.eichelmann@xxxxxxxx> wrote:
>> Hi Dan,
>>
>> we are alreay back on the kernel module since the same problems were
>> happening with fuse. I had no special ulimit settings for the
>> fuse-process, so that could have been an issue there.
>>
>> I was pasting you the kernel messages during such incidents here:
>> http://pastebin.com/X5JRe1v3
>>
>> I was never debugging the kernel client. Can you give me a short hint
>> how to increase the debug level and where the logs will be written to?
>>
>> Regards,
>> Christian
>>
>> Am 20.04.2015 um 15:50 schrieb Dan van der Ster:
>>> Hi,
>>> This is similar to what you would observe if you hit the ulimit on
>>> open files/sockets in a Ceph client. Though that normally only affects
>>> clients in user mode, not the kernel. What are the ulimits of your
>>> rbd-fuse client? Also, you could increase the client logging debug
>>> levels to see why the client is hanging. When the kernel rbd client
>>> was hanging, was there anything printed to dmesg ?
>>> Cheers, Dan
>>>
>>> On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann
>>> <christian.eichelmann@xxxxxxxx> wrote:
>>>> Hi Ceph-Users!
>>>>
>>>> We currently have a problem where I am not sure if the it has it's cause
>>>> in Ceph or something else. First, some information about our ceph-setup:
>>>>
>>>> * ceph version 0.87.1
>>>> * 5 MON
>>>> * 12 OSD with 60x2TB each
>>>> * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
>>>> Wheezy)
>>>>
>>>> Our cluster is mainly used to store Log-Files from numerous servers via
>>>> RSync and make them available via RSync as well. Since about two weeks
>>>> we have a very strange behaviour and our RSync Gateways (they just map
>>>> several rbd devices and "export" them via rsyncd): The IO Wait on the
>>>> systems are increasing untill some of the cores getting stuck with an IO
>>>> Wait of 100%. RSync processes become zombies (defunct) and/or can not be
>>>> killed even with SIGKILL. After the system has reached a load of about
>>>> 1400, it becomes totally unresponsive and the only way to "fix" the
>>>> problem is to reboot the system.
>>>>
>>>> I was trying to manually reproduce the problem by simultainously reading
>>>> and writing from several machine, but the problem didn't appear.
>>>>
>>>> I have no idea where the error can be. I was doing a ceph tell osd.*
>>>> bench during the problem and all osds where having normal benchmark
>>>> results. Has anyone an idea how this can happen? If you need any more
>>>> informations, please let me know.
>>>>
>>>> Regards,
>>>> Christian
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> --
>> Christian Eichelmann
>> Systemadministrator
>>
>> 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
>> Brauerstraße 48 · DE-76135 Karlsruhe
>> Telefon: +49 721 91374-8026
>> christian.eichelmann@xxxxxxxx
>>
>> Amtsgericht Montabaur / HRB 6484
>> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
>> Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
>> Aufsichtsratsvorsitzender: Michael Scheeren


-- 
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann@xxxxxxxx

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux