Hi Dan, nope, we have no iptables rules on those hosts and the gateway is on the same subnet as the ceph cluster. I will see if I can find some informations on how to debug the rbd kernel module (any suggestions are appreciated :)) Regards, Christian Am 21.04.2015 um 10:20 schrieb Dan van der Ster: > Hi Christian, > > I've never debugged the kernel client either, so I don't know how to > increase debugging. (I don't see any useful parms on the kernel > modules). > > Your log looks like the client just stops communicating with the ceph > cluster. Is iptables getting in the way ? > > Cheers, Dan > > On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann > <christian.eichelmann@xxxxxxxx> wrote: >> Hi Dan, >> >> we are alreay back on the kernel module since the same problems were >> happening with fuse. I had no special ulimit settings for the >> fuse-process, so that could have been an issue there. >> >> I was pasting you the kernel messages during such incidents here: >> http://pastebin.com/X5JRe1v3 >> >> I was never debugging the kernel client. Can you give me a short hint >> how to increase the debug level and where the logs will be written to? >> >> Regards, >> Christian >> >> Am 20.04.2015 um 15:50 schrieb Dan van der Ster: >>> Hi, >>> This is similar to what you would observe if you hit the ulimit on >>> open files/sockets in a Ceph client. Though that normally only affects >>> clients in user mode, not the kernel. What are the ulimits of your >>> rbd-fuse client? Also, you could increase the client logging debug >>> levels to see why the client is hanging. When the kernel rbd client >>> was hanging, was there anything printed to dmesg ? >>> Cheers, Dan >>> >>> On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann >>> <christian.eichelmann@xxxxxxxx> wrote: >>>> Hi Ceph-Users! >>>> >>>> We currently have a problem where I am not sure if the it has it's cause >>>> in Ceph or something else. First, some information about our ceph-setup: >>>> >>>> * ceph version 0.87.1 >>>> * 5 MON >>>> * 12 OSD with 60x2TB each >>>> * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian >>>> Wheezy) >>>> >>>> Our cluster is mainly used to store Log-Files from numerous servers via >>>> RSync and make them available via RSync as well. Since about two weeks >>>> we have a very strange behaviour and our RSync Gateways (they just map >>>> several rbd devices and "export" them via rsyncd): The IO Wait on the >>>> systems are increasing untill some of the cores getting stuck with an IO >>>> Wait of 100%. RSync processes become zombies (defunct) and/or can not be >>>> killed even with SIGKILL. After the system has reached a load of about >>>> 1400, it becomes totally unresponsive and the only way to "fix" the >>>> problem is to reboot the system. >>>> >>>> I was trying to manually reproduce the problem by simultainously reading >>>> and writing from several machine, but the problem didn't appear. >>>> >>>> I have no idea where the error can be. I was doing a ceph tell osd.* >>>> bench during the problem and all osds where having normal benchmark >>>> results. Has anyone an idea how this can happen? If you need any more >>>> informations, please let me know. >>>> >>>> Regards, >>>> Christian >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> Christian Eichelmann >> Systemadministrator >> >> 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting >> Brauerstraße 48 · DE-76135 Karlsruhe >> Telefon: +49 721 91374-8026 >> christian.eichelmann@xxxxxxxx >> >> Amtsgericht Montabaur / HRB 6484 >> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert >> Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen >> Aufsichtsratsvorsitzender: Michael Scheeren -- Christian Eichelmann Systemadministrator 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting Brauerstraße 48 · DE-76135 Karlsruhe Telefon: +49 721 91374-8026 christian.eichelmann@xxxxxxxx Amtsgericht Montabaur / HRB 6484 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen Aufsichtsratsvorsitzender: Michael Scheeren _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com