If possible, it might be worth trying an EXT4 formatted RBD. I've had problems with XFS hanging in the past on simple LVM volumes and never really got to the bottom of it, whereas the same volumes formatted with EXT4 has been running for years without a problem. > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Christian Eichelmann > Sent: 20 April 2015 14:41 > To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx > Subject: Re: 100% IO Wait with CEPH RBD and RSYNC > > I'm using xfs on the rbd disks. > They are between 1 and 10TB in size. > > Am 20.04.2015 um 14:32 schrieb Nick Fisk: > > Ah ok, good point > > > > What FS are you using on the RBD? > > > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > >> Of Christian Eichelmann > >> Sent: 20 April 2015 13:16 > >> To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx > >> Subject: Re: 100% IO Wait with CEPH RBD and RSYNC > >> > >> Hi Nick, > >> > >> I forgot to mention that I was also trying a workaround using the > >> userland (rbd-fuse). The behaviour was exactly the same (worked fine > >> for several hours, testing parallel reading and writing, then IO Wait > >> and system load increased). > >> > >> This is why I don't think it is an issue with the rbd kernel module. > >> > >> Regards, > >> Christian > >> > >> Am 20.04.2015 um 11:37 schrieb Nick Fisk: > >>> Hi Christian, > >>> > >>> A very non-technical answer but as the problem seems related to the > >>> RBD client it might be worth trying the latest Kernel if possible. > >>> The RBD client is Kernel based and so there may be a fix which might > >>> stop this from happening. > >>> > >>> Nick > >>> > >>>> -----Original Message----- > >>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On > >>>> Behalf Of Christian Eichelmann > >>>> Sent: 20 April 2015 08:29 > >>>> To: ceph-users@xxxxxxxxxxxxxx > >>>> Subject: 100% IO Wait with CEPH RBD and RSYNC > >>>> > >>>> Hi Ceph-Users! > >>>> > >>>> We currently have a problem where I am not sure if the it has it's > >>>> cause > >>> in > >>>> Ceph or something else. First, some information about our ceph-setup: > >>>> > >>>> * ceph version 0.87.1 > >>>> * 5 MON > >>>> * 12 OSD with 60x2TB each > >>>> * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, > >>>> Debian > >>>> Wheezy) > >>>> > >>>> Our cluster is mainly used to store Log-Files from numerous servers > >>>> via > >>> RSync > >>>> and make them available via RSync as well. Since about two weeks we > >>>> have a very strange behaviour and our RSync Gateways (they just map > >>>> several rbd devices and "export" them via rsyncd): The IO Wait on > >>>> the systems are increasing untill some of the cores getting stuck > >>>> with an > > IO > >> Wait of 100%. > >>>> RSync processes become zombies (defunct) and/or can not be killed > >>>> even with SIGKILL. After the system has reached a load of about > >>>> 1400, it > >>> becomes > >>>> totally unresponsive and the only way to "fix" the problem is to > >>>> reboot > >>> the > >>>> system. > >>>> > >>>> I was trying to manually reproduce the problem by simultainously > >>>> reading and writing from several machine, but the problem didn't > > appear. > >>>> > >>>> I have no idea where the error can be. I was doing a ceph tell > >>>> osd.* bench during the problem and all osds where having normal > >>>> benchmark results. Has anyone an idea how this can happen? If you > >>>> need any more informations, please let me know. > >>>> > >>>> Regards, > >>>> Christian > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@xxxxxxxxxxxxxx > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> > >>> > >>> > >> > >> > >> -- > >> Christian Eichelmann > >> Systemadministrator > >> > >> 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting > >> Brauerstraße 48 · DE-76135 Karlsruhe > >> Telefon: +49 721 91374-8026 > >> christian.eichelmann@xxxxxxxx > >> > >> Amtsgericht Montabaur / HRB 6484 > >> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert > >> Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan > >> Oetjen > >> Aufsichtsratsvorsitzender: Michael Scheeren > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > -- > Christian Eichelmann > Systemadministrator > > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting > Brauerstraße 48 · DE-76135 Karlsruhe > Telefon: +49 721 91374-8026 > christian.eichelmann@xxxxxxxx > > Amtsgericht Montabaur / HRB 6484 > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen > Aufsichtsratsvorsitzender: Michael Scheeren > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com