Re: 100% IO Wait with CEPH RBD and RSYNC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Check   xfs fregmentation factor for rbd disks i.e.

xfs_db -c frag -r /dev/sdX

if it is high, try defrag

xfs_fsr /dev/sdX


Regards,

Onur.


On 4/20/2015 4:41 PM, Nick Fisk wrote:
If possible, it might be worth trying an EXT4 formatted RBD. I've had
problems with XFS hanging in the past on simple LVM volumes and never really
got to the bottom of it, whereas the same volumes formatted with EXT4 has
been running for years without a problem.

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Christian Eichelmann
Sent: 20 April 2015 14:41
To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  100% IO Wait with CEPH RBD and RSYNC

I'm using xfs on the rbd disks.
They are between 1 and 10TB in size.

Am 20.04.2015 um 14:32 schrieb Nick Fisk:
Ah ok, good point

What FS are you using on the RBD?

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
Of Christian Eichelmann
Sent: 20 April 2015 13:16
To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  100% IO Wait with CEPH RBD and RSYNC

Hi Nick,

I forgot to mention that I was also trying a workaround using the
userland (rbd-fuse). The behaviour was exactly the same (worked fine
for several hours, testing parallel reading and writing, then IO Wait
and system load increased).

This is why I don't think it is an issue with the rbd kernel module.

Regards,
Christian

Am 20.04.2015 um 11:37 schrieb Nick Fisk:
Hi Christian,

A very non-technical answer but as the problem seems related to the
RBD client it might be worth trying the latest Kernel if possible.
The RBD client is Kernel based and so there may be a fix which might
stop this from happening.

Nick

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
Behalf Of Christian Eichelmann
Sent: 20 April 2015 08:29
To: ceph-users@xxxxxxxxxxxxxx
Subject:  100% IO Wait with CEPH RBD and RSYNC

Hi Ceph-Users!

We currently have a problem where I am not sure if the it has it's
cause
in
Ceph or something else. First, some information about our ceph-setup:

* ceph version 0.87.1
* 5 MON
* 12 OSD with 60x2TB each
* 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1,
Debian
Wheezy)

Our cluster is mainly used to store Log-Files from numerous servers
via
RSync
and make them available via RSync as well. Since about two weeks we
have a very strange behaviour and our RSync Gateways (they just map
several rbd devices and "export" them via rsyncd): The IO Wait on
the systems are increasing untill some of the cores getting stuck
with an
IO
Wait of 100%.
RSync processes become zombies (defunct) and/or can not be killed
even with SIGKILL. After the system has reached a load of about
1400, it
becomes
totally unresponsive and the only way to "fix" the problem is to
reboot
the
system.

I was trying to manually reproduce the problem by simultainously
reading and writing from several machine, but the problem didn't
appear.
I have no idea where the error can be. I was doing a ceph tell
osd.* bench during the problem and all osds where having normal
benchmark results. Has anyone an idea how this can happen? If you
need any more informations, please let me know.

Regards,
Christian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann@xxxxxxxx

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan
Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann@xxxxxxxx

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux