Ceph writes stall for long perioids with no disk/network activity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson
<mark.nelson at inktank.com> wrote:

> On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
> > On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
> >> I got weird stalling during writes, sometimes I got same write speed
> >> for few minutes and after some time it starts stalling with 0 MB/s for
> >> minutes
> >
> > I'm getting very similar behavior on my cluster. My writes start well but then just kinda stop for a while and then bump along slowly until the bench finishes. I've got a thread about it going here called "Ceph runs great then falters".
>
> This kind of behaviour often results when the journal can write much
> faster than the OSD data disks.  Initially the journals will be able to
> absorb the data and things will run along well, but eventually ceph will
> need to stall writes if things get too out of sync.  You may want to
> take a look at what's happening on the data disks during your tests to
> see if there's anything that looks suspect.  Checking the admin socket
> for dump_historic_ops might provide some clues as well.
>
> Mark

I did check journals already, they are on same disk as data (separate
partition) and during stalls there is no traffic to both of them (like
8 iops on average with 0% io wait).
I'm 99.99% sure they are ok as they worked few months in cassandra
cluster before, and when I was doing some ceph rebalancing and
adding/removing OSDs they worked constantly with no stalls

If I look into logs of osds that have slow ops I get something like
that:

    2014-08-05 15:31:33.461481 7fbff4fd3700  0 log [WRN] : slow request 30.147566 seconds old, received at 2014-08-05 15:31:03.313858: osd_op(client.190830.0:176 benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
    2014-08-05 15:32:03.467775 7fbff4fd3700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 60.153871 secs
    2014-08-05 15:32:03.467794 7fbff4fd3700  0 log [WRN] : slow request 60.153871 seconds old, received at 2014-08-05 15:31:03.313858: osd_op(client.190830.0:176 benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
    2014-08-05 15:33:03.481163 7fbff4fd3700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 120.167272 secs
    2014-08-05 15:33:03.481170 7fbff4fd3700  0 log [WRN] : slow request 120.167272 seconds old, received at 2014-08-05 15:31:03.313858: osd_op(client.190830.0:176 benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6

but when I look on osd.2 I only get cryptic messages like:

    2014-08-05 14:39:34.708788 7ff5d36f4700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.24:6800/3540 pipe(0x54ada00 sd=157 :6800 s=2 pgs=26 cs=5 l=0 c=0x5409340).fault with nothing to send, going to standby
    2014-08-05 14:39:35.594447 7ff5d0ecc700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.25:6800/3551 pipe(0x41fd500 sd=141 :60790 s=2 pgs=21 cs=5 l=0 c=0x51bd960).fault with nothing to send, going to standby
    2014-08-05 14:39:37.594901 7ff5d40fe700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.25:6802/3709 pipe(0x54adc80 sd=149 :35693 s=2 pgs=24 cs=5 l=0 c=0x51bee00).fault with nothing to send, going to standby
    2014-08-05 14:39:53.891172 7ff5d15d3700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.24:6802/3694 pipe(0x54a8500 sd=137 :60823 s=2 pgs=24 cs=5 l=0 c=0x5409080).fault with nothing to send, going to standby
    2014-08-05 14:40:01.410307 7ff5d4905700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.23:6802/3689 pipe(0x41f9180 sd=133 :39288 s=2 pgs=28 cs=5 l=0 c=0x4956f60).fault with nothing to send, going to standby
    2014-08-05 14:40:02.428402 7ff5d1ddb700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.23:6800/3535 pipe(0x41fee00 sd=145 :34278 s=2 pgs=23 cs=5 l=0 c=0x51bad60).fault with nothing to send, going to standby
    2014-08-05 15:30:50.752795 7ff5d29e7700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.25:6800/3551 pipe(0x54ad780 sd=153 :6800 s=0 pgs=0 cs=0 l=0 c=0x51ba260).accept connect_seq 6 vs existing 5 state standby
    2014-08-05 15:30:51.475488 7ff5d2cea700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.25:6802/3709 pipe(0x6002080 sd=158 :6800 s=0 pgs=0 cs=0 l=0 c=0x51b94a0).accept connect_seq 6 vs existing 5 state standby
    2014-08-05 15:30:51.600661 7ff5d32f0700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.24:6802/3694 pipe(0x6001e00 sd=149 :6800 s=0 pgs=0 cs=0 l=0 c=0x4955280).accept connect_seq 6 vs existing 5 state standby
    2014-08-05 15:30:51.901849 7ff5d18d6700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.23:6800/3535 pipe(0x6001180 sd=22 :6800 s=0 pgs=0 cs=0 l=0 c=0x4956720).accept connect_seq 6 vs existing 5 state standby
    2014-08-05 15:30:54.529900 7ff5d0cca700  0 -- 10.100.245.22:6800/3540 >> 10.100.245.23:6802/3689 pipe(0x54ae400 sd=42 :6800 s=0 pgs=0 cs=0 l=0 c=0x51b91e0).accept connect_seq 6 vs existing 5 state standby


and similiar on osd.2:

    2014-08-05 14:39:13.606907 7f34d2140700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.21:6802/3686 pipe(0x4c4a800 sd=183 :6800 s=2 pgs=32 cs=5 l=0 c=0x3848580).fault with nothing to send, going to standby
    2014-08-05 14:39:34.709150 7f34d3958700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.22:6800/3540 pipe(0x39f9b80 sd=166 :43213 s=2 pgs=27 cs=5 l=0 c=0x38494a0).fault with nothing to send, going to standby
    2014-08-05 14:39:38.693139 7f34cee0e700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.25:6800/3551 pipe(0x3291b80 sd=167 :51687 s=2 pgs=23 cs=5 l=0 c=0x384a680).fault with nothing to send, going to standby
    2014-08-05 14:39:38.712965 7f34d2b4a700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.21:6800/3532 pipe(0x4c4ad00 sd=195 :6800 s=2 pgs=31 cs=5 l=0 c=0x3848160).fault with nothing to send, going to standby
    2014-08-05 14:39:42.694641 7f34d4961700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.25:6802/3709 pipe(0x4c4aa80 sd=179 :6800 s=2 pgs=21 cs=5 l=0 c=0x3848b00).fault with nothing to send, going to standby
    2014-08-05 14:39:42.712340 7f34d3b5a700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.22:6802/3694 pipe(0x39f9680 sd=33 :47495 s=2 pgs=31 cs=5 l=0 c=0x384ee00).fault with nothing to send, going to standby
    2014-08-05 14:39:44.493155 7f34d3453700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.23:6800/3535 pipe(0x4c48c80 sd=202 :6800 s=2 pgs=29 cs=5 l=0 c=0x3848f20).fault with nothing to send, going to standby
    2014-08-05 14:39:59.387037 7f34d4c64700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.23:6802/3689 pipe(0x4c49b80 sd=34 :6800 s=2 pgs=25 cs=5 l=0 c=0x3849600).fault with nothing to send, going to standby
    2014-08-05 15:30:51.911827 7f34cec0c700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.22:6802/3694 pipe(0x39fc880 sd=116 :6800 s=0 pgs=0 cs=0 l=0 c=0x3b714a0).accept connect_seq 6 vs existing 5 state standby
    2014-08-05 15:30:52.276484 7f34d2241700  0 -- 10.100.245.24:6800/3540 >> 10.100.245.22:6800/3540 pipe(0x39fc100 sd=118 :6800 s=0 pgs=0 cs=0 l=0 c=0x3b73b20).accept connect_seq 6 vs existing 5 state standby

and 'bad' osds change when I rerun benchmark so it's not like there is one or 2 bad nodes


I've checked for network or IO load on every node and they are just not doing anything, no kernel errors, and those nodes worked fine under load when they were us



--
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wo?oska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski at efigence.com
<mailto:mariusz.gronczewski at efigence.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140805/48d167e2/attachment.pgp>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux