Re: deep scrubbing causes osd down

Christian Balzer <chibi@xxxxxxx> · Mon, 13 Apr 2015 10:02:06 +0900

Hello,

On Sun, 12 Apr 2015 22:01:06 +0100 (BST) Andrei Mikhailovsky wrote:

> JC, 
> 
> Thanks 
> 
> I think the max scrub option that you refer to is a value per osd and
> not per cluster. So, the default is not to run more than 1 scrub per
> osd. So, if you have 100 osds by default it will not run more than 100
> scurb processes at the same time. However, I want to limit the number on
> a cluster basis rather than on an osd basis. 
>
Yeah, that's a request/question that comes up frequently. 
And so far there's no option in Ceph to do that (AFAIK), it would be
really nice along with scheduling options (don't scrub during peak hours),
which have also been talked about.

So for you right now, the manual scheduling is the best solution as
pointed out by Jean-Charles. 

If your cluster is fast enough to finish a full deep-scrub in a night, you
can even forgo the cron jobs and setting the cluster to nodeep-scrub.

I have a cluster where I kicked off a manual deep-scrub of all OSDs on a
Saturday morning at 0:05 and it finished at 3:25. 
So now, with the default scrub intervals, all subsequent deep scrubs
happen in the same time frame (weekly) and all normal scrubs in the same
time as well (daily, unless it'd deep scrub day). 

Of course ultimately you really want to get your cluster into a shape where
scrubbing doesn't make it unstable by improving your hardware and scale.

Christian

> Andrei 
> 
> ----- Original Message -----
> 
> > From: "Jean-Charles Lopez" <jelopez@xxxxxxxxxx>
> > To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Sent: Sunday, 12 April, 2015 5:17:10 PM
> > Subject: Re:  deep scrubbing causes osd down
> 
> > Hi andrei
> 
> > There is one parameter, osd_max_scrub I think, that controls the
> > number of scrubs per OSD. But the default is 1 if I'm correct.
> 
> > Can you check on one of your OSDs with the admin socket?
> 
> > Then it remains the option of scheduling the deep scrubs via a cron
> > job after setting nodeep-scrub to prevent automatic deep scrubbing.
> 
> > Dan Van Der Ster had a post on this ML regarding this.
> > JC
> 
> > While moving. Excuse unintended typos.
> 
> > On Apr 12, 2015, at 05:21, Andrei Mikhailovsky < andrei@xxxxxxxxxx >
> > wrote:
> 
> > > JC,
> > 
> 
> > > the restart of the osd servers seems to have stabilised the
> > > cluster.
> > > It has been a few hours since the restart and I haven't not seen a
> > > single osd disconnect.
> > 
> 
> > > Is there a way to limit the total number of scrub and/or deep-scrub
> > > processes running at the same time? For instance, I do not want to
> > > have more than 1 or 2 scrub/deep-scrubs running at the same time on
> > > my cluster. How do I implement this?
> > 
> 
> > > Thanks
> > 
> 
> > > Andrei
> > 
> 
> > > ----- Original Message -----
> > 
> 
> > > > From: "Andrei Mikhailovsky" < andrei@xxxxxxxxxx >
> > > 
> > 
> > > > To: "LOPEZ Jean-Charles" < jelopez@xxxxxxxxxx >
> > > 
> > 
> > > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > 
> > 
> > > > Sent: Sunday, 12 April, 2015 9:02:05 AM
> > > 
> > 
> > > > Subject: Re:  deep scrubbing causes osd down
> > > 
> > 
> 
> > > > JC,
> > > 
> > 
> 
> > > > I've implemented the following changes to the ceph.conf and
> > > > restarted
> > > > mons and osds.
> > > 
> > 
> 
> > > > osd_scrub_chunk_min = 1
> > > 
> > 
> > > > osd_scrub_chunk_max =5
> > > 
> > 
> 
> > > > Things have become considerably worse after the changes. Shortly
> > > > after doing that, majority of osd processes started taking up
> > > > over
> > > > 100% cpu and the cluster has considerably slowed down. All my vms
> > > > are reporting high IO wait (between 30-80%), even vms which are
> > > > pretty idle and don't do much.
> > > 
> > 
> 
> > > > i have tried restarting all osds, but shortly after the restart
> > > > the
> > > > cpu usage goes up. The osds are showing the following logs:
> > > 
> > 
> 
> > > > 2015-04-12 08:39:28.853860 7f96f81dd700 0 log_channel(default)
> > > > log
> > > > [WRN] : slow request 60.277590 seconds old, received at
> > > > 2015-04-12
> > > > 08:38:28.576168: osd_op(client.69637439.0:290325926
> > > > rbd_data.265f967a5f7514.0000000000004a00 [set-alloc-hint
> > > > object_size
> > > > 4194304 write_size 4194304,write 1249280~4096] 5.cb2620e0 snapc
> > > > ac=[ac] ack+ondisk+write+known_if_redirected e74834) currently
> > > > waiting for missing object
> > > 
> > 
> > > > 2015-04-12 08:39:28.853863 7f96f81dd700 0 log_channel(default)
> > > > log
> > > > [WRN] : slow request 60.246943 seconds old, received at
> > > > 2015-04-12
> > > > 08:38:28.606815: osd_op(client.69637439.0:290325927
> > > > rbd_data.265f967a5f7514.0000000000004a00 [set-alloc-hint
> > > > object_size
> > > > 4194304 write_size 4194304,write 1310720~4096] 5.cb2620e0 snapc
> > > > ac=[ac] ack+ondisk+write+known_if_redirected e74834) currently
> > > > waiting for missing object
> > > 
> > 
> > > > 2015-04-12 08:39:36.855180 7f96f81dd700 0 log_channel(default)
> > > > log
> > > > [WRN] : 7 slow requests, 1 included below; oldest blocked for >
> > > > 68.278951 secs
> > > 
> > 
> > > > 2015-04-12 08:39:36.855191 7f96f81dd700 0 log_channel(default)
> > > > log
> > > > [WRN] : slow request 30.268450 seconds old, received at
> > > > 2015-04-12
> > > > 08:39:06.586669: osd_op(client.64965167.0:1607510
> > > > rbd_data.1f264b2ae8944a.0000000000000228 [set-alloc-hint
> > > > object_size
> > > > 4194304 write_size 4194304,write 3584000~69632] 5.30418007
> > > > ack+ondisk+write+known_if_redirected e74834) currently waiting
> > > > for
> > > > subops from 9
> > > 
> > 
> > > > 2015-04-12 08:40:43.570004 7f96dd693700 0 <cls>
> > > > cls/rgw/cls_rgw.cc:1458: gc_iterate_entries
> > > > end_key=1_01428824443.569998000
> > > 
> > 
> 
> > > > [In total i've got around 40,000 slow request entries accumulated
> > > > overnight ((( ]
> > > 
> > 
> 
> > > > On top of that, I have reports of osds going down and back up as
> > > > frequently as every 10-20 minutes. This effects all osds and not
> > > > a
> > > > particular set of osds.
> > > 
> > 
> 
> > > > I will restart the osd servers to see if it makes a difference,
> > > > otherwise, I will need to revert back to the default settings as
> > > > the
> > > > cluster as it currently is is not functional.
> > > 
> > 
> 
> > > > Andrei
> > > 
> > 
> 
> > > > ----- Original Message -----
> > > 
> > 
> 
> > > > > From: "LOPEZ Jean-Charles" < jelopez@xxxxxxxxxx >
> > > > 
> > > 
> > 
> > > > > To: "Andrei Mikhailovsky" < andrei@xxxxxxxxxx >
> > > > 
> > > 
> > 
> > > > > Cc: "LOPEZ Jean-Charles" < jelopez@xxxxxxxxxx >,
> > > > > ceph-users@xxxxxxxxxxxxxx
> > > > 
> > > 
> > 
> > > > > Sent: Saturday, 11 April, 2015 7:54:18 PM
> > > > 
> > > 
> > 
> > > > > Subject: Re:  deep scrubbing causes osd down
> > > > 
> > > 
> > 
> 
> > > > > Hi Andrei,
> > > > 
> > > 
> > 
> 
> > > > > 1) what ceph version are you running?
> > > > 
> > > 
> > 
> > > > > 2) what distro and version are you running?
> > > > 
> > > 
> > 
> > > > > 3) have you checked the disk elevator for the OSD devices to be
> > > > > set
> > > > > to cfq?
> > > > 
> > > 
> > 
> > > > > 4) Have have you considered exploring the following parameters
> > > > > to
> > > > > further tune
> > > > 
> > > 
> > 
> > > > > - osd_scrub_chunk_min lower the default value of 5. e.g. = 1
> > > > 
> > > 
> > 
> > > > > - osd_scrub_chunk_max lower the default value of 25. e.g. = 5
> > > > 
> > > 
> > 
> 
> > > > > - osd_deep_scrub_stride If you have lowered parameters above,
> > > > > you
> > > > > can
> > > > > play with this one to fit best your physical disk behaviour. -
> > > > > osd_scrub_sleep introduce a half second sleep between 2 scrubs;
> > > > > e.g.
> > > > > = 0.5 to start with a half second delay
> > > > 
> > > 
> > 
> 
> > > > > Cheers
> > > > 
> > > 
> > 
> > > > > JC
> > > > 
> > > 
> > 
> 
> > > > > > On 10 Apr 2015, at 12:01, Andrei Mikhailovsky <
> > > > > > andrei@xxxxxxxxxx
> > > > > > >
> > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Hi guys,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > I was wondering if anyone noticed that the deep scrubbing
> > > > > > process
> > > > > > causes some osd to go down?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > I have been keeping an eye on a few remaining stability
> > > > > > issues
> > > > > > in
> > > > > > my
> > > > > > test cluster. One of the unsolved issues is the occasional
> > > > > > reporting
> > > > > > of osd(s) going down and coming back up after about 20-30
> > > > > > seconds.
> > > > > > This happens to various osds throughout the cluster. I have a
> > > > > > small
> > > > > > cluster of just 2 osd servers with 9 osds each.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > The common trend that i see week after week is that whenever
> > > > > > there
> > > > > > is
> > > > > > a long deep scrubbing activity on the cluster it triggers one
> > > > > > or
> > > > > > more osds to go down for a short period of time. After the
> > > > > > osd
> > > > > > is
> > > > > > marked down, it goes back up after about 20 seconds.
> > > > > > Obviously
> > > > > > there
> > > > > > is a repair process that kicks in which causes more load on
> > > > > > the
> > > > > > cluster. While looking at the logs, i've not seen the osds
> > > > > > being
> > > > > > marked down when the cluster is not deep scrubbing. It
> > > > > > _always_
> > > > > > happens when there is a deep scrub activity. I am seeing the
> > > > > > reports
> > > > > > of osds going down about 3-4 times a week.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > The latest happened just recently with the following log
> > > > > > entries:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0
> > > > > > 3441533
> > > > > > :
> > > > > > cluster [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2
> > > > > > active+clean+scrubbing+deep; 13213 GB data, 26896 GB used,
> > > > > > 23310
> > > > > > GB
> > > > > > / 50206 GB avail; 1005 B/s rd, 1005
> > > > > 
> > > > 
> > > 
> > 
> > > > > > B/s wr, 0 op/s
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0
> > > > > > 3441542
> > > > > > :
> > > > > > cluster [INF] osd.6 192.168.168.200:6816/3738 failed (5
> > > > > > reports
> > > > > > from
> > > > > > 5 peers after 60.747890 >= grace 46.701350)
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0
> > > > > > 3441544
> > > > > > :
> > > > > > cluster [INF] osdmap e74309: 18 osds: 17 up, 18 in
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0
> > > > > > 3441545
> > > > > > :
> > > > > > cluster [INF] pgmap v50849467: 8508 pgs: 599
> > > > > > stale+active+clean,
> > > > > > 7907 active+clean, 1 stale+active+clean+scrubbing+deep, 1
> > > > > > active+clean+scrubbing+deep; 13213 GB data, 26896 GB used,
> > > > > > 23310
> > > > > > GB
> > > > > > / 50206 GB avail; 375 B/s rd, 0 op/s
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > osd.6 logs around the same time are:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:16:29.110617 7fad6d5ec700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.3d7 deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:27:47.561389 7fad6bde9700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.276 deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:11.611321 7fad6d5ec700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.287 deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:53.339881 7fad7ce0b700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:53.339887 7fad7ce0b700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:53.339890 7fad7ce0b700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:53.340050 7fad7e60e700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:31:53.340053 7fad7e60e700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [.....]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:32:53.010609 7fad7e60e700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'FileStore::op_tp thread 0x7fad86132700' had timed out after
> > > > > > 60
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.010611 7fad7e60e700 1 heartbeat_map
> > > > > > is_healthy
> > > > > > 'FileStore::op_tp thread 0x7fad88937700' had timed out after
> > > > > > 60
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.111470 7fad66ed2700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > > > > > pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0
> > > > > > c=0x21e8b420).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.111496 7fad6329d700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208
> > > > > > pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0
> > > > > > c=0x1791ab00).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.111463 7fad55bd0700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6822/3910
> > > > > > pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0
> > > > > > c=0xe7802c0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.121815 7fad6218c700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > > > > > pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0
> > > > > > c=0x138669a0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.121856 7fad67bdf700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > > > > > pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0
> > > > > > c=0x138922c0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.123545 7fad651bc700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6801/3053
> > > > > > pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0
> > > > > > c=0x16bf09a0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.128729 7fad53eb3700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257
> > > > > > pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0
> > > > > > c=0x1131f420).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.132691 7fad53fb4700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674
> > > > > > pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0
> > > > > > c=0x1c0badc0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:32:53.139856 7fad65cc7700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > > > > > pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0
> > > > > > c=0xbd1f2c0).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [ followed by about 500 or so lines which look similar to:]
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021252 7fad65cc7700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > > > > > pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371 l=0
> > > > > > c=0x138922c0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021280 7fad53eb3700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > > > > > pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449 l=0
> > > > > > c=0x138669a0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021443 7fad66ed2700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674
> > > > > > pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489 l=0
> > > > > > c=0x1c0badc0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021702 7fad6218c700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > > > > > pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449 l=0
> > > > > > c=0x21e8b420).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [followed by:]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:33:09.021830 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : 6 slow requests, 6 included below; oldest blocked for
> > > > > > >
> > > > > > 94.284115 secs
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021834 7fad67bdf700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > > > > > pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395 l=0
> > > > > > c=0xbd1f2c0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021840 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : slow request 62.902554 seconds old, received at
> > > > > > 2015-04-10
> > > > > > 19:32:06.119164: osd_op(client.64965167.0:1562644
> > > > > > rbd_data.1f264b2ae8944a.0000000000000400 [set-alloc-hint
> > > > > > object_size
> > > > > > 4194304 write_size 4194304,write 1155072~4096] 5.f4cd34bb
> > > > > > ack+ondisk+write+known_if_redirected e74308) currently
> > > > > > waiting
> > > > > > for
> > > > > > subops from 13
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021846 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : slow request 63.004012 seconds old, received at
> > > > > > 2015-04-10
> > > > > > 19:32:06.017706: osd_op(client.69637439.0:278581863
> > > > > > rbd_data.265f967a5f7514.0000000000064e2e [set-alloc-hint
> > > > > > object_size
> > > > > > 4194304 write_size 4194304,write 1040384~4096] 5.bf3d2652
> > > > > > snapc
> > > > > > ac=[ac] ack+ondisk+write+known_if_redirected e74308)
> > > > > > currently
> > > > > > waiting for subops from 11
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021851 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : slow request 94.284115 seconds old, received at
> > > > > > 2015-04-10
> > > > > > 19:31:34.737603: osd_sub_op(unknown.0.0:0 5.299 0//0//-1
> > > > > > [scrub-map]
> > > > > > v 0'0 snapset=0=[]:[] snapc=0=[]) currently started
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021856 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : slow request 92.669022 seconds old, received at
> > > > > > 2015-04-10
> > > > > > 19:31:36.352696: osd_op(client.69645050.0:2750479
> > > > > > rbd_data.1b704a8f51a837.000000000000023d [stat,set-alloc-hint
> > > > > > object_size 4194304 write_size 4194304,write 1449984~4096]
> > > > > > 5.44a257d7 snapc
> > > > > > 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8]
> > > > > > ack+ondisk+write+known_if_redirected e74308) currently
> > > > > > waiting
> > > > > > for
> > > > > > subops from 17
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.021861 7fad8e142700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : slow request 76.733358 seconds old, received at
> > > > > > 2015-04-10
> > > > > > 19:31:52.288360: osd_op(client.64987185.0:26635606
> > > > > > rbd_data.1b652a2ae8944a.0000000000000623 [set-alloc-hint
> > > > > > object_size
> > > > > > 4194304 write_size 4194304,write 1282048~4096] 5.64d42a36
> > > > > > ack+ondisk+write+known_if_redirected e74308) currently
> > > > > > waiting
> > > > > > for
> > > > > > subops from 17
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [and finally]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:33:09.024361 7fad5c83b700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257
> > > > > > pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359 l=0
> > > > > > c=0x1131f420).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024418 7fad82616700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [WRN] : map e74309 wrongly marked me down
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024466 7fad53fb4700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208
> > > > > > pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403 l=0
> > > > > > c=0x1791ab00).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024476 7fad67bdf700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > > > > > pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401 l=0
> > > > > > c=0xbd1f2c0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024553 7fad65cc7700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > > > > > pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379 l=0
> > > > > > c=0x138922c0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024629 7fad53eb3700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > > > > > pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457 l=0
> > > > > > c=0x138669a0).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.024783 7fad650bb700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > > > > > pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380 l=0
> > > > > > c=0x138922c0).fault
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.025056 7fad6218c700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > > > > > pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457 l=0
> > > > > > c=0x21e8b420).fault, initiating reconnect
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:09.025062 7fad61580700 0 --
> > > > > > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > > > > > pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402 l=0
> > > > > > c=0xbd1f2c0).connect got RESETSESSION but no longer
> > > > > > connecting
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:16.999741 7fad5f964700 0 --
> > > > > > 192.168.168.200:6846/1003738 >> 192.168.168.200:6837/4266
> > > > > > pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0
> > > > > > c=0x1afbf840).accept
> > > > > > connect_seq 0 vs existing 0 state connecting
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:33:16.999883 7fad5f863700 0 --
> > > > > > 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938
> > > > > > pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0
> > > > > > c=0x18a85580).accept
> > > > > > connect_seq 0 vs existing 0 state connecting
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:36:26.247402 7fad6bde9700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.299 deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:40:18.577423 7fad725f6700 0 <cls>
> > > > > > cls/rgw/cls_rgw.cc:1458: gc_iterate_entries
> > > > > > end_key=1_01428691218.577419000
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2015-04-10 19:42:22.709291 7fad6edef700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.2ae deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:45:51.790224 7fad6bde9700 0
> > > > > > log_channel(default)
> > > > > > log
> > > > > > [INF] : 5.2bb deep-scrub ok
> > > > > 
> > > > 
> > > 
> > 
> > > > > > 2015-04-10 19:48:17.022413 7fad5f863700 0 --
> > > > > > 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938
> > > > > > pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1 l=0
> > > > > > c=0x3f81160).fault
> > > > > > with nothing to send, going to standby
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > In the past I have been asked to verify that the networking
> > > > > > is
> > > > > > working okay. I've ran days and days of various tests and
> > > > > > verified
> > > > > > that the network is not the cause of the problems. The
> > > > > > network
> > > > > > is
> > > > > > running perfectly well with no errors reported on the
> > > > > > interfaces
> > > > > > nor
> > > > > > any delays. I ran multitude of tcp and udp tests including
> > > > > > firing
> > > > > > hundreds to thousands connections per second for weeks with
> > > > > > 0%
> > > > > > packet loss or any other network related errors.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Does anyone have any idea what could be causing the osds to
> > > > > > be
> > > > > > marked
> > > > > > down during the deep scrub activity?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > I have enabled the following tuning options in the ceph.conf:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > #change to decrease rebuild/recovery priority (should make
> > > > > > guest
> > > > > > vms
> > > > > > more responsive) and decrease scrubbing priority
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd recovery max chunk = 8388608
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd recovery op priority = 1
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd max backfills = 1
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd recovery max active = 1
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd recovery threads = 1
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd_disk_thread_ioprio_priority = 7
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd_disk_thread_ioprio_class = idle
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > #reporting of down osds
> > > > > 
> > > > 
> > > 
> > 
> > > > > > mon osd min down reporters = 6
> > > > > 
> > > > 
> > > 
> > 
> > > > > > mon osd report timeout = 1800
> > > > > 
> > > > 
> > > 
> > 
> > > > > > mon osd min down reports = 7
> > > > > 
> > > > 
> > > 
> > 
> > > > > > osd heartbeat grace = 60
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Many thanks for your help
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Andrei
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > _______________________________________________
> > > > > 
> > > > 
> > > 
> > 
> > > > > > ceph-users mailing list
> > > > > 
> > > > 
> > > 
> > 
> > > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > 
> > > > 
> > > 
> > 
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > _______________________________________________
> > > 
> > 
> > > > ceph-users mailing list
> > > 
> > 
> > > > ceph-users@xxxxxxxxxxxxxx
> > > 
> > 
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com