Re: ONE pg deep-scrub blocks cluster

Mehmet <ceph@xxxxxxxxxx> · Wed, 24 Aug 2016 15:41:55 +0200

Hello Guys,

the issue still exists :(

If we run a "ceph pg deep-scrub 0.223" nearly all VMs stop for a while 
(blocked requests).

- we already replaced the OSDs (SAS Disks - journal on NVMe)
- Removed OSDs so that acting set for pg 0.223 has changed
- checked the filesystem on the acting OSDs
- changed the tunables back from jewel to default
- changed the tunables again to jewel from default
- done a deep-scrub on the hole OSDs (ceph osd deep-scrub osd.<id>) - 
only when a deeph-scrub on pg 0.223 runs we get blocked requests

The deep-scrub on pg 0.223 took always 13-15 Min. to finish. It does not 
matter which OSDs are in the acting set for this pg.

So, i dont have any ideas what could be the issue for this.

As long as "ceph osd set nodeep-scrub" is set - so that no deep-scrub on 
0.223 is running - the cluster is fine!

Could this be a bug?

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
Kernel: 4.4.0-31-generic #50-Ubuntu

Any ideas?
- Mehmet

Am 2016-08-02 17:57, schrieb c:
Am 2016-08-02 13:30, schrieb c:
Hello Guys,

this time without the original acting-set osd.4, 16 and 28. The issue
still exists...

[...]
For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?

Yes, right.
[...]
When doing the deep-scrub, monitor (atop, etc) all 3 nodes and
see if a
particular OSD (HDD) stands out, as I would expect it to.

Now I logged all disks via atop each 2 seconds while the deep-scrub
was running ( atop -w osdXX_atop 2 ).
As you expected all disks was 100% busy - with constant 150MB
(osd.4), 130MB (osd.28) and 170MB (osd.16)...

- osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
- osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
- osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]
[...]
But what is causing this? A deep-scrub on all other disks - same
model and ordered at the same time - seems to not have this issue.
[...]
Next week, I will do this

1.1 Remove osd.4 completely from Ceph - again (the actual primary
for PG 0.223)

osd.4 is now removed completely.
The Primary PG is now on "osd.9"

# ceph pg map 0.223
osdmap e8671 pg 0.223 (0.223) -> up [9,16,28] acting [9,16,28]

1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error

xfs_repair did not find/show any error

1.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"

Because now osd.9 is the Primary PG i have set the debug_osd on this 
too:
ceph tell osd.9 injectargs "--debug_osd 5/5"

and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
working for a while)
Start @ 15:33:27
End @ 15:48:31

The "ceph.log"
- http://slexy.org/view/s2WbdApDLz

The related LogFiles (OSDs 9,16 and 28) and the LogFile via atop for 
the osds

LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s2kXeLMQyw
- atop Log: http://slexy.org/view/s21wJG2qr8

LogFile - osd.16 (/dev/sdh)
- ceph-osd.16.log: http://slexy.org/view/s20D6WhD4d
- atop Log: http://slexy.org/view/s2iMjer8rC

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s21dmXoEo7
- atop log: http://slexy.org/view/s2gJqzu3uG

2.1 Remove osd.16 completely from Ceph

osd.16 is now removed completely - now replaced with osd.17 witihin
the acting set.

# ceph pg map 0.223
osdmap e9017 pg 0.223 (0.223) -> up [9,17,28] acting [9,17,28]

2.2 xfs_repair -n /dev/sdh1

xfs_repair did not find/show any error

2.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,28 injectargs "--debug_osd 5/5"

and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
working for a while)

Start @ 2016-08-02 10:02:44
End @ 2016-08-02 10:17:22

The "Ceph.log": http://slexy.org/view/s2ED5LvuV2

LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s21z9JmwSu
- atop Log: http://slexy.org/view/s20XjFZFEL

LogFile - osd.17 (/dev/sdi)
- ceph-osd.17.log: http://slexy.org/view/s202fpcZS9
- atop Log: http://slexy.org/view/s2TxeR1JSz

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s2eCUyC7xV
- atop log: http://slexy.org/view/s21AfebBqK

3.1 Remove osd.28 completely from Ceph

Now osd.28 is also removed completely from Ceph - now replaced with 
osd.23

# ceph pg map 0.223
osdmap e9363 pg 0.223 (0.223) -> up [9,17,23] acting [9,17,23]

3.2 xfs_repair -n /dev/sdm1

As expected: xfs_repair did not find/show any error

3.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,23 injectargs "--debug_osd 5/5"

... againg nearly all of my VMs stop working for a while...

Now are all "original" OSDs (4,16,28) removed which was in the
acting-set when i wrote my first eMail to this mailinglist. But the
issue still exists with different OSDs (9,17,23) as the acting-set
while the questionable PG 0.223 is still the same!

In suspicion that the "tunable" could be the cause, i have now changed
this back to "default" via " ceph osd crush tunables default ".
This will take a whille... then i will do " ceph pg deep-scrub 0.223 "
again (without osds 4,16,28)...

Really, i do not know whats going on here.

Ceph finished its recovering to "default" tunables but the issue still
exists!:*(

The acting set has changed again

# ceph pg map 0.223
osdmap e11230 pg 0.223 (0.223) -> up [9,11,20] acting [9,11,20]

But when i start " ceph pg deep-scrub 0.223 ", again nearly all of my
VMs stop working for a while!

Does any one have an idea where i should have a look to find the cause 
for this?

It seems that everytime the Primary OSD from the acting set of PG
0.223 (*4*,16,28; *9*,17,23 or *9*,11,20) leads to "currently waiting
for subops from 9,X" and the deep-scrub takes always nearly 15 minutes
to finish.

My output from " ceph pg 0.223 query "

- http://slexy.org/view/s21d6qUqnV

Mehmet

For the records: Although nearly all disks are busy i have no
slow/blocked requests and i am watching the logfiles for nearly 20
minutes now...

Your help is realy appreciated!
- Mehmet

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com