Re: ONE pg deep-scrub blocks cluster

ceph@xxxxxxxxxx · Thu, 25 Aug 2016 13:58:45 +0200

Hey JC,

Thank you very much for your mail!

I will provide the Informations tomorrow when i am at work again.

Hope that we will find a solution :)

- Mehmet

Am 24. August 2016 16:58:58 MESZ, schrieb LOPEZ Jean-Charles <jelopez@xxxxxxxxxx>:
Hi Mehmet,

I’m just seeing your message and read the thread going with it.

Can you please provide me with a copy of the ceph.conf file on the MON and OSD side assuming it’s identical and if the ceph.conf file is different on the client side (the VM side) can you please provide me with a copy of it.

Can you also provide me as attached txt files with
output of your pg query of the pg 0.223?
output of ceph -s
output of ceph df
output of ceph osd df
output of ceph osd dump | grep pool
output of ceph osd crush rule dump

Thank you and I’ll see if I can get something to ease your pain.

As a remark, assuming the size parameter of the rbd pool is set to 3, the number of PGs in your cluster should be higher

If we manage to move forward and get it fixed, we will repost to the mailing list the changes we made to your configuration.

Regards
JC

 On Aug 24, 2016, at 06:41, Mehmet <ceph@xxxxxxxxxx> wrote:

 Hello Guys,

 the issue still exists :(

 If we run a "ceph pg deep-scrub 0.223" nearly all VMs stop for a while (blocked requests).

 - we already replaced the OSDs (SAS Disks - journal on NVMe)
 - Removed OSDs so that acting set for pg 0.223 has changed
 - checked the filesystem on the acting OSDs
 - changed the tunables back from jewel to default
 - changed the tunables again to jewel from default
 - done a deep-scrub on the hole OSDs (ceph osd deep-scrub osd.<id>) - only when a deeph-scrub on pg 0.223 runs we get blocked requests

 The deep-scrub on pg 0.223 took always 13-15 Min. to finish. It does not matter which OSDs are in the acting set for this pg.

 So, i dont have any ideas what coul
 d be
the issue for this.

 As long as "ceph osd set nodeep-scrub" is set - so that no deep-scrub on 0.223 is running - the cluster is fine!

 Could this be a bug?

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 Kernel: 4.4.0-31-generic #50-Ubuntu

 Any ideas?
 - Mehmet

 Am 2016-08-02 17:57, schrieb c:
 Am 2016-08-02 13:30, schrieb c:
 Hello Guys,
 this time without the original acting-set osd.4, 16 and 28. The issue
 still exists...
 [...]
 For the record, this ONLY happens with this PG and no others that
 share
 the same OSDs, right?
 Yes, right.
 [...]
 When doing the deep-scrub, monitor (atop, etc) 
 all 3
nodes and
 see if a
 particular OSD (HDD) stands out, as I would expect it to.
 Now I logged all disks via atop each 2 seconds while the deep-scrub
 was running ( atop -w osdXX_atop 2 ).
 As you expected all disks was 100% busy - with constant 150MB
 (osd.4), 130MB (osd.28) and 170MB (osd.16)...
 - osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
 - osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
 - osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]
 [...]
 But what is causing this? A deep-scrub on all other disks - same
 model and ordered at the same time - seems to not have this issue.
 [...]
 Next week, I will do this
 1.1 Remove osd.4 completely from Ceph - again (the actual primary
 for PG 0.223)
 osd.4 is now removed completely.
 The Primary PG is now on "osd.9"
 # ceph pg map 0.223
 osdmap e8671 pg 0.223 (0.223) -> up [9,16,28] acting [9,16,28]
 1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error
 xfs_repair did not find/show any error
 1.3 ceph pg deep-scrub 0.223
 - Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"
 Because now osd.9 is the Primary PG i have set the debug_osd on this too:
 ceph tell osd.9 injectargs "--debug_osd 5/5"
 and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
 working for a while)
 Start @ 15:33:27
 End @ 15:48:31
 The "ceph.log"
 - http://slexy.org/view/s2WbdApDLz
 The related LogFiles 
 (OSDs
9,16 and 28) and the LogFile via atop for the osds
 LogFile - osd.9 (/dev/sdk)
 - ceph-osd.9.log: http://slexy.org/view/s2kXeLMQyw
 - atop Log: http://slexy.org/view/s21wJG2qr8
 LogFile - osd.16 (/dev/sdh)
 - ceph-osd.16.log: http://slexy.org/view/s20D6WhD4d
 - atop Log: http://slexy.org/view/s2iMjer8rC
 LogFile - osd.28 (/dev/sdm)
 - ceph-osd.28.log: http://slexy.org/view/s21dmXoEo7
 - atop log: http://slexy.org/view/s2gJqzu3uG
 2.1 Remove osd.16 completely from Ceph
 osd.16 is now removed completely - now replaced with osd.17 witihin
 the acting set.
 # ceph pg map 0.223
 osdmap e9017 pg 0.223 (0.223) -> up [9,17,28] acting [9,17,28]
 2.2 xfs_repair -n /dev/sdh1
 xfs_repair did not find/show any error
 2.3 ceph pg deep-scrub 0.223
 - Log with " ceph tell osd.9,17,28 injectargs "--debug_osd 5/5"
 and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
 working for a while)
 Start @ 2016-08-02 10:02:44
 End @ 2016-08-02 10:17:22
 The "Ceph.log": http://slexy.org/view/s2ED5LvuV2
 LogFile - osd.9 (/dev/sdk)
 - ceph-osd.9.log: http://slexy.org/view/s21z9JmwSu
 - atop Log: http://slexy.org/view/s20XjFZFEL
 LogFile - osd.17 (/dev/sdi)
 - ceph-osd.17.log: http://slexy.org/view/s202fpcZS9
 - atop Log: http://slexy.org/view/s2TxeR1JSz
 LogFile - osd.28 (/dev/sdm)
 - ceph-osd.28.log: http://slexy.org/view/s2eCUyC7xV
 - atop log: http://slexy.org/view/s21AfebBqK
 3.1 Remove osd.28 completely from Ceph
 Now osd.28 is also removed completely from Ceph - now replaced with osd.23
 # ceph pg map 0.223
 osdmap e9363 pg 0.223 (0.223) 
 ->
up [9,17,23] acting [9,17,23]
 3.2 xfs_repair -n /dev/sdm1
 As expected: xfs_repair did not find/show any error
 3.3 ceph pg deep-scrub 0.223
 - Log with " ceph tell osd.9,17,23 injectargs "--debug_osd 5/5"
 ... againg nearly all of my VMs stop working for a while...
 Now are all "original" OSDs (4,16,28) removed which was in the
 acting-set when i wrote my first eMail to this mailinglist. But the
 issue still exists with different OSDs (9,17,23) as the acting-set
 while the questionable PG 0.223 is still the same!
 In suspicion that the "tunable" could be the cause, i have now changed
 this back to "default" via " ceph osd crush tunables default ".
 This will take a whille... then i will do " ceph pg deep-scrub 0.223 "
 again (without osds 4,16,28)...
 Really, i do not know whats going on here.
 Ceph finished its recovering to "default" tunables but the issue still
 exists!:*(
 The acting set has changed again
 # ceph pg map 0.223
 osdmap e11230 pg 0.223 (0.223) -> up [9,11,20] acting [9,11,20]
 But when i start " ceph pg deep-scrub 0.223 ", again 
 nearly
all of my
 VMs stop working for a while!
 Does any one have an idea where i should have a look to find the cause for this?
 It seems that everytime the Primary OSD from the acting set of PG
 0.223 (*4*,16,28; *9*,17,23 or *9*,11,20) leads to "currently waiting
 for subops from 9,X" and the deep-scrub takes always nearly 15 minutes
 to finish.
 My output from " ceph pg 0.223 query "
 - http://slexy.org/view/s21d6qUqnV
 Mehmet
 For the records: Although nearly all disks are busy i have no
 slow/blocked requests and i am watching the logfiles for nearly 20
 minutes now...
 Your help is realy appreciated!
 - Mehmet

 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

JC Lopez
S. Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelopez@xxxxxxxxxx
+1 408-680-6959

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com