Re: blocked ops

Roeland Mertens <roeland.mertens@xxxxxxxxxxxxxxx> · Fri, 12 Aug 2016 16:15:08 +0100

Hi Brad,

thanks for that. We were able to track the blocked ops down to the cache 
layer in front of our EC rgw pool, which contains the utterly broken pg, 
so it may be related. We removed the cache layer and now the blocked ops 
have moved to the primary OSD for the broken pg with the debug logging 
pointing at the broken pg being the culprit.

The pg is in a horrendous state due to multiple disk losses (exceeding 
the EC m value) and attempts at "fixing" it :(

I've put the output of pg query here : http://pastebin.com/EXjup33D and 
the osd log specifically for the pg here: http://pastebin.com/0UpRJnUZ

and I've put some logs here: http://pastebin.com/Br1u8wTj

If anyone can give us a hand getting this resolved we'd greatly 
appreciate as we'd rather not have to zap a pool containing 100TB of 
data just because of a pg containing only about 0.1% of that data

kind regards,

Roeland

On 12/08/16 08:10, Brad Hubbard wrote:
On Fri, Aug 12, 2016 at 07:47:54AM +0100, roeland mertens wrote:
Hi Brad,

thank you for that. Unfortunately our immediate concern is the blocked ops
rather than the broken pg (we know why its broken).
OK, if you look at the following file it shows not only the declaration of
wait_for_blocked_object (highlighted) but also all of it's callers.

https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.cc#L500

Multiple calls relate to snapshots but I'd suggest turning debug logging for
the OSDs right up may give us more information.

# ceph tell osd.* injectargs '--debug_osd 20 --debug_ms 5'

Note: The above will turn up debugging for all OSDs, you may want to only
focus on some so adjust accordingly.

I don't think that's specifically crushmap related nor related to the
broken pg as the osds involved in the blocked ops aren't the ones that were
hosting the broken pg.

On 12 August 2016 at 04:12, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:

On Thu, Aug 11, 2016 at 11:33:29PM +0100, Roeland Mertens wrote:
Hi,

I was hoping someone on this list may be able to help?

We're running a 35 node 10.2.1 cluster with 595 OSDs. For the last 12
hours
we've been plagued with blocked requests which completely kills the
performance of the cluster

# ceph health detail
HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs
down; 1
pgs peering; 1 pgs stuck inactive; 100 requests are blocked > 32 sec; 1
osds
have slow requests; noout,nodeep-scrub,sortbitwise flag(s) set
pg 63.1a18 is stuck inactive for 135133.509820, current state
down+remapped+peering, last acting [2147483647,2147483647,
2147483647,2147483647,2147483647,2147483647,235,148,290,300,147,157,370]

That value (2147483647) is defined in src/crush/crush.h like so;

#define CRUSH_ITEM_NONE   0x7fffffff  /* no result */

So this could be due to a bad crush rule or maybe choose_total_tries needs
to
be higher?

$ ceph osd crush rule ls

For each rule listed by the above command.

$ ceph osd crush rule dump [rule_name]

I'd then dump out the crushmap and test it showing any bad mappings with
the
commands listed here;

http://docs.ceph.com/docs/master/rados/troubleshooting/
troubleshooting-pg/#crush-gives-up-too-soon

I'd also check the pg numbers for your pool(s) are appropriate as not
enough
pgs could also be a contributing factor IIRC.

That should hopefully give some insight.

--
HTH,
Brad

pg 63.1a18 is down+remapped+peering, acting [2147483647,2147483647,
2147483647,2147483647,2147483647,2147483647,235,148,290,300,147,157,370]
100 ops are blocked > 2097.15 sec on osd.4
1 osds have slow requests
noout,nodeep-scrub,sortbitwise flag(s) set

the one pg down is due to us running into an odd EC issue which I mailed
the
list about earlier, it's the 100 blocked ops that are puzzling us. If we
out
the osd in question, they just shift to another osd (on a different
host!).
We even tried rebooting the node it's on but to little avail.

We get a ton of log messages like this:

2016-08-11 23:32:10.041174 7fc668d9f700  0 log_channel(cluster) log
[WRN] :
100 slow requests, 5 included below; oldest blocked for > 139.313915 secs
2016-08-11 23:32:10.041184 7fc668d9f700  0 log_channel(cluster) log
[WRN] :
slow request 139.267004 seconds old, received at 2016-08-11
23:29:50.774091:
osd_op(client.9192464.0:485640 66.b96c3a18
default.4282484.42_442fac8195c63a2e19c3c4bb91e8800e [getxattrs,stat,read
0~524288] snapc 0=[] RETRY=36 ack+retry+read+known_if_redirected e50109)
currently waiting for blocked object
2016-08-11 23:32:10.041189 7fc668d9f700  0 log_channel(cluster) log
[WRN] :
slow request 139.244839 seconds old, received at 2016-08-11
23:29:50.796256:
osd_op(client.9192464.0:596033 66.942a5a18
default.4282484.30__shadow_.sLkZ_rUX6cvi0ifFasw1UipEIuFPzYB_6 [write
1048576~524288] snapc 0=[] RETRY=36
ack+ondisk+retry+write+known_if_redirected e50109) currently waiting for
blocked object

A dump of the blocked ops tells us very little , is there anyone who can
shed some light on this? Or at least give us a hint on how we can fix
this?
# ceph daemon osd.4 dump_blocked_ops
....

        {
             "description": "osd_op(client.9192464.0:596030 66.942a5a18
default.4282484.30__shadow_.sLkZ_rUX6cvi0ifFasw1UipEIuFPzYB_6 [writefull
0~0] snapc 0=[] RETRY=32 ack+ondisk+retry+write+known_if_redirected
e50092)",
             "initiated_at": "2016-08-11 22:58:09.721027",
             "age": 1515.105186,
             "duration": 1515.113255,
             "type_data": [
                 "reached pg",
                 {
                     "client": "client.9192464",
                     "tid": 596030
                 },
                 [
                     {
                         "time": "2016-08-11 22:58:09.721027",
                         "event": "initiated"
                     },
                     {
                         "time": "2016-08-11 22:58:09.721066",
                         "event": "waiting_for_map not empty"
                     },
                     {
                         "time": "2016-08-11 22:58:09.813574",
                         "event": "reached_pg"
                     },
                     {
                         "time": "2016-08-11 22:58:09.813581",
                         "event": "waiting for peered"
                     },
                     {
                         "time": "2016-08-11 22:58:09.852796",
                         "event": "reached_pg"
                     },
                     {
                         "time": "2016-08-11 22:58:09.852804",
                         "event": "waiting for peered"
                     },
                     {
                         "time": "2016-08-11 22:58:10.876636",
                         "event": "reached_pg"
                     },
                     {
                         "time": "2016-08-11 22:58:10.876640",
                         "event": "waiting for peered"
                     },
                     {
                         "time": "2016-08-11 22:58:10.902760",
                         "event": "reached_pg"
                     }
                 ]
             ]
         }
...

Kind regards,

Roeland

--
This email is sent on behalf of Genomics plc, a public limited company
registered in England and Wales with registered number 8839972, VAT
registered number 189 2635 65 and registered office at King Charles
House,
Park End Street, Oxford, OX1 1JD, United Kingdom.
The contents of this e-mail and any attachments are confidential to the
intended recipient. If you are not the intended recipient please do not
use
or publish its contents, contact Genomics plc immediately at
info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not
copy,
forward, use or disclose the contents of this email to anybody else if
you
are not the intended recipient. Emails are not secure and may contain
viruses.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
--
Roeland Mertens
Systems Engineer - Genomics PLC

--
This email is sent on behalf of Genomics plc, a public limited company
registered in England and Wales with registered number 8839972, VAT
registered number 189 2635 65 and registered office at King Charles House,
Park End Street, Oxford, OX1 1JD, United Kingdom.
The contents of this e-mail and any attachments are confidential to the
intended recipient. If you are not the intended recipient please do not use
or publish its contents, contact Genomics plc immediately at
info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not copy,
forward, use or disclose the contents of this email to anybody else if you
are not the intended recipient. Emails are not secure and may contain
viruses.

--
This email is sent on behalf of Genomics plc, a public limited company 
registered in England and Wales with registered number 8839972, VAT 
registered number 189 2635 65 and registered office at King Charles House, 
Park End Street, Oxford, OX1 1JD, United Kingdom.
The contents of this e-mail and any attachments are confidential to the 
intended recipient. If you are not the intended recipient please do not use 
or publish its contents, contact Genomics plc immediately at 
info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not copy, 
forward, use or disclose the contents of this email to anybody else if you 
are not the intended recipient. Emails are not secure and may contain 
viruses.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com