Re: Odd object blocking IO on PG

Nick Fisk <nick@xxxxxxxxxx> · Wed, 13 Dec 2017 11:13:41 -0000

On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk <nick@xxxxxxxxxx> wrote:

> That doesn't look like an RBD object -- any idea who is
> "client.34720596.1:212637720"?

So I think these might be proxy ops from the cache tier, as there are also
block ops on one of the cache tier OSD's, but this time it actually lists
the object name. Block op on cache tier.

           "description": "osd_op(client.34720596.1:212637720 17.ae78c1cf
17:f3831e75:::rbd_data.15a5e20238e1f29.00000000000388ad:head [set-alloc-hint
object_size 4194304 write_size 4194304,write 2584576~16384] snapc 0=[]
RETRY=2 ondisk+retry+write+known_if_redirected e104841)",
            "initiated_at": "2017-12-12 16:25:32.435718",
            "age": 13996.681147,
            "duration": 13996.681203,
            "type_data": {
                "flag_point": "reached pg",
                "client_info": {
                    "client": "client.34720596",
                    "client_addr": "10.3.31.41:0/2600619462",
                    "tid": 212637720

I'm a bit baffled at the moment what's going. The pg query (attached) is not
showing in the main status that it has been blocked from peering or that
there are any missing objects. I've tried restarting all OSD's I can see
relating to the PG in case they needed a bit of a nudge.

Did that fix anything? I don't see anything immediately obvious but I'm not practiced in quickly reading that pg state output.

What's the output of "ceph -s"?

Hi Greg,

No restarting OSD’s didn’t seem to help. But I did make some progress late last night. By stopping OSD.68 the cluster unlocks itself and IO can progress. However as soon as it starts back up, 0.1cf and a couple of other PG’s again get stuck in an activating state. If I out the OSD, either with it up or down, then some other PG’s seem to get hit by the same problem as CRUSH moves PG mappings around to other OSD’s.

So there definitely seems to be some sort of weird peering issue somewhere. I have seen a very similar issue before on this cluster where after running the crush reweight script to balance OSD utilization, the weight got set too low and PG’s were unable to peer. I’m not convinced this is what’s happening here as all the weights haven’t changed, but I’m intending to explore this further just in case.

With 68 down
    pgs:     1071783/48650631 objects degraded (2.203%)
             5923 active+clean
             399  active+undersized+degraded
             7    active+clean+scrubbing+deep
             7    active+clean+remapped

With it up
    pgs:     0.047% pgs not active
             67271/48651279 objects degraded (0.138%)
             15602/48651279 objects misplaced (0.032%)
             6051 active+clean
             273  active+recovery_wait+degraded
             4    active+clean+scrubbing+deep
             4    active+remapped+backfill_wait
             3    activating+remapped
active+recovering+degraded

PG Dump
ceph pg dump | grep activatin
dumped all
2.389         0                  0        0         0       0           0 1500     1500           activating+remapped 2017-12-13 11:08:50.990526      76271'34230    106239:160310 [68,60,58,59,29,23]         68 [62,60,58,59,29,23]             62      76271'34230 2017-12-13 09:00:08.359690      76271'34230 2017-12-10 10:05:10.931366
0.1cf      3947                  0        0         0       0 16472186880 1577     1577           activating+remapped 2017-12-13 11:08:50.641034   106236'7512915   106239:6176548           [34,68,8]         34           [34,8,53]             34   106138'7512682 2017-12-13 10:27:37.400613   106138'7512682 2017-12-13 10:27:37.400613
2.210         0                  0        0         0       0           0 1500     1500           activating+remapped 2017-12-13 11:08:50.686193      76271'33304     106239:96797 [68,67,34,36,16,15]         68 [62,67,34,36,16,15]             62      76271'33304 2017-12-12 00:49:21.038437      76271'33304 2017-12-10 16:05:12.751425

>
> On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > Does anyone know what this object (0.ae78c1cf) might be, it's not your
> > normal run of the mill RBD object and I can't seem to find it in the
> > pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck
> > in an
> > activating+remapped state and blocking IO. Pool 0 is just a pure RBD
> > activating+pool
> > with a cache tier above it. There is no current mention of unfound
> > objects or any other obvious issues.
> >
> > There is some backfilling going on, on another OSD which was upgraded
> > to bluestore, which was when the issue started. But I can't see any
> > link in the PG dump with upgraded OSD. My only thought so far is to
> > wait for this backfilling to finish and then deep-scrub this PG and
> > see if that reveals anything?
> >
> > Thanks,
> > Nick
> >
> >  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> > (undecoded)
> > ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> > e105014)",
> >             "initiated_at": "2017-12-12 17:10:50.030660",
> >             "age": 335.948290,
> >             "duration": 335.948383,
> >             "type_data": {
> >                 "flag_point": "delayed",
> >                 "events": [
> >                     {
> >                         "time": "2017-12-12 17:10:50.030660",
> >                         "event": "initiated"
> >                     },
> >                     {
> >                         "time": "2017-12-12 17:10:50.030692",
> >                         "event": "queued_for_pg"
> >                     },
> >                     {
> >                         "time": "2017-12-12 17:10:50.030719",
> >                         "event": "reached_pg"
> >                     },
> >                     {
> >                         "time": "2017-12-12 17:10:50.030727",
> >                         "event": "waiting for peered"
> >                     },
> >                     {
> >                         "time": "2017-12-12 17:10:50.197353",
> >                         "event": "reached_pg"
> >                     },
> >                     {
> >                         "time": "2017-12-12 17:10:50.197355",
> >                         "event": "waiting for peered"
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com