Deep scrub doesn't help. After some steps (not sure what exact list) ceph does remap this pg to another osd, but PG doesn't move # ceph pg map 11.206 osdmap e176314 pg 11.206 (11.206) -> up [955,198,801] acting [787,697] Hangs in this state forever, 'ceph pg 11.206 query' hangs as well On Sat, Apr 7, 2018 at 12:42 AM, Konstantin Danilov <kdanilov@xxxxxxxxxxxx> wrote: > David, > >> What happens when you deep-scrub this PG? > we haven't try to deep-scrub it, will try. > >> What do the OSD logs show for any lines involving the problem PGs? > Nothing special were logged about this particular osd, except that it's > degraded. > Yet osd consume quite a lot portion of its CPU time in > snappy/leveldb/jemalloc libs. > In logs there a lot of messages from leveldb about moving data between > levels. > Needles to mention that this PG is from RGW index bucket, so it's metadata > only > and get a relatively hight load. Yet not we have 3 PG with the same > behavior from rgw data pool ()cluster have almost all data in RGW > >> Was anything happening on your cluster just before this started happening >> at first? > Cluster gets many updates in a week before issue, but nothing particularly > noticeable. > SSD OSD get's split in two, about 10% of OSD were removed. Some networking > issues > appears. > > Thanks > > On Fri, Apr 6, 2018 at 10:07 PM, David Turner <drakonstein@xxxxxxxxx> wrote: >> >> What happens when you deep-scrub this PG? What do the OSD logs show for >> any lines involving the problem PGs? Was anything happening on your cluster >> just before this started happening at first? >> >> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov <kdanilov@xxxxxxxxxxxx> >> wrote: >>> >>> Hi all, we have a strange issue on one cluster. >>> >>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't >>> matter what how >>> we change crush map. >>> The whole picture is next: >>> >>> * This is 10.2.7 ceph version, all monitors and osd's have the same >>> version >>> * One PG eventually get into 'active+degraded+incomplete' state. It >>> was active+clean for a long time >>> and already has some data. We can't detect the event, which leads it >>> to this state. Probably it's >>> happened after some osd was removed from the cluster >>> * This PG has all 3 required OSD up and running, and all of them >>> online (pool_sz=3, min_pool_sz=2) >>> * All requests to pg stack forever, historic_ops shows that it waiting >>> on "waiting_for_degraded_pg" >>> * ceph pg query hangs forever >>> * We can't copy data from another pool as well - copying process hangs >>> and that fails with >>> (34) Numerical result out of range >>> * We was trying to restart osd's, nodes, mon's with no effects >>> * Eventually we found that shutting down osd Z(not primary) does solve >>> the issue, but >>> only before ceph set this osd out. If we trying to change the weight >>> of this osd or remove it from cluster problem appears again. Cluster >>> is working only while osd Z is down and not out and has the default >>> weight >>> * Then we have found that doesn't matter what we are doing with crushmap >>> - >>> osdmaptool --test-map-pgs-dump always put this PG to the same set of >>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map >>> to remove nodes with OSD X,Y and Z completely out of it, compile it, >>> import it back to osdmap and run osdmaptool and always get the same >>> results >>> * After several nodes restart and setting osd Z down, but no out we >>> are now have 3 more PG with the same behaviour, but 'pined' to another >>> osd's >>> * We have run osdmaptool from luminous ceph to check if upmap >>> extension is somehow getting into this osd map - it is not. >>> >>> So this is where we are now. Have anyone seen something like this? Any >>> ideas are welcome. Thanks >>> >>> >>> -- >>> Kostiantyn Danilov >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Kostiantyn Danilov aka koder.ua > Principal software engineer, Mirantis > > skype:koder.ua > http://koder-ua.blogspot.com/ > http://mirantis.com -- Kostiantyn Danilov aka koder.ua Principal software engineer, Mirantis skype:koder.ua http://koder-ua.blogspot.com/ http://mirantis.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com