Re: jewel ceph has PG mapped always to the same OSD's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Deep scrub doesn't help.
After some steps (not sure what exact list)
ceph does remap this pg to another osd, but PG doesn't move
# ceph pg map 11.206
osdmap e176314 pg 11.206 (11.206) -> up [955,198,801] acting [787,697]

Hangs in this state forever, 'ceph pg 11.206 query' hangs as well

On Sat, Apr 7, 2018 at 12:42 AM, Konstantin Danilov
<kdanilov@xxxxxxxxxxxx> wrote:
> David,
>
>> What happens when you deep-scrub this PG?
> we haven't try to deep-scrub it, will try.
>
>> What do the OSD logs show for any lines involving the problem PGs?
> Nothing special were logged about this particular osd, except that it's
> degraded.
> Yet osd consume quite a lot portion of its CPU time in
> snappy/leveldb/jemalloc libs.
> In logs there a lot of messages from leveldb about moving data between
> levels.
> Needles to mention that this PG is from RGW index bucket, so it's metadata
> only
> and get a relatively hight load. Yet not we have 3 PG with the same
> behavior from rgw data pool ()cluster have almost all data in RGW
>
>> Was anything happening on your cluster just before this started happening
>> at first?
> Cluster gets many updates in a week before issue, but nothing particularly
> noticeable.
> SSD OSD get's split in two, about 10% of OSD were removed. Some networking
> issues
> appears.
>
> Thanks
>
> On Fri, Apr 6, 2018 at 10:07 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
>>
>> What happens when you deep-scrub this PG?  What do the OSD logs show for
>> any lines involving the problem PGs?  Was anything happening on your cluster
>> just before this started happening at first?
>>
>> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov <kdanilov@xxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi all, we have a strange issue on one cluster.
>>>
>>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>>> matter what how
>>> we change crush map.
>>> The whole picture is next:
>>>
>>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>>> version
>>> * One  PG eventually get into 'active+degraded+incomplete' state. It
>>> was active+clean for a long time
>>> and already has some data. We can't detect the event, which leads it
>>> to this state. Probably it's
>>> happened after some osd was removed from the cluster
>>> * This PG has all 3 required OSD up and running, and all of them
>>> online (pool_sz=3, min_pool_sz=2)
>>> * All requests to pg stack forever, historic_ops shows that it waiting
>>> on "waiting_for_degraded_pg"
>>> * ceph pg query hangs forever
>>> * We can't copy data from another pool as well - copying process hangs
>>> and that fails with
>>> (34) Numerical result out of range
>>>  * We was trying to restart osd's, nodes, mon's with no effects
>>> * Eventually we found that shutting down osd Z(not primary) does solve
>>> the issue, but
>>> only before ceph set this osd out. If we trying to change the weight
>>> of this osd or remove it from cluster problem appears again. Cluster
>>> is working only while osd Z is down and not out and has the default
>>> weight
>>> * Then we have found that doesn't matter what we are doing with crushmap
>>> -
>>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>>> import it back to osdmap and run osdmaptool and always get the same
>>> results
>>> * After several nodes restart and setting osd Z down, but no out we
>>> are now have 3 more PG with the same behaviour, but 'pined' to another
>>> osd's
>>> * We have run osdmaptool from luminous ceph to check if upmap
>>> extension is somehow getting into this osd map - it is not.
>>>
>>> So this is where we are now. Have anyone seen something like this? Any
>>> ideas are welcome. Thanks
>>>
>>>
>>> --
>>> Kostiantyn Danilov
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Kostiantyn Danilov aka koder.ua
> Principal software engineer, Mirantis
>
> skype:koder.ua
> http://koder-ua.blogspot.com/
> http://mirantis.com



-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux