Re: Incomplete pgs and no data movement ( cluster appears readonly )

"Brent Kennedy" <bkennedy@xxxxxxxxxx> · Wed, 10 Jan 2018 16:59:57 -0500

Ugh, that’s what I was hoping to avoid.  OSD 13 is still in the server, I wonder if I could somehow bring it back in as OSD 13 to see if it has the missing data.  

I was looking into using the ceph-objectstore tool, but the only instructions I can find online are sparse and mostly in this lists archive.  I am still trying to get clarification on the data itself as I was hoping to delete it, but even the deletion process doesn’t seem to exist.  All I can find is a force-create feature that seems to force recreate the pg, but again the documentation that is weak as well.

-Brent

-----Original Message-----
From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] 
Sent: Wednesday, January 10, 2018 3:15 PM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: Janne Johansson <icepic.dz@xxxxxxxxx>; Ceph Users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  Incomplete pgs and no data movement ( cluster appears readonly )

On Wed, Jan 10, 2018 at 11:14 AM, Brent Kennedy <bkennedy@xxxxxxxxxx> wrote:
> I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max 
> pg per osd” at 5000 just to see if things would allow data movement.  
> This worked, the new pool I created finished its creation and spread 
> out.  I was able to then copy the data from the existing pool into the 
> new pool and delete the old one.
>
>
>
> Used this process for copying the default pools:
>
> ceph osd pool create .users.email.new 16
>
> rados cppool .users.email .users.email.new
>
> ceph osd pool delete .users.email .users.email 
> --yes-i-really-really-mean-it
>
> ceph osd pool rename .users.email.new .users.email
>
> ceph osd pool application enable .users.email rgw
>
>
>
>
>
> So at this point, I have recreated all the .rgw and .user pools except 
> .rgw.buckets with a pg_num of 16, which significantly reduced the pgs, 
> unfortunately, the incompletes are still there:
>
>
>
>   cluster:
>
>    health: HEALTH_WARN
>
>             Reduced data availability: 4 pgs inactive, 4 pgs 
> incomplete
>
>             Degraded data redundancy: 4 pgs unclean

There seems to have been some confusion here. From your prior thread:

On Thu, Jan 4, 2018 at 9:56 PM, Brent Kennedy <bkennedy@xxxxxxxxxx> wrote:
> We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today.
> During the hammer upgrade to Jewel we lost two host servers

So, if you have size two, and you lose two servers before the data has finished recovering...you've lost data. And that is indeed what "incomplete" means: the PG thinks writes may have happened, but the OSDs which held the data at that time aren't available. You'll need to dive into doing PG recovery with the ceph-objectstore tool and things, or find one of the groups that does consulting around recovery.
-Greg

>
>
>
>   services:
>
>     mon: 3 daemons, quorum mon1,mon2,mon3
>
>     mgr: mon3(active), standbys: mon1, mon2
>
>     osd: 43 osds: 43 up, 43 in
>
>
>
>   data:
>
>     pools:   10 pools, 4240 pgs
>
>     objects: 8148k objects, 10486 GB
>
>     usage:   21536 GB used, 135 TB / 156 TB avail
>
>     pgs:     0.094% pgs not active
>
>              4236 active+clean
>
>              4    incomplete
>
>
>
> The health page is showing blue instead of read on the donut chart, at 
> one point it jumped to green but its back to blue currently.  There 
> are no more ops blocked/delayed either.
>
>
>
> Thanks for assistance, it seems the cluster will play nice now.  Any 
> thoughts on the stuck pgs?  I ran a query on 11.720 and it shows:
>
> "blocked_by": [
>
>                 13,
>
>                 27,
>
>                 28
>
>
>
> OSD 13 was acting strange so I wiped it and removed it from the cluster.
> This was during the rebuild so I wasn’t aware of it blocking.  Now I 
> am trying to figure out how a removed OSD is blocking.  I went through 
> the process to remove it:
>
> ceph osd crush remove
>
> ceph auth del
>
> ceph osd rm
>
>
>
> I guess since the cluster was a hot mess at that point, its possible 
> it was borked and therefore the pg is borked.  I am trying to avoid 
> deleting the data as there is data in the OSDs that are online.
>
>
>
> -Brent
>
>
>
>
>
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
> Of Brent Kennedy
> Sent: Wednesday, January 10, 2018 12:20 PM
> To: 'Janne Johansson' <icepic.dz@xxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Incomplete pgs and no data movement ( 
> cluster appears readonly )
>
>
>
> I change “mon max pg per osd” to 5000 because when I changed it to 
> zero, which was supposed to disable it, it caused an issue where I 
> couldn’t create any pools.  It would say 0 was larger than the 
> minimum.  I imagine that’s a bug, if I wanted it disabled, then it 
> shouldn’t use the calculation.  I then set “osd max pg per osd hard 
> ratio ” to 5 after changing “mon max pg per osd” to 5000, figuring 
> 5*5000 would cover it.  Perhaps not.  I will adjust it to 30 and restart the OSDs.
>
>
>
> -Brent
>
>
>
>
>
>
>
> From: Janne Johansson [mailto:icepic.dz@xxxxxxxxx]
> Sent: Wednesday, January 10, 2018 3:00 AM
> To: Brent Kennedy <bkennedy@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Incomplete pgs and no data movement ( 
> cluster appears readonly )
>
>
>
>
>
>
>
> 2018-01-10 8:51 GMT+01:00 Brent Kennedy <bkennedy@xxxxxxxxxx>:
>
> As per a previous thread, my pgs are set too high.  I tried adjusting 
> the “mon max pg per osd” up higher and higher, which did clear the 
> error(restarted monitors and managers each time), but it seems that 
> data simply wont move around the cluster.  If I stop the primary OSD 
> of an incomplete pg, the cluster just shows those affected pages as
> active+undersized+degraded:
>
>
>
> I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t 
> seem to trigger any data moved.  I did restart the OSDs each time I changed it.
> The data just wont finish moving.  “ceph –w” shows this:
>
> 2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 
> seconds old, received at 2018-01-10 07:33:27.039907: 
> osd_op(client.3542508.0:4097 14.0
> 14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984) 
> currently queued_for_pg
>
>
>
>
>
> Did you bump the ratio so that the PGs per OSD max * hard ratio 
> actually became more than the amount of PGs you had?
>
> Last time you mailed the ratio was 25xx and the max was 200 which 
> meant the ratio would have needed to be far more than 5.0.
>
>
>
>
>
> --
>
> May the most significant bit of your life be positive.
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com