Re: All PGs are active+clean, still remapped PGs

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 26 Oct 2016 08:44:03 +0000 (UTC)

On Wed, 26 Oct 2016, Dan van der Ster wrote:
> On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> >
> >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster <dan@xxxxxxxxxxxxxx>:
> >>
> >>
> >> Hi Wido,
> >>
> >> This seems similar to what our dumpling tunables cluster does when a few
> >> particular osds go down... Though in our case the remapped pgs are
> >> correctly shown as remapped, not clean.
> >>
> >> The fix in our case will be to enable the vary_r tunable (which will move
> >> some data).
> >>
> >
> > Ah, as I figured. I will probably apply the Firefly tunables here. This cluster was upgraded from Dumping to Firefly and to Hammer recently and we didn't change the tunables yet.
> >
> > The MON stores are 35GB each right now and I think they are not trimming due to the pg_temp which still exists.
> >
> > I'll report back later, but this rebalance will take a lot of time.
> 
> I forgot to mention, a workaround for the vary_r issue is to simply
> remove the down/out osd from the crush map. We just hit this issue
> again last night on a failed osd and after removing it from the crush
> map the last degraded PG started backfilling.

Also note that if you do enable vary_r, you can set it to a higher value 
(like 5) to get the benefit without moving as much existing data.  See the 
CRUSH tunable docs for more details!

sage

> 
> Cheers, Dan
> 
> 
> >
> > Wido
> >
> >> Cheers, Dan
> >>
> >> On 24 Oct 2016 22:19, "Wido den Hollander" <wido@xxxxxxxx> wrote:
> >> >
> >> > Hi,
> >> >
> >> > On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 29
> >> remapped PGs according to the OSDMap, but all PGs are active+clean.
> >> >
> >> > osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs
> >> >
> >> > pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects
> >> >     264 TB used, 184 TB / 448 TB avail
> >> >         6144 active+clean
> >> >
> >> > The OSDMap shows:
> >> >
> >> > root@mon1:~# ceph osd dump|grep pg_temp
> >> > pg_temp 4.39 [160,17,10,8]
> >> > pg_temp 4.52 [161,16,10,11]
> >> > pg_temp 4.8b [166,29,10,7]
> >> > pg_temp 4.b1 [5,162,148,2]
> >> > pg_temp 4.168 [95,59,6,2]
> >> > pg_temp 4.1ef [22,162,10,5]
> >> > pg_temp 4.2c9 [164,95,10,7]
> >> > pg_temp 4.330 [165,154,10,8]
> >> > pg_temp 4.353 [2,33,18,54]
> >> > pg_temp 4.3f8 [88,67,10,7]
> >> > pg_temp 4.41a [30,59,10,5]
> >> > pg_temp 4.45f [47,156,21,2]
> >> > pg_temp 4.486 [138,43,10,7]
> >> > pg_temp 4.674 [59,18,7,2]
> >> > pg_temp 4.7b8 [164,68,10,11]
> >> > pg_temp 4.816 [167,147,57,2]
> >> > pg_temp 4.829 [82,45,10,11]
> >> > pg_temp 4.843 [141,34,10,6]
> >> > pg_temp 4.862 [31,160,138,2]
> >> > pg_temp 4.868 [78,67,10,5]
> >> > pg_temp 4.9ca [150,68,10,8]
> >> > pg_temp 4.a83 [156,83,10,7]
> >> > pg_temp 4.a98 [161,94,10,7]
> >> > pg_temp 4.b80 [162,88,10,8]
> >> > pg_temp 4.d41 [163,52,10,6]
> >> > pg_temp 4.d54 [149,140,10,7]
> >> > pg_temp 4.e8e [164,78,10,8]
> >> > pg_temp 4.f2a [150,68,10,6]
> >> > pg_temp 4.ff3 [30,157,10,7]
> >> > root@mon1:~#
> >> >
> >> > So I tried to restart osd.160 and osd.161, but that didn't chance the
> >> state.
> >> >
> >> > root@mon1:~# ceph pg 4.39 query
> >> > {
> >> >     "state": "active+clean",
> >> >     "snap_trimq": "[]",
> >> >     "epoch": 111212,
> >> >     "up": [
> >> >         160,
> >> >         17,
> >> >         8
> >> >     ],
> >> >     "acting": [
> >> >         160,
> >> >         17,
> >> >         8
> >> >     ],
> >> >     "actingbackfill": [
> >> >         "8",
> >> >         "17",
> >> >         "160"
> >> >     ],
> >> >
> >> > In all these PGs osd.10 is involved, but that OSD is down and out. I
> >> tried marking it as down again, but that didn't work.
> >> >
> >> > I haven't tried removing osd.10 yet from the CRUSHMap since that will
> >> trigger a rather large rebalance.
> >> >
> >> > This cluster is still running with the Dumpling tunables though, so that
> >> might be the issue. But before I trigger a very large rebalance I wanted to
> >> check if there are any insights on this one.
> >> >
> >> > Thanks,
> >> >
> >> > Wido
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com