Re: backfill_toofull after adding new OSDs

Caspar Smit <casparsmit@xxxxxxxxxxx> · Thu, 31 Jan 2019 14:43:07 +0100

Hi Jan,
You might be hitting the same issue as Wido here:

https://www.spinics.net/lists/ceph-users/msg50603.html

Kind regards,
Caspar

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak <kas@xxxxxxxxxx>:
        Hello, ceph users,

I see the following HEALTH_ERR during cluster rebalance:

        Degraded data redundancy (low space): 8 pgs backfill_toofull

Detailed description:

I have upgraded my cluster to mimic and added 16 new bluestore OSDs

on 4 hosts. The hosts are in a separate region in my crush map, and crush

rules prevented data to be moved on the new OSDs. Now I want to move

all data to the new OSDs (and possibly decomission the old filestore OSDs).

I have created the following rule:

# ceph osd crush rule create-replicated on-newhosts newhostsroot host

after this, I am slowly moving the pools one-by-one to this new rule:

# ceph osd pool set test-hdd-pool crush_rule on-newhosts

When I do this, I get the above error. This is misleading, because

ceph osd df does not suggest the OSDs are getting full (the most full

OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR

disappears. Why am I getting this error?

# ceph -s

  cluster:

    id:     ...my UUID...

    health: HEALTH_ERR

            1271/3803223 objects misplaced (0.033%)

            Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized

            Degraded data redundancy (low space): 8 pgs backfill_toofull

  services:

    mon: 3 daemons, quorum mon1,mon2,mon3

    mgr: mon2(active), standbys: mon1, mon3

    osd: 80 osds: 80 up, 80 in; 90 remapped pgs

    rgw: 1 daemon active

  data:

    pools:   13 pools, 5056 pgs

    objects: 1.27 M objects, 4.8 TiB

    usage:   15 TiB used, 208 TiB / 224 TiB avail

    pgs:     40124/3803223 objects degraded (1.055%)

             1271/3803223 objects misplaced (0.033%)

             4963 active+clean

             41   active+recovery_wait+undersized+degraded+remapped

             21   active+recovery_wait+undersized+degraded

             17   active+remapped+backfill_wait

             5    active+remapped+backfill_wait+backfill_toofull

             3    active+remapped+backfill_toofull

             2    active+recovering+undersized+remapped

             2    active+recovering+undersized+degraded+remapped

             1    active+clean+remapped

             1    active+recovering+undersized+degraded

  io:

    client:   6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr

    recovery: 2.0 MiB/s, 92 objects/s

Thanks for any hint,

-Yenya

-- 

| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |

| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |

 This is the world we live in: the way to deal with computers is to google

 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com