Working ceph cluster reports large amount of pgs in state unknown/undersized and objects degraded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We operate a tiny ceph cluster (v16.2.7) across three machines, each running two OSDs and one of each mds, mgr, and mon. The cluster serves one main erasure-coded (2+1) storage pool and a few other management-related pools. The cluster has been running smoothly for several months. A few weeks ago we noticed a health warning reporting backfillfull/nearfull osds and pools. Here is the output of `ceph -s` at this point (extraced from logs):

--------------------------------------------------------------------------------
  cluster:
    health: HEALTH_WARN
            1 backfillfull osd(s)
            2 nearfull osd(s)
            Reduced data availability: 163 pgs inactive, 1 pg peering
            Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull             Degraded data redundancy: 1486709/10911157 objects degraded (13.626%), 68 pgs degraded, 68 pgs undersized
            162 pgs not scrubbed in time
            6 pool(s) backfillfull

  services:
    mon: 3 daemons, quorum mon.101,mon.102,mon.100 (age 5m)
    mgr: mgr-102(active, since 54m), standbys: mgr-101, mgr-100
    mds: 1/1 daemons up, 1 standby, 1 hot standby
    osd: 6 osds: 6 up (since 4m), 6 in (since 2w); 7 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   6 pools, 338 pgs
    objects: 3.64M objects, 14 TiB
    usage:   13 TiB used, 1.7 TiB / 15 TiB avail
    pgs:     47.929% pgs unknown
             0.296% pgs not active
             1486709/10911157 objects degraded (13.626%)
             52771/10911157 objects misplaced (0.484%)
             162 unknown
             106 active+clean
             67  active+undersized+degraded
             1 active+undersized+degraded+remapped+backfill_toofull
             1   remapped+peering
             1   active+remapped+backfill_toofull
--------------------------------------------------------------------------------

I now see the large amount of pgs in state unknown and the fact that a significant fraction of objects is degraded despite all osds being up, but we didn't notice this back then. Because the cluster continued to act fine from the perspective of the mounted filesystem, we didn't really notice the potential problem and did not intervene. From then one, things have mostly gone downwards. Now, `ceph -s` reports the following:

--------------------------------------------------------------------------------
  cluster:
    health: HEALTH_WARN
            noout flag(s) set
            Reduced data availability: 117 pgs inactive
            Degraded data redundancy: 2095625/12121767 objects degraded (17.288%), 114 pgs degraded, 114 pgs undersized
            117 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum mon.101,mon.102,mon.100 (age 15h)
    mgr: mgr-102(active, since 7d), standbys: mgr-100, mgr-101
    mds: 1/1 daemons up, 1 standby, 1 hot standby
    osd: 6 osds: 6 up (since 55m), 6 in (since 3w)
         flags noout

  data:
    volumes: 1/1 healthy
    pools:   6 pools, 338 pgs
    objects: 4.04M objects, 15 TiB
    usage:   12 TiB used, 2.8 TiB / 15 TiB avail
    pgs:     34.615% pgs unknown
             2095625/12121767 objects degraded (17.288%)
             117 unknown
             114 active+undersized+degraded
             107 active+clean
--------------------------------------------------------------------------------

Note in particular the still very large number of pgs in state unknown, which hasn't changed in days. Same goes for the degraded pgs. Also, the cluster should have around 37TiB storage available but now it only reports 15 TiB. We did a bit of digging around but couldn't really get to the bottom of the unknown pgs and how we can recover from that. One other data point is that the command `ceph osd df tree` gets stuck on two of the three machines and one the one where it returns something, it looks like this:

--------------------------------------------------------------------------------
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP META    AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME  -1         47.67506         -      0 B      0 B      0 B     0 B     0 B      0 B      0     0    -          root default -13         18.26408         -      0 B      0 B      0 B     0 B     0 B      0 B      0     0    -              datacenter dc.100  -5         18.26408         -      0 B      0 B      0 B     0 B     0 B      0 B      0     0    -                  host osd-100   3    hdd  10.91409   1.00000      0 B      0 B      0 B     0 B     0 B      0 B      0     0   91      up osd.3   5    hdd   7.34999   1.00000      0 B      0 B      0 B     0 B     0 B      0 B      0     0   48      up osd.5  -9         14.69998         -      0 B      0 B      0 B     0 B     0 B      0 B      0     0    -              datacenter dc.101  -7         14.69998         -      0 B      0 B      0 B     0 B     0 B      0 B      0     0    -                  host osd-101   0    hdd   7.34999   1.00000      0 B      0 B      0 B     0 B     0 B      0 B      0     0   83      up osd.0   1    hdd   7.34999   1.00000      0 B      0 B      0 B     0 B     0 B      0 B      0     0   86      up osd.1 -11         14.71100         -   15 TiB   12 TiB   12 TiB  77 MiB 21 GiB  2.6 TiB  82.00  1.00    -              datacenter dc.102 -17          7.35550         -  7.4 TiB  6.3 TiB  6.2 TiB  16 MiB 11 GiB  1.1 TiB  85.16  1.04    -                  host osdroid-102-1   4    hdd   7.35550   1.00000  7.4 TiB  6.3 TiB  6.2 TiB  16 MiB 11 GiB  1.1 TiB  85.16  1.04  114      up osd.4 -15          7.35550         -  7.4 TiB  5.8 TiB  5.7 TiB  61 MiB 10 GiB  1.6 TiB  78.83  0.96    -                  host osdroid-102-2   2    hdd   7.35550   1.00000  7.4 TiB  5.8 TiB  5.7 TiB  61 MiB 10 GiB  1.6 TiB  78.83  0.96  107      up osd.2                          TOTAL   15 TiB   12 TiB   12 TiB  77 MiB 21 GiB  2.6 TiB 82.00
MIN/MAX VAR: 0/1.04  STDDEV: 66.97
--------------------------------------------------------------------------------

The odd part here is that for some reason only osd.2 and osd.4 seem to contribute size to the cluster. Interestingly, accessing content from the storage pool works mostly without issues, which shouldn't work if 4 out of 6 OSDs weren't properly up.

Even more odd is that while `ceph health detail` reports a lot of pgs in state unknown, undersized, and degraded, inspecting the respective pgs with `ceph pg <pdid> query` results in active+clean for *all* of them... I'm not sure which of the two pieces of information I am supposed to trust...

Any ideas what we can do to get our cluster back into a sane state? I'm happy to provide more logs or command output, please let me know.

Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux