Re: Misplaced objects greater than 100%

Johan Hattne <johan@xxxxxxxxx> · Mon, 3 Apr 2023 12:31:06 -0700

Thanks Mehmet; I took a closer look at what I sent you and the problem 
appears to be in the CRUSH map.  At some point since anything was last 
rebooted, I created rack buckets and moved the OSD nodes in under them:

  # ceph osd crush add-bucket rack-0 rack
  # ceph osd crush add-bucket rack-1 rack

  # ceph osd crush move bcgonen-r0h0 rack=rack-0
  # ceph osd crush move bcgonen-r0h1 rack=rack-0
  # ceph osd crush move bcgonen-r1h0 rack=rack-1

All seemed fine at the time; it was not until bcgonen-r1h0 was rebooted 
that stuff got weird.  But as per "ceph osd tree" output, those rack 
buckets were sitting next to the default root as opposed to under it.

Now that's fixed, and the cluster is backfilling remapped PGs.

// J

On 2023-03-31 16:01, Johan Hattne wrote:
Here goes:

# ceph -s
   cluster:
     id:     e1327a10-8b8c-11ed-88b9-3cecef0e3946
     health: HEALTH_OK

   services:
     mon: 5 daemons, quorum 
bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)
     mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj
     mds: 1/1 daemons up, 2 standby
     osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs

   data:
     volumes: 1/1 healthy
     pools:   3 pools, 1041 pgs
     objects: 5.42M objects, 6.5 TiB
     usage:   19 TiB used, 428 TiB / 447 TiB avail
     pgs:     27087125/16252275 objects misplaced (166.667%)
              1039 active+clean+remapped
              2    active+clean+remapped+scrubbing+deep

# ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-14         149.02008  rack rack-1
  -7         149.02008      host bcgonen-r1h0
  20    hdd   14.55269          osd.20             up   1.00000  1.00000
  21    hdd   14.55269          osd.21             up   1.00000  1.00000
  22    hdd   14.55269          osd.22             up   1.00000  1.00000
  23    hdd   14.55269          osd.23             up   1.00000  1.00000
  24    hdd   14.55269          osd.24             up   1.00000  1.00000
  25    hdd   14.55269          osd.25             up   1.00000  1.00000
  26    hdd   14.55269          osd.26             up   1.00000  1.00000
  27    hdd   14.55269          osd.27             up   1.00000  1.00000
  28    hdd   14.55269          osd.28             up   1.00000  1.00000
  29    hdd   14.55269          osd.29             up   1.00000  1.00000
  34    ssd    1.74660          osd.34             up   1.00000  1.00000
  35    ssd    1.74660          osd.35             up   1.00000  1.00000
-13         298.04016  rack rack-0
  -3         149.02008      host bcgonen-r0h0
   0    hdd   14.55269          osd.0              up   1.00000  1.00000
   1    hdd   14.55269          osd.1              up   1.00000  1.00000
   2    hdd   14.55269          osd.2              up   1.00000  1.00000
   3    hdd   14.55269          osd.3              up   1.00000  1.00000
   4    hdd   14.55269          osd.4              up   1.00000  1.00000
   5    hdd   14.55269          osd.5              up   1.00000  1.00000
   6    hdd   14.55269          osd.6              up   1.00000  1.00000
   7    hdd   14.55269          osd.7              up   1.00000  1.00000
   8    hdd   14.55269          osd.8              up   1.00000  1.00000
   9    hdd   14.55269          osd.9              up   1.00000  1.00000
  30    ssd    1.74660          osd.30             up   1.00000  1.00000
  31    ssd    1.74660          osd.31             up   1.00000  1.00000
  -5         149.02008      host bcgonen-r0h1
  10    hdd   14.55269          osd.10             up   1.00000  1.00000
  11    hdd   14.55269          osd.11             up   1.00000  1.00000
  12    hdd   14.55269          osd.12             up   1.00000  1.00000
  13    hdd   14.55269          osd.13             up   1.00000  1.00000
  14    hdd   14.55269          osd.14             up   1.00000  1.00000
  15    hdd   14.55269          osd.15             up   1.00000  1.00000
  16    hdd   14.55269          osd.16             up   1.00000  1.00000
  17    hdd   14.55269          osd.17             up   1.00000  1.00000
  18    hdd   14.55269          osd.18             up   1.00000  1.00000
  19    hdd   14.55269          osd.19             up   1.00000  1.00000
  32    ssd    1.74660          osd.32             up   1.00000  1.00000
  33    ssd    1.74660          osd.33             up   1.00000  1.00000
  -1                 0  root default

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags 
hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 
9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 
pg_num_min 16 recovery_priority 5 application cephfs
pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on 
last_change 7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0 
application cephfs

crush_rules 1 and 2 are just used to assign the data and meta pool to 
HDD and SSD, respectively (failure domain: host).

// J

On 2023-03-31 15:37, ceph@xxxxxxxxxx wrote:
Need to know some more about your cluster...

Ceph -s
Ceph osd df tree
Replica or ec?
...

Perhaps this can give us some insight
Mehmet

Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne <johan@xxxxxxxxx>:

    Dear all;

    Up until a few hours ago, I had a seemingly normally-behaving 
cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of 
its 6 nodes.  The cluster is only used for CephFS and the only 
non-standard configuration I can think of is that I had 2 active MDSs, 
but only 1 standby.  I had also doubled mds_cache_memory limit to 8 GB 
(all OSD hosts have 256 G of RAM) at some point in the past.

    Then I rebooted one of the OSD nodes.  The rebooted node held one 
of the active MDSs.  Now the node is back up: ceph -s says the cluster 
is healthy, but all PGs are in a active+clean+remapped state and 
166.67% of the objects are misplaced (dashboard: -66.66% healthy).

    The data pool is a threefold replica with 5.4M object,  the number 
of misplaced objects is reported as 27087410/16252446.  The 
denominator in the ratio makes sense to me (16.2M / 3 = 5.4M), but the 
numerator does not.  I also note that the ratio is *exactly* 5 / 3.  
The filesystem is still mounted and appears to be usable, but df 
reports it as 100% full; I suspect it would say 167% but that is 
capped somewhere.

    Any ideas about what is going on?  Any suggestions for recovery?

    // Best wishes; Johan

------------------------------------------------------------------------
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx