Re: Misplaced objects greater than 100%

Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> · Thu, 6 Apr 2023 08:08:22 +0200

Perhaps this option triggered the crush map change:

osd crush update on start

Each time the OSD starts, it verifies it is in the correct location in
the CRUSH map and, if it is not, it moves itself.

 https://docs.ceph.com/en/quincy/rados/operations/crush-map/

Joachim

Johan Hattne <johan@xxxxxxxxx> schrieb am Mi., 5. Apr. 2023, 22:21:

> I think this is resolved—and you're right about the 0-weight of the root
> bucket being strange. I had created the rack buckets with
>
> # ceph osd crush add-bucket rack-0 rack
>
> whereas I should have used something like
>
> # ceph osd crush add-bucket rack-0 rack root=default
>
> There's a bit in the documentation
> (https://docs.ceph.com/en/quincy/rados/operations/crush-map) that says
> "Not all keys need to be specified" (in a different context, I admit).
>
> I might have saved a second or two by omitting "root=default" and maybe
> half a minute by not checking the CRUSH map carefully afterwards.  It
> was not worth it.
>
> // J
>
> On 2023-04-05 12:01, ceph@xxxxxxxxxx wrote:
> > I guess this is related to your crush rules..
> > Unfortunaly i dont know much about creating the rules...
> >
> > But someone cloud give more insights when you also provide
> >
> > crush rule dump
> >
> > .... your "-1 0 root default" is a bit strange
> >
> >
> > Am 1. April 2023 01:01:39 MESZ schrieb Johan Hattne <johan@xxxxxxxxx>:
> >
> >     Here goes:
> >
> >     # ceph -s
> >        cluster:
> >          id:     e1327a10-8b8c-11ed-88b9-3cecef0e3946
> >          health: HEALTH_OK
> >
> >        services:
> >          mon: 5 daemons, quorum
> bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)
> >          mgr: bcgonen-b.furndm(active, since 8d), standbys:
> bcgonen-a.qmmqxj
> >          mds: 1/1 daemons up, 2 standby
> >          osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041
> remapped pgs
> >
> >        data:
> >          volumes: 1/1 healthy
> >          pools:   3 pools, 1041 pgs
> >          objects: 5.42M objects, 6.5 TiB
> >          usage:   19 TiB used, 428 TiB / 447 TiB avail
> >          pgs:     27087125/16252275 objects misplaced (166.667%)
> >                   1039 active+clean+remapped
> >                   2    active+clean+remapped+scrubbing+deep
> >
> >     # ceph osd tree
> >     ID   CLASS  WEIGHT     TYPE NAME              STATUS  REWEIGHT
> PRI-AFF
> >     -14         149.02008  rack rack-1
> >       -7         149.02008      host bcgonen-r1h0
> >       20    hdd   14.55269          osd.20             up   1.00000
> 1.00000
> >       21    hdd   14.55269          osd.21             up   1.00000
> 1.00000
> >       22    hdd   14.55269          osd.22             up   1.00000
> 1.00000
> >       23    hdd   14.55269          osd.23             up   1.00000
> 1.00000
> >       24    hdd   14.55269          osd.24             up   1.00000
> 1.00000
> >       25    hdd   14.55269          osd.25             up   1.00000
> 1.00000
> >       26    hdd   14.55269          osd.26             up   1.00000
> 1.00000
> >       27    hdd   14.55269          osd.27             up   1.00000
> 1.00000
> >       28    hdd   14.55269          osd.28             up   1.00000
> 1.00000
> >       29    hdd   14.55269          osd.29             up   1.00000
> 1.00000
> >       34    ssd    1.74660          osd.34             up   1.00000
> 1.00000
> >       35    ssd    1.74660          osd.35             up   1.00000
> 1.00000
> >     -13         298.04016  rack rack-0
> >       -3         149.02008      host bcgonen-r0h0
> >        0    hdd   14.55269          osd.0              up   1.00000
> 1.00000
> >        1    hdd   14.55269          osd.1              up   1.00000
> 1.00000
> >        2    hdd   14.55269          osd.2              up   1.00000
> 1.00000
> >        3    hdd   14.55269          osd.3              up   1.00000
> 1.00000
> >        4    hdd   14.55269          osd.4              up   1.00000
> 1.00000
> >        5    hdd   14.55269          osd.5              up   1.00000
> 1.00000
> >        6    hdd   14.55269          osd.6              up   1.00000
> 1.00000
> >        7    hdd   14.55269          osd.7              up   1.00000
> 1.00000
> >        8    hdd   14.55269          osd.8              up   1.00000
> 1.00000
> >        9    hdd   14.55269          osd.9              up   1.00000
> 1.00000
> >       30    ssd    1.74660          osd.30             up   1.00000
> 1.00000
> >       31    ssd    1.74660          osd.31             up   1.00000
> 1.00000
> >       -5         149.02008      host bcgonen-r0h1
> >       10    hdd   14.55269          osd.10             up   1.00000
> 1.00000
> >       11    hdd   14.55269          osd.11             up   1.00000
> 1.00000
> >       12    hdd   14.55269          osd.12             up   1.00000
> 1.00000
> >       13    hdd   14.55269          osd.13             up   1.00000
> 1.00000
> >       14    hdd   14.55269          osd.14             up   1.00000
> 1.00000
> >       15    hdd   14.55269          osd.15             up   1.00000
> 1.00000
> >       16    hdd   14.55269          osd.16             up   1.00000
> 1.00000
> >       17    hdd   14.55269          osd.17             up   1.00000
> 1.00000
> >       18    hdd   14.55269          osd.18             up   1.00000
> 1.00000
> >       19    hdd   14.55269          osd.19             up   1.00000
> 1.00000
> >       32    ssd    1.74660          osd.32             up   1.00000
> 1.00000
> >       33    ssd    1.74660          osd.33             up   1.00000
> 1.00000
> >       -1                 0  root default
> >
> >     # ceph osd pool ls detail
> >     pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
> >     pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule
> 2 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change
> 9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 16 recovery_priority 5 application cephfs
> >     pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule
> 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on
> last_change 7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0
> application cephfs
> >
> >     crush_rules 1 and 2 are just used to assign the data and meta pool
> to HDD and SSD, respectively (failure domain: host).
> >
> >     // J
> >
> >     On 2023-03-31 15:37, ceph@xxxxxxxxxx wrote:
> >
> >         Need to know some more about your cluster...
> >
> >         Ceph -s
> >         Ceph osd df tree
> >         Replica or ec?
> >         ...
> >
> >         Perhaps this can give us some insight
> >         Mehmet
> >
> >         Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne
> >         <johan@xxxxxxxxx>:
> >
> >         Dear all;
> >
> >         Up until a few hours ago, I had a seemingly normally-behaving
> >         cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across
> >         3 of its 6 nodes. The cluster is only used for CephFS and the
> >         only non-standard configuration I can think of is that I had 2
> >         active MDSs, but only 1 standby. I had also doubled
> >         mds_cache_memory limit to 8 GB (all OSD hosts have 256 G of RAM)
> >         at some point in the past.
> >
> >         Then I rebooted one of the OSD nodes. The rebooted node held one
> >         of the active MDSs. Now the node is back up: ceph -s says the
> >         cluster is healthy, but all PGs are in a active+clean+remapped
> >         state and 166.67% of the objects are misplaced (dashboard:
> >         -66.66% healthy).
> >
> >         The data pool is a threefold replica with 5.4M object, the
> >         number of misplaced objects is reported as 27087410/16252446.
> >         The denominator in the ratio makes sense to me (16.2M / 3 =
> >         5.4M), but the numerator does not. I also note that the ratio is
> >         *exactly* 5 / 3. The filesystem is still mounted and appears to
> >         be usable, but df reports it as 100% full; I suspect it would
> >         say 167% but that is capped somewhere.
> >
> >         Any ideas about what is going on? Any suggestions for recovery?
> >
> >         // Best wishes; Johan
> >
>  ------------------------------------------------------------------------
> >         ceph-users mailing list -- ceph-users@xxxxxxx
> >         To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx