Re: unknown PGs after adding hosts in different subtree

Eugen Block <eblock@xxxxxx> · Fri, 24 May 2024 08:58:48 +0000

Hi Frank,

thanks for looking up those trackers. I haven't looked into them yet,  
I'll read your response in detail later, but I wanted to add some new  
observation:

I added another root bucket (custom) to the osd tree:

# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-12               0  root custom
 -1         0.27698  root default
 -8         0.09399      room room1
 -3         0.04700          host host1
  7    hdd  0.02299              osd.7       up   1.00000  1.00000
 10    hdd  0.02299              osd.10      up   1.00000  1.00000
...

Then I tried this approach to add a new host directly to the non-default root:

# cat host5.yaml
service_type: host
hostname: host5
addr: 192.168.168.54
location:
  root: custom
labels:
   - osd

# ceph orch apply -i host5.yaml

# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-12         0.04678  root custom
-23         0.04678      host host5
  1    hdd  0.02339          osd.1           up   1.00000  1.00000
 13    hdd  0.02339          osd.13          up   1.00000  1.00000
 -1         0.27698  root default
 -8         0.09399      room room1
 -3         0.04700          host host1
  7    hdd  0.02299              osd.7       up   1.00000  1.00000
 10    hdd  0.02299              osd.10      up   1.00000  1.00000
...

host5 is placed directly underneath the new custom root correctly, but  
not a single PG is marked "remapped"! So this is actually what I (or  
we) expected. I'm not sure yet what to make of it, but I'm leaning  
towards using this approach in the future and add hosts underneath a  
different root first, and then move it to its designated location.

Just to validate again, I added host6 without a location spec, so it's  
placed underneath the default root again:

# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-12         0.04678  root custom
-23         0.04678      host host5
  1    hdd  0.02339          osd.1           up   1.00000  1.00000
 13    hdd  0.02339          osd.13          up   1.00000  1.00000
 -1         0.32376  root default
-25         0.04678      host host6
 14    hdd  0.02339          osd.14          up   1.00000  1.00000
 15    hdd  0.02339          osd.15          up   1.00000  1.00000
 -8         0.09399      room room1
 -3         0.04700          host host1
...

And this leads to remapped PGs again. I assume this must be related to  
the default root. I'm gonna investigate further.

Thanks!
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Eugen,

just to add another strangeness observation from long ago:  
https://www.spinics.net/lists/ceph-users/msg74655.html. I didn't see  
any reweights in your trees, so its something else. However, there  
seem to be multiple issues with EC pools and peering.

I also want to clarify:

If this is the case, it is possible that this is partly intentional  
and partly buggy.

"Partly intentional" here means the code behaviour changes when you  
add OSDs to the root outside the rooms and this change is not  
considered a bug. It is clearly *not* expected as it means you  
cannot do maintenance on a pool living on a tree A without affecting  
pools on the same device class living on an unmodified subtree of A.

From a ceph user's point of view everything you observe looks buggy.  
I would really like to see a good explanation why the mappings in  
the subtree *should* change when adding OSDs above that subtree as  
in your case when the expectation for good reasons is that they  
don't. This would help devising clean procedures for adding hosts  
when you (and I) want to add OSDs first without any peering and then  
move OSDs into place to have it happen separate from adding and not  
a total mess with everything in parallel.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Thursday, May 23, 2024 6:32 PM
To: Eugen Block
Cc: ceph-users@xxxxxxx
Subject:  Re: unknown PGs after adding hosts in different subtree

Hi Eugen,

I'm at home now. Could you please check all the remapped PGs that  
they have no shards on the new OSDs, i.e. its just shuffling around  
mappings within the same set of OSDs under rooms?

If this is the case, it is possible that this is partly intentional  
and partly buggy. The remapping is then probably intentional and the  
method I use with a disjoint tree for new hosts prevents such  
remappings initially (the crush code sees the new OSDs in the root,  
doesn't use them but their presence does change choice orders  
resulting in remapped PGs). However, the unknown PGs should clearly  
not occur.

I'm afraid that the peering code has quite a few bugs, I reported  
something at least similarly weird a long time ago:  
https://tracker.ceph.com/issues/56995 and  
https://tracker.ceph.com/issues/46847. Might even be related. It  
looks like peering can loose track of PG members in certain  
situations (specifically after adding OSDs until rebalancing  
completed). In my cases, I get degraded objects even though  
everything is obviously still around. Flipping between the  
crush-maps before/after the change re-discovers everything again.

Issue 46847 is long-standing and still unresolved. In case you need  
to file a tracker, please consider to refer to the two above as well  
as "might be related" if you deem that they might be related.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx