Stefan Kooman <stefan@xxxxxx> writes: > On 3/23/21 11:00 AM, Nico Schottelius wrote: >> Stefan Kooman <stefan@xxxxxx> writes: >>>> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix >>>> this? >>> >>> Do you have: osd_class_update_on_start enabled? >> So this one is a bit funky. It seems to be off, but the behaviour >> would >> indicate it isn't. Checking the typical configurations: >> [10:38:53] black2.place6:~# ceph config-key get >> config/global/osd_class_update_on_start; echo "" >> obtained 'config/global/osd_class_update_on_start' >> false >> [10:39:59] black2.place6:~# ceph-conf -D | grep >> osd_class_update_on_start >> osd_class_update_on_start = true >> [10:47:24] black2.place6:~# grep osd_class_update_on_start >> /etc/ceph/ceph.conf >> [10:52:59] black2.place6:~# ceph config dump | grep >> osd_class_update_on_start >> global advanced osd_class_update_on_start false >> [10:53:38] black2.place6:~# >> So it looks like it's already disabled. > > What does a "ceph daemon osd.$id config get osd_class_update_on_start" > give on that host for an OSD that is running there? That returns [12:52:24] server6.place6:~# ceph daemon osd.4 config get osd_class_update_on_start { "osd_class_update_on_start": "false" } for all involved OSDs. > It depends on settings on logging of the OSD daemons, but in our case > it was logged to the daemon log I believe (or syslog, dunno anymore). It's so strange, because none of the configurations indicate to use a "hdd" class. Which, btw, we also don't use in other cases (i.e. none of our used classes is hdd), so I suspect some builtin to try to setup the class. >> I am not sure where ceph-conf reads the value true from, but I >> assume >> it's a builtin. >> I was also searching for osd_class_update_on_start in the Internet >> and >> it seems there is no reference to it in the ceph documentation. Do you >> have any pointers to it? > > Not anymore with new Ceph documentation. Out of curiosity, do you have any clue why it's not in there anymore? > But the parameter is self > explaining, it will try to put itself into the proper class at > startup. Source code: src/common/options.cc > > Option("osd_class_update_on_start", Option::TYPE_BOOL, > Option::LEVEL_ADVANCED) > .set_default(true) > .set_description("set OSD device class on startup"), The description I am somewhat missing is "set based on which criteria?" In any case, it seems that the running OSD has the correct class assigned. However I can see that that OSD has connections open to unrelated osds: tcp6 0 0 2a0a:e5c0:2:1:21b:21ff:febc:bf30:6805 2a0a:e5c0:2:1:21b:21ff:febb:68dc:57280 ESTABLISHED 17034/ceph-osd So something is "not good" or "not correct" with this osd. This particular one is in a special class that serves 1 pool with only 3 osds in it. However this osd has around 200 connections established to what I can see most (all?) other osds in the cluster. To my understanding, it seems wrong that ceph osds form a complete mesh, especially if they will never exchange data with the osds they are connected to. Can somebody confirm that osds should only connect to osds they share data with? And if my assumption is correct: is there any way to tell this osd to behave correctly and only establish connections to osds of the same class? (i.e. correctly assigning the class) Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx