Re: Device class not deleted/set correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Stefan Kooman <stefan@xxxxxx> writes:

> On 3/23/21 11:00 AM, Nico Schottelius wrote:
>> Stefan Kooman <stefan@xxxxxx> writes:
>>>> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
>>>> this?
>>>
>>> Do you have: osd_class_update_on_start enabled?
>> So this one is a bit funky. It seems to be off, but the behaviour
>> would
>> indicate it isn't. Checking the typical configurations:
>> [10:38:53] black2.place6:~# ceph config-key get
>> config/global/osd_class_update_on_start; echo ""
>> obtained 'config/global/osd_class_update_on_start'
>> false
>> [10:39:59] black2.place6:~# ceph-conf -D | grep
>> osd_class_update_on_start
>> osd_class_update_on_start = true
>> [10:47:24] black2.place6:~# grep osd_class_update_on_start
>> /etc/ceph/ceph.conf
>> [10:52:59] black2.place6:~# ceph config dump | grep
>> osd_class_update_on_start
>> global      advanced osd_class_update_on_start      false
>> [10:53:38] black2.place6:~#
>> So it looks like it's already disabled.
>
> What does a "ceph daemon osd.$id config get osd_class_update_on_start"
> give on that host for an OSD that is running there?

That returns

[12:52:24] server6.place6:~# ceph daemon osd.4 config get osd_class_update_on_start
{
    "osd_class_update_on_start": "false"
}

for all involved OSDs.

> It depends on settings on logging of the OSD daemons, but in our case
> it was logged to the daemon log I believe (or syslog, dunno anymore).

It's so strange, because none of the configurations indicate to use a
"hdd" class. Which, btw, we also don't use in other cases (i.e. none of
our used classes is hdd), so I suspect some builtin to try to setup the
class.

>> I am not sure where ceph-conf reads the value true from, but I
>> assume
>> it's a builtin.
>> I was also searching for osd_class_update_on_start in the Internet
>> and
>> it seems there is no reference to it in the ceph documentation. Do you
>> have any pointers to it?
>
> Not anymore with new Ceph documentation.

Out of curiosity, do you have any clue why it's not in there anymore?

> But the parameter is self
> explaining, it will try to put itself into the proper class at
> startup. Source code: src/common/options.cc
>
>     Option("osd_class_update_on_start", Option::TYPE_BOOL,
>     Option::LEVEL_ADVANCED)
>     .set_default(true)
>     .set_description("set OSD device class on startup"),

The description I am somewhat missing is "set based on which criteria?"

In any case, it seems that the running OSD has the correct class
assigned. However I can see that that OSD has connections open to
unrelated osds:

tcp6       0      0 2a0a:e5c0:2:1:21b:21ff:febc:bf30:6805 2a0a:e5c0:2:1:21b:21ff:febb:68dc:57280 ESTABLISHED 17034/ceph-osd

So something is "not good" or "not correct" with this osd. This
particular one is in a special class that serves 1 pool with only 3 osds
in it. However this osd has around 200 connections established to what I
can see most (all?) other osds in the cluster.

To my understanding, it seems wrong that ceph osds form a complete mesh,
especially if they will never exchange data with the osds they are
connected to.

Can somebody confirm that osds should only connect to osds they share
data with?

And if my assumption is correct: is there any way to tell this osd to
behave correctly and only establish connections to osds of the same
class? (i.e. correctly assigning the class)

Best regards,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux