Re: ceph orch osd spec questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In case anyone was wondering, I figured out the problem...

This nasty bug in Pacific 16.2.10   https://tracker.ceph.com/issues/56031  - I think it is fixed in the upcoming .11 release and in Quincy.

This bug causes the computation of the bluestore DB partition to be much smaller than it should be, so if you request a reasonable size which is smaller than the incorrectly computed maximum size, the DB creation will fail.

Our problem was that we added 3 new SSDs that were considered "unused" by the system, giving us a total of 8 (5 used, 3 unused).   When the orchestrator issues a "ceph-volume lvm batch" command, it passes 40 data devices and 8 db devices.  Normally, you would expect it to divide them into 5 slots per DB device (40/8).   But when it computes the size of the slots, that is where the problem occurs.

ceph-volume first sees the 3 unused devices in a group and incorrectly decides that the slots needed is 3 * 5 = 15 slots, then divides the size of a single DB device by 15, thus making a max DB size 3x smaller than it should be.  If the code had also used the size of all of the devices in the group, then computed the max size, it would have been fine, but it only accounts for the size of the 1st DB device in the group resulting in a size 3x smaller than it should be.

The workaround is to trick ceph into grouping all of the DB devices into unique groups of 1 by putting a minimal VG with a unique name on each of the unused SSDs so that when ceph-volume computes the sizing, it sees groups of 1 and thus doesn't multiply the number of slots incorrectly.   I used "vgcreate bug1 -s 1M /dev/xyz" to create a bogus VG on each of the unused SSDs, now I have properly sized DB devices on the new SSDs (the "bugX" VGs can then be removed once there are legitimate DB VGs on the device).

Question - Because our cluster was initially layed out using the buggy ceph-volume (16.2.10), we now have hundreds of DB devices that are far smaller than they should be (far less than the recommended 1-4% of the data devices size).  Is it possible to resize the DB devices without destroying and recreating the OSD itself?

What are the implications of having bluestore DB devices that are far smaller than they should be?


thanks,
  Wyllys Ingersoll


________________________________
From: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Sent: Friday, January 13, 2023 4:35 PM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  ceph orch osd spec questions


Ceph Pacific 16.2.9

We have a storage server with multiple 1.7TB SSDs dedicated to the bluestore DB usage.  The osd spec originally was misconfigured slightly and had set the "limit" parameter on the db_devices to 5 (there are 8 SSDs available) and did not specify a block_db_size.  ceph layed out the original 40 OSDs and put 8 DBs across 5 of the SSDs (because of limit param).  Ceph seems to have auto-sized the bluestore DB partitions to be about 45GB, which is far less than the recommended 1-4% (using 10TB drives).  How does ceph-volume determine the size of the bluestore DB/WAL partitions when it is not specified in the spec?

We updated the spec and specified a block_db_size of 300G and removed the "limit" value.  Now we can see in the cephadm.log that the ceph-volume command being issued is using the correct list of SSD devices (all 8) as options to the lvm batch (--db-devices ...), but it keeps failing to create the new OSD because we are asking for 300G and it thinks there is only 44G available even though the last 3 SSDs in the list are empty (1.7T).  So, it appears that somehow the orchestrator is ignoring the last 3 SSDs.  I have verified that these SSDs are wiped clean, have no partitions or LVM, and no label (sgdisk -Z, wipefs -a). They appear as available in the inventory and not locked or otherwise in use.

Also, the "db_slots" spec parameter is ignored in pacific due to a bug so there is no way to tell the orchestrator to use "block_db_slots". Adding it to the spec like "block_db_size" fails since it is not recognized.

Any help figuring out why these SSDs are being ignored would be much appreciated.

Our spec for this host looks like this:
---

spec:

  data_devices:

    rotational: 1

    size: '3TB:'

  db_devices:

    rotational: 0

    size: ':2T'

    vendor: 'SEAGATE'

  block_db_size: 300G

---

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux