[[
possibly stuff of general interest here so adding ceph-devel back to CC
]]
On Wed, Mar 08, 2023 at 07:49:01AM -0500, Anthony D'Atri wrote:
On Mar 8, 2023, at 12:24 AM, Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
b2$ lsblk -P -p -o 'NAME' | wc -l
924
I’m curious how you have that many devices.
Something like a 90-bay toploader + dmcrypt + shared WAL+DB?
The actual box has 105 bays, head unit + jbod x 2, with currently 87
physical devices - see below for more details on what this box is doing.
The "lsblk" line above is what's run by:
src/ceph-volume/ceph_volume/util/disk.py:lsblk_all()
[[ aside...
I think the general recommendation is that where a command has short and
long versions of the same option, the short option is nice for
interactive use but it's better to use the long option for programming,
i.e. it would be better to have disk.py call lsblk like:
lsblk --pairs --paths --output 'NAME'
]]
But it's significantly overcounting, e.g. I can't see that lsblk_all() is
"unique-ifying" the output as it should:
b2$ lsblk -P -p -o 'NAME' | sort -u | wc -l
393
The lsblk "merge" option might be appropriate for this:
b2$ diff <(lsblk -P -p -o 'NAME' | sort -u) <(lsblk -M -P -p -o 'NAME' | sort) || echo same
same
Perhaps there's more filtering that can / should be done at the lsblk
level to further reduce the device count but I don't know exactly what
ceph is looking for here so can't suggest what might work.
To further expand on this box...
Just to get in ahead of the obvious comments: yes, this box is more than a
little frankenstein and the functions it performs can and should be
separated out. But "for historical reasons" this is where we are
currently, so...
The box is running 21 osds, some with separate WAL+DB. It's also running a
bunch of other stuff, e.g. there are:
- 8 MD devices comprised of 55 of the physical devices which make up
- a 300T XFS filesystem for bulk storage
- some smaller XFS FSs (e.g. root)
- some mirrored SSDs for WAL+DB
- an SSD LV writecache on raid6 (5 devices)
- 56 mapped rbds, each with LV writecache
Overall, the 393 unique lsblk entries are made up of:
84 physical (e.g. /dev/sdx)
47 partition (e.g. /dev/sdx1)
8 md (e.g. /dev/mdx)
56 rbds (e.g. /dev/rbdx)
198 mapper (e.g. /dev/mapper/xxxx)
The mapper stuff is obviously a significant part of my issue. E.g. for a
single mapped rbd with an LV writecache on a 5-device MD raid6 (and there
are currently 56 of these mapped rbds!), "lsblk" shows these entries:
b2$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
sdbp 68:48 0 3.5T 0 disk
└─md10 9:10 0 10.5T 0 raid6
├─aaaaa-fast--bbbbbbbb_cvol 253:46 0 1G 0 lvm
│ └─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
sdbt 68:112 0 3.5T 0 disk
└─md10 9:10 0 10.5T 0 raid6
├─aaaaa-fast-bbbbbbbbb_cvol 253:46 0 1G 0 lvm
│ └─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
sdce 69:32 0 3.5T 0 disk
└─md10 9:10 0 10.5T 0 raid6
├─aaaaa-fast-bbbbbbbbb_cvol 253:46 0 1G 0 lvm
│ └─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
sdci 69:96 0 3.5T 0 disk
└─md10 9:10 0 10.5T 0 raid6
├─aaaaa-fast-bbbbbbbbb_cvol 253:46 0 1G 0 lvm
│ └─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
sdcw 70:64 0 3.5T 0 disk
└─md10 9:10 0 10.5T 0 raid6
├─aaaaa-fast-bbbbbbbbb_cvol 253:46 0 1G 0 lvm
│ └─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
rbd33 252:528 0 1T 0 disk
└─aaaaa-bbbbbbbb_wcorig 253:47 0 1024G 0 lvm
└─aaaaa-bbbbbbbb 253:48 0 1024G 0 lvm /aaaaa/bbbbbbbb
...
So lsblk_all() sees:
b2$ lsblk -M -P -p -o 'NAME' | grep bbbbbbbb
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-bbbbbbbb_wcorig"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
Alternatively:
b2$ lsblk -M -P -p -o 'NAME' | grep bbbbbbbb
NAME="/dev/mapper/aaaaa-fast--bbbbbbbb_cvol"
NAME="/dev/mapper/aaaaa-bbbbbbbb"
NAME="/dev/mapper/aaaaa-bbbbbbbb_wcorig"
Cheers,
Chris