Re: [External Email] Re: Hardware for new OSD nodes.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eneko,

# ceph health detail
HEALTH_WARN BlueFS spillover detected on 7 OSD(s)
BLUEFS_SPILLOVER BlueFS spillover detected on 7 OSD(s)
     osd.1 spilled over 648 MiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.3 spilled over 613 MiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.4 spilled over 485 MiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.10 spilled over 1008 MiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.17 spilled over 808 MiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.18 spilled over 2.5 GiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device
     osd.20 spilled over 1.5 GiB metadata from 'db' device (28 GiB used of
124 GiB) to slow device

nvme0n1
             259:1    0   1.5T  0 disk
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--6dcbb748--13f5--45cb--9d49--6c78d6589a71
│
             253:1    0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--736a22a8--e4aa--4da9--b63b--295d8f5f2a3d
│
             253:3    0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--751c6623--9870--4123--b551--1fd7fc837341
│
             253:5    0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--2a376e8d--abb1--42af--a4bd--4ae8734d703e
│
             253:7    0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--54fbe282--9b29--422b--bdb2--d7ed730bc589
│
             253:9    0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--c1153cd2--2ec0--4e7f--a3d7--91dac92560ad
│
             253:11   0   124G  0 lvm
├─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--d613f4eb--6ddc--4dd5--a2b5--cb520b6ba922
│
             253:13   0   124G  0 lvm
└─ceph--block--dbs--a2b7a161--d4da--4b86--a191--37564008adca-osd--block--db--41f75c25--67db--46e8--a3fb--ddee9e7f7fc4

             253:15   0   124G  0 lvm

Dave Hall
Binghamton Universitykdhall@xxxxxxxxxxxxxx
607-760-2328 (Cell)
607-777-4641 (Office)

On 10/23/2020 6:00 AM, Eneko Lacunza wrote:

Hi Dave,

El 22/10/20 a las 19:43, Dave Hall escribió:


El 22/10/20 a las 16:48, Dave Hall escribió:


(BTW, Nautilus 14.2.7 on Debian non-container.)

We're about to purchase more OSD nodes for our cluster, but I have a couple
questions about hardware choices.  Our original nodes were 8 x 12TB SAS
drives and a 1.6TB Samsung NVMe card for WAL, DB, etc.

We chose the NVMe card for performance since it has an 8 lane PCIe
interface.  However, we're currently BlueFS spillovers.

The Tyan chassis we are considering has the option of 4 x U.2 NVMe bays -
each with 4 PCIe lanes, (and 8 SAS bays).   It has occurred to me that I
might stripe 4 1TB NVMe drives together to get much more space for WAL/DB
and a net performance of 16 PCIe lanes.

Any thoughts on this approach?

Don't stripe them, if one NVMe fails you'll lose all OSDs. Just use 1 NVMe
drive for 2  SAS drives  and provision 300GB for WAL/DB for each OSD (see
related threads on this mailing list about why that exact size).

This way if a NVMe fails, you'll only lose 2 OSD.

I was under the impression that everything that BlueStore puts on the
SSD/NVMe could be reconstructed from information on the OSD. Am I mistaken
about this?  If so, my single 1.6TB NVMe card is equally vulnerable.


I don't think so, that info only exists on that partition as was the case
with filestore journal. Your single 1.6TB NVMe is vulnerable, yes.


Also, what size of WAL/DB partitions do you have now, and what spillover
size?


I recently posted another question to the list on this topic, since I now
have spillover on 7 of 24 OSDs.  Since the data layout on the NVMe for
BlueStore is not  traditional I've never quite figured out how to get this
information.   The current partition size is 1.6TB /12 since we had the
possibility to add for more drives to each node.  How that was divided
between WAL, DB, etc. is something I'd like to be able to understand.
However, we're not going to add the extra 4 drives, so expanding the LVM
partitions is now a possibility.

Can you paste the warning message? If shows the spillover size. What size
are the partitions on NVMe disk (lsblk)


Cheers
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux