Re: Smarter DB disk replacement

Reed Dier <reed.dier@xxxxxxxxxxx> · Mon, 13 Sep 2021 13:23:48 -0500

I've been eyeing a similar icydock product (https://www.icydock.com/goods.php?id=309 <https://www.icydock.com/goods.php?id=309>) for make M.2 drives more serviceable.
While M.2 isn't ideal, if you have a 2U/4U box with a ton of available slots in the back, you could use these with some Micron 7300 MAX or like M.2's for WAL/DB.
In theory would make identifying failed M.2 easier/quicker, and allow hot-servicing, rather than say an on-motherboard slot, requiring a full server pull to service.

Curious if anyone has experience with it yet.

Reed

> On Sep 9, 2021, at 12:36 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> 
> I don't think the bigger tier 1 enterprise vendors have really jumped on, but I've been curious to see if anyone would create a dense hotswap m.2 setup (possibly combined with traditional 3.5" HDD bays).  The only vendor I've really seen even attempt something like this is icydock:
> 
> 
> https://www.icydock.com/goods.php?id=287
> 
> 
> 8 NVMe m.2 devices in a single 5.25" bay.  They also have another version that does 6 m.2 in 2x3.5".  You could imagine that one of the tier 1 enterprise vendors could probably do something similar on the back of a traditional 12-bay 2U 3.5" chassis.  Stick in some moderately sized high write endurance m.2 devices and you're looking at something like 2 OSD DB/WAL per NVMe.  As it is, 6:1 with 2x2.5" seems to be pretty typical and isn't terrible if you use decent drives.
> 
> Mark
> 
> On 9/9/21 12:04 PM, David Orman wrote:
>> Exactly, we minimize the blast radius/data destruction by allocating
>> more devices for DB/WAL of smaller size than less of larger size. We
>> encountered this same issue on an earlier iteration of our hardware
>> design. With rotational drives and NVMEs, we are now aiming for a 6:1
>> ratio based on our CRUSH rules/rotational disk sizing/nvme
>> sizing/server sizing/EC setup/etc.
>> 
>> Make sure to use write-friendly NVMEs for DB/WAL and the failures
>> should be much fewer and further between.
>> 
>> On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson <icepic.dz@xxxxxxxxx> wrote:
>>> Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad <michal.strnad@xxxxxxxxx>:
>>>>  When the disk with DB died
>>>> it will cause inaccessibility of all depended OSDs (six or eight in our
>>>> environment),
>>>> How do you do it in your environment?
>>> Have two ssds for 8 OSDs, so only half go away when one ssd dies.
>>> 
>>> --
>>> May the most significant bit of your life be positive.
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx