Re: Handling node failures.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OSD will probably not sart if wal device is lost. You can give a try by
removing the corresponding link to the block device from
/var/lib/ceph/osd/ceph-ID/block.wal. Or it will use block.db for wal in
that case.

IOPS should be counted as well. I would go 1:3 way if we are considering
IOPS. But its still better to use NVME for db and wal.

сб, 13 лист. 2021, 02:44 користувач Subu Sankara Subramanian <
subu.zsked@xxxxxxxxx> пише:

> Thanks for the answers - some clarifications:
>
> - RE WAL drive loss: Can I assume the WAL drive is used ONLY if there is
> write traffic? IOW, can I protect against a SPOF like that by keeping
> clusters cold after the initial data load and serving only reads?
> - I do see the docs around 1-4% in terms of SSD size. Is there a
> restriction based on IOPS as well? I see mails specifying the ratio of SSD
> to HDD should be 1:3 or 1:6 - Am I reading this correctly?
>
> Thanks. Subu
>
> On Fri, Nov 12, 2021 at 4:27 PM prosergey07 <prosergey07@xxxxxxxxx> wrote:
>
>>
>>
>> > - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a
>> new
>> >node with the same OSDs and avoid data >shuffling - is this correct?
>>
>> You can still rebuild the node and add old OSDs and avoid shuffling.
>> Might need to enable NOOUT flag while you work on configuration of new node.
>>
>> >- If the hardware, fails - I assume replacing >the part and rebooting in
>> >time will bring back the node as is - is this >right?
>>
>>  Sounds correct.
>>
>> >- If the root drive fails, is there a way to bring >up a new host with
>> the
>> >same OSDs in the same order but with a >different host name / ip address?
>>
>> Should be possible as each OSD authenticate with its own credentials
>> which should not count on the IP address change. But IP should be in the
>> same subnet as the cluster.
>>
>>
>> >FWIW we are using rook, so I am wondering if >the crush map can be
>> >configured with some logical labels instead >of host names for this
>> purpose
>>
>>  That should be possible.
>>
>>
>> >-Assuming we use a shared SSD with >partitions for WAL/ Metadata for the
>> >whole node - if this drive fails, I assume we >have to recover the entire
>> >node. Correct? I remember seeing a note that >this pretty much renders
>> all
>> >the relevant OSDs useless.
>>
>> Thats correct. If DB/WAL is lost, you would have to recover osd which has
>> db broken.
>>
>> >Semi-related: What is the ideal ratio of SSDs >for WAL/metadata to count
>> >of OSDs? I remember seeing pdfs from >Redhat showing a 1:10 ratio, The
>> >mailing list has references to 1:3 or 1:6. I am >trying to figure out
>> what
>> >the right number is.
>>
>>  It depends. The recommendation is 1-4% of OSD size  for DB. But it
>> depends on how many tiny objects you would have which would mainly occupy
>> rocksdb (db).
>>
>> Надіслано з пристрою Galaxy
>>
>>
>> -------- Оригінальне повідомлення --------
>> Від: Subu Sankara Subramanian <subu.zsked@xxxxxxxxx>
>> Дата: 12.11.21 18:41 (GMT+02:00)
>> Кому: ceph-users@xxxxxxx
>> Тема:  Handling node failures.
>>
>> Folks,
>>
>>   New here - I tried searching for this topic in the archive, couldn't
>> find
>> any since 2018 or so. So starting a new thread.  I am looking at the
>> impact
>> of node failures. I found this doc:
>>
>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure
>> - I have a few questions about this:
>>
>> -  IIUC, if a root SSD fails, there is pretty much no way to rebuild a new
>> node with the same OSDs and avoid data shuffling - is this correct?
>> - If the hardware, fails - I assume replacing the part and rebooting in
>> time will bring back the node as is - is this right?
>> - If the root drive fails, is there a way to bring up a new host with the
>> same OSDs in the same order but with a different host name / ip address?
>> FWIW we are using rook, so I am wondering if the crush map can be
>> configured with some logical labels instead of host names for this purpose
>> - Is this possible? ( I am evaluating if I can bring up a new node back
>> with the original host name itself - at least the cloud K8s clusters make
>> this impossible).
>>
>> - Assuming we use a shared SSD with partitions for WAL/ Metadata for the
>> whole node - if this drive fails, I assume we have to recover the entire
>> node. Correct? I remember seeing a note that this pretty much renders all
>> the relevant OSDs useless.
>> -- Semi-related: What is the ideal ratio of SSDs for WAL/metadata to count
>> of OSDs? I remember seeing pdfs from Redhat showing a 1:10 ratio, The
>> mailing list has references to 1:3 or 1:6. I am trying to figure out what
>> the right number is.
>>
>> Thanks. Subu
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux