Re: Handling node failures.

Subu Sankara Subramanian <subu.zsked@xxxxxxxxx> · Fri, 12 Nov 2021 16:44:16 -0800

Thanks for the answers - some clarifications:

- RE WAL drive loss: Can I assume the WAL drive is used ONLY if there is
write traffic? IOW, can I protect against a SPOF like that by keeping
clusters cold after the initial data load and serving only reads?
- I do see the docs around 1-4% in terms of SSD size. Is there a
restriction based on IOPS as well? I see mails specifying the ratio of SSD
to HDD should be 1:3 or 1:6 - Am I reading this correctly?

Thanks. Subu

On Fri, Nov 12, 2021 at 4:27 PM prosergey07 <prosergey07@xxxxxxxxx> wrote:

>
>
> > - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a
> new
> >node with the same OSDs and avoid data >shuffling - is this correct?
>
> You can still rebuild the node and add old OSDs and avoid shuffling. Might
> need to enable NOOUT flag while you work on configuration of new node.
>
> >- If the hardware, fails - I assume replacing >the part and rebooting in
> >time will bring back the node as is - is this >right?
>
>  Sounds correct.
>
> >- If the root drive fails, is there a way to bring >up a new host with the
> >same OSDs in the same order but with a >different host name / ip address?
>
> Should be possible as each OSD authenticate with its own credentials which
> should not count on the IP address change. But IP should be in the same
> subnet as the cluster.
>
>
> >FWIW we are using rook, so I am wondering if >the crush map can be
> >configured with some logical labels instead >of host names for this
> purpose
>
>  That should be possible.
>
>
> >-Assuming we use a shared SSD with >partitions for WAL/ Metadata for the
> >whole node - if this drive fails, I assume we >have to recover the entire
> >node. Correct? I remember seeing a note that >this pretty much renders all
> >the relevant OSDs useless.
>
> Thats correct. If DB/WAL is lost, you would have to recover osd which has
> db broken.
>
> >Semi-related: What is the ideal ratio of SSDs >for WAL/metadata to count
> >of OSDs? I remember seeing pdfs from >Redhat showing a 1:10 ratio, The
> >mailing list has references to 1:3 or 1:6. I am >trying to figure out what
> >the right number is.
>
>  It depends. The recommendation is 1-4% of OSD size  for DB. But it
> depends on how many tiny objects you would have which would mainly occupy
> rocksdb (db).
>
> Надіслано з пристрою Galaxy
>
>
> -------- Оригінальне повідомлення --------
> Від: Subu Sankara Subramanian <subu.zsked@xxxxxxxxx>
> Дата: 12.11.21 18:41 (GMT+02:00)
> Кому: ceph-users@xxxxxxx
> Тема:  Handling node failures.
>
> Folks,
>
>   New here - I tried searching for this topic in the archive, couldn't find
> any since 2018 or so. So starting a new thread.  I am looking at the impact
> of node failures. I found this doc:
>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure
> - I have a few questions about this:
>
> -  IIUC, if a root SSD fails, there is pretty much no way to rebuild a new
> node with the same OSDs and avoid data shuffling - is this correct?
> - If the hardware, fails - I assume replacing the part and rebooting in
> time will bring back the node as is - is this right?
> - If the root drive fails, is there a way to bring up a new host with the
> same OSDs in the same order but with a different host name / ip address?
> FWIW we are using rook, so I am wondering if the crush map can be
> configured with some logical labels instead of host names for this purpose
> - Is this possible? ( I am evaluating if I can bring up a new node back
> with the original host name itself - at least the cloud K8s clusters make
> this impossible).
>
> - Assuming we use a shared SSD with partitions for WAL/ Metadata for the
> whole node - if this drive fails, I assume we have to recover the entire
> node. Correct? I remember seeing a note that this pretty much renders all
> the relevant OSDs useless.
> -- Semi-related: What is the ideal ratio of SSDs for WAL/metadata to count
> of OSDs? I remember seeing pdfs from Redhat showing a 1:10 ratio, The
> mailing list has references to 1:3 or 1:6. I am trying to figure out what
> the right number is.
>
> Thanks. Subu
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx