Thanks for the answers - some clarifications: - RE WAL drive loss: Can I assume the WAL drive is used ONLY if there is write traffic? IOW, can I protect against a SPOF like that by keeping clusters cold after the initial data load and serving only reads? - I do see the docs around 1-4% in terms of SSD size. Is there a restriction based on IOPS as well? I see mails specifying the ratio of SSD to HDD should be 1:3 or 1:6 - Am I reading this correctly? Thanks. Subu On Fri, Nov 12, 2021 at 4:27 PM prosergey07 <prosergey07@xxxxxxxxx> wrote: > > > > - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a > new > >node with the same OSDs and avoid data >shuffling - is this correct? > > You can still rebuild the node and add old OSDs and avoid shuffling. Might > need to enable NOOUT flag while you work on configuration of new node. > > >- If the hardware, fails - I assume replacing >the part and rebooting in > >time will bring back the node as is - is this >right? > > Sounds correct. > > >- If the root drive fails, is there a way to bring >up a new host with the > >same OSDs in the same order but with a >different host name / ip address? > > Should be possible as each OSD authenticate with its own credentials which > should not count on the IP address change. But IP should be in the same > subnet as the cluster. > > > >FWIW we are using rook, so I am wondering if >the crush map can be > >configured with some logical labels instead >of host names for this > purpose > > That should be possible. > > > >-Assuming we use a shared SSD with >partitions for WAL/ Metadata for the > >whole node - if this drive fails, I assume we >have to recover the entire > >node. Correct? I remember seeing a note that >this pretty much renders all > >the relevant OSDs useless. > > Thats correct. If DB/WAL is lost, you would have to recover osd which has > db broken. > > >Semi-related: What is the ideal ratio of SSDs >for WAL/metadata to count > >of OSDs? I remember seeing pdfs from >Redhat showing a 1:10 ratio, The > >mailing list has references to 1:3 or 1:6. I am >trying to figure out what > >the right number is. > > It depends. The recommendation is 1-4% of OSD size for DB. But it > depends on how many tiny objects you would have which would mainly occupy > rocksdb (db). > > Надіслано з пристрою Galaxy > > > -------- Оригінальне повідомлення -------- > Від: Subu Sankara Subramanian <subu.zsked@xxxxxxxxx> > Дата: 12.11.21 18:41 (GMT+02:00) > Кому: ceph-users@xxxxxxx > Тема: Handling node failures. > > Folks, > > New here - I tried searching for this topic in the archive, couldn't find > any since 2018 or so. So starting a new thread. I am looking at the impact > of node failures. I found this doc: > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure > - I have a few questions about this: > > - IIUC, if a root SSD fails, there is pretty much no way to rebuild a new > node with the same OSDs and avoid data shuffling - is this correct? > - If the hardware, fails - I assume replacing the part and rebooting in > time will bring back the node as is - is this right? > - If the root drive fails, is there a way to bring up a new host with the > same OSDs in the same order but with a different host name / ip address? > FWIW we are using rook, so I am wondering if the crush map can be > configured with some logical labels instead of host names for this purpose > - Is this possible? ( I am evaluating if I can bring up a new node back > with the original host name itself - at least the cloud K8s clusters make > this impossible). > > - Assuming we use a shared SSD with partitions for WAL/ Metadata for the > whole node - if this drive fails, I assume we have to recover the entire > node. Correct? I remember seeing a note that this pretty much renders all > the relevant OSDs useless. > -- Semi-related: What is the ideal ratio of SSDs for WAL/metadata to count > of OSDs? I remember seeing pdfs from Redhat showing a 1:10 ratio, The > mailing list has references to 1:3 or 1:6. I am trying to figure out what > the right number is. > > Thanks. Subu > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx