OSD will probably not sart if wal device is lost. You can give a try by removing the corresponding link to the block device from /var/lib/ceph/osd/ceph-ID/block.wal. Or it will use block.db for wal in that case. IOPS should be counted as well. I would go 1:3 way if we are considering IOPS. But its still better to use NVME for db and wal. сб, 13 лист. 2021, 02:44 користувач Subu Sankara Subramanian < subu.zsked@xxxxxxxxx> пише: > Thanks for the answers - some clarifications: > > - RE WAL drive loss: Can I assume the WAL drive is used ONLY if there is > write traffic? IOW, can I protect against a SPOF like that by keeping > clusters cold after the initial data load and serving only reads? > - I do see the docs around 1-4% in terms of SSD size. Is there a > restriction based on IOPS as well? I see mails specifying the ratio of SSD > to HDD should be 1:3 or 1:6 - Am I reading this correctly? > > Thanks. Subu > > On Fri, Nov 12, 2021 at 4:27 PM prosergey07 <prosergey07@xxxxxxxxx> wrote: > >> >> >> > - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a >> new >> >node with the same OSDs and avoid data >shuffling - is this correct? >> >> You can still rebuild the node and add old OSDs and avoid shuffling. >> Might need to enable NOOUT flag while you work on configuration of new node. >> >> >- If the hardware, fails - I assume replacing >the part and rebooting in >> >time will bring back the node as is - is this >right? >> >> Sounds correct. >> >> >- If the root drive fails, is there a way to bring >up a new host with >> the >> >same OSDs in the same order but with a >different host name / ip address? >> >> Should be possible as each OSD authenticate with its own credentials >> which should not count on the IP address change. But IP should be in the >> same subnet as the cluster. >> >> >> >FWIW we are using rook, so I am wondering if >the crush map can be >> >configured with some logical labels instead >of host names for this >> purpose >> >> That should be possible. >> >> >> >-Assuming we use a shared SSD with >partitions for WAL/ Metadata for the >> >whole node - if this drive fails, I assume we >have to recover the entire >> >node. Correct? I remember seeing a note that >this pretty much renders >> all >> >the relevant OSDs useless. >> >> Thats correct. If DB/WAL is lost, you would have to recover osd which has >> db broken. >> >> >Semi-related: What is the ideal ratio of SSDs >for WAL/metadata to count >> >of OSDs? I remember seeing pdfs from >Redhat showing a 1:10 ratio, The >> >mailing list has references to 1:3 or 1:6. I am >trying to figure out >> what >> >the right number is. >> >> It depends. The recommendation is 1-4% of OSD size for DB. But it >> depends on how many tiny objects you would have which would mainly occupy >> rocksdb (db). >> >> Надіслано з пристрою Galaxy >> >> >> -------- Оригінальне повідомлення -------- >> Від: Subu Sankara Subramanian <subu.zsked@xxxxxxxxx> >> Дата: 12.11.21 18:41 (GMT+02:00) >> Кому: ceph-users@xxxxxxx >> Тема: Handling node failures. >> >> Folks, >> >> New here - I tried searching for this topic in the archive, couldn't >> find >> any since 2018 or so. So starting a new thread. I am looking at the >> impact >> of node failures. I found this doc: >> >> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure >> - I have a few questions about this: >> >> - IIUC, if a root SSD fails, there is pretty much no way to rebuild a new >> node with the same OSDs and avoid data shuffling - is this correct? >> - If the hardware, fails - I assume replacing the part and rebooting in >> time will bring back the node as is - is this right? >> - If the root drive fails, is there a way to bring up a new host with the >> same OSDs in the same order but with a different host name / ip address? >> FWIW we are using rook, so I am wondering if the crush map can be >> configured with some logical labels instead of host names for this purpose >> - Is this possible? ( I am evaluating if I can bring up a new node back >> with the original host name itself - at least the cloud K8s clusters make >> this impossible). >> >> - Assuming we use a shared SSD with partitions for WAL/ Metadata for the >> whole node - if this drive fails, I assume we have to recover the entire >> node. Correct? I remember seeing a note that this pretty much renders all >> the relevant OSDs useless. >> -- Semi-related: What is the ideal ratio of SSDs for WAL/metadata to count >> of OSDs? I remember seeing pdfs from Redhat showing a 1:10 ratio, The >> mailing list has references to 1:3 or 1:6. I am trying to figure out what >> the right number is. >> >> Thanks. Subu >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx