Hello, Are you using a BBU backed raid controller? It sounds more like your write cache is acting up if you are using one. Can you check what your raid controller is showing? I have sometimes seen raid controllers performing consistency checks or patrol read on single drive raid0. You can disable that if it's running. If it's lsi based controller you can use this "MegaCli64 -AdpPR -Dsbl -aALL" for stopping patrol reads or "MegaCli64 -LDCC -Stop -lall -aall" for stopping consistency check. You can also have a BBU learn cycle active. Which discharges and charges the battery back up disabling writeback cache. If it's running the cycle unfortunately, but you will not be able to enable writeback cache. I recommend enabling read cache and controller readahead. Use "MegaCli64 -LDSetProp -RA -Immediate -Lall -aAll" to enable read ahead and "MegaCli64 -LDSetProp -Cached -Immediate -Lall -aAll" to enable cache on I/O. Now I wouldn't do this, but you can force writeback mode even with the BBU off. YOU CAN AND YOU WILL LOSE ALL THE OSDS ON THE NODE IF SOMETHING BAD HAPPENS. Use at your own risk and discretion: "MegaCli64 -LDSetProp -CachedBadBBU -Immediate -Lall -aAll" . If these options didn't work. Respond and we will try to help you. On Tue, Apr 16, 2019 at 3:27 PM M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: > > Its Smart Storage battery, which was disabled due to high ambient temperature. > All OSD processes/daemon working as is...but those OSDs not responding to other OSD due to high CPU utilization.. > Don't observe the clock skew issue. > > On Tue, Apr 16, 2019 at 12:49 PM Marco Gaiarin <gaio@xxxxxxxxx> wrote: >> >> Mandi! M Ranga Swami Reddy >> In chel di` si favelave... >> >> > Hello - Recevenlt we had an issue with storage node's battery failure, which >> > cause ceph client IO dropped to '0' bytes. Means ceph cluster couldn't perform >> > IO operations on the cluster till the node takes out. This is not expected from >> > Ceph, as some HW fails, those respective OSDs should mark as out/down and IO >> > should go as is.. >> > Please let me know if anyone seen the similar behavior and is this issue >> > resolved? >> >> 'battery' mean 'CMOS battery'? >> >> >> OSDs and MONs need accurate clock sync between them. So, if a node >> reboot with a clock skew more than (AFAI Remember well) 5 seconds, OSD >> does not start. >> >> Provide a stable NTP server for all your OSDs and MONs, and restart >> OSDs after clock are in sync. >> >> -- >> dott. Marco Gaiarin GNUPG Key ID: 240A3D66 >> Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ >> Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) >> marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 >> >> Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! >> http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 >> (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com