Also a question, how can I identify this issue? I have 3 high performing cluster with nvme for wal and rocksdb backed with ssds (3ssd/1nvme) and many months ago smashing the nvmes. We are using at as objectstore. I would assume the issue is the same as with our old cluster, but is there a way to I can be sure about this? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo@xxxxxxxxx --------------------------------------------------- -----Original Message----- From: Szabo, Istvan (Agoda) Sent: Tuesday, May 18, 2021 1:28 PM To: Igor Fedotov <ifedotov@xxxxxxx> Cc: ceph-users@xxxxxxx Subject: RE: Re: Pool has been deleted before snaptrim finished Thank you Igor your help, I've done on the smashed SSDs seems like finally the cluster come back to normal. How can I avoid this situation? Should I use buffered_io or not? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo@xxxxxxxxx --------------------------------------------------- -----Original Message----- From: Igor Fedotov <ifedotov@xxxxxxx> Sent: Monday, May 17, 2021 10:36 PM To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; ceph-users@xxxxxxx Subject: Re: Pool has been deleted before snaptrim finished Highly likely you're facing "degraded" RocksDB caused by bulk data removal. This applies to both your original snaptrimm issue and the current flapping OSDs. Following tickets reffer to snaptrim issue: https://tracker.ceph.com/issues/50511 https://tracker.ceph.com/issues/47446 As a workaround you might want to compact DBs for every OSD using ceph-kvstore-tool (offline) or ceph admin daemon's compact command (online). Offline compaction is preferred as online one might be no-op undere some circumstanses. Thanks, Igor On 5/17/2021 4:31 PM, Szabo, Istvan (Agoda) wrote: > Hi, > > We decided to delete the pool before the snaptrim finished after 4 days waiting. > Now we have bigger issue, many osd started to flap, 2 of them cannot even restart due after. > > Did some bluestore fsck on the not started osds and has many messages like this inside: > > 2021-05-17 18:37:07.176203 7f416d20bec0 10 stupidalloc > 0x0x564e4e804f50 init_add_free 0x482d0778000~4000 > 2021-05-17 18:37:07.176204 7f416d20bec0 10 freelist enumerate_next > 0x482d0784000~4000 > 2021-05-17 18:37:07.176204 7f416d20bec0 10 stupidalloc > 0x0x564e4e804f50 init_add_free 0x482d0784000~4000 > 2021-05-17 18:37:07.176205 7f416d20bec0 10 freelist enumerate_next > 0x482d078c000~c000 > 2021-05-17 18:37:07.176206 7f416d20bec0 10 stupidalloc > 0x0x564e4e804f50 init_add_free 0x482d078c000~c000 > [root@hk-cephosd-2002 ~]# tail -f /tmp/ceph-osd-44-fsck.log > 2021-05-17 18:39:16.466967 7f416d20bec0 20 bluefs _read_random read > buffered 0x2cd6e8f~ed6 of 1:0x372e0700000+4200000 > 2021-05-17 18:39:16.467154 7f416d20bec0 20 bluefs _read_random got > 3798 > 2021-05-17 18:39:16.467179 7f416d20bec0 10 bluefs _read_random h > 0x564e4e658500 0x24d6e35~ee2 from file(ino 216551 size 0x43a382d mtime > 2021-05-17 13:21:19.839668 bdev 1 allocated 4400000 extents > [1:0x35bc7c00000+4400000]) > 2021-05-17 18:39:16.467186 7f416d20bec0 20 bluefs _read_random read > buffered 0x24d6e35~ee2 of 1:0x35bc7c00000+4400000 > 2021-05-17 18:39:16.467409 7f416d20bec0 20 bluefs _read_random got > 3810 > > and > > uh oh, missing shared_blob > > I've set back buffered_io to false back because when restart the osds always had to wait to fix degraded pgs. > Many of the SSDs are smashing at the moment on 100% and don't really > know what to do to stop the process and bring back the 2 ssds :/ > > Some paste: https://justpaste.it/9bj3a > > Some metric (each column is 1 server metric, total 3 servers): > How it is smashing the ssds: https://i.ibb.co/x3xm0Rj/ssds.png IOWAIT > Super high due to ssd utilization: https://i.ibb.co/683TR9y/iowait.png > Capacity seems coming back: https://i.ibb.co/mz4Lq2r/space.png > > Thank you the help. > > ________________________________ > This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx