Hi, Today I had a very similar case: 2 nvme OSDs got down and out. I had freshly installed 16.2.1 version. Before failure disks were under some load ~1.5k read IOPS + ~600 write IOPS. When they failed, nothing helped. After every trial of resterting them I was finding in logs messages containing: bluefs _allocate unable to allocate The bluestore_allocator was default i.e. hybrid. I had changed it to bitmap just like in the issue mentioned by Neha Ojha (thanks), and OSDs got in and up. Now the disks are OK, but they are under very small load. Thus, I am not certain, whether the bitmap allocator is stable. Kind regards, -- Bartosz Lis On 5/15/2021 01:10:57 CEST Igor Fedotov wrote: > This looks similar to #50656 indeed. > > Hopefully will fix that next week. > > > Thanks, > > Igor > > On 5/14/2021 9:09 PM, Neha Ojha wrote: > > On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus > > > > <andrius.jurkus@xxxxxxxxxx> wrote: > >> Hello, I will try to keep it sad and short :) :( PS sorry if this > >> dublicate I tried post it from web also. > >> > >> Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds. > >> After data migration for few hours, 1 SSD failed, then another and > >> another 1 by 1. Now I have cluster in pause and 5 failed SSD's, same > >> host has SSD and HDD, but only SSD's are failing so I think this has to > >> be balancing refiling or something bug and probably not upgrade bug. > >> > >> Cluster has been in pause for 4 hours and no more OSD's are failing. > >> > >> full trace > >> https://pastebin.com/UxbfFYpb > > > > This looks very similar to https://tracker.ceph.com/issues/50656. > > Adding Igor for more ideas. [---] _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx