Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

Bartosz Lis <bartosz@xxxxxxxxxx> · Sun, 16 May 2021 01:51:38 +0200

Hi,

Today I had a very similar case: 2 nvme OSDs got down and out. I had freshly 
installed 16.2.1 version. Before failure disks were under some load ~1.5k read 
IOPS + ~600 write IOPS. When they failed, nothing helped. After every trial of 
resterting them I was finding in logs messages containing:

bluefs _allocate unable to allocate

The bluestore_allocator was default i.e. hybrid. I had changed it to bitmap 
just like in the issue mentioned by Neha Ojha (thanks), and OSDs got in and 
up.

Now the disks are OK, but they are under very small load. Thus, I am not 
certain, whether the bitmap allocator is stable.

Kind regards,
--
Bartosz Lis

On 5/15/2021 01:10:57 CEST Igor Fedotov wrote:
> This looks similar to #50656 indeed.
> 
> Hopefully will fix that next week.
> 
> 
> Thanks,
> 
> Igor
> 
> On 5/14/2021 9:09 PM, Neha Ojha wrote:
> > On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus
> > 
> > <andrius.jurkus@xxxxxxxxxx> wrote:
> >> Hello, I will try to keep it sad and short :) :(    PS sorry if this
> >> dublicate I tried post it from web also.
> >> 
> >> Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds.
> >> After data migration for few hours, 1 SSD failed, then another and
> >> another 1 by 1. Now I have cluster in pause and 5 failed SSD's, same
> >> host has SSD and HDD, but only SSD's are failing so I think this has to
> >> be balancing refiling or something bug and probably not upgrade bug.
> >> 
> >> Cluster has been in pause for 4 hours and no more OSD's are failing.
> >> 
> >> full trace
> >> https://pastebin.com/UxbfFYpb
> > 
> > This looks very similar to https://tracker.ceph.com/issues/50656.
> > Adding Igor for more ideas.
[---]

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx