Re: OSD crash with end_of_buffer + bad crc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le lundi 11 avril 2022, 10:26:31 CEST Gilles Mocellin a écrit :
> Just a follow-up.
> 
> I've found that a specific  network interface is causing this.
> We have bonds :
> - 1 management bond0
> - 1 storage access on bond1
> - 1 storage replication on bond2
> 
> As the crc errors are all between clients on the storage access network, 
> I focus on bond1.
> => I set one interface down, and immediately I have many errors and some 
> OSDs crash !
> => I set it up and the other down, no errors. A weeked after, still no 
> errors and no OSD crash.
> 
> I now have to understand what is going on with that interface, because I 
> have no errors anywhere. I will first try to change the AOC cable (SFP+ 
> + Fibre).
> 
> But, it's not a Ceph problem. Just a hardware one, that only Ceph has 
> caught !

Just to end that thread :
I have changed the network card, and no more errors in Ceph logs since days.

It's really bad if a network card firmware / driver doesn't see a CRC error 
that Ceph can see...
I can't imagine how other applications can react or not to that. And data 
corruptions that will happen.



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux