> > Hi Ilya, > > > > hmm, OK, I'm not sure now whether this is the bug which I'm > > experiencing.. I've had read_partial_message / bad crc/signature > > problem occurance on the second cluster in short period even though > > we're on the same ceph version (12.2.5) for quite long time (almost since > > its release), so it's starting to pain me.. I suppose this must > > have been caused by some kernel update, (we're currently sticking > > to 4.14.x and lately been upgrading to 4.14.50) > > These "bad crc/signature" are usually the sign of faulty hardware. > > What was the last "good" kernel and the first "bad" kernel? > > You said "on the second cluster". How is it different from the first? > Are you using the kernel client with both? Is there Xen involved? it's complicated.. both those clusters are fairly new, running kernel 4.14.50, ceph 12.2.5. XEN is not involved, but KVM is. I think those were already installed with this kernel. I was thinking about that, and main difference compared to other (and older) clusters is, krbd is used much more: before, we were using krbd only for postgres, and qemu-kvm accessed RBD volumes using librbd. on new clusters where problems occured, all volumes are accessed using krbd, since it performs way much better.. so we'll just revert to librbd and I'll try to find way to reproduce. If I find some, we can talk about bisect, but it's possible the problem is here for the long time, but since we didn't use krbd heavily, it just didn't occur.. but I think we can rule out hardware problem here.. > > Thanks, > > Ilya > -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx ------------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com