Re: No fix for 0x6706be76 CRCs ? [SOLVED] (WORKAROUND)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have ubuntu servers.

With ukuu I installed kernel 4.8.17-040817 (The last < 4.9 available kernel) and I haven't any 0x6706be76 crc since.

Nor any inconsistence.


On 19/09/18 12:01, Alfredo Daniel Rezinovsky wrote:
Tried 4.17 with the same problem

Just downgraded to 4.8. Let's see if no more 0x67... appears


On 18/09/18 16:28, Alfredo Daniel Rezinovsky wrote:
I started with this after upgrade to bionic. I had Xenial with lts kernels (4.13) without problem.

I will try to change to ubuntu 4.13 and wait for the logs.

Thanks


On 18/09/18 16:27, Paul Emmerich wrote:
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).

4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.


Paul

2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
<alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx>:
MOMENT !!!

"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."

I dind't knew that was the problem. Can't I just dowgrade the kernel?

There are known working versions o just need to be prior 4.9?


On 18/09/18 16:19, Paul Emmerich wrote:

We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
<alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx>:

Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still

7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006

It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo





--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux