Re: Socket errors, CRC, lossy con messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Piotr,

On Tue, Apr 11, 2017 at 2:41 AM, Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> wrote:
> On 04/10/2017 08:16 PM, Alex Gorbachev wrote:
>>
>> I am trying to understand the cause of a problem we started
>> encountering a few weeks ago.  There are 30 or so per hour messages on
>> OSD nodes of type:
>>
>> ceph-osd.33.log:2017-04-10 13:42:39.935422 7fd7076d8700  0 bad crc in
>> data 2227614508 != exp 2469058201
>>
>> and
>>
>> 2017-04-10 13:42:39.939284 7fd722c42700  0 -- 10.80.3.25:6826/5752
>> submit_message osd_op_reply(1826606251
>> rbd_data.922d95238e1f29.00000000000101bf [set-alloc-hint object_size
>> 16777216 write_size 16777216,write 6328320~12288] v103574'18626765
>> uv18626765 ondisk = 0) v6 remote, 10.80.3.216:0/1934733503, failed
>> lossy con, dropping message 0x3b55600 [..]
>
>
> Is that happening on entire cluster, or just specific OSDs? That is a clear
> indication of data corruption, in the above example osd.33 calculated crc
> for received data block and found out that it doesn't match what was
> precalculated by sending side. Try gathering some more examples of such crc
> errors and isolate osd/host that sends malformed data, then do usual
> diagnostics like memory test on that mahcine.

This is happening on the entire cluster (each of the 11 OSD nodes),
and on practically every OSD.  We only have 5 clients - ceph is 0.94.9
on OSD nodes and 0.94.9/0.94.7 on clients - error messages appear for
all clients.

I saw this page
https://www.spinics.net/lists/ceph-devel/msg34640.html, and we indeed
mostly use XFS and NFS exports on RBD.  However, the kernels are
higher than 4.3.

All important pools are 3x replicated, and client data seems OK for
the moment, but we are getting more slow requests than before.

Regards,
Alex

>
> --
> Piotr Dałek
> piotr.dalek@xxxxxxxxxxxx
> https://www.ovh.com/us/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux