Re: bad CRC in data error on ARM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oh, very sorry. I need to refresh my mind.

Yeah, if data is filled in zero and messenger code will use zero seed
then crc value is 0 !

When decode failed, do you get the data dump content?


On Mon, May 18, 2015 at 9:20 AM, huang jun <hjwsm1989@xxxxxxxxx> wrote:
> hi,haomai
> I have tested it again, use "dd if=/dev/zero of=/mnt/test bs=4M
> count=1000" on our X86 cluster, and we confirmed that, all
> 4194304 bytes object's data_crc is 0 both in client and OSD side.
> Note that we use /dev/zero to generate data bytes.
>
> 2015-05-16 22:07 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>:
>> Maybe I'm missing something, but from your osd.log:
>>
>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>
>> osd side compute the crc value from data is "0", it shouldn't happen
>> if we have any data bytes.
>>
>> On Sat, May 16, 2015 at 7:34 PM, huang jun <hjwsm1989@xxxxxxxxx> wrote:
>>> we think is the client send the wrong crc value.
>>> we print data on OSD side, it seems ok, no data changed, but the
>>> calculated crc not equals the one passed from client.
>>>
>>>
>>> 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>:
>>>> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@xxxxxxxxx> wrote:
>>>>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>>>>> we print all 4M object's data crc, it seems all 0 until now.
>>>>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>>>>> <<should be fine
>>>>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>>>>> so not OSD but MON and MDS will do crc computing, and the OSD side
>>>>> compute the CRC value is 0, which is different with the data_crc in
>>>>> message footer.data_crc.
>>>>
>>>> I'm not following your meaning. The core problem is osd computes a
>>>> wrong crc value?
>>>>
>>>>>
>>>>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@xxxxxxxxx>:
>>>>>> that always happen, every test have such errors. And our cluster and
>>>>>> client that  running on X86 works fine, never seen bad crc error.
>>>>>>
>>>>>>
>>>>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>:
>>>>>>> is this always happen or occasionally?
>>>>>>>
>>>>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@xxxxxxxxx> wrote:
>>>>>>>> hi,steve
>>>>>>>>
>>>>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@xxxxxxxxxx>:
>>>>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@xxxxxxxxx> wrote:
>>>>>>>>>> hi,all
>>>>>>>>>
>>>>>>>>> Hi HuangJun,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>>>>
>>>>>>>>>> The kclient log: (tid=6)
>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>>>>> data size is 4194304
>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>>>>
>>>>>>>>>> The OSD-0 log:
>>>>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>>>>> mid_len 0 data_len 4194304
>>>>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>>>>
>>>>>>>>>> some considerations:
>>>>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>>>>
>>>>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>>>>> out 0.80.7 now to test.
>>>>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>>>>> 0, am i right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>>>>> full of zeros won't be zero.
>>>>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>>>>
>>>>>>>>> I would like to reproduce this problem here.
>>>>>>>>> What steps did you take before this error occurred?
>>>>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>>>>> (If so which package version is it?)
>>>>>>>>>
>>>>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>>>>> times, but just a few reported CRC error.
>>>>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>>>>> object's data_crc. Any tips are welcome.
>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> --
>>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> thanks
>>>>>>>> huangjun
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Wheat
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> thanks
>>>>>> huangjun
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> thanks
>>>>> huangjun
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> thanks
>>> huangjun
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> thanks
> huangjun



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux