we think is the client send the wrong crc value. we print data on OSD side, it seems ok, no data changed, but the calculated crc not equals the one passed from client. 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>: > On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@xxxxxxxxx> wrote: >> <<Even if from /dev/zero, the data crc shouldn't be 0. >> we print all 4M object's data crc, it seems all 0 until now. >> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm >> <<should be fine >> When decode a message, it will check the fron_crc, middle_crc and also data_crc, >> so not OSD but MON and MDS will do crc computing, and the OSD side >> compute the CRC value is 0, which is different with the data_crc in >> message footer.data_crc. > > I'm not following your meaning. The core problem is osd computes a > wrong crc value? > >> >> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@xxxxxxxxx>: >>> that always happen, every test have such errors. And our cluster and >>> client that running on X86 works fine, never seen bad crc error. >>> >>> >>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>: >>>> is this always happen or occasionally? >>>> >>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@xxxxxxxxx> wrote: >>>>> hi,steve >>>>> >>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@xxxxxxxxxx>: >>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@xxxxxxxxx> wrote: >>>>>>> hi,all >>>>>> >>>>>> Hi HuangJun, >>>>>> >>>>>>> >>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS >>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125" >>>>>>> to write data. On the osd side, we got bad data CRC error. >>>>>>> >>>>>>> The kclient log: (tid=6) >>>>>>> May 14 17:21:08 node103 kernel: [ 180.194312] CPU[0] libceph: >>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req >>>>>>> data size is 4194304 >>>>>>> May 14 17:21:08 node103 kernel: [ 180.194316] CPU[0] libceph: tid-6 >>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 ----- >>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994 >>>>>>> >>>>>>> The OSD-0 log: >>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700 0 seq 3 tid 6 front_len 197 >>>>>>> mid_len 0 data_len 4194304 >>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700 0 crc in front 388648745 exp 388648745 >>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700 0 crc in middle 0 exp 0 >>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700 0 crc in data 0 exp 3036014994 >>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700 0 bad crc in data 0 != exp 3036014994 >>>>>>> >>>>>>> some considerations: >>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this >>>>>>> works? or does ceph's code has ARM branch? >>>>>> >>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking >>>>>> out 0.80.7 now to test. >>>>>> In v9.0.0, there is some code to use the ARM optional crc32c >>>>>> instructions, but this isn't in 0.80.7. >>>>>> >>>>>>> >>>>>>> 2) as we have write 125 objects, only few of them report CRC error, >>>>>>> and the right object's data_crc is 0 both on osd and kclient. the >>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result >>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be >>>>>>> 0, am i right? >>>>>>> >>>>>> >>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer >>>>>> full of zeros won't be zero. >>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero. >>>>>> >>>>>> I would like to reproduce this problem here. >>>>>> What steps did you take before this error occurred? >>>>>> Is this a cephfs filesystem or something on top of an RBD image? >>>>>> Which kernel are you running? Is it the one that comes with Ubuntu? >>>>>> (If so which package version is it?) >>>>>> >>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and >>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems. >>>>> I'm not sure whether it's related to Memory, since we tested many >>>>> times, but just a few reported CRC error. >>>>> As i mentioned, i doubt the memory fault changed the data, because we >>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC >>>>> object's data_crc. Any tips are welcome. >>>>> >>>>>> Cheers, >>>>>> -- >>>>>> Steve >>>>> >>>>> >>>>> >>>>> -- >>>>> thanks >>>>> huangjun >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Wheat >>> >>> >>> >>> -- >>> thanks >>> huangjun >> >> >> >> -- >> thanks >> huangjun > > > > -- > Best Regards, > > Wheat -- thanks huangjun -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html