Re: Data corruption in kernel 5.1+ with iSER attached ramdisk

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 4 Dec 2019 09:05:29 +0800

On Tue, Dec 03, 2019 at 02:56:08PM -0500, Stephen Rust wrote:
> Hi Ming,
> 
> Thanks very much for the patch.
> 
> > BTW, you may try the attached test patch. If the issue can be fixed by
> > this patch, that means it is really caused by un-aligned buffer, and
> > the iser driver needs to be fixed.
> 
> I have tried the patch, and re-run the test. Results are mixed.
> 
> To recap, our test writes the last bytes of an iser attached iscsi
> device. The target device is a LIO iblock, backed by a brd ramdisk.
> The client does a simple `dd`, doing a seek to "size - offset" of the
> device, and writing a buffer of "length" which is equivalent to the
> offset.
> 
> For example, to test a write at a 512 offset, seek to device "size -
> 512", and write a length of data 512 bytes.
> 
> WITHOUT the patch, writing data at the following offsets from the end
> of the device failed to write all the correct data (rather, the write
> succeeded, but reading the data back it was invalid):
> 
> - failed: 512,1024, 2048, 4096, 8192
> 
> Anything larger worked fine.
> 
> WITH the patch applied, writing data up to an offset of 4096 all now
> worked and verified correctly. However, offsets between 4096 and 8192
> all still failed. I started at 512, and incremented by 512 all the way
> up to 16384. The following offsets all failed to verify the write:
> 
> - failed: 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192
> 
> Anything larger continues to work fine with the patch.
> 
> As an example, for the failed 8192 case, the `bpftrace lio.bt` trace shows:
> 
> 8192 76
> 4096 0
> 4096 0
> 8192 76
> 4096 0
> 4096 0
> ...
> [snip]
> 
> What do you think are appropriate next steps?

OK, my guess should be correct, and the issue is related with un-aligned
bvec->bv_offset.

So firstly, I'd suggest to investigate from RDMA driver side to see why
un-aligned buffer is passed to block layer.

According to previous discussion, 512 aligned buffer should be provided
to block layer.

So looks the driver needs to be fixed.

> Do you think you have an
> idea on why the specific "multi-page bvec helpers" commit could have
> exposed this particular latent issue? Please let me know what else I
> can try, or additional data I can provide for you.

The patch might not cover the big offset case, could you collect bpftrace
via the following script when you reproduce the issue with >4096 offset?

kprobe:iblock_execute_rw
{
    @start[tid]=1;
}

kretprobe:iblock_execute_rw
{
    @start[tid]=0;
}

kprobe:bio_add_page
/@start[tid]/
{
  printf("%d %d\n", arg2, arg3);
}

kprobe:brd_do_bvec
{
  printf("%d %d %d %d\n", arg2, arg3, arg4, arg5);
}

Thanks,
Ming