Re: Data corruption in kernel 5.1+ with iSER attached ramdisk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 27, 2019 at 11:14:46PM -0500, Stephen Rust wrote:
> Hi,
> 
> Thanks for your reply.
> 
> I agree it does seem surprising that the git bisect pointed to this
> particular commit when tracking down this issue.
> 
> The ramdisk we export in LIO is a standard "brd" module ramdisk (ie:
> /dev/ram*). We configure it as a "block" backstore in LIO, not using the
> built-in LIO ramdisk.

Then it isn't strange any more, since iblock code uses bio interface.

> 
> LIO configuration is as follows:
> 
>   o- backstores ..........................................................
> [...]
>   | o- block .............................................. [Storage
> Objects: 1]
>   | | o- Blockbridge-952f0334-2535-5fae-9581-6c6524165067
>  [/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2 (16.0MiB) write-thru
> activated]
>   | |   o- alua ............................................... [ALUA
> Groups: 1]
>   | |     o- default_tg_pt_gp ................... [ALUA state:
> Active/optimized]
>   | o- fileio ............................................. [Storage
> Objects: 0]
>   | o- pscsi .............................................. [Storage
> Objects: 0]
>   | o- ramdisk ............................................ [Storage
> Objects: 0]
>   o- iscsi ........................................................
> [Targets: 1]
>   | o-
> iqn.2009-12.com.blockbridge:rda:1:952f0334-2535-5fae-9581-6c6524165067:rda
>  [TPGs: 1]
>   |   o- tpg1 ...................................... [no-gen-acls, auth
> per-acl]
>   |     o- acls ......................................................
> [ACLs: 1]
>   |     | o- iqn.1994-05.com.redhat:115ecc56a5c .. [mutual auth, Mapped
> LUNs: 1]
>   |     |   o- mapped_lun0  [lun0
> block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067 (rw)]
>   |     o- luns ......................................................
> [LUNs: 1]
>   |     | o- lun0  [block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067
> (/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2) (default_tg_pt_gp)]
>   |     o- portals ................................................
> [Portals: 1]
>   |       o- 0.0.0.0:3260 ...............................................
> [iser]
> 
> 
> iSER is the iSCSI extension for RDMA, and it is important to note that we
> have _only_ reproduced this when the writes occur over RDMA, with the
> target portal in LIO having enabled "iser". The iscsi client (using
> iscsiadm) connects to the target directly over iSER. We use the Mellanox
> ConnectX-5 Ethernet NICs (mlx5* module) for this purpose, which utilizes
> RoCE (RDMA over Converged Ethernet) instead of TCP.

I may get one machine with Mellanox NIC, is it easy to setup & reproduce
just in the local machine(both host and target are setup on same machine)?

> 
> The identical ramdisk configuration using TCP/IP target in LIO has _not_
> reproduced this issue for us.

Yeah, I just tried iblock over brd, and can't reproduce it.

> 
> I installed bcc and used the stackcount tool to trace rd_execute_rw, but I
> suspect because we are not using the built-in LIO ramdisk this did not
> catch anything. Are there other function traces we can provide for you?

Please try to trace bio_add_page() a bit via 'bpftrace ./ilo.bt'.

[root@ktest-01 func]# cat ilo.bt
kprobe:iblock_execute_rw
{
    @start[tid]=1;
}

kretprobe:iblock_execute_rw
{
    @start[tid]=0;
}

kprobe:bio_add_page
/@start[tid]/
{
  printf("%d %d\n", arg2, arg3);
}



Thanks, 
Ming





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux