Re: [PATCH v1 2/3] pnfs/blocklayout: Revalidate SCSI disks after registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12 Dec 2017, at 21:23, Benjamin Coddington wrote:

On 12 Dec 2017, at 9:38, Christoph Hellwig wrote:

On Fri, Dec 08, 2017 at 12:52:58PM -0500, Benjamin Coddington wrote:
If a SCSI device already has an exclusive access registrants only
reservation, the device's size cannot be known by the client until after registration. We should therefore delay setting the block device's length
until after the registration and revalidation of the disk.

Table 13 in sbc4r14.pdf clearly indicates READ CAPACITY is allowed even
for a not registered I_T nexus.  What kind of setup do you see this
problem with?

It does indeed, I should have checked. So it seems this patch should be unnecessary, however I think if we do end up with d->len = 0, bad things
happen later.  My memory is fuzzy on this, I'll have to re-test it.

The SCSI devices are presented by the TCM driver on a 4.11 era kernel. I
don't know that code at all, but with quick look it seems that
__target_execute_cmd() wants to check for reservations for all commands.

I think this patch should be dropped, and we can try to fix this on the TCM
side.

So, when the problem occurs with TCM, the bio calculations seem to be
botched.  do_add_page_to_bio ends up with some bad numbers..

[ 1508.078239] scsi host2: iSCSI Initiator over TCP/IP
[ 1508.080686] scsi 2:0:0:0: Direct-Access LIO-ORG block_1 4.0 PQ: 0 ANSI: 5
[ 1508.081864] sd 2:0:0:0: reservation conflict
[ 1508.082221] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 1508.082754] sd 2:0:0:0: reservation conflict
[ 1508.083091] sd 2:0:0:0: reservation conflict
[ 1508.083173] scsi 2:0:0:1: Direct-Access LIO-ORG block_2 4.0 PQ: 0 ANSI: 5 [ 1508.083994] sd 2:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_NEXUS_FAILURE driverbyte=DRIVER_OK
[ 1508.084707] sd 2:0:0:0: [sda] Sense not available.
[ 1508.085152] sd 2:0:0:0: reservation conflict
[ 1508.085664] sd 2:0:0:0: reservation conflict
[ 1508.085753] sd 2:0:0:1: Attached scsi generic sg2 type 0
[ 1508.086579] sd 2:0:0:0: reservation conflict
[ 1508.086867] sd 2:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_NEXUS_FAILURE driverbyte=DRIVER_OK
[ 1508.087496] sd 2:0:0:0: [sda] Sense not available.
[ 1508.088075] sd 2:0:0:1: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[ 1508.088627] sd 2:0:0:1: [sdb] Write Protect is off
[ 1508.088951] sd 2:0:0:1: [sdb] Mode Sense: 43 00 10 08
[ 1508.088968] sd 2:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[ 1508.089360] sd 2:0:0:0: [sda] 0-byte physical blocks
[ 1508.089742] sd 2:0:0:0: reservation conflict
[ 1508.090081] sd 2:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1508.090671] sd 2:0:0:0: reservation conflict
[ 1508.091126] sd 2:0:0:0: reservation conflict
[ 1508.091656] sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled
[ 1508.092157] sd 2:0:0:0: reservation conflict
[ 1508.092454] sd 2:0:0:0: [sda] Asking for cache data failed
[ 1508.092805] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 1508.093734] sd 2:0:0:0: reservation conflict
[ 1508.094235] sd 2:0:0:0: reservation conflict
[ 1508.095052] sd 2:0:0:0: reservation conflict
[ 1508.095460] sd 2:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_NEXUS_FAILURE driverbyte=DRIVER_OK
[ 1508.096083] sd 2:0:0:0: [sda] Sense not available.
[ 1508.096459] sd 2:0:0:0: reservation conflict
[ 1508.096886] sd 2:0:0:0: reservation conflict
[ 1508.097345] sd 2:0:0:0: reservation conflict
[ 1508.097791] sd 2:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_NEXUS_FAILURE driverbyte=DRIVER_OK
[ 1508.098696] sd 2:0:0:0: [sda] Sense not available.
[ 1508.099547] sd 2:0:0:0: reservation conflict
[ 1508.100000] sd 2:0:0:0: reservation conflict
[ 1508.100538] sd 2:0:0:0: reservation conflict
[ 1508.100891] sd 2:0:0:0: reservation conflict
[ 1508.101180] sd 2:0:0:1: [sdb] Attached SCSI disk
[ 1508.101252] sd 2:0:0:0: [sda] Attached SCSI disk
[ 1528.104541] set_pnfs_layoutdriver: Using NFSv4 I/O
[ 1528.105864] set_pnfs_layoutdriver: Using NFSv4 I/O
[ 1532.269993] find_pnfs_driver_locked: Searching for id 5, found ffffffffa0391000
[ 1532.270493] bl_set_layoutdriver enter
[ 1532.270728] set_pnfs_layoutdriver: pNFS module for 5 set
[ 1532.274569] pnfs_find_alloc_layout Begin ino=ffff880076c196c8 layout= (null)
[ 1532.275101] __bl_alloc_layout_hdr enter
[ 1532.275387] pnfs_find_lseg:Begin
[ 1532.275599] pnfs_find_lseg:Return lseg           (null) ref 0
[ 1532.275937] --> send_layoutget
[ 1532.278518] ---> bl_alloc_lseg
[ 1532.278727] bl_alloc_lseg: number of extents 1
[ 1532.279021] nfs4_get_device_info: server ffff8800726d1000 max_resp_sz 266240 max_pages 65
[ 1532.280246] nfs4_get_device_info getdevice info returns 0
[ 1532.280641] pNFS: using block device sda (reservation key 0x5a30918050925eae)
[ 1532.281393] <-- nfs4_get_device_info d ffff88007f1b36c0
[ 1532.281731] bl_alloc_lseg returns 0
[ 1532.282004] pnfs_generic_layout_insert_lseg:Begin
[ 1532.282290] pnfs_generic_layout_insert_lseg: inserted lseg ffff880077a03e40 iomode 2 offset 0 length 8192 at tail
[ 1532.282917] pnfs_generic_layout_insert_lseg:Return
[ 1532.283304] pnfs_update_layout: inode 0:43/100 pNFS layout segment found for (read/write, offset: 0, length: 8192)
[ 1532.283902] pnfs_try_to_write_data: Writing ino:100 8192@0 (how 32)
[ 1532.284418] bl_write_pagelist enter, 8192@0
[ 1532.284714] do_add_page_to_bio: npg 2 rw 1 isect 0 offset 0 len 4096
[ 1532.285080] do_add_page_to_bio: npg 1 rw 1 isect 8388536 offset 0 len 4096
[ 1532.285499] bl_submit_bio submitting write bio 4294930432@72
[ 1532.286277] bl_submit_bio submitting write bio 0@8388608
[ 1532.286610] pnfs_mark_matching_lsegs_invalid:Begin lo ffff88007f1b3480 [ 1532.287125] pnfs_mark_matching_lsegs_invalid: freeing lseg ffff880077a03e40 iomode 2 seq 1offset 0 length 8192
[ 1532.287674] mark_lseg_invalid: lseg ffff880077a03e40 ref 3
[ 1532.287979] pnfs_mark_matching_lsegs_invalid:Return 1
[ 1532.288259] pnfs_layout_io_set_failed Setting layout IOMODE_RW fail bit
[ 1532.288662] ------------[ cut here ]------------
[ 1532.288922] kernel BUG at block/blk-core.c:3023!
[ 1532.289181] invalid opcode: 0000 [#1] SMP
[ 1532.289405] Modules linked in: blocklayoutdriver nfsv4 dns_resolver nfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_blk virtio_balloon virtio_net virtio_console crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw parport_pc i2c_piix4 parport virtio_pci ata_generic virtio_ring pata_acpi virtio [ 1532.293131] CPU: 1 PID: 1570 Comm: dd Not tainted 4.14.0.bebc6082da #2 [ 1532.293490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[ 1532.293969] task: ffff88006f19bc00 task.stack: ffffc90001008000
[ 1532.294297] RIP: 0010:__blk_end_request_all+0x5a/0x60
[ 1532.294577] RSP: 0018:ffffc9000100b978 EFLAGS: 00010002
[ 1532.294866] RAX: 0000000000000001 RBX: ffff88007f069920 RCX: 0000000000000000 [ 1532.295260] RDX: 0000000000000001 RSI: ffff88007e8c2f00 RDI: ffff88007f069920 [ 1532.295652] RBP: ffffc9000100b978 R08: ffff88007e8c2f00 R09: 0000000000000000 [ 1532.296045] R10: ffff880077a03e40 R11: 0000000000000000 R12: ffff88007e5f8000 [ 1532.296437] R13: 0000000000000001 R14: 0000000000000296 R15: 0000000000000000 [ 1532.296830] FS: 00007f43df54b700(0000) GS:ffff88007c080000(0000) knlGS:0000000000000000
[ 1532.297275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1532.297592] CR2: 000055f30c939000 CR3: 000000006d968004 CR4: 00000000001606e0
[ 1532.297987] Call Trace:
[ 1532.298129]  blk_peek_request+0x1bc/0x2c0
[ 1532.298354]  scsi_request_fn+0x3f/0x6c0
[ 1532.298569]  ? __cfq_update_io_thinktime+0x17/0x80
[ 1532.298835]  __blk_run_queue+0x3d/0x60
[ 1532.299047]  queue_unplugged+0x2a/0xa0
[ 1532.299258]  blk_flush_plug_list+0x22a/0x290
[ 1532.299496]  blk_finish_plug+0x2c/0x40
[ 1532.299708]  bl_write_pagelist+0x1f2/0x2b0 [blocklayoutdriver]
[ 1532.300039]  pnfs_generic_pg_writepages+0xb2/0x220 [nfsv4]
[ 1532.300349]  ? pnfs_generic_pg_writepages+0xb2/0x220 [nfsv4]
[ 1532.300667]  nfs_pageio_doio+0x2a/0x60 [nfs]
[ 1532.300912]  nfs_pageio_complete+0x55/0xc0 [nfs]
[ 1532.301172]  nfs_writepages+0xcb/0x120 [nfs]
[ 1532.301412]  do_writepages+0x1c/0x60
[ 1532.301614]  __filemap_fdatawrite_range+0xc6/0x100
[ 1532.301883]  filemap_write_and_wait_range+0x35/0x90
[ 1532.302156]  nfs_file_fsync+0x46/0x1f0 [nfs]
[ 1532.302394]  ? mntput+0x24/0x40
[ 1532.302571]  vfs_fsync_range+0x49/0xa0
[ 1532.302780]  vfs_fsync+0x1c/0x20
[ 1532.302979]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 1532.303236]  filp_close+0x2f/0x80
[ 1532.303422]  __close_fd+0x81/0xa0
[ 1532.303608]  SyS_close+0x23/0x50
[ 1532.303790]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 1532.304049] RIP: 0033:0x7f43df067820
[ 1532.304249] RSP: 002b:00007ffcf4c59138 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 1532.304664] RAX: ffffffffffffffda RBX: 00007f43df54b698 RCX: 00007f43df067820 [ 1532.305057] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000001 [ 1532.305448] RBP: 0000000000000002 R08: 0000000000000010 R09: 0000028341f19d4c [ 1532.305840] R10: 000055f30c938fa0 R11: 0000000000000246 R12: 0000000000000002 [ 1532.306233] R13: 0000000000000000 R14: 7fffffffffffffff R15: 0000000000000000 [ 1532.306624] Code: ff ff 84 c0 75 25 5d c3 0f ff 48 8b 87 50 01 00 00 31 c9 48 85 c0 74 de 8b 48 68 8b 57 68 40 0f b6 f6 e8 ba fe ff ff 84 c0 74 db <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 [ 1532.307661] RIP: __blk_end_request_all+0x5a/0x60 RSP: ffffc9000100b978
[ 1532.308022] ---[ end trace 03e5297ad9a73867 ]---

That looks a bit scary, I wonder if we could end up corrupting a filesystem
if this case isn't handled.  I think something like this is needed:

commit 670d2bebde3368b0244f5eb4af6fc581add7d4a4
Author: Benjamin Coddington <bcodding@xxxxxxxxxx>
Date:   Tue Jun 13 16:02:14 2017 -0400

    pnfs/blocklayout: Ensure disk address in block device map

It's possible that the device map is smaller than the offset into the device for the I/O we're adding. This probably indicates an error of some sort (such as the dev->len = 0 due to the device being unreadable), so let's add a check for it and bail out. Otherwise, we risk botching the bio
    calculations that follow.

    Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index ec110aa87634..227a26dde9e4 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -137,6 +137,11 @@ bl_alloc_init_bio(int npg, struct block_device *bdev, sector_t disk_sector,
        return bio;
 }

+static bool offset_in_map(sector_t isect, struct pnfs_block_dev_map *map)
+{
+       return isect >= map->start && isect < map->start + map->len;
+}
+
 static struct bio *
 do_add_page_to_bio(struct bio *bio, int npg, int rw, sector_t isect,
                struct page *page, struct pnfs_block_dev_map *map,
@@ -156,8 +161,8 @@ do_add_page_to_bio(struct bio *bio, int npg, int rw, sector_t isect,

        /* translate to physical disk offset */
        disk_addr = (u64)isect << SECTOR_SHIFT;
- if (disk_addr < map->start || disk_addr >= map->start + map->len) {
-               if (!dev->map(dev, disk_addr, map))
+       if (!offset_in_map(isect, map)) {
+ if (!dev->map(dev, disk_addr, map) || !offset_in_map(isect, map))
                        return ERR_PTR(-EIO);
                bio = bl_submit_bio(bio);
        }


Any thoughts?

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux