Re: Seeing this on a RHEL kernel with upstream backports wondering if this was ever fixed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2018-07-27 at 08:05 -0400, Laurence Oberman wrote:
> On Thu, 2018-07-26 at 16:02 -0400, Laurence Oberman wrote:
> > On Thu, 2018-07-26 at 10:28 -0400, Don Dutile wrote:
> > > On 07/26/2018 08:48 AM, Laurence Oberman wrote:
> > > > Hello
> > > > 
> > > > https://www.spinics.net/lists/linux-rdma/msg51334.html
> > > > 
> > > > A rhel 7.5 with backports from upstream is hitting this.
> > > > Chuck Reported it and Sagi and Max responded but its not clear
> > > > if
> > > > we
> > > > ever fixed this.
> > > > 
> > > 
> > > RHEL-7.5 data point:
> > > -- drivers/infiniband/* -r is backported to v4.14.
> > >     i.e., includes the patch(es) mentioned in the above thread.
> > > 
> > > Laurence:
> > > Please test with 7.6 kernel & report back.
> > > if that passes, RH can bisect the bug fix btwn v4.14 & v4.16(the
> > > 7.6
> > > update point for its rdma kernel core),
> > > and backport to 7.5-zstream.  note: you'll have to update rdma-
> > > core
> > > pkg to the 7.6 version as well.
> > > All functional & bug fix patches to mlx* (ib & enet) are in as
> > > well
> > > (same kernel references).
> > > 
> > > -dd
> > > 
> > > > In this case we land up in a panic, noty just messaging,
> > > > although
> > > > the
> > > > messages logged for a long time over and over until we finally
> > > > panicked.
> > > > 
> > > > crash> log | grep "memreg failure: memor" | wc -l
> > > > 2414
> > > > 
> > > > crash> log
> > > > [1635578.012721]  connection16:0: detected conn error (1011)
> > > > [1635587.050688] mlx5_0:dump_cqe:262:(pid 93128): dump error
> > > > cqe
> > > > [1635587.089686] 00000000 00000000 00000000 00000000
> > > > [1635587.123989] 00000000 00000000 00000000 00000000
> > > > [1635587.157494] 00000000 00000000 00000000 00000000
> > > > [1635587.190968] 00000000 08007806 250002ad ba6115d3
> > > > 
> > > > [1635587.224331] iser: iser_err_comp: memreg failure: memory
> > > > management
> > > > operation error (6) vend_err 78
> > > > [1635587.278876]  connection15:0: detected conn error (1011)
> > > > [1635590.986286] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> > > > [1635591.021891] 00000000 00000000 00000000 00000000
> > > > [1635591.053944] 00000000 00000000 00000000 00000000
> > > > 
> > > > [1657077.997960] BUG: unable to handle kernel NULL pointer
> > > > dereference
> > > > at 0000000000000010
> > > > [1657077.997967] IP: [<ffffffffc08a541e>]
> > > > iscsi_verify_itt+0x1e/0x110
> > > > [libiscsi]
> > > > [1657077.997970] PGD 80000098de387067 PUD b8d9ffa067 PMD 0
> > > > [1657077.997971] Oops: 0000 [#1] SMP
> > > > [1657077.998009] Modules linked in: oracleasm(O) nfsv3
> > > > rpcsec_gss_krb5
> > > > nfsv4 dns_resolver nfs fscache dm_round_robin bonding rpcrdma
> > > > ib_isert
> > > > iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt
> > > > target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib
> > > > rdma_ucm
> > > > ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core
> > > > vfat
> > > > fat
> > > > xfs sb_edac edac_core intel_powerclamp coretemp intel_rapl
> > > > iosf_mbi
> > > > kvm_intel kvm irqbypass iTCO_wdt crc32_pclmul ipmi_ssif
> > > > iTCO_vendor_support ghash_clmulni_intel aesni_intel lrw
> > > > gf128mul
> > > > ipmi_si glue_helper ablk_helper cryptd sg hpwdt hpilo pcspkr
> > > > ipmi_devintf ioatdma dm_multipath i2c_i801 lpc_ich shpchp dca
> > > > wmi
> > > > ipmi_msghandler pcc_cpufreq acpi_power_meter nfsd binfmt_misc
> > > > auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache
> > > > jbd2
> > > > sd_mod crc_t10dif crct10dif_generic
> > > > [1657077.998020]  i2c_algo_bit drm_kms_helper syscopyarea
> > > > sysfillrect
> > > > sysimgblt fb_sys_fops ttm bnx2x mlx5_core crct10dif_pclmul mdio
> > > > tg3(OE)
> > > > devlink libcrc32c crct10dif_common drm hpsa(OE) ptp i2c_core
> > > > crc32c_intel scsi_transport_sas pps_core dm_mirror
> > > > dm_region_hash
> > > > dm_log dm_mod
> > > > [1657077.998023] CPU: 20 PID: 41538 Comm: sh Tainted:
> > > > G           OE  -
> > > > -----------   3.10.0-693.34.1.el7_bz1582551.x86_64 #1
> > > > [1657077.998024] Hardware name: HP ProLiant DL380 Gen9/ProLiant
> > > > DL380
> > > > Gen9, BIOS P89 05/21/2018
> > > > [1657077.998025] task: ffff88587ce38fd0 ti: ffff884dd0af0000
> > > > task.ti:
> > > > ffff884dd0af0000
> > > > [1657077.998029] RIP:
> > > > 0010:[<ffffffffc08a541e>]  [<ffffffffc08a541e>]
> > > > iscsi_verify_itt+0x1e/0x110 [libiscsi]
> > > > [1657077.998030] RSP: 0000:ffff88beff403d78  EFLAGS: 00010286
> > > > [1657077.998031] RAX: 000000000000004c RBX: 00000000b0000036
> > > > RCX:
> > > > 0000000000000002
> > > > [1657077.998032] RDX: 00000000000000cc RSI: 00000000b0000036
> > > > RDI:
> > > > 0000000000000000
> > > > [1657077.998033] RBP: ffff88beff403da0 R08: 0000000040032a20
> > > > R09:
> > > > ffff8896e4eaf91c
> > > > [1657077.998034] R10: 0000000000000000 R11: 00007ffff7763ca0
> > > > R12:
> > > > 0000000000000000
> > > > [1657077.998035] R13: ffff8896e4eaf9e4 R14: ffff8896e4eaf900
> > > > R15:
> > > > 0000000000000000
> > > > [1657077.998036] FS:  00007ffff7fe6740(0000)
> > > > GS:ffff88beff400000(0000)
> > > > knlGS:0000000000000000
> > > > [1657077.998038] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > 0000000080050033
> > > > [1657077.998039] CR2: 0000000000000010 CR3: 000000ad92eba000
> > > > CR4:
> > > > 00000000003607e0
> > > > [1657077.998040] DR0: 0000000000000000 DR1: 0000000000000000
> > > > DR2:
> > > > 0000000000000000
> > > > [1657077.998041] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > > DR7:
> > > > 0000000000000400
> > > > [1657077.998042] Call Trace:
> > > > [1657077.998044]  <IRQ>
> > > > [1657077.998046]  [<ffffffffc08a5527>]
> > > > iscsi_itt_to_ctask+0x17/0x80
> > > > [libiscsi]
> > > > [1657077.998050]  [<ffffffffc05eefea>] iser_task_rsp+0xca/0x360
> > > > [ib_iser]
> > > > [1657077.998061]  [<ffffffffc0587fbb>]
> > > > __ib_process_cq+0x6b/0xe0
> > > > [ib_core]
> > > > [1657077.998066]  [<ffffffffc0588122>]
> > > > ib_poll_handler+0x22/0x80
> > > > [ib_core]
> > > > [1657077.998070]  [<ffffffff81358507>]
> > > > irq_poll_softirq+0xc7/0x100
> > > > [1657077.998076]  [<ffffffff81095195>] __do_softirq+0xf5/0x280
> > > > [1657077.998081]  [<ffffffff816c4e8c>] call_softirq+0x1c/0x30
> > > > [1657077.998086]  [<ffffffff8102d435>] do_softirq+0x65/0xa0
> > > > [1657077.998088]  [<ffffffff81095515>] irq_exit+0x105/0x110
> > > > [1657077.998091]  [<ffffffff816c61d6>] do_IRQ+0x56/0xf0
> > > > [1657077.998098]  [<ffffffff816b837c>]
> > > > common_interrupt+0x17c/0x17c
> > > > [1657077.998099]  <EOI>
> > > > [1657077.998113] Code: ff ff ff eb a9 41 be 95 ff ff ff eb a1
> > > > 0f
> > > > 1f
> > > > 44
> > > > 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 10 c7
> > > > 45
> > > > d8 00
> > > > 00 00 00 <4c> 8b 6f 10 65 48 8b 04 25 28 00 00 00 48 89 45 e0
> > > > 31
> > > > c0
> > > > 83
> > > > fe
> > > > [1657077.998116] RIP  [<ffffffffc08a541e>]
> > > > iscsi_verify_itt+0x1e/0x110
> > > > [libiscsi]
> > > > [1657077.998116]  RSP <ffff88beff403d78>
> > > > [1657077.998117] CR2: 0000000000000010
> > > > crash>
> > > > 
> > > > crash> bt
> > > > PID: 41538  TASK: ffff88587ce38fd0  CPU: 20  COMMAND: "sh"
> > > >   #0 [ffff88beff403a18] machine_kexec at ffffffff8105ddeb
> > > >   #1 [ffff88beff403a78] __crash_kexec at ffffffff81109902
> > > >   #2 [ffff88beff403b48] crash_kexec at ffffffff811099f0
> > > >   #3 [ffff88beff403b60] oops_end at ffffffff816b97a8
> > > >   #4 [ffff88beff403b88] no_context at ffffffff816a8c96
> > > >   #5 [ffff88beff403bd8] __bad_area_nosemaphore at
> > > > ffffffff816a8d2c
> > > >   #6 [ffff88beff403c20] bad_area_nosemaphore at
> > > > ffffffff816a8e96
> > > >   #7 [ffff88beff403c30] __do_page_fault at ffffffff816bc6be
> > > >   #8 [ffff88beff403c90] do_page_fault at ffffffff816bc865
> > > >   #9 [ffff88beff403cc0] page_fault at ffffffff816b8788
> > > >      [exception RIP: iscsi_verify_itt+30]
> > > >      RIP: ffffffffc08a541e  RSP: ffff88beff403d78  RFLAGS:
> > > > 00010286
> > > >      RAX: 000000000000004c  RBX: 00000000b0000036  RCX:
> > > > 0000000000000002
> > > >      RDX: 00000000000000cc  RSI: 00000000b0000036  RDI:
> > > > 0000000000000000
> > > >      RBP: ffff88beff403da0   R8: 0000000040032a20   R9:
> > > > ffff8896e4eaf91c
> > > >      R10: 0000000000000000  R11: 00007ffff7763ca0  R12:
> > > > 0000000000000000
> > > >      R13: ffff8896e4eaf9e4  R14: ffff8896e4eaf900  R15:
> > > > 0000000000000000
> > > >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
> > > > #10 [ffff88beff403da8] iscsi_itt_to_ctask at ffffffffc08a5527
> > > > [libiscsi]
> > > > #11 [ffff88beff403dc8] iser_task_rsp at ffffffffc05eefea
> > > > [ib_iser]
> > > > #12 [ffff88beff403e10] __ib_process_cq at ffffffffc0587fbb
> > > > [ib_core]
> > > > #13 [ffff88beff403e50] ib_poll_handler at ffffffffc0588122
> > > > [ib_core]
> > > > #14 [ffff88beff403e80] irq_poll_softirq at ffffffff81358507
> > > > #15 [ffff88beff403eb8] __do_softirq at ffffffff81095195
> > > > #16 [ffff88beff403f28] call_softirq at ffffffff816c4e8c
> > > > #17 [ffff88beff403f40] do_softirq at ffffffff8102d435
> > > > #18 [ffff88beff403f60] irq_exit at ffffffff81095515
> > > > #19 [ffff88beff403f78] do_IRQ at ffffffff816c61d6
> > > > --- <IRQ stack> ---
> > > > #20 [ffff884dd0af3f58] ret_from_intr at ffffffff816b837c
> > > >      RIP: 000000000041b866  RSP: 00007fffffffea28  RFLAGS:
> > > > 00000206
> > > >      RAX: 0000000000000000  RBX: 00007fffffffef53  RCX:
> > > > 00000000006f1a70
> > > >      RDX: 00000000006f1a70  RSI: 00000000006f1a90  RDI:
> > > > 0000000000000000
> > > >      RBP: 0000000000000002   R8: 0000000000000001   R9:
> > > > 0000000000000020
> > > >      R10: 0000000000000003  R11: 00007ffff7763ca0  R12:
> > > > ffff88beff4061e8
> > > >      R13: 00000000ffffffff  R14: 0000000000000000  R15:
> > > > 0000000000000063
> > > >      ORIG_RAX: ffffffffffffffbb  CS: 0033  SS: 002b
> > > > 
> > > > crash> ps -p 41538
> > > > PID: 0      TASK: ffffffff81a0e480  CPU: 0   COMMAND:
> > > > "swapper/0"
> > > >   PID: 1      TASK: ffff88012e4c8000  CPU: 7   COMMAND:
> > > > "systemd"
> > > >    PID: 2345   TASK: ffff885ef5eb8fd0  CPU: 14  COMMAND:
> > > > "zabbix_agentd"
> > > >     PID: 2349   TASK: ffff885efcbcaf70  CPU: 1   COMMAND:
> > > > "zabbix_agentd"
> > > >      PID: 41538  TASK: ffff88587ce38fd0  CPU: 20  COMMAND: "sh"
> > > > 
> > > 
> > > 
> > 
> > Don
> > I misspoke about the kernel version, its 7.4 
> > 3.10.0-693.34.1.el7_bz1582551.x86_64
> > Its the one we added the missing iscsi patches to but base is 7.4
> > So I will test with 7.5
> > 
> 
> Don, I had another look at this.
> 
> Its not the SG_GAPS issue causing a memory registration error I
> reported and we fixed in 7.5 from upstream.
> 
> Which commit in 7.5 did we pull in for fix this from upstream.
> 
> I think this is different and not yet fixed ??
> 
> [14556.614551] iser: iser_err_comp: memreg failure: memory management
> operation error (6) vend_err 78
> [14556.666134]  connection1:0: detected conn error (1011)
> [14562.678414] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [14562.678529] mlx5_0:dump_cqe:262:(pid 0): dump error cqe
> [14562.678530] 00000000 00000000 00000000 00000000
> [14562.678531] 00000000 00000000 00000000 00000000
> [14562.678531] 00000000 00000000 00000000 00000000
> [14562.678532] 00000000 08007806 25000344 34681cd2
> [14562.678535] iser: iser_err_comp: memreg failure: memory management
> operation error (6) vend_err 78
> [14562.678544]  connection1:0: detected conn error (1011)
> [14562.679098] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000010
> [14562.679105] IP: [<ffffffffc088141e>] iscsi_verify_itt+0x1e/0x110
> [libiscsi]
> [14562.679106] PGD 0
> [14562.679107] Oops: 0000 [#1] SMP
> [14562.679134] Modules linked in: ip6table_filter ip6_tables
> iptable_filter sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag
> af_packet_diag netlink_diag bnx2i cnic uio ip_vs nf_conntrack
> oracleadvm(POE) oracleoks(POE) oracleasm(O) nfsv3 rpcsec_gss_krb5
> nfsv4
> dns_resolver nfs fscache dm_round_robin bonding rpcrdma ib_isert
> iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt
> target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm
> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core xfs vfat
> fat sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi
> kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel
> lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
> iTCO_vendor_support ipmi_ssif pcspkr ipmi_si dm_multipath ioatdma
> lpc_ich i2c_i801 sg hpilo
> [14562.679152]  hpwdt dca ipmi_devintf ipmi_msghandler pcc_cpufreq
> shpchp wmi acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl
> lockd
> grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif
> crct10dif_generic i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops ttm bnx2x mlx5_core devlink mdio tg3(OE)
> libcrc32c drm crct10dif_pclmul hpsa(OE) crct10dif_common ptp i2c_core
> crc32c_intel scsi_transport_sas pps_core dm_mirror dm_region_hash
> dm_log dm_mod
> [14562.679154] CPU: 9 PID: 0 Comm: swapper/9 Tainted:
> P           OE  -
> -----------   3.10.0-693.22.1.el7.x86_64 #1
> [14562.679155] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380
> Gen9, BIOS P89 05/21/2018
> [14562.679156] task: ffff8860aefaaf70 ti: ffff8860ae440000 task.ti:
> ffff8860ae440000
> [14562.679158] RIP: 0010:[<ffffffffc088141e>]  [<ffffffffc088141e>]
> iscsi_verify_itt+0x1e/0x110 [libiscsi]
> [14562.679159] RSP: 0018:ffff88beff2c3d78  EFLAGS: 00010286
> [14562.679160] RAX: 000000000000004c RBX: 00000000d0000041 RCX:
> 0000000000000002
> [14562.679161] RDX: 00000000000000cc RSI: 00000000d0000041 RDI:
> 0000000000000000
> [14562.679161] RBP: ffff88beff2c3da0 R08: 0000000040001038 R09:
> ffff88ae496fe01c
> [14562.679162] R10: 0000000000000000 R11: 7fffffffffffffff R12:
> 0000000000000000
> [14562.679162] R13: ffff88ae496fe0e4 R14: ffff88ae496fe000 R15:
> 0000000000000000
> [14562.679163] FS:  0000000000000000(0000) GS:ffff88beff2c0000(0000)
> knlGS:0000000000000000
> [14562.679164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [14562.679164] CR2: 0000000000000010 CR3: 000000beede48000 CR4:
> 00000000003607e0
> [14562.679165] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [14562.679166] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [14562.679166] Call Trace:
> [14562.679168]  <IRQ>
> [14562.679170]  [<ffffffffc0881527>] iscsi_itt_to_ctask+0x17/0x80
> [libiscsi]
> [14562.679173]  [<ffffffffc069ffea>] iser_task_rsp+0xca/0x360
> [ib_iser]
> [14562.679181]  [<ffffffffc0924fbb>] __ib_process_cq+0x6b/0xe0
> [ib_core]

Starts with the memreg failures
crash> log | grep "iser: iser_err_comp: memreg failure" | wc -l
1237

Then the panic

[14556.614551] iser: iser_err_comp: memreg failure: memory management
operation error (6) vend_err 78
[14556.666134]  connection1:0: detected conn error (1011)
[14562.678414] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
[14562.678529] mlx5_0:dump_cqe:262:(pid 0): dump error cqe
[14562.678530] 00000000 00000000 00000000 00000000
[14562.678531] 00000000 00000000 00000000 00000000
[14562.678531] 00000000 00000000 00000000 00000000
[14562.678532] 00000000 08007806 25000344 34681cd2
[14562.678535] iser: iser_err_comp: memreg failure: memory management
operation error (6) vend_err 78
[14562.678544]  connection1:0: detected conn error (1011)

[14562.679098] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[14562.679105] IP: [<ffffffffc088141e>] iscsi_verify_itt+0x1e/0x110
[libiscsi]

crash> bt
PID: 0      TASK: ffff8860aefaaf70  CPU: 9   COMMAND: "swapper/9"
 #0 [ffff88beff2c3a18] machine_kexec at ffffffff8105d77b
 #1 [ffff88beff2c3a78] __crash_kexec at ffffffff81108732
 #2 [ffff88beff2c3b48] crash_kexec at ffffffff81108820
 #3 [ffff88beff2c3b60] oops_end at ffffffff816b8778
 #4 [ffff88beff2c3b88] no_context at ffffffff816a7c7a
 #5 [ffff88beff2c3bd8] __bad_area_nosemaphore at ffffffff816a7d10
 #6 [ffff88beff2c3c20] bad_area_nosemaphore at ffffffff816a7e7a
 #7 [ffff88beff2c3c30] __do_page_fault at ffffffff816bb68e
 #8 [ffff88beff2c3c90] do_page_fault at ffffffff816bb835
 #9 [ffff88beff2c3cc0] page_fault at ffffffff816b7768
    [exception RIP: iscsi_verify_itt+30]
    RIP: ffffffffc088141e  RSP: ffff88beff2c3d78  RFLAGS: 00010286
    RAX: 000000000000004c  RBX: 00000000d0000041  RCX: 0000000000000002
    RDX: 00000000000000cc  RSI: 00000000d0000041  RDI: 0000000000000000
    RBP: ffff88beff2c3da0   R8: 0000000040001038   R9: ffff88ae496fe01c
    R10: 0000000000000000  R11: 7fffffffffffffff  R12: 0000000000000000
    R13: ffff88ae496fe0e4  R14: ffff88ae496fe000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff88beff2c3da8] iscsi_itt_to_ctask at ffffffffc0881527
[libiscsi]
#11 [ffff88beff2c3dc8] iser_task_rsp at ffffffffc069ffea [ib_iser]
#12 [ffff88beff2c3e10] __ib_process_cq at ffffffffc0924fbb [ib_core]
#13 [ffff88beff2c3e50] ib_poll_handler at ffffffffc0925122 [ib_core]
#14 [ffff88beff2c3e80] irq_poll_softirq at ffffffff813572b7
#15 [ffff88beff2c3eb8] __do_softirq at ffffffff81094035
#16 [ffff88beff2c3f28] call_softirq at ffffffff816c3afc
#17 [ffff88beff2c3f40] do_softirq at ffffffff8102d435
#18 [ffff88beff2c3f60] irq_exit at ffffffff810943b5
#19 [ffff88beff2c3f78] do_IRQ at ffffffff816c4d96
--- <IRQ stack> ---
#20 [ffff8860ae443db8] ret_from_intr at ffffffff816b7362
    [exception RIP: cpuidle_enter_state+87]
    RIP: ffffffff81530b07  RSP: ffff8860ae443e60  RFLAGS: 00000202
    RAX: 00000d3e7d729de6  RBX: ffff8860ae443e40  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff8860ae443fd8  RDI: 00000d3e7d729de6
    RBP: ffff8860ae443e88   R8: 000000000000016c   R9: 000000000000001c
    R10: 0000000000000043  R11: 7fffffffffffffff  R12: 0000000000000009
    R13: ffff88beff2d39a0  R14: ffffffff810b77e5  R15: ffff8860ae443de0
    ORIG_RAX: ffffffffffffff5d  CS: 0010  SS: 0018
#21 [ffff8860ae443e90] cpuidle_idle_call at ffffffff81530c5e
#22 [ffff8860ae443ed0] arch_cpu_idle at ffffffff81034f8e
#23 [ffff8860ae443ee0] cpu_startup_entry at ffffffff810eb6da
#24 [ffff8860ae443f28] start_secondary at ffffffff81052222

crash> dis -l iscsi_verify_itt+30
/usr/src/debug/kernel-3.10.0-693.22.1.el7/linux-3.10.0-
693.22.1.el7.x86_64/drivers/scsi/libiscsi.c: 1292
0xffffffffc088141e <iscsi_verify_itt+30>:       mov    0x10(%rdi),%r13
crash> 


So fails here

int iscsi_verify_itt(struct iscsi_conn *conn, itt_t itt)
{
        struct iscsi_session *session = conn->session;  **** conn-
>session is invalid

rdi had the struct iscsi_conn 

0xffffffffc0881400 <iscsi_verify_itt>:  nopl   0x0(%rax,%rax,1) [FTRACE
NOP]
0xffffffffc0881405 <iscsi_verify_itt+5>:        push   %rbp
0xffffffffc0881406 <iscsi_verify_itt+6>:        mov    %rsp,%rbp
0xffffffffc0881409 <iscsi_verify_itt+9>:        push   %r13
0xffffffffc088140b <iscsi_verify_itt+11>:       push   %r12
0xffffffffc088140d <iscsi_verify_itt+13>:       mov    %rdi,%r12
0xffffffffc0881410 <iscsi_verify_itt+16>:       push   %rbx
0xffffffffc0881411 <iscsi_verify_itt+17>:       mov    %esi,%ebx
0xffffffffc0881413 <iscsi_verify_itt+19>:       sub    $0x10,%rsp
0xffffffffc0881417 <iscsi_verify_itt+23>:       movl   $0x0,-0x28(%rbp)
0xffffffffc088141e <iscsi_verify_itt+30>:       mov    0x10(%rdi),%r13

   RIP: ffffffffc088141e  RSP: ffff88beff2c3d78  RFLAGS: 00010286
    RAX: 000000000000004c  RBX: 00000000d0000041  RCX: 0000000000000002
    RDX: 00000000000000cc  RSI: 00000000d0000041  RDI: 0000000000000000
    RBP: ffff88beff2c3da0   R8: 0000000040001038   R9: ffff88ae496fe01c
    R10: 0000000000000000  R11: 7fffffffffffffff  R12: 0000000000000000
    R13: ffff88ae496fe0e4  R14: ffff88ae496fe000  R15: 0000000000000000

Both RDI and R12 are null, offset by 10 get the bad address

So we have a race somehow that trashes the conn pointer under load.

The load clearly is seeing resource issues and repeatedly failing the
memory registration.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux