kernel crash when using sha1 as csums-alg for drbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Chandramouli

Sorry for last email. 

These days we experienced 5 times kernel crash issue when using sha1 as
csums-alg for drbd on our CentOS7.2  3.10.0-327.el7.x86_64:

Kernel log as below:
[19839335.792807] BUG: unable to handle kernel paging request at
ffff88007bd4f000
[19839335.793145] IP: [<ffffffff8106a908>] _begin+0x28/0x187
[19839335.793326] PGD 1f32067 PUD 607ffff067 PMD 1f35067 PTE 0 
[19839335.793510] Oops: 0000 [#1] SMP 
[19839335.793683] Modules linked in: dm_service_time iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi nf_conntrack_netlink nf_conntrack_ipv6
nf_defrag_ipv6 xt_mac xt_set xt_physdev xt_CT ip_set_hash_net ip_set
nfnetlink vhost_net vhost macvtap macvlan veth iptable_raw iptable_filter
iptable_nat nf_nat_ipv4 iptable_mangle ip_tables dm_multipath ip6table_raw
vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch xt_multiport
ipmi_devintf xt_comment ext4 mbcache jbd2 xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack ipt_REJECT tun bridge ebtable_filter ebtables ip6table_filter
ip6_tables drbd(OE) 8021q garp stp mrp llc bonding dm_mirror dm_region_hash
dm_log iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl
kvm_intel kvm
[19839335.795640]  crc32_pclmul dm_mod ghash_clmulni_intel aesni_intel lrw
gf128mul glue_helper ablk_helper cryptd pcspkr ses ipmi_ssif enclosure sg
sb_edac edac_core lpc_ich mei_me i2c_i801 mfd_core mei ioatdma shpchp wmi
ipmi_si ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_generic
syscopyarea sysfillrect sysimgblt crct10dif_pclmul crct10dif_common
crc32c_intel drm_kms_helper ttm ixgbe drm igb mdio ptp mpt3sas pps_core
i2c_algo_bit raid_class dca i2c_core scsi_transport_sas [last unloaded:
ip_tables][19839335.797216] CPU: 1 PID: 2912 Comm: drbd_w_drbd1 Tainted: G
OE  ------------   3.10.0-327.el7.x86_64 #1                              
[19839335.797550] Hardware name: Inspur NF5280M4/YZMB-00326-101, BIOS 4.0.18
11/09/2015
[19839335.797877] task: ffff885f749b9700 ti: ffff882f62fc4000 task.ti:
ffff882f62fc4000
[19839335.798203] RIP: 0010:[<ffffffff8106a908>]  [<ffffffff8106a908>]
_begin+0x28/0x187
[19839335.798532] RSP: 0018:ffff882f62fc75f8  EFLAGS: 00010202
[19839335.798702] RAX: 000000002fced277 RBX: 00000000e9cee1cc RCX:
00000000a73b8733
[19839335.799030] RDX: 00000000b573ac7c RSI: 00000000bb6b5097 RDI:
00000000da4f4b14
[19839335.799356] RBP: 0000000058444804 R08: ffffffff81656100 R09:
ffff882f33147998
[19839335.799680] R10: ffff88007bd4ef80 R11: ffff88007bd4f040 R12:
00000000e770e674
[19839335.800010] R13: ffff88007bd4efc0 R14: ffff882f62fc75f8 R15:
ffff882f62fc7898
[19839335.800336] FS:  0000000000000000(0000) GS:ffff882fbf840000(0000)
knlGS:0000000000000000
[19839335.800664] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19839335.800835] CR2: ffff88007bd4f000 CR3: 000000000194a000 CR4:
00000000001427e0
[19839335.801160] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[19839335.801486] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[19839335.801812] Stack:
[19839335.801974]  5a8279995a827999 5a8279995a827999 5a8279995a827999
5a8279995a827999
[19839335.802317]  5a8279995a827999 5a8279995a827999 5a8279995a827999
5a8279995a827999
[19839335.802663]  5a8279995a827999 5a8279995a827999 5a8279995a827999
5a8279995a827999
[19839335.803005] Call Trace:
[19839335.803180]  [<ffffffff81569a41>] ? ip_local_out_sk+0x31/0x40
[19839335.803355]  [<ffffffff8106a31d>] ?
sha1_apply_transform_avx2+0x1d/0x30
[19839335.803530]  [<ffffffff8106a063>] ? __sha1_ssse3_update+0x53/0xd0
[19839335.803704]  [<ffffffff8106a388>] ? sha1_ssse3_update+0x58/0xf0
[19839335.803881]  [<ffffffff812b1878>] ? crypto_shash_update+0x38/0x100
[19839335.804056]  [<ffffffff812b1d6e>] ? shash_compat_update+0x4e/0x80
[19839335.804242]  [<ffffffffa05245ab>] ? drbd_csum_bio+0x9b/0xe0 [drbd]
[19839335.804427]  [<ffffffffa0546701>] ? drbd_send_dblock+0x3b1/0x480
[drbd]
[19839335.804608]  [<ffffffffa0522a80>] ? dequeue_work_batch+0x20/0x90
[drbd]
[19839335.804788]  [<ffffffffa0522d37>] ? wait_for_work+0x67/0x370 [drbd]
[19839335.804969]  [<ffffffffa052726f>] ? w_send_dblock+0xaf/0x1d0 [drbd]
[19839335.805168]  [<ffffffffa052867b>] ? drbd_worker+0xfb/0x390 [drbd]
[19839335.805349]  [<ffffffffa0542430>] ?
drbd_destroy_connection+0x160/0x160 [drbd]
[19839335.805684]  [<ffffffffa054244d>] ? drbd_thread_setup+0x1d/0x110
[drbd]
[19839335.805864]  [<ffffffffa0542430>] ?
drbd_destroy_connection+0x160/0x160 [drbd]
[19839335.806195]  [<ffffffff810a5aef>] ? kthread+0xcf/0xe0
[19839335.806367]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[19839335.806545]  [<ffffffff81645858>] ? ret_from_fork+0x58/0x90
[19839335.806717]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[19839335.806889] Code: 00 00 00 89 f3 c4 e3 7b f0 f6 02 c4 e2 60 f2 e8 21
fb 31 eb 41 03 17 c4 e2 70 f2 ef 8d 14 1a c4 63 7b f0 e1 1b c4 e3 7b f0 d9
02 <c4> c1 7a 6f 82 80 00 00 00 21 f1 31 e9 42 8d 14 22 41 03 47 04 
[19839335.807640] RIP  [<ffffffff8106a908>] _begin+0x28/0x187
[19839335.807814]  RSP <ffff882f62fc75f8>
[19839335.807979] CR2: ffff88007bd4f000     

We debug it by using crash:

crash> bt
PID: 2912   TASK: ffff885f749b9700  CPU: 1   COMMAND: "drbd_w_drbd1"
#0 [ffff882f62fc72c0] machine_kexec at ffffffff81051beb
#1 [ffff882f62fc7320] crash_kexec at ffffffff810f2542
#2 [ffff882f62fc73f0] oops_end at ffffffff8163e1a8
#3 [ffff882f62fc7418] no_context at ffffffff8162e2b8
#4 [ffff882f62fc7468] __bad_area_nosemaphore at ffffffff8162e34e
#5 [ffff882f62fc74b0] bad_area_nosemaphore at ffffffff8162e4b8
#6 [ffff882f62fc74c0] __do_page_fault at ffffffff81640fce
#7 [ffff882f62fc7518] do_page_fault at ffffffff81641113
#8 [ffff882f62fc7540] page_fault at ffffffff8163d408
    [exception RIP: _begin+40]
    RIP: ffffffff8106a908  RSP: ffff882f62fc75f8  RFLAGS: 00010202
    RAX: 000000002fced277  RBX: 00000000e9cee1cc  RCX: 00000000a73b8733
    RDX: 00000000b573ac7c  RSI: 00000000bb6b5097  RDI: 00000000da4f4b14
    RBP: 0000000058444804   R8: ffffffff81656100   R9: ffff882f33147998
    R10: ffff88007bd4ef80  R11: ffff88007bd4f040  R12: 00000000e770e674
    R13: ffff88007bd4efc0  R14: ffff882f62fc75f8  R15: ffff882f62fc7898
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#9 [ffff882f62fc7878] ip_local_out_sk at ffffffff81569a41
#10 [ffff882f62fc7ba8] sha1_apply_transform_avx2 at ffffffff8106a31d
#11 [ffff882f62fc7bb8] __sha1_ssse3_update at ffffffff8106a063
#12 [ffff882f62fc7bf8] sha1_ssse3_update at ffffffff8106a388
#13 [ffff882f62fc7c28] crypto_shash_update at ffffffff812b1878
#14 [ffff882f62fc7c78] shash_compat_update at ffffffff812b1d6e
#15 [ffff882f62fc7cc8] drbd_csum_bio at ffffffffa05245ab [drbd]
#16 [ffff882f62fc7d28] drbd_send_dblock at ffffffffa0546701 [drbd]
#17 [ffff882f62fc7de0] w_send_dblock at ffffffffa052726f [drbd]
#18 [ffff882f62fc7e28] drbd_worker at ffffffffa052867b [drbd]
#19 [ffff882f62fc7e98] drbd_thread_setup at ffffffffa054244d [drbd]
#20 [ffff882f62fc7ec8] kthread at ffffffff810a5aef
#21 [ffff882f62fc7f50] ret_from_fork at ffffffff81645858

crash> dis -l ffffffff8106a908
/usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/cr
ypto/sha1_avx2_x86_64_asm.S: 677
0xffffffff8106a908 <_begin+40>: vmovdqu 0x80(%r10),%xmm0

crash> dis -l _begin
/usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/cr
ypto/sha1_avx2_x86_64_asm.S: 677
0xffffffff8106a8e0 <_begin>:    mov    %esi,%ebx
0xffffffff8106a8e2 <_begin+2>:  rorx   $0x2,%esi,%esi
0xffffffff8106a8e8 <_begin+8>:  andn   %eax,%ebx,%ebp
0xffffffff8106a8ed <_begin+13>: and    %edi,%ebx
0xffffffff8106a8ef <_begin+15>: xor    %ebp,%ebx
0xffffffff8106a8f1 <_begin+17>: add    (%r15),%edx
0xffffffff8106a8f4 <_begin+20>: andn   %edi,%ecx,%ebp
0xffffffff8106a8f9 <_begin+25>: lea    (%rdx,%rbx,1),%edx
0xffffffff8106a8fc <_begin+28>: rorx   $0x1b,%ecx,%r12d
0xffffffff8106a902 <_begin+34>: rorx   $0x2,%ecx,%ebx
0xffffffff8106a908 <_begin+40>: vmovdqu 0x80(%r10),%xmm0
<--------------- crash here
0xffffffff8106a911 <_begin+49>: and    %esi,%ecx
0xffffffff8106a913 <_begin+51>: xor    %ebp,%ecx
0xffffffff8106a915 <_begin+53>: lea    (%rdx,%r12,1),%edx
0xffffffff8106a919 <_begin+57>: add    0x4(%r15),%eax
0xffffffff8106a91d <_begin+61>: andn   %esi,%edx,%ebp
0xffffffff8106a922 <_begin+66>: lea    (%rax,%rcx,1),%eax
0xffffffff8106a925 <_begin+69>: rorx   $0x1b,%edx,%r12d
0xffffffff8106a92b <_begin+75>: rorx   $0x2,%edx,%ecx
0xffffffff8106a931 <_begin+81>: vinsertf128 $0x1,0x80(%r13),%ymm0,%ymm0
0xffffffff8106a93b <_begin+91>: and    %ebx,%edx
0xffffffff8106a93d <_begin+93>: xor    %ebp,%edx
0xffffffff8106a93f <_begin+95>: lea    (%rax,%r12,1),%eax
0xffffffff8106a943 <_begin+99>: add    0x8(%r15),%edi

It crashed at arch/x86/crypto/sha1_avx2_x86_64_asm.S, and according to the
stack trace, I deduced some useful information:

crash> struct -x sha1_state 0xffff882f33147990
struct sha1_state {
  count = 0x4e000, 
  state = {0xa73b8733, 0xedad425e, 0xda4f4b14, 0x2fced277, 0x90a160ae}, 
  buffer =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000"
}


crash> rd ffff882f62fc7c78 24
ffff882f62fc7c78:  ffffffff812b1d6e ffff88007bd4e000   n.+........{....
ffff882f62fc7c88:  0000000000000000 ffffea0001ef5380   .........S......
ffff882f62fc7c98:  0000000000000000 ffff882f62fc7ce0   .........|.b/...
ffff882f62fc7ca8:  ffffffff00000000 00000000f6275b17   .........['.....
ffff882f62fc7cb8:  000000000000004e ffff882f62fc7d20   N....... }.b/...
ffff882f62fc7cc8:  ffffffffa05245ab ffff885f66044120   .ER..... A.f_...
ffff882f62fc7cd8:  ffff882f00000000 ffffea0001ef5382   ..../....S......
ffff882f62fc7ce8:  0000100000000000 0000000000000000   ................
ffff882f62fc7cf8:  0000000000000000 00000000f6275b17   .........['.....
ffff882f62fc7d08:  ffff882f73c0a000 ffff880111b94540   ...s/...@E......
ffff882f62fc7d18:  ffff882f6aff0010 ffff882f62fc7dd8   ...j/....}.b/...
ffff882f62fc7d28:  ffffffffa0546701 0000000000000000   .gT.............
crash> 
crash> struct hash_desc ffff882f62fc7cd0
struct hash_desc {
  tfm = 0xffff885f66044120, 
  flags = 0
}
crash> struct scatterlist ffff882f62fc7ce0
struct scatterlist {
  page_link = 18446719884486202242, 
  offset = 0, 
  length = 4096, 
  dma_address = 0, 
  dma_length = 0
}

crash> rd ffff882f62fc7c28 22
ffff882f62fc7c28:  ffffffff812b1878 ffff882f33147980   x.+......y.3/...
ffff882f62fc7c38:  ffff882f6aff0028 ffff882ae84cd500   (..j/.....L.*...
ffff882f62fc7c48:  ffff882f33147980 ffff882f6aff0028   .y.3/...(..j/...
ffff882f62fc7c58:  ffff882ae84cd500 ffff882f70846800   ..L.*....h.p/...
ffff882f62fc7c68:  ffff885f738a12a0 ffff882f62fc7cc0   ...s_....|.b/...
ffff882f62fc7c78:  ffffffff812b1d6e ffff88007bd4e000   n.+........{....
ffff882f62fc7c88:  0000000000000000 ffffea0001ef5380   .........S......
ffff882f62fc7c98:  0000000000000000 ffff882f62fc7ce0   .........|.b/...
ffff882f62fc7ca8:  ffffffff00000000 00000000f6275b17   .........['.....
ffff882f62fc7cb8:  000000000000004e ffff882f62fc7d20   N....... }.b/...
ffff882f62fc7cc8:  ffffffffa05245ab ffff885f66044120   .ER..... A.f_...
crash> 
crash> 
crash> 
crash> struct crypto_hash_walk ffff882f62fc7c80
struct crypto_hash_walk {
  data = 0xffff88007bd4e000 struct: page excluded: kernel virtual address:
ffff88007bd4e000  type: "gdb_readmem_callback"
struct: page excluded: kernel virtual address: ffff88007bd4e000  type:
"gdb_readmem_callback"
<Address 0xffff88007bd4e000 out of bounds>, 
  offset = 0, 
  alignmask = 0, 
  pg = 0xffffea0001ef5380, 
  entrylen = 0, 
  total = 0, 
  sg = 0xffff882f62fc7ce0, 
  flags = 0
}

According to the above information, after call shash_compat_update and, we
got one page sized 4k after kmap, which started at virtual address
0xffff88007bd4e000. 
So, the value pass to void sha1_transform_avx2(int *hash, const char* data,
size_t num_blocks ); data = 0xffff88007bd4e000, rounds = 64, which means we
have 64 blocks(4k) to handle.
But the BUFFER_END we calculated out in sha1_avx2_x86_64_asm.S is rounds <<6
+ data + 64 = 64 <<6 + 0xffff88007bd4e000 + 64 = 0xffff88007bd4f040 which
exceed one page.
I think maybe it is the reason why we got the "BUG: unable to handle kernel
paging request at ffff88007bd4f000".
I am not so familiar with the sha1 algorithm, so I email you for your kindly
help, can you give me some suggestion on this issue?



Sincerely

Zhuoyu



--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux