Hit OOPS on FPU save and restore while useing AESNI for IPSec on 32 bit System

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Recently I hit an OOPS on FPU save/restore in Linux version 2.6.38.8 using aesni_intel_asm.S and aesni_intel_glue.c for native IPSec(netkey) on 32bit System. The same OOPS were found in versions 2.6.39.4, 3.0.x and 3.1.x.But I did not hit this problem in 64 bit system for all these versions.

My platform information: 
"Linux dnsubuntu 2.6.38.8 #7 SMP Sat Nov 12 03:11:12 CST 2011 i686 i686 i386 GNU/Linux"

IPsec uses these two crypto driver with aead interface
driver       : cryptd(__driver-cbc-aes-aesni)            --- my understanding (while in irq path, encryption/decryption will be sent to crypto daemon to do an asynchronous operation)
driver       : authenc(hmac(sha1-generic),cbc-aes-aesni) --- my understanding (IPsec will call it in softirq via aead interface)
all the function calls such as (cbc_encrypt/cbc_decrypt) in file aesni_intel_glue.c has been protected inside kernel_fpu_begin()/kernel_fpu_end().  I have done some research on how FPU save/restore in Linux. I still can not figure out where the problem is in this case. I wondered how can fxsave/fxrestor OOPS happen? how can tsk->thread->fpu->state be null when PF_MATH_USED or TS_USEDFPU is set?

It is easy to repeat this problem as following steps:
1. build two 32bit system with AESNI in crypto, install openswan, use netkey kernel IPSec stack. Create ESP tunnel between the left and right IPSec gateway.
2. run iperf on host in the left subnet to the host in the right subnet, iperf traffic can be bi-direction.
3. run top or tcpdump inside left and right IPSec gateway
4. From another client or desktop use SSH login to both VPN gateway many times
5. you will find that SSH connection is not stable, top and tcpdump application are not stable ether. In 5 to 10 mins, there will be an OOPS, then system hangs.

I have some questions below:
1. Can functions in aesni_intel_glue.c safely be called in softirq (such as IPSec stack)?
2. I think these functions should not be called in interrupt, is it correct?
3. Have these functions be used/tested for native IPSec of Linux via aead interface on 32 bit platform? This could be a bug for 32bit AESNI usage of  Linux native IPSec stack.

I have attached OOPS image, back trace and decodes
Please help to give me some advices on this OOPS, how do you think of this issue, how to fix it?


OOPS info
<snip>
IP: [<c1009880>] __switch_to+0x150/0x190
*pdpt = 0000000030580001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/serpent/initstate
<snip>

<snip>
Code: 00 80 7d e7 00 74 05 e8 ff 23 00 00 64 89 35 2c 82 85 c1 89 d8 83 c4 14 5b 5e 5f 5d c3 8d b6 00 00 00 00 89 f6 8b 83 4c 03 00 00 <0f> ae 00 8b 83 4c 03 00 00 e9 15 ff ff ff 66 90 8b 83 4c 03 00

root@dnsubuntu:/linux-source-2.6.38# find -name decodecode
./scripts/decodecode
root@dnsubuntu:/linux-source-2.6.38# echo "Code: 00 80 7d e7 00 74 05 e8 ff 23 00 00 64 89 35 2c 82 85 c1 89 d8 83 c4 14 5b 5e 5f 5d c3 8d b6 00 00 00 00 89 f6 8b 83 4c 03 00 00 <0f> ae 00 8b 83 4c 03 00 00 e9 15 ff ff ff 66 90 8b 83 4c 03 00" | ./scripts/decodecode
Code: 00 80 7d e7 00 74 05 e8 ff 23 00 00 64 89 35 2c 82 85 c1 89 d8 83 c4 14 5b 5e 5f 5d c3 8d b6 00 00 00 00 89 f6 8b 83 4c 03 00 00 <0f> ae 00 8b 83 4c 03 00 00 e9 15 ff ff ff 66 90 8b 83 4c 03 00
All code
========
   0:   00 80 7d e7 00 74       add    %al,0x7400e77d(%eax)
   6:   05 e8 ff 23 00          add    $0x23ffe8,%eax
   b:   00 64 89 35             add    %ah,0x35(%ecx,%ecx,4)
   f:   2c 82                   sub    $0x82,%al
  11:   85 c1                   test   %eax,%ecx
  13:   89 d8                   mov    %ebx,%eax
  15:   83 c4 14                add    $0x14,%esp
  18:   5b                      pop    %ebx
  19:   5e                      pop    %esi
  1a:   5f                      pop    %edi
  1b:   5d                      pop    %ebp
  1c:   c3                      ret
  1d:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  23:   89 f6                   mov    %esi,%esi
  25:   8b 83 4c 03 00 00       mov    0x34c(%ebx),%eax
  2b:*  0f ae 00                fxsave (%eax)     <-- trapping instruction
  2e:   8b 83 4c 03 00 00       mov    0x34c(%ebx),%eax
  34:   e9 15 ff ff ff          jmp    0xffffff4e
  39:   66 90                   xchg   %ax,%ax
  3b:   8b                      .byte 0x8b
  3c:   83                      .byte 0x83
  3d:   4c                      dec    %esp
  3e:   03 00                   add    (%eax),%eax

Code starting with the faulting instruction
===========================================
   0:   0f ae 00                fxsave (%eax)
   3:   8b 83 4c 03 00 00       mov    0x34c(%ebx),%eax
   9:   e9 15 ff ff ff          jmp    0xffffff23
   e:   66 90                   xchg   %ax,%ax
  10:   8b                      .byte 0x8b
  11:   83                      .byte 0x83
  12:   4c                      dec    %esp
  13:   03 00                   add    (%eax),%eax
root@dnsubuntu:/linux-source-2.6.38#
^C^CInterrupted while waiting for the program.
Give up (and stop debugging it)? (y or n) y
(gdb) target remote /dev/ttyS1
Remote debugging using /dev/ttyS1
fpu_fxsave (prev_p=0xf17c71a0, next_p=0xf5891940)
    at /linux-source-2.6.38/arch/x86/include/asm/i387.h:209
209             asm volatile("fxsave %[fx]"
(gdb) bt
#0  fpu_fxsave (prev_p=0xf17c71a0, next_p=0xf5891940)
    at /linux-source-2.6.38/arch/x86/include/asm/i387.h:209
#1  fpu_save_init (prev_p=0xf17c71a0, next_p=0xf5891940)
    at /linux-source-2.6.38/arch/x86/include/asm/i387.h:238
#2  __save_init_fpu (prev_p=0xf17c71a0, next_p=0xf5891940)
    at /linux-source-2.6.38/arch/x86/include/asm/i387.h:261
#3  __unlazy_fpu (prev_p=0xf17c71a0, next_p=0xf5891940)
    at /linux-source-2.6.38/arch/x86/include/asm/i387.h:292
#4  __switch_to (prev_p=0xf17c71a0, next_p=0xf5891940)
    at arch/x86/kernel/process_32.c:316
#5  0xc151fb3b in context_switch () at kernel/sched.c:2946
#6  schedule () at kernel/sched.c:3999
#7  0xc105073b in __cond_resched () at kernel/sched.c:5258
#8  0xc1520318 in _cond_resched () at kernel/sched.c:5265
#9  0xc1120419 in slab_pre_alloc_hook (s=<value optimized out>, gfpflags=208)
    at mm/slub.c:795
#10 slab_alloc (s=<value optimized out>, gfpflags=208) at mm/slub.c:1744
#11 kmem_cache_alloc (s=<value optimized out>, gfpflags=208) at mm/slub.c:1770
#12 0xc113ef91 in d_alloc (parent=0x0, name=0xf09d3f24) at fs/dcache.c:1286
#13 0xc113f1ab in d_alloc_pseudo (sb=0xf58b5800, name=<value optimized out>)
    at fs/dcache.c:1343
#14 0xc1435269 in sock_alloc_file (sock=0xf5667c40, f=0xf09d3f4c, flags=526336)
    at net/socket.c:365
---Type <return> to continue, or q <return> to quit---
#15 0xc1435326 in sock_map_fd (sock=<value optimized out>,
    flags=<value optimized out>) at net/socket.c:397
#16 0xc14364ac in sys_socket (family=1, type=1, protocol=0)
    at net/socket.c:1313
#17 0xc1437768 in sys_socketcall (call=1, args=0xbfdb6398) at net/socket.c:2256
#18 <signal handler called>
#19 0xb7786424 in ?? ()
#20 0xb7721e11 in ?? ()
#21 0xb77222b9 in ?? ()
#22 0xb771f424 in ?? ()
#23 0xb771f7e2 in ?? ()
#24 0xb76b50c9 in ?? ()
#25 0xb76b4a0f in ?? ()
#26 0x08048627 in ?? ()
#27 0xb7633e37 in ?? ()
#28 0x08048501 in ?? ()
(gdb)
</snip>

Thanks & Regards
TimLee?韬{.n?壏煯壄?%娝?檩?w?{.n?壏{饼黍?{ay?蕠跈?jf"穐殢飦?戧鐉_璁(殠娸"濟?m??G珴?⒏?櫒璀?x忈



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux