On Thu, Nov 29, 2012 at 12:24:48PM +1100, Neil Brown wrote: > On Wed, 28 Nov 2012 10:33:06 +0000 (UTC) bugzilla-daemon@xxxxxxxxxxxxxxxxxxx > wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=45351 > > > > > > Cyril B. <cbay@xxxxxxxxxxxxx> changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > Kernel Version|3.5.0 |3.5.0, 3.6.8 > > > > > > > > > > --- Comment #1 from Cyril B. <cbay@xxxxxxxxxxxxx> 2012-11-28 10:33:05 --- > > I've just tested 3.6.8, I still get the same bug/trace. > > > > Hi Jim, > could you look at this bug please? Hi Neil, Thank you for bringing this to my attention. > > https://bugzilla.kernel.org/show_bug.cgi?id=45351 > > It seems to be crashing in xor_avx_4: > > [48595.135046] general protection fault: 0000 [#1] SMP > [48595.135093] CPU 0 > [48595.135098] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport coretemp > hwmon i2c_i801 shpchp pci_hotplug ehci_hcd usbcore usb_common netconsole e1000e > [last unloaded: scsi_wait_scan] > [48595.135211] > [48595.135224] Pid: 2429, comm: md4_raid5 Not tainted 3.5.0 #2 > /DH67BL > [48595.135263] RIP: 0010:[<ffffffff813512d8>] [<ffffffff813512d8>] xor_avx_4+0x48/0x350 > [48595.135303] RSP: 0018:ffff880213a259d0 EFLAGS: 00010282 > [48595.135323] RAX: 000000008005003b RBX: 0000000000000008 RCX: ffff8802130b5000 > [48595.135346] RDX: ffff880212c9f000 RSI: ffff880212c9e000 RDI: 0000000000001000 > [48595.135368] RBP: ffff880213a25ac0 R08: ffff8802130b4000 R09: ffff880212c9e000 > [48595.135391] R10: ffff880212c9e000 R11: 0000000000000000 R12: 000000008005003b > [48595.135413] R13: 0000000000000003 R14: ffff880213a25cd0 R15: 0000000000001000 > [48595.135436] FS: 0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000 > [48595.135471] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [48595.135492] CR2: 000000000235f570 CR3: 0000000001c0b000 CR4: 00000000000407f0 > [48595.135514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [48595.135537] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > .... > > [48595.136063] Code: b5 30 ff ff ff 48 89 95 28 ff ff ff 48 89 8d 20 ff ff ff 4c 89 85 18 ff ff ff e8 c4 04 ce ff 66 90 49 89 c4 0f 06 66 66 90 66 90 <c5> fc 29 85 50 ff ff ff c5 fc 29 8d 70 ff ff ff c5 fc 29 55 90 The code dump above is quiet revealing. The relevant instructions are: clts vmovaps %ymm0, -0xb0(%rbp) vmovaps %ymm1, -0x90(%rbp) vmovaps %ymm2, -0x70(%rbp) These instructions save the floating point state before we begin the actual xor work. Looking at the register dump, -0xb0(%rbp) is not properly aligned to 32 bytes, hence the #GP. The question is whether the #GP still occurs after 841e3604d35aa70d399146abdc526d8c89a2c2f5. Before that commit, we manually saved and restored the floating point state to the stack with the YMMS_{SAVE,RESTORE} macros. After that commit, we use the kernel_fpu_{begin,end} routines. In the former case, it would seem GCC is ignoring our request to align the stack variable to 32-bytes and 841e3604d35aa70d399146abdc526d8c89a2c2f5 should resolve the problem. In the later case, we will need to investigate further. Thanks. -- Jim Kukunas Intel Open Source Technology Center
Attachment:
pgpBuOhXpSI7W.pgp
Description: PGP signature