On Mon 04-07-16 11:49:27, Nikolay Borisov wrote: > Hello again Jan, > > On 06/03/2016 12:19 PM, Jan Kara wrote: > > Hi, > > > > On Fri 03-06-16 11:28:31, Nikolay Borisov wrote: > >> Recently the following crash was brought to my attention: > >> > [SNIP] > > > > Hum, this looks most likely like a memory corruption. The value > > ffffffffd9c01f11 doesn't look like a valid pointer to any dynamically > > allocated data (it is not aligned to multiple of 4, it does not point to > > data segment ffff88..........). It is close to a pointer to kernel code > > (modules start at ffffffffa.......) so if it really points to some kernel > > code it may be interesting to find out where. I have no clue how such > > number could get to ei->i_dquot[0]. Usually what I do in such cases is > > search kernel memory whether something unusual points to that place, > > whether previous struct members didn't get corrupted as well or whether > > that value is not also somewhere else in memory. But it's a search for a > > needle in a haystack. > > > > Honza > > So I got this exact same crash on a different machine, > with the exact same value. This rules out it being a random corruption: > > [2455521.848677] BUG: unable to handle kernel paging request at ffffffffd9c01fb1 > [2455521.849025] IP: [<ffffffff81204b62>] dquot_free_inode+0xa2/0x230 > [2455521.849315] PGD 1c0b067 PUD 1c0d067 PMD 0 > [2455521.849720] Oops: 0000 [#1] SMP > [2455521.850062] Modules linked in: <OMITTED > > [2455521.856549] ipv6 [last unloaded: nf_conntrack_ftp] > [2455521.856904] CPU: 8 PID: 2955 Comm: rm Tainted: G O 4.4.10-clouder1 #73 > [2455521.857286] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015 > [2455521.857517] task: ffff883506658000 ti: ffff881d50198000 task.ti: ffff881d50198000 > [2455521.857898] RIP: 0010:[<ffffffff81204b62>] [<ffffffff81204b62>] dquot_free_inode+0xa2/0x230 > [2455521.858353] RSP: 0018:ffff881d5019bc48 EFLAGS: 00010286 > [2455521.858581] RAX: ffffffffd9c01f11 RBX: ffff881d5019bc48 RCX: 000000000000fb20 > [2455521.858962] RDX: ffff881d5019bc58 RSI: ffff880996894680 RDI: ffffffff81c09540 > [2455521.859343] RBP: ffff881d5019bcc8 R08: 0000000000000001 R09: ffff881d5019bc58 > [2455521.859724] R10: ffff881d5019bca0 R11: 0000000100000000 R12: ffff880996894680 > [2455521.860105] R13: 0000000000000000 R14: 0000000000000008 R15: ffff881d5019be68 > [2455521.860486] FS: 00007f6ad2fe9700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000 > [2455521.860868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [2455521.861096] CR2: ffffffffd9c01fb1 CR3: 0000000151007000 CR4: 00000000001406e0 > [2455521.861476] Stack: > [2455521.861696] ffff881fa0388c00 ffff880996894368 0000000000000000 0000000000000000 > [2455521.862335] 0000000000000000 ffffffff8123949c ffff881d5019bd28 ffffffff812351c8 > [2455521.862972] ffff881d5019bcb8 ffff883fb9a4d800 ffff881ff093a810 ffff883fb9a4d800 > [2455521.863611] Call Trace: > [2455521.863838] [<ffffffff8123949c>] ? ext4_evict_inode+0x26c/0x4c0 > [2455521.864069] [<ffffffff812351c8>] ? ext4_mark_iloc_dirty+0x518/0x770 > [2455521.864304] [<ffffffff812312e3>] ext4_free_inode+0x83/0x5a0 > [2455521.864534] [<ffffffff8123949c>] ? ext4_evict_inode+0x26c/0x4c0 > [2455521.864765] [<ffffffff8123673b>] ? ext4_mark_inode_dirty+0x7b/0x260 > [2455521.864999] [<ffffffff812396e5>] ext4_evict_inode+0x4b5/0x4c0 > [2455521.865233] [<ffffffff811ba616>] evict+0xc6/0x1c0 > [2455521.865466] [<ffffffff811ba9dc>] iput+0x1ec/0x260 > [2455521.865696] [<ffffffff811ab128>] ? vfs_unlink+0x128/0x130 > [2455521.865928] [<ffffffff811ae766>] do_unlinkat+0x186/0x2c0 > [2455521.866158] [<ffffffff811ae8e2>] SyS_unlinkat+0x22/0x40 > [2455521.866390] [<ffffffff81635c57>] entry_SYSCALL_64_fastpath+0x12/0x6a > [2455521.866620] Code: 80 41 be 08 00 00 00 65 ff 0d cf 60 e0 7e e8 f6 0d 43 00 48 8d 53 10 4c 89 e6 4c 8d 55 d8 66 c7 02 00 00 48 8b 06 48 85 c0 74 61 <48> 8b 88 a0 00 00 00 4c 8d 80 a0 00 00 00 83 e1 08 0f 84 a5 00 > [2455521.871376] RIP [<ffffffff81204b62>] dquot_free_inode+0xa2/0x230 > [2455521.871674] RSP <ffff881d5019bc48> > [2455521.871897] CR2: ffffffffd9c01fb1 > > The crash again points to test_bit in info_idq_free. I followed > your advise to search for the address and here is what I got: > > crash> search -m ffffffff00000000 d9c01f11 > > ffff88000181e030: d9c01927d9c01f11 > ffff880996894680: ffffffffd9c01f11 > ffff881d5019b858: ffffffffd9c01f11 > ffff881d5019b998: ffffffffd9c01f11 - <stack frame of crash_kexec> > ffff881d5019bbe8: ffffffffd9c01f11 - <stack frame of page_fault) > ffffffff8181e030: d9c01927d9c01f11 > > So two of the values are in the stack frames of function involved, > in the crash so I'd say they are of no interest. What's interesting > is that ffffffff8181e030 seems to be quota_magics: > > readelf -s vmlinux-4.4.10-clouder1 | grep ffffffff8181e030 > 15605: ffffffff8181e030 12 OBJECT LOCAL DEFAULT 4 quota_magics.24849 > > #define V2_INITQMAGICS {\ > 0xd9c01f11, /* USRQUOTA */\ > 0xd9c01927, /* GRPQUOTA */\ > 0xd9c03f14, /* PRJQUOTA */\ > } > > So it seems that somehow the USRQUOTA magic values overwrites > the dquot pointer. Looking at the code I'm not entirely > sure how this can happen though. This is indeed interesting. Can you dump full struct ext4_inode * of the inode for which dquot_free_inode() was crashing? Command kmem -s ffff880996894680 should show you that this address is part of an object in ext4_inode_cache (please verify that) and give you pointer to the beginning of the object which is ext4_inode... Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html