Re: 3.11.4: kernel BUG at fs/buffer.c:1268

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 28-11-13 00:09:06, George Spelvin wrote:
> Well, it finally triggered.
> 
> 
> Not *that* long before, I fiddled with a USB thumb drive, which
> I'll mention here, but I don't think it's connected.
> 
> [2328294.996152] usb 1-1.3: new high-speed USB device number 6 using ehci-pci
> [2328295.080347] usb 1-1.3: New USB device found, idVendor=0781, idProduct=556c
> [2328295.080351] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> [2328295.080352] usb 1-1.3: Product: Ultra
> [2328295.080353] usb 1-1.3: Manufacturer: SanDisk
> [2328295.080354] usb 1-1.3: SerialNumber: 20054861120C8D407604
> [2328295.829526] usb-storage 1-1.3:1.0: USB Mass Storage device detected
> [2328295.829571] scsi6 : usb-storage 1-1.3:1.0
> [2328295.829615] usbcore: registered new interface driver usb-storage
> [2328296.832215] scsi 6:0:0:0: Direct-Access     SanDisk  Ultra            1.20 PQ: 0 ANSI: 5
> [2328296.832343] sd 6:0:0:0: Attached scsi generic sg3 type 0
> [2328296.833579] sd 6:0:0:0: [sdc] 15633408 512-byte logical blocks: (8.00 GB/7.45 GiB)
> [2328296.834942] sd 6:0:0:0: [sdc] Write Protect is off
> [2328296.834944] sd 6:0:0:0: [sdc] Mode Sense: 43 00 00 00
> [2328296.835947] sd 6:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [2328296.848345]  sdc: sdc1
> [2328296.851338] sd 6:0:0:0: [sdc] Attached SCSI removable disk
> [2328361.332585] FAT-fs (sdc1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
> [2335705.838834] usb 1-1.3: USB disconnect, device number 6
> 
> 
> The next thing in the kernel log is the might_sleep() warning (followed by
> the oops):
> 
> [2348070.539862] BUG: sleeping function called from invalid context at fs/ext4/mballoc.c:4791
> [2348070.539865] in_atomic(): 0, irqs_disabled(): 1, pid: 4635, name: iceweasel
> [2348070.539867] CPU: 4 PID: 4635 Comm: iceweasel Tainted: G        W    3.11.5-00009-g06a2442 #100
> [2348070.539868] Hardware name: Gigabyte Technology Co., Ltd. Z68A-D3H-B3/Z68A-D3H-B3, BIOS F13 03/20/2012
> [2348070.539870]  ffff88011379f4e0 ffff8801a30179a8 ffffffff81561017 0000000000000002
> [2348070.539872]  ffff8801a30179b8 ffffffff8106a06f ffff8801a3017a90 ffffffff81197a3e
> [2348070.539874]  ffff880011306f68 ffff8801a3017fd8 00000001a30179f8 0000000000f15fd1
> [2348070.539876] Call Trace:
> [2348070.539881]  [<ffffffff81561017>] dump_stack+0x54/0x74
> [2348070.539884]  [<ffffffff8106a06f>] __might_sleep+0xcf/0xf0
> [2348070.539887]  [<ffffffff81197a3e>] ext4_free_blocks+0x53e/0xa90
  Thanks for the report. So this is the last might_sleep() we have added.
Interesting. I've also realized it isn't a stack overflow problem because
irqs_disabled() check in might_sleep() checks real setting of a flag in the
processor flags.

I went again through the several calls between last two might_sleep() calls
but I didn't find anything suspicious. So let's narrow this down a bit more
by the attached patch (to be applied on top of your previous patch).
Thanks!

								Honza

> [2348070.539889]  [<ffffffff8118c286>] ext4_ext_remove_space+0x806/0xe20
> [2348070.539891]  [<ffffffff8118e568>] ext4_ext_truncate+0xb8/0xe0
> [2348070.539894]  [<ffffffff811749a5>] ext4_truncate+0x2b5/0x300
> [2348070.539895]  [<ffffffff81175458>] ext4_evict_inode+0x3f8/0x430
> [2348070.539898]  [<ffffffff8111acca>] evict+0xba/0x1c0
> [2348070.539899]  [<ffffffff8111b67b>] iput+0x10b/0x1b0
> [2348070.539901]  [<ffffffff81117468>] dput+0x278/0x350
> [2348070.539904]  [<ffffffff811032da>] __fput+0x16a/0x240
> [2348070.539905]  [<ffffffff811033e9>] ____fput+0x9/0x10
> [2348070.539909]  [<ffffffff8105e63c>] task_work_run+0x9c/0xd0
> [2348070.539911]  [<ffffffff81045547>] do_exit+0x2a7/0x9d0
> [2348070.539914]  [<ffffffff8104fc1e>] ? __sigqueue_free.part.13+0x2e/0x40
> [2348070.539915]  [<ffffffff81046aee>] do_group_exit+0x3e/0xb0
> [2348070.539917]  [<ffffffff81052a90>] get_signal_to_deliver+0x1b0/0x5f0
> [2348070.539919]  [<ffffffff81002133>] do_signal+0x43/0x940
> [2348070.539921]  [<ffffffff810519e8>] ? do_send_sig_info+0x58/0x80
> [2348070.539923]  [<ffffffff81002a8d>] do_notify_resume+0x5d/0x80
> [2348070.539925]  [<ffffffff81568f60>] int_signal+0x12/0x17
> [2348070.539931] ------------[ cut here ]------------
> [2348070.539950] kernel BUG at fs/buffer.c:1268!
> [2348070.539962] invalid opcode: 0000 [#1] SMP 
> [2348070.539976] Modules linked in: nls_utf8 nls_cp437 vfat fat usb_storage fuse pl2303 ftdi_sio usbserial iTCO_wdt
> [2348070.540018] CPU: 4 PID: 4635 Comm: iceweasel Tainted: G        W    3.11.5-00009-g06a2442 #100
> [2348070.540040] Hardware name: Gigabyte Technology Co., Ltd. Z68A-D3H-B3/Z68A-D3H-B3, BIOS F13 03/20/2012
> [2348070.540063] task: ffff88021688cf00 ti: ffff8801a3016000 task.ti: ffff8801a3016000
> [2348070.540082] RIP: 0010:[<ffffffff8156053e>]  [<ffffffff8156053e>] check_irqs_on.part.16+0x4/0x6
> [2348070.540108] RSP: 0018:ffff8801a3017798  EFLAGS: 00010046
> [2348070.540122] RAX: 0000000000000082 RBX: ffff8801a3017928 RCX: ffff8802162bd000
> [2348070.540141] RDX: 0000000000001000 RSI: 0000000000980080 RDI: ffff8802171b6b00
> [2348070.540159] RBP: ffff8801a3017798 R08: 0000000000000002 R09: 0000000000000002
> [2348070.540177] R10: ffff8802162bd000 R11: ffff8801a301751e R12: ffff8802171b6b00
> [2348070.540195] R13: 0000000000001000 R14: ffff880213a70600 R15: ffff880215228c00
> [2348070.540214] FS:  0000000000000000(0000) GS:ffff88021fb00000(0000) knlGS:0000000000000000
> [2348070.540235] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [2348070.540250] CR2: 00000000f4832a3c CR3: 000000000180c000 CR4: 00000000000407e0
> [2348070.540268] Stack:
> [2348070.540274]  ffff8801a3017808 ffffffff8112e547 ffff8801a30177b0 0000000000000092
> [2348070.540297]  ffffffff81568f60 ffffffff81732ff4 ffffffff81568f60 ffffffff810987bd
> [2348070.540320]  00000000ffe743c4 ffff8801a30177f0 ffff8801a3017928 ffff8802171b6b00
> [2348070.540342] Call Trace:
> [2348070.540352]  [<ffffffff8112e547>] __find_get_block+0x1d7/0x1e0
> [2348070.540369]  [<ffffffff81568f60>] ? int_signal+0x12/0x17
> [2348070.540384]  [<ffffffff81568f60>] ? int_signal+0x12/0x17
> [2348070.540400]  [<ffffffff810987bd>] ? __module_text_address+0xd/0x60
> [2348070.540417]  [<ffffffff8112e570>] __getblk+0x20/0x2f0
> [2348070.540432]  [<ffffffff8116fe46>] __ext4_get_inode_loc+0x106/0x410
> [2348070.540448]  [<ffffffff81171df8>] ext4_get_inode_loc+0x18/0x20
> [2348070.540465]  [<ffffffff811735a1>] ext4_reserve_inode_write+0x21/0x90
> [2348070.540482]  [<ffffffff81173659>] ext4_mark_inode_dirty+0x49/0x1a0
> [2348070.540499]  [<ffffffff811760eb>] ext4_dirty_inode+0x3b/0x60
> [2348070.540515]  [<ffffffff81127d3d>] __mark_inode_dirty+0x2d/0x230
> [2348070.540532]  [<ffffffff81197cbc>] ext4_free_blocks+0x7bc/0xa90
> [2348070.540548]  [<ffffffff8118c286>] ext4_ext_remove_space+0x806/0xe20
> [2348070.540565]  [<ffffffff8118e568>] ext4_ext_truncate+0xb8/0xe0
> [2348070.540581]  [<ffffffff811749a5>] ext4_truncate+0x2b5/0x300
> [2348070.540597]  [<ffffffff81175458>] ext4_evict_inode+0x3f8/0x430
> [2348070.540613]  [<ffffffff8111acca>] evict+0xba/0x1c0
> [2348070.540627]  [<ffffffff8111b67b>] iput+0x10b/0x1b0
> [2348070.540641]  [<ffffffff81117468>] dput+0x278/0x350
> [2348070.540655]  [<ffffffff811032da>] __fput+0x16a/0x240
> [2348070.540669]  [<ffffffff811033e9>] ____fput+0x9/0x10
> [2348070.540683]  [<ffffffff8105e63c>] task_work_run+0x9c/0xd0
> [2348070.540698]  [<ffffffff81045547>] do_exit+0x2a7/0x9d0
> [2348070.540713]  [<ffffffff8104fc1e>] ? __sigqueue_free.part.13+0x2e/0x40
> [2348070.540730]  [<ffffffff81046aee>] do_group_exit+0x3e/0xb0
> [2348070.540745]  [<ffffffff81052a90>] get_signal_to_deliver+0x1b0/0x5f0
> [2348070.540762]  [<ffffffff81002133>] do_signal+0x43/0x940
> [2348070.540777]  [<ffffffff810519e8>] ? do_send_sig_info+0x58/0x80
> [2348070.540793]  [<ffffffff81002a8d>] do_notify_resume+0x5d/0x80
> [2348070.540809]  [<ffffffff81568f60>] int_signal+0x12/0x17
> [2348070.540823] Code: 4d 85 e4 74 1d 41 80 44 24 58 01 65 48 8b 04 25 b0 b7 00 00 ff 88 44 e0 ff ff 4c 89 e7 e8 bb 6c bb ff 5b 41 5c 5d c3 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 
> [2348070.540943] RIP  [<ffffffff8156053e>] check_irqs_on.part.16+0x4/0x6
> [2348070.540961]  RSP <ffff8801a3017798>
> [2348070.547170] ---[ end trace 41094c9a65f45b94 ]---
> [2348070.547171] Fixing recursive fault but reboot is needed!
> 
> 
> Here's the debug patch from last month's e-mail, for reference.
> The one that tripped is the last hunk.
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4bbbf13b..e6f0d6b 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4734,6 +4734,7 @@ do_more:
>  	if (err)
>  		goto error_return;
>  
> +	might_sleep();
>  	if ((flags & EXT4_FREE_BLOCKS_METADATA) && ext4_handle_valid(handle)) {
>  		struct ext4_free_data *new_entry;
>  		/*
> @@ -4741,6 +4742,7 @@ do_more:
>  		 * be used until this transaction is committed
>  		 */
>  	retry:
> +		might_sleep();
>  		new_entry = kmem_cache_alloc(ext4_free_data_cachep, GFP_NOFS);
>  		if (!new_entry) {
>  			/*
> @@ -4756,6 +4758,7 @@ do_more:
>  		new_entry->efd_count = count_clusters;
>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>  
> +		might_sleep();
>  		ext4_lock_group(sb, block_group);
>  		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
> @@ -4773,7 +4776,7 @@ do_more:
>  					 err);
>  		}
>  
> -
> +		might_sleep();
>  		ext4_lock_group(sb, block_group);
>  		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
> @@ -4785,6 +4788,7 @@ do_more:
>  	ext4_group_desc_csum_set(sb, block_group, gdp);
>  	ext4_unlock_group(sb, block_group);
>  	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
> +	might_sleep();
>  
>  	if (sbi->s_log_groups_per_flex) {
>  		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> 
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
>From 0868c52e3a67c43b2e0f2eb3275e6d994b5f3bdd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@xxxxxxx>
Date: Thu, 28 Nov 2013 15:36:53 +0100
Subject: [PATCH] Debug irqs

Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/ext4/mballoc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 267c14e3a19e..d22f9690aaf6 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4760,8 +4760,11 @@ do_more:
 
 		might_sleep();
 		ext4_lock_group(sb, block_group);
+		WARN_ON(irqs_disabled());
 		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
+		WARN_ON(irqs_disabled());
 		ext4_mb_free_metadata(handle, &e4b, new_entry);
+		WARN_ON(irqs_disabled());
 	} else {
 		/* need to update group_info->bb_free and bitmap
 		 * with group lock held. generate_buddy look at
@@ -4779,15 +4782,23 @@ do_more:
 
 		might_sleep();
 		ext4_lock_group(sb, block_group);
+		WARN_ON(irqs_disabled());
 		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
+		WARN_ON(irqs_disabled());
 		mb_free_blocks(inode, &e4b, bit, count_clusters);
+		WARN_ON(irqs_disabled());
 	}
 
 	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
+	WARN_ON(irqs_disabled());
 	ext4_free_group_clusters_set(sb, gdp, ret);
+	WARN_ON(irqs_disabled());
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
+	WARN_ON(irqs_disabled());
 	ext4_group_desc_csum_set(sb, block_group, gdp);
+	WARN_ON(irqs_disabled());
 	ext4_unlock_group(sb, block_group);
+	WARN_ON(irqs_disabled());
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
 	might_sleep();
 
-- 
1.8.1.4


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux