On 10/30/2013 08:42 AM, Theodore Ts'o wrote: > I tried running xfstests with this patch, and it blew up on > generic/020 test: > > generic/020 [10:21:50][ 105.170352] ------------[ cut here ]------------ > [ 105.171683] kernel BUG at /usr/projects/linux/ext4/include/linux/bit_spinlock.h:76! > [ 105.173346] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 105.173346] Modules linked in: > [ 105.173346] CPU: 1 PID: 8519 Comm: attr Not tainted 3.12.0-rc5-00008-gffbe1d7-dirty #1492 > [ 105.173346] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 105.173346] task: f5abe560 ti: f2274000 task.ti: f2274000 > [ 105.173346] EIP: 0060:[<c026b464>] EFLAGS: 00010246 CPU: 1 > [ 105.173346] EIP is at hlist_bl_unlock+0x7/0x1c > [ 105.173346] EAX: f488d360 EBX: f488d360 ECX: 00000000 EDX: f2998800 > [ 105.173346] ESI: f29987f0 EDI: 6954c848 EBP: f2275cc8 ESP: f2275cb8 > [ 105.173346] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > [ 105.173346] CR0: 80050033 CR2: b76bcf54 CR3: 34844000 CR4: 000006f0 > [ 105.173346] Stack: > [ 105.173346] c026bc78 f2275d48 6954c848 f29987f0 f2275d24 c02cd7a9 f2275ce4 c02e2881 > [ 105.173346] f255d8c8 00000000 f1109020 f4a67f00 f2275d54 f2275d08 c02cd020 6954c848 > [ 105.173346] f4a67f00 f1109000 f2b0eba8 f2ee3800 f2275d28 f4f811e8 f2275d38 00000000 > [ 105.173346] Call Trace: > [ 105.173346] [<c026bc78>] ? mb_cache_entry_find_first+0x4b/0x55 > [ 105.173346] [<c02cd7a9>] ext4_xattr_block_set+0x248/0x6e7 > [ 105.173346] [<c02e2881>] ? jbd2_journal_put_journal_head+0xe2/0xed > [ 105.173346] [<c02cd020>] ? ext4_xattr_find_entry+0x52/0xac > [ 105.173346] [<c02ce307>] ext4_xattr_set_handle+0x1c7/0x30f > [ 105.173346] [<c02ce4f4>] ext4_xattr_set+0xa5/0xe1 > [ 105.173346] [<c02ceb36>] ext4_xattr_user_set+0x46/0x5f > [ 105.173346] [<c024a4da>] generic_setxattr+0x4c/0x5e > [ 105.173346] [<c024a48e>] ? generic_listxattr+0x95/0x95 > [ 105.173346] [<c024ab0f>] __vfs_setxattr_noperm+0x56/0xb6 > [ 105.173346] [<c024abd2>] vfs_setxattr+0x63/0x7e > [ 105.173346] [<c024ace8>] setxattr+0xfb/0x139 > [ 105.173346] [<c01b200a>] ? __lock_acquire+0x540/0xca6 > [ 105.173346] [<c01877a3>] ? lg_local_unlock+0x1b/0x34 > [ 105.173346] [<c01af8dd>] ? trace_hardirqs_off_caller+0x2e/0x98 > [ 105.173346] [<c0227e69>] ? kmem_cache_free+0xd4/0x149 > [ 105.173346] [<c01b2c2b>] ? lock_acquire+0xdd/0x107 > [ 105.173346] [<c023225e>] ? __sb_start_write+0xee/0x11d > [ 105.173346] [<c0247383>] ? mnt_want_write+0x1e/0x3e > [ 105.173346] [<c01b3019>] ? trace_hardirqs_on_caller+0x12a/0x17e > [ 105.173346] [<c0247353>] ? __mnt_want_write+0x4e/0x60 > [ 105.173346] [<c024af3b>] SyS_lsetxattr+0x6a/0x9f > [ 105.173346] [<c078d0e8>] syscall_call+0x7/0xb > [ 105.173346] Code: 00 00 00 00 5b 5d c3 55 89 e5 53 3e 8d 74 26 00 8b 58 08 89 c2 8b 43 18 e8 3f c9 fb ff f0 ff 4b 0c 5b 5d c3 8b 10 80 e2 01 75 02 <0f> 0b 55 89 e5 0f ba 30 00 89 e0 25 00 e0 ff ff ff 48 14 5d c3 > [ 105.173346] EIP: [<c026b464>] hlist_bl_unlock+0x7/0x1c SS:ESP 0068:f2275cb8 > [ 105.273781] ---[ end trace 1ee45ddfc1df0935 ]--- > > When I tried to find a potential problem, I immediately ran into this. > I'm not entirely sure it's the problem, but it's raised a number of > red flags for me in terms of (a) how much testing you've employed with > this patch set, and (b) how maintaining and easy-to-audit the code > will be with this extra locking. The comments are good start, but > some additional comments about exactly what assumptions a function > assumes about locks that are held on function entry, or especially if > the locking is different on function entry and function exit, might > make it easier for people to audit this patch. > > Or maybe this commit needs to be split up with first a conversion from > using list_head to hlist_hl_node, and the changing the locking? The > bottom line is that we need to somehow make this patch easier to > validate/review. > Thanks for the comemnts. Yes, I did run through xfstests. My guess is that you probably ran into a race condition that I did not. I will try to port the patch to a more recent kernel, including the mb_cache_shrink_scan() sent earlier (BTW, it looks good) and debug the problem. Yes, those are good suggestions. Once I find the problem, I will resubmit with more comments and also split it into two patches, as suggested. >> @@ -520,18 +647,23 @@ __mb_cache_entry_find(struct list_head *l, struct list_head *head, >> ce->e_queued++; >> prepare_to_wait(&mb_cache_queue, &wait, >> TASK_UNINTERRUPTIBLE); >> - spin_unlock(&mb_cache_spinlock); >> + hlist_bl_unlock(head); >> schedule(); >> - spin_lock(&mb_cache_spinlock); >> + hlist_bl_lock(head); >> + mb_assert(ce->e_index_hash_p == head); >> ce->e_queued--; >> } >> + hlist_bl_unlock(head); >> finish_wait(&mb_cache_queue, &wait); >> >> - if (!__mb_cache_entry_is_hashed(ce)) { >> + hlist_bl_lock(ce->e_block_hash_p); >> + if (!__mb_cache_entry_is_block_hashed(ce)) { >> __mb_cache_entry_release_unlock(ce); >> - spin_lock(&mb_cache_spinlock); >> + hlist_bl_lock(head); > > <--- are we missing a "hlist_bl_unlock(ce->e_block_hash_p);" here? > > <---- Why is it that we call "hlist_bl_lock(head);" but not in the else clause? The function __mb_cache_entry_release_unlock() releases both the e_block_hash and e_index_hash locks. So we have to reacquire the e_index_hash (head) lock in the then part, and release the e_block_hash lock in the else part. > >> return ERR_PTR(-EAGAIN); >> - } >> + } else >> + hlist_bl_unlock(ce->e_block_hash_p); >> + mb_assert(ce->e_index_hash_p == head); >> return ce; >> } >> l = l->next; > > > Cheers, > > - Ted > Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html