> Betreff: BUG() in ecryptfs_send_miscdev() > > Hi, > > I have a kernel BUG() with ecryptfs and I was hoping to get some help with > debugging the problem. I run a Gentoo Linux kernel version 3.14.23. I have a > BUG() trace as well which shows that the problem is in an ecryptfs kernel > function. Looking at the code, it seems as if the problem occurs when > ecryptfs kernelspace is trying to send an encrypted blob to the userspace > daemon ecryptfsd. It attaches the message context to the daemon's queue, > but that message context is already attached to that list. So the kernel prints > a warning about a double add. Has anyone else ever seen this problem? I run > ecryptfs-utils version 104. > > Many thanks for your help. > > Attempt to access file with crypto metadata only in the extended attribute > region, but eCryptfs was mounted without xattr support ena0 list_add > double add: new=ffff880407e41038, prev=ffff880407e41038, > next=ffff880405cecf40. > ------------[ cut here ]------------ > kernel BUG at lib/list_debug.c:45! > invalid opcode: 0000 [#1] PREEMPT SMP > Modules linked in: microcode vboxnetadp(O) vboxnetflt(O) vboxdrv(O) > CPU: 3 PID: 6468 Comm: ShFolders Tainted: G O 3.14.23 #3 Hardware name: > Hewlett-Packard HP ProDesk 600 G1 SFF/18E7, BIOS L01 v02.30 04/22/2014 > task: ffff8804048a4ec0 ti: ffff8804048a5458 task.ti: ffff8804048a5458 > RIP: 0010:[<ffffffff812f07fc>] [<ffffffff812f07fc>] > __list_add_debug+0x65/0x6b > RSP: 0018:ffff8803cdbd79f0 EFLAGS: 00010296 > RAX: 0000000000000058 RBX: ffff880407e41038 RCX: 0000000000000007 > RDX: 0000000000000006 RSI: 0000000000000046 RDI: ffff88041eacc030 > RBP: ffff8803cdbd79f0 R08: 00000000c5bc912d R09: ffffffff82080710 > R10: 00000000ffffff66 R11: 0000000000000000 R12: ffff880407e41038 > R13: ffff880405cecf40 R14: ffff880405cecf00 R15: 0000000000000066 > FS: 00007f26b17d6700(0000) GS:ffff88041eac0000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f26a803c000 CR3: 00000003cfb2c000 CR4: 00000000001627f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Stack: > ffff8803cdbd7a18 ffffffff812f0819 ffff880405cecf10 000000000000021c > 0000000000000214 ffff8803cdbd7a68 ffffffff8122df33 ffff8804082d5c00 > ffff880407e41048 ffff8804082d0c00 ffff880407e41000 ffff8803cdbd7ac8 Call > Trace: > [<ffffffff812f0819>] __list_add+0x17/0x31 [<ffffffff8122df33>] > ecryptfs_send_miscdev+0xb1/0xf7 [<ffffffff8122d3ed>] > ecryptfs_send_message+0x114/0x17a [<ffffffff8122a350>] > decrypt_pki_encrypted_session_key+0x12a/0x3b9 > [<ffffffff8122bec9>] ecryptfs_parse_packet_set+0x753/0x903 > [<ffffffff81229132>] ecryptfs_read_headers_virt.part.26+0xfc/0x107 > [<ffffffff8122930c>] ecryptfs_read_metadata+0xf0/0x1b6 > [<ffffffff81224a20>] ecryptfs_open+0x1ae/0x24e [<ffffffff81224872>] ? > ecryptfs_release+0x25/0x25 [<ffffffff8116ad7a>] > do_dentry_open.isra.15+0x18d/0x235 > [<ffffffff8116ae3d>] finish_open+0x1b/0x25 [<ffffffff81178c02>] > do_last+0xa3b/0xccc [<ffffffff811762cf>] ? link_path_walk+0x56/0x765 > [<ffffffff811790d3>] path_openat+0x240/0x626 [<ffffffff812490fd>] ? > cifs_getattr+0x88/0xfb [<ffffffff8117a2ab>] do_filp_open+0x35/0x7a > [<ffffffff811852e0>] ? __alloc_fd+0xdd/0xed [<ffffffff8116bd6a>] > do_sys_open+0x142/0x1d1 [<ffffffff810cc093>] ? > vtime_account_user+0x4d/0x52 [<ffffffff8116be12>] SyS_open+0x19/0x1b > [<ffffffff8180de62>] tracesys+0xd1/0xd6 > Code: 81 31 c0 e8 cf 05 51 00 0f 0b 48 39 d7 74 05 48 39 c7 75 19 48 89 d1 48 89 > fe 48 89 c2 48 c7 c7 e5 f8 d6 81 31 c0 e8 ac 05 51 0 RIP [<ffffffff812f07fc>] > __list_add_debug+0x65/0x6b RSP <ffff8803cdbd79f0> ---[ end trace > 32deecf4531f4b92 ]--- Kernel panic - not syncing: Fatal exception Kernel > Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000- > 0xffffffff9fffffff) > drm_kms_helper: panic occurred, switching back to text console After some more debugging I have found out that the problem occurs when a timeout happens in ecryptfs_wait_for_response(). In my case, the ecryptfsd does not manage to reply in time with the decrypted session key. Looking through the code, it seems to me as if there is a list_del() missing in that function. Instead, you are doing a ecryptfs_msg_ctx_alloc_to_free(msg_ctx); to put the message context back on the free list, but you have never removed it from the list. Is that possible? If a timeout occurs, would you not have to still remove the message context from the &daemon->msg_ctx_out_queue? As a workaround, I can prevent the BUG() from occurring by upping the ecryptfs_message_wait_timeout kernel module parameter. Regards, Anna -- To unsubscribe from this list: send the line "unsubscribe ecryptfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html