On Wed, Oct 09, 2019 at 05:35:29PM +0300, Jarkko Sakkinen wrote: > On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote: > > Baseline before adding Sean's updates. This contains only my updates. I > > spent this day mostly fixing diff's. Especially these two were somewhat > > unclean: > > > > 1. x86/sgx: Add a page reclaimer > > 2. x86/sgx: Linux Enclave Driver > > > > Now they pile up nicely (I think). So I decided to do this tag since now > > commit's in the sense of form and shape are legit. And also because > > things, well, work. > > > > I'll continue from this by integrating Sean's changes. You can see below > > what has been already changed. > > > > /Jarkko > > > > tag v23-rc1 > > Tagger: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> > > Date: Wed Oct 9 16:59:10 2019 +0300 > > > > x86/sgx: v23-rc1 patch set > > > > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page. > > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail > > (because of OOM) even in legit behaviour and after EBLOCK the reclaiming > > flow can be only reverted by killing the whole enclave. > > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact > > it should have been bit 6 (Table 37-3 in the SDM). > > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX > > error code. > > -----BEGIN PGP SIGNATURE----- > > > > iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh > > a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY > > b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW > > vCGUvRAhgwY= > > =LCI3 > > -----END PGP SIGNATURE----- > > > > /Jarkko > > Getting this with rc1 (after running selftest). Leaving from office. > No time to check this today but here are anyway logs. > > [ 96.906523] ============================================ > [ 96.906600] WARNING: possible recursive locking detected > [ 96.906679] 5.4.0-rc1-custom #66 Not tainted > [ 96.906741] -------------------------------------------- > [ 96.906817] test_sgx/1297 is trying to acquire lock: > [ 96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0 > [ 96.907009] > but task is already holding lock: > [ 96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620 > [ 96.907217] > other info that might help us debug this: > [ 96.907308] Possible unsafe locking scenario: > > [ 96.907391] CPU0 > [ 96.907428] ---- > [ 96.907464] lock(&mm->mmap_sem#2); > [ 96.907516] lock(&mm->mmap_sem#2); > [ 96.907569] > *** DEADLOCK *** > > [ 96.907650] May be due to missing lock nesting notation > > [ 96.907745] 2 locks held by test_sgx/1297: > [ 96.907804] #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620 > [ 96.907935] #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620 > [ 96.910109] > stack backtrace: > [ 96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66 > [ 96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018 > [ 96.921795] Call Trace: > [ 96.925209] dump_stack+0x8e/0xd5 > [ 96.928462] __lock_acquire+0xeab/0x1470 > [ 96.931648] ? __do_fault+0x57/0x11d > [ 96.934761] lock_acquire+0xa3/0x180 > [ 96.937820] ? __do_page_fault+0x424/0x4f0 > [ 96.940866] down_read+0x30/0x150 > [ 96.943867] ? __do_page_fault+0x424/0x4f0 > [ 96.946889] __do_page_fault+0x424/0x4f0 > [ 96.949792] do_page_fault+0x2c/0x1a0 > [ 96.952602] page_fault+0x39/0x40 > [ 96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620 > [ 96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0 > [ 96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286 > [ 96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000 > [ 96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246 > [ 96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000 > [ 96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000 > [ 96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0 > [ 96.983425] ? avc_has_extended_perms+0x1f6/0x610 > [ 96.986830] sgx_ioctl+0x87/0x470 > [ 96.990247] ? sgx_ioctl+0x87/0x470 > [ 96.993696] do_vfs_ioctl+0xa9/0x6d0 > [ 96.997151] ? tomoyo_file_ioctl+0x19/0x20 > [ 97.000571] ksys_ioctl+0x75/0x80 > [ 97.003983] ? do_syscall_64+0x17/0x230 > [ 97.007285] __x64_sys_ioctl+0x1a/0x20 > [ 97.010487] do_syscall_64+0x5f/0x230 > [ 97.013612] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 97.016791] RIP: 0033:0x7f2d79e135d7 > [ 97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 > [ 97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 > [ 97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7 > [ 97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003 > [ 97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000 > [ 97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0 > [ 97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000 > [ 681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) > > $ sudo cat /sys/kernel/debug/kmemleak > [sudo] password for jsakkine: > unreferenced object 0xffff990325b69eb0 (size 16): > comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s) > hex dump (first 16 bytes): > 6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00 memstick0....... > backtrace: > [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280 > [<00000000a5374cb0>] kstrdup+0x31/0x60 > [<00000000c59be911>] kstrdup_const+0x24/0x30 > [<00000000ff88e957>] kvasprintf_const+0x86/0xa0 > [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90 > [<00000000839b8dd7>] dev_set_name+0x4e/0x70 > [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick] > [<00000000dffb0c9f>] process_one_work+0x281/0x5c0 > [<00000000090981e2>] worker_thread+0x34/0x400 > [<00000000bb117b3c>] kthread+0x121/0x140 > [<000000004d2f4c32>] ret_from_fork+0x24/0x50 > > /Jarkko The locking order is all wrong: up_read(¤t->mm->mmap_sem); if (ret) goto err_out; ret = __sgx_encl_extend(encl, epc_page, addp->mrmask); if (ret) goto err_out; encl_page->encl = encl; encl_page->epc_page = epc_page; encl->secs_child_cnt++; sgx_mark_page_reclaimable(encl_page->epc_page); mutex_unlock(&encl->lock); Sean: what might be reason for this? Probably is caused by add page worker changes. Is this just something that has happend when squashing patches by accident? /Jarkko