On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@xxxxxxxx wrote: > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > a Nuvoton TPM. > > > > > > Presumably this is using the tis driver? > > > > Correct. > > > > > > As mentioned by the reporter all TPM2 commands are failing > > > > with: > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > device.c:290:tcti_device_receive() > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > permitted > > > > > > > > The reporter bisected this issue back to this commit which was > > > > backported to stable as commit 4d6ebc4. > > > > > > I think the problem is request_locality ... for some inexplicable > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > That seems to be a bug in the async code since everything else > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > assumes it gives back a sensible return code. > > > > > > What I think is happening is that with the patch the TPM goes > > > through a quick sequence of request, relinquish, request, > > > relinquish and it's the third request which is failing (likely > > > timing out). Without the patch, the patch there's only one > > > request,relinquish cycle because the ops are held while the async > > > work is executed. I have a vague recollection that there is a > > > problem with too many locality request in quick succession, but > > > I'll defer to Jason, who I think understands the intricacies of > > > localities better than I do. > > > > Thanks, I don't pretend to understand the nuances of this > > particular > > code, but I was hoping that the request to revert got some > > attention > > since Alex's kernel Bugzilla and message a few months ago to linux > > integrity weren't. > > > > > If that's the problem, the solution looks simple enough: just > > > move > > > the ops get down because the priv state is already protected by > > > the > > > buffer mutex > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > better solution than reverting this patch as this patch actually > > does > > fix a problem reported by Jeffrin originally. > > > > Could you propose a specific patch that Alex and Jeffrin can > > perhaps > > both try? > > Um, what's wrong with the one I originally attached and which you > quote > below? It's only compile tested, but I think it will work, if the > theory is correct. > > James > > > > James > > > > > > --- > > > > > > diff --git a/drivers/char/tpm/tpm-dev-common.c > > > b/drivers/char/tpm/tpm-dev- > > > common.c > > > index 87f449340202..1784530b8387 100644 > > > --- a/drivers/char/tpm/tpm-dev-common.c > > > +++ b/drivers/char/tpm/tpm-dev-common.c > > > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, > > > const char > > > __user *buf, > > > goto out; > > > } > > > > > > - /* atomic tpm command send and result receive. We only > > > hold the ops > > > - * lock during this period so that the tpm can be > > > unregistered even if > > > - * the char dev is held open. > > > - */ > > > - if (tpm_try_get_ops(priv->chip)) { > > > - ret = -EPIPE; > > > - goto out; > > > - } > > > - > > > priv->response_length = 0; > > > priv->response_read = false; > > > *off = 0; > > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > > > const char > > > __user *buf, > > > if (file->f_flags & O_NONBLOCK) { > > > priv->command_enqueued = true; > > > queue_work(tpm_dev_wq, &priv->async_work); > > > - tpm_put_ops(priv->chip); > > > mutex_unlock(&priv->buffer_mutex); > > > return size; > > > } > > > > > > + /* atomic tpm command send and result receive. We only > > > hold the ops > > > + * lock during this period so that the tpm can be > > > unregistered even if > > > + * the char dev is held open. > > > + */ > > > + if (tpm_try_get_ops(priv->chip)) { > > > + ret = -EPIPE; > > > + goto out; > > > + } > > > + > > > ret = tpm_dev_transmit(priv->chip, priv->space, priv- > > > > data_buffer, > > > > > > sizeof(priv->data_buffer)); > > > tpm_put_ops(priv->chip); When using your patch, I get a hang when trying to use tpm2_getcap, and dmesg shows some info.
[ 570.913779] BUG: unable to handle page fault for address: ffffb20001247000 [ 570.913782] #PF: supervisor write access in kernel mode [ 570.913783] #PF: error_code(0x0002) - not-present page [ 570.913784] PGD 0 P4D 0 [ 570.913785] Oops: 0002 [#3] SMP PTI [ 570.913787] CPU: 6 PID: 24744 Comm: tpm2_getcap Tainted: G UD 5.7.0-rc7+ #31 [ 570.913788] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.18.0 11/17/2019 [ 570.913791] RIP: 0010:iowrite8+0x9/0x50 [ 570.913792] Code: 48 c7 c2 40 43 9f 99 48 89 04 24 e8 14 a7 90 ff 0f 0b 48 8b 04 24 48 83 c4 08 c3 66 0f 1f 44 00 00 48 81 fe ff ff 03 00 76 04 <40> 88 3e c3 48 81 fe 00 00 01 00 76 07 0f b7 d6 89 f8 ee c3 8b 05 [ 570.913793] RSP: 0018:ffffb1ff049d7db0 EFLAGS: 00010292 [ 570.913794] RAX: ffffffff981bf520 RBX: ffffb1ff049d7df9 RCX: ffffb1ff049d7df8 [ 570.913795] RDX: 0000000000000001 RSI: ffffb20001247000 RDI: 0000000000000020 [ 570.913796] RBP: ffffb1ff049d7df9 R08: 0000000000000000 R09: ffff8b80de5370f0 [ 570.913797] R10: 0000000000b71b00 R11: 000000000000028f R12: ffff8b80b148cda8 [ 570.913797] R13: 00000000fffff000 R14: ffff8b80b148cda8 R15: ffff8b80cb44a0ba [ 570.913799] FS: 00007f78f7cd0d80(0000) GS:ffff8b80de500000(0000) knlGS:0000000000000000 [ 570.913799] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.913800] CR2: ffffb20001247000 CR3: 0000000795618001 CR4: 00000000003606e0 [ 570.913801] Call Trace: [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40 [ 570.913805] release_locality+0x49/0x220 [ 570.913807] tpm_relinquish_locality+0x1f/0x40 [ 570.913808] tpm_chip_stop+0x21/0x40 [ 570.913810] tpm_put_ops+0x9/0x30 [ 570.913811] tpm_common_write+0x179/0x190 [ 570.913813] vfs_write+0xb1/0x1a0 [ 570.913815] ksys_write+0x5a/0xd0 [ 570.913816] do_syscall_64+0x43/0x130 [ 570.913819] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 570.913820] RIP: 0033:0x7f78f7e00123 [ 570.913821] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 [ 570.913822] RSP: 002b:00007fff724e8c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 570.913823] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f78f7e00123 [ 570.913824] RDX: 0000000000000016 RSI: 0000564cf24a7220 RDI: 0000000000000003 [ 570.913825] RBP: 0000000000000016 R08: 00007f78f7ccc785 R09: 00007f78f7ccca40 [ 570.913826] R10: 00007fff724e8b10 R11: 0000000000000246 R12: 0000564cf24a7220 [ 570.913826] R13: 0000000000000000 R14: 0000000000000016 R15: 00007f78f7ccc890 [ 570.913827] Modules linked in: squashfs rtsx_pci_sdmmc x86_pkg_temp_thermal coretemp rtsx_pci mfd_core [ 570.913831] CR2: ffffb20001247000 [ 570.913832] ---[ end trace c84437b00f0d01a0 ]--- [ 570.913833] RIP: 0010:iowrite8+0x9/0x50 [ 570.913834] Code: 48 c7 c2 40 43 9f 99 48 89 04 24 e8 14 a7 90 ff 0f 0b 48 8b 04 24 48 83 c4 08 c3 66 0f 1f 44 00 00 48 81 fe ff ff 03 00 76 04 <40> 88 3e c3 48 81 fe 00 00 01 00 76 07 0f b7 d6 89 f8 ee c3 8b 05 [ 570.913835] RSP: 0018:ffffb1ff030b7db0 EFLAGS: 00010292 [ 570.913836] RAX: ffffffff981bf520 RBX: ffffb1ff030b7df9 RCX: ffffb1ff030b7df8 [ 570.913837] RDX: 0000000000000001 RSI: ffffb20001247000 RDI: 0000000000000020 [ 570.913837] RBP: ffffb1ff030b7df9 R08: 0000000000000000 R09: ffff8b80de2370f0 [ 570.913838] R10: 0000000000b71b00 R11: 000000000000019c R12: ffff8b80b148cda8 [ 570.913839] R13: 00000000fffff000 R14: ffff8b80b148cda8 R15: ffff8b80c4cfc0ba [ 570.913840] FS: 00007f78f7cd0d80(0000) GS:ffff8b80de500000(0000) knlGS:0000000000000000 [ 570.913840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.913841] CR2: ffffb20001247000 CR3: 0000000795618001 CR4: 00000000003606e0