On 18/06/2020 20:18, Kurt Roeckx wrote: > On Thu, Jun 18, 2020 at 07:24:39PM +0200, Kurt Roeckx wrote: >> >> Now that a large fraction of the cost has been found, I can look >> again to see where the biggest cost in 3.0 comes from now and if we >> can do something about it. > > So a code path that I've noticed before when looking at the > profile is: > /* TODO(3.0): Remove this eventually when no more legacy */ > if (ctx->op.sig.sigprovctx == NULL) > return EVP_PKEY_CTX_ctrl(ctx, -1, EVP_PKEY_OP_TYPE_SIG, > EVP_PKEY_CTRL_MD, 0, (void *)(md)); > > I think that is now actually causing most of the CPU usage. > > This currently ends up doing an EVP_MAC_dup_ctx(), and I'm > currently not sure why, and what the effect is going to be > when sigprovctx != NULL. I think it might be better to wait until > someone fixes that before I look at that again. I looked into what was going on here. The EVP_PKEY -> EVP_MAC bridge is implemented as a *legacy* EVP_PKEY_METHOD, i.e. the conversion from EVP_PKEY -> EVP_MAC happens in libcrypto *before* it hits any provider. So in the above code "ctx->op.sig.signprovctx" will *always* be NULL because we are using the bridge. The answer to why we have the EVP_MAC_dup_ctx() lies in the implementation of EVP_PKEY_new_CMAC_key(). In EVP_MAC terms the cipher and key to be used are parameters set on an EVP_MAC_CTX - there is no long term "key" object to store these in. By contrast an EVP_PKEY considers these part of the long term "key" that can be reused in multiple EVP_PKEY_CTX operations. To resolve this difference in approach the EVP_PKEY -> MAC bridge creates an EVP_MAC_CTX during construction of the EVP_PKEY and sets the cipher and key parameters on it. Then, every time we do an EVP_DigestSignInit() call we create a new EVP_PKEY_CTX and "dup" the EVP_MAC_CTX from the EVP_PKEY. In this way we can reuse the same EVP_PKEY in many EVP_DigestSign*() operations. That all seems to work but has the impact that if you only ever create the EVP_PKEY, use it once and then throw it away then we have to create the underlying EVP_MAC_CTX and then dup it (never actually using the original EVP_MAC_CTX for anything other than a template for the subsequent dup). I find it slightly surprising that EVP_MAC_dup_ctx() is quite so expensive. It mainly seems to end up doing a CMAC_CTX_copy() so I guess this is where the time is going? Matt