On 8/19/20 5:29 AM, Matt Caswell wrote: > We should really have a proper callback for this purpose. PRs welcome! > (Doesn't help you right now though). Thank you for a prompt, thoughtful, and useful response. I believe that we are on the same page as far as async API overall intentions, and I am also very glad to hear that the OpenSSL team may welcome an addition of a proper callback to address Squid's use case. I know Squid needs are not unique. I do not yet know whether I can contribute (or facilitate contribution of) such an enhancement, but this green light is meaningful progress already! >> "Somewhat counter-intuitively, the OpenSSL async API is meant for >> activities that can work correctly _without_ becoming asynchronous (i.e. >> without being paused to temporary give way to other activities)" > I have no idea what you mean by this. Sorry for not detailing this accusation. I was worried that my email was already too long/verbose... I will detail it below. > The whole point of ASYNC_pause_job() is to temporarily give way to > other activities. Yes, giving way to other activities is the whole point of the async API optimization. Unfortunately, being only an optimization, the API is not enough when the callback MUST "give way to other activities". AFAICT, OpenSSL does not guarantee that ASYNC_pause_job() in a callback will actually "give way to other activities" because OpenSSL does guarantee that some engine or OpenSSL native code does not hold a ASYNC_block_pause() "lock" while calling the callback. The engine code that async API supports well may look like this[1]: myengine() { while (!something_happened()) ASYNC_pause_job(); // application MAY get control here ... use something ... } The callback code that async API does not support looks like this: mycallback() { if (!something_happened()) ASYNC_pause_job(); // application MUST get control here assert(something_happened()); ... use something ... } Please note that replacing "if" with "while" in mycallback() would make the compiled code identical with myengine() but would not solve the problem: Instead of the failed assertion, the callback would get into an infinite loop... The callback _relies_ on the application making progress (e.g., fetching the missing intermediate certificates or declaring a fetch failure before resuming SSL_connect()). This callback cannot work correctly without the application actually getting control. That is why the pausing call comments are different: MAY vs. MUST. Does this clarify what I meant? Do you agree that OpenSSL async API is not suitable for callbacks that _require_ ASYNC_pause_job() to return control to the application? [1] This myengine() example is inspired by your explanation at https://mta.openssl.org/pipermail/openssl-dev/2015-October/003031.html > ASYNC_block_pause() ... does not appear in the library code True, but it did appear there in the past, right? I am looking at commit 625146d as an example. Those calls were removed more than a year later in 75e2c87 AFAICT, but I see no guarantee that they will not reappear again. And even if OpenSSL now has a policy against using ASYNC_block_pause() internally, or a policy against holding an ASYNC_block_pause() "lock" while calling any callback, some custom engine might do that at the "wrong" for the above mycallback() moment, right? If you think that fears about something inside OpenSSL/engines preventing our callback from returning control to the application are unfounded, then using async API may be the best long-term solution for Squid. Short-term, it does not work "as is" because OpenSSL STACKSIZE appears to be too small (leading to weird crashes that disappear if we increase STACKSIZE from 32768 to 524288 bytes), but perhaps we can somehow hack around that. > One possibility that springs to mind (which is also an ugly hack) is to > defer the validation of the certificates. So, you have a verify callback > that always says "ok". But any further reads on the underlying BIO > always return with "retry" until such time as any intermediate > certificates have been fetched and the chain has been verified "for > real". The main problem I can see with this approach is there is no easy > way to send the right alert back to the server in the event of failure. We were also concerned that X509_verify_cert() is not enough to fully mimic the existing OpenSSL certificate validation procedure because the internal OpenSSL ssl_verify_cert_chain() does not just call X509_verify_cert(). It also does some DANE-related manipulations, for example. Are those fears unfounded? In other words, is calling X509_verify_cert() directly always enough to make the right certificate validation decision? Thanks a lot, Alex.