Hello all,
Not sure this is exactly right list, so feel free to point me to some
other as appropriate.
While working on a higher-level binding to the libpq library, I've
(likely) discovered a problem with non-blocking operation in case of
using openssl. And, it looks so striking I'd like to share my observation.
For libpq, non-blocking operation is documented as a normal supported
feature, e.g. [1]
Now, openssl transport is also documented as a normal supported feature,
e.g. [2]
I have not found anywhere in documentaion any clear warnings that
non-blocking operation and openssl transport are mutually exclusive or
might not quite work as specified in any way.
From [1] we learn (through some intricate wording) that in order to
avoid blocking at PQgetResult() one can employ PQsetnonblocking(),
PQflush(), PQconsumeInput() and PQisBusy(), supposedly all of them
non-blocking after calling PQsetnonblocking(), although not stated
explicitely so, but otherwise it would make just no sence whatsoever, right?
Now lets have a look at e.g. PQconsumeInput():
===================
.....
/*
* Load more data, if available. We do this no matter what state we are
* in, since we are probably getting called because the application wants
* to get rid of a read-select condition. Note that we will NOT block
* waiting for more input.
*/
if (pqReadData(conn) < 0)
return 0;
/* Parsing of the data waits till later. */
return 1;
}
===================
It is stated that pqReadData() will NOT block. Now let's get inside:
===================
.....
/* OK, try to read some data */
retry3:
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
conn->inBufSize - conn->inEnd);
.....
/*
* Still not sure that it's EOF, because some data could have just
* arrived.
*/
retry4:
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
conn->inBufSize - conn->inEnd);
....
====================
Now in case of SSL, this pqsecure_read() is just a wrapper around
pgtls_read(), so lets look further:
====================
pgtls_read(PGconn *conn, void *ptr, size_t len)
{
.....
rloop:
SOCK_ERRNO_SET(0);
n = SSL_read(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)
{
......
break;
case SSL_ERROR_WANT_WRITE:
/* Returning 0 here would cause caller to wait for read-ready,
* which is not correct since what SSL wants is wait for
* write-ready. The former could get us stuck in an infinite
* wait, so don't risk it; busy-loop instead. */
goto rloop;
======================
So going PQconsumeInput()->pqReadData()->pqsecure_read()->pgtls_read()
in a supposedly non-blocking operation we finally come to a tight
busy-loop waiting for SSL_ERROR_WANT_WRITE to go down! How could such
thing ever be,
- with no even sleep(1),
- no timeout,
- no diagnostics of any sort,
- a comment implying that getting stuck in a (potentially) infinite
sleepless loop deep inside a library is OK.
And looking more into this pgtls_read() function it seems it just has
inadequate interface. So that it has really no way to reliably indicate
some important details to its caller, namely the need to wait for
write-readyness. It's like if ssl support was a quick-n-dirty hack
rather than a consistently integrated feature. Or do I read it all wrong?
Any thoughts?
[1] https://www.postgresql.org/docs/9.5/static/libpq-async.html
[2] https://www.postgresql.org/docs/9.5/static/libpq-ssl.html
Thank you,
Regards,
Nikolai
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general