Re: Blocking on a non-blocking socket?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
> From: "Wiebe Cazemier" <wiebe@xxxxxxxxxxxx>
> To: openssl-users@xxxxxxxxxxx
> Sent: Thursday, 23 May, 2024 12:22:31
> Subject: Blocking on a non-blocking socket?
>
> Hi List,
> 
> I have a very obscure problem with an application using O_NONBLOCK still
> blocking. Over the course of a year of running with hundreds of thousands of
> clients, it has happened twice over the last month that a worker thread froze.
> It's a long story, but I'm pretty sure it's not a deadlock or spinning event
> loop or something, primarily because the application recovers after about 20
> minutes with a client errorring out with ETIMEDOUT. Coincidentally, that 20
> minutes matches the timeout description of the tcp man page [1].
> 
> It really looks like a non-blocking socket is still blocking. I found something
> with a similar problem ([2]), but what they think of SSL_MODE_AUTO_RETRY does
> not match the documentation.
> 
> So, is there indeed any way an application that has SSL_MODE_AUTO_RETRY on
> (which is default since 1.1.1) can block? Looking at the source code, I don't
> see any calls to fcntl() that removes the O_NONBLOCK.
> 
> My IO method is SSL_read() and SSL_write() with an SSL object given to
> SSL_set_fd().
> 
> The only SSL modes I change from the default is that I set
> SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER.
> 
> There are two primary deployments of this application, one with OpenSSL 1.1.1
> and one with 3.0.0. Only 1.1.1 has shown this problem, but it may be a
> coincidence.
> 
> Side question, is it a problem to set SSL_set_fd() before using fcntl to set the
> fd to O_NONBLOCK? I ask, because the docs say "The BIO and hence the SSL engine
> inherit the behaviour of fd. If fd is non-blocking, the ssl will also have
> non-blocking behaviour.". The 'inherit' may be a key word here; not sure when
> it's done.
> 
> Regards,
> 
> Wiebe Cazemier


As a follow-up, the fault did turn out to be my own... As I imagine [1] is. They describe SSL_MODE_AUTO_RETRY 'attempts to renegotiate a broken SSL connection', but all SSL_MODE_AUTO_RETRY indeed really does is read multiple records at a time, without returning from read. 

Despite what I thought before, my code actually did have an unfortunate edge case where there was a while loop spinning on SSL_write() when there was no room in the socket. This would eventually fail with ETIMEDOUT.

Well, it was educational at least...


[1] https://github.com/alanxz/rabbitmq-c/issues/586






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux