On Tue, Feb 28, 2023 at 12:53 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Mon, 27 Feb 2023 21:35:41 +0100 Eric Dumazet wrote: > > This looks suspicious to me > > > > commit 79ffe6087e9145d2377385cac48d0d6a6b4225a5 > > Author: Jakub Kicinski <kuba@xxxxxxxxxx> > > Date: Tue Nov 5 14:24:35 2019 -0800 > > > > net/tls: add a TX lock > > > > > > If tls_sw_sendpage() has to call sk_stream_wait_memory(), > > sk_stream_wait_memory() is properly releasing the socket lock, > > but knows nothing about mutex_{un}lock(&tls_ctx->tx_lock); > > That's supposed to be the point of the lock, prevent new writers from > messing with the partially pushed records when the original writer > is waiting for write space. > > Obvious hack but the async crypto support makes TLS a bit of a mess :| > > sendpage_lock not taking tx_lock may lead to obvious problems, I'm not > seeing where the deadlock is, tho.. > This report mentions sendpage, but sendmsg() would have the same issue. A thread might be blocked in sk_stream_wait_memory() with the mutex held, for an arbitrary amount of time, say if the remote peer stays in RWIN 0 for hours. This prevents tx_work from making progress, and tls_sw_cancel_work_tx() would be stuck forever. The consensus is that the kernel shouts a warning if a thread has been waiting on a mutex more than 120 seconds (check_hung_uninterruptible_tasks())