On 06/08/2024 13:07, Tariq Toukan wrote:
On 06/08/2024 11:09, Sagi Grimberg wrote:
On 06/08/2024 7:43, Tariq Toukan wrote:
On 05/08/2024 14:43, Sagi Grimberg wrote:
On 05/08/2024 13:40, Tariq Toukan wrote:
Hi,
A recent patch [1] to 'fs' broke the TX TLS device-offloaded flow
starting from v6.11-rc1.
The kernel crashes. Different runs result in different kernel traces.
See below [2].
All of them disappear once patch [1] is reverted.
The issues appears only with "sendfile on and zerocopy on".
We couldn't repro with "sendfile off", or with "sendfile on and
zerocopy off".
The repro test is as simple as a repeated client/server
communication (wrk/nginx), with sendfile on and zc on, and with
"tls-hw-tx-offload: on".
$ for i in `seq 10`; do wrk -b::2:2:2:3 -t10 -c100 -d15 --timeout
5s https://[::2:2:2:2]:20448/16000b.img; done
We can provide more details if needed, to help with the analysis
and debug.
Does tls sw (i.e. no offload) also break?
No it doesn't.
Only the "sendfile with ZC" flow of the TX device-offloaded TLS.
Adding Maxim Mikityanskiy, he might have some insights.
Not familiar with the TLS offload code, are there any assumptions on
PAGE_SIZE contig buffers? Or assumptions on individual
page references/lifetime?
The sporadic panics you reported look like a result of memory
corruption or use-after-free conditions.
You can find the original patch that implements it here:
c1318b39c7d3 tls: Add opt-in zerocopy mode of sendfile()
In this flow (sendfile + ZC), page is shared for kernel and userspace,
and the extra copy is skipped.
There were a few code changes in this area since the feature was introduced.
Adding relevant ppl, including David Howells <dhowells@xxxxxxxxxx>, who
removed the sendpage() routine and added MSG_SPLICE_PAGES support to
tls_device.
Regards,
Tariq