On Thu, Dec 14, 2023 at 12:19:43PM +0300, Arseniy Krasnov wrote:
Hello, DESCRIPTION This patchset fixes old problem with hungup of both rx/tx sides and adds test for it. This happens due to non-default SO_RCVLOWAT value and deferred credit update in virtio/vsock. Link to previous old patchset: https://lore.kernel.org/netdev/39b2e9fd-601b-189d-39a9-914e5574524c@xxxxxxxxxxxxxx/ Here is what happens step by step: TEST INITIAL CONDITIONS 1) Vsock buffer size is 128KB. 2) Maximum packet size is also 64KB as defined in header (yes it is hardcoded, just to remind about that value). 3) SO_RCVLOWAT is default, e.g. 1 byte. STEPS SENDER RECEIVER 1) sends 128KB + 1 byte in a single buffer. 128KB will be sent, but for 1 byte sender will wait for free space at peer. Sender goes to sleep. 2) reads 64KB, credit update not sent 3) sets SO_RCVLOWAT to 64KB + 1 4) poll() -> wait forever, there is only 64KB available to read. So in step 4) receiver also goes to sleep, waiting for enough data or connection shutdown message from the sender. Idea to fix it is that rx kicks tx side to continue transmission (and may be close connection) when rx changes number of bytes to be woken up (e.g. SO_RCVLOWAT) and this value is bigger than number of available bytes to read. I've added small test for this, but not sure as it uses hardcoded value for maximum packet length, this value is defined in kernel header and used to control deferred credit update. And as this is not available to userspace, I can't control test parameters correctly (if one day this define will be changed - test may become useless). Head for this patchset is: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9bab51bd662be4c3ebb18a28879981d69f3ef15a Link to v1: https://lore.kernel.org/netdev/20231108072004.1045669-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v2: https://lore.kernel.org/netdev/20231119204922.2251912-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v3: https://lore.kernel.org/netdev/20231122180510.2297075-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v4: https://lore.kernel.org/netdev/20231129212519.2938875-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v5: https://lore.kernel.org/netdev/20231130130840.253733-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v6: https://lore.kernel.org/netdev/20231205064806.2851305-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v7: https://lore.kernel.org/netdev/20231206211849.2707151-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Link to v8: https://lore.kernel.org/netdev/20231211211658.2904268-1-avkrasnov@xxxxxxxxxxxxxxxxx/ Changelog: v1 -> v2: * Patchset rebased and tested on new HEAD of net-next (see hash above). * New patch is added as 0001 - it removes return from SO_RCVLOWAT set callback in 'af_vsock.c' when transport callback is set - with that we can set 'sk_rcvlowat' only once in 'af_vsock.c' and in future do not copy-paste it to every transport. It was discussed in v1. * See per-patch changelog after ---. v2 -> v3: * See changelog after --- in 0003 only (0001 and 0002 still same). v3 -> v4: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v4 -> v5: * Change patchset tag 'RFC' -> 'net-next'. * See per-patch changelog after ---. v5 -> v6: * New patch 0003 which sends credit update during reading bytes from socket. * See per-patch changelog after ---. v6 -> v7: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v7 -> v8: * See per-patch changelog after ---. v8 -> v9: * Patchset rebased and tested on new HEAD of net-next (see hash above). * Add 'Fixes' tag for the current 0002. * Reorder patches by moving two fixes first. Arseniy Krasnov (4): virtio/vsock: fix logic which reduces credit update messages virtio/vsock: send credit update during setting SO_RCVLOWAT vsock: update SO_RCVLOWAT setting callback vsock/test: two tests to check credit update logic
This order will break the bisectability, since now patch 2 will not build if patch 3 is not applied. So you need to implement in patch 2 `set_rcvlowat` and in patch 3 updated it to `notify_set_rcvlowat`, otherwise we always need to backport patch 3 in stable branches, that should be applied before patch 2. You have 2 options: a. move patch 3 before patch 2 without changing the code b. change patch 2 to use `set_rcvlowat` and updated that code in patch 3 I don't have a strong opinion, but I slightly prefer option a. BTW that forces us to backport more patches on stable branches, so I'm fine with option b as well. That said: Nacked-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>