On Mon, Dec 09, 2019 at 03:19:59PM +0100, Marek Marczykowski-Górecki wrote: > On Mon, Dec 09, 2019 at 03:35:43PM +0900, Suwan Kim wrote: > > On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote: > > > On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote: > > > > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote: > > > > > [ 212.890519] usb 1-1: recv xbuf, 42 > > > > > > > > This message is printed by receive error and before that, driver > > > > canceled URB transmission. we need to know the exact situation > > > > before this message. > > > > > > I've added some more messages and found recv_size is 0. > > > > That is the bug point. "size" is urb->actual_length that means > > amount of data actually received from device. And "copy" is > > amount of data received from usbip server. So, in this situation, > > vhci-hcd received all the data from usbip server even if there > > are more sg entries left. So, "copy == 0" means vhci-hcd receives > > all data from the server and we should check "if (copy == 0)" in > > for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not > > to add error event. > > That makes sense. But I think there is also another issue here: hang in > case of an error. Here it was EINVAL, but there are probably other > reasons why usbip_recv can fail, like network error or misbehaving > server. This definitely should not cause the client to fail this way... > And also, the actual error code is lost. I agree. I have been taking a look at it and trying to reproduce the same issue on my machine. I guess race condition between hub irq thread and driver (vhci_urb_enqueue or hcd?) But I'm not sure... > > > > > Could you send me a longer log messages showing the situation > > > > before "[ 212.890519] usb 1-1: recv xbuf, 42"? > > > > > > Sure, with added extra messages (debug patch below). > > > > > > [ 131.397522] usb 1-1: num_sgs 0 > > > [ 131.406588] usb 1-1: num_sgs 0 > > > [ 131.410621] usb 1-1: num_sgs 0 > > > [ 131.411950] usb 1-1: num_sgs 0 > > > [ 131.413186] usb 1-1: num_sgs 0 > > > [ 131.414590] usb 1-1: num_sgs 0 > > > [ 131.417086] usb 1-1: num_sgs 0 > > > [ 131.418188] usb 1-1: num_sgs 0 > > > [ 131.419228] usb 1-1: num_sgs 0 > > > [ 131.420248] usb 1-1: num_sgs 0 > > > [ 131.457315] usb 1-1: num_sgs 5 > > > [ 131.457345] usb 1-1: size 42, copy 42 recv 42, recv_size 42, sg->length 16384 > > > > Device sent 42 bytes data (size 42) and vhci-hcd received 42 bytes > > data from the server. vhci-hcd received all the data and It should > > exit the loop. > > > > > [ 131.457359] usb 1-1: size 42, copy 0 recv -22, recv_size 0, sg->length 16384 > > > [ 131.457372] usb 1-1: recv xbuf, 42 size 42 > > > [ 131.458263] vhci_hcd: vhci_shutdown_connection:1024: stop threads > > > [ 131.458318] vhci_hcd: vhci_shutdown_connection:1032: release socket > > > [ 131.458431] vhci_hcd: vhci_shutdown_connection:1058: disconnect device > > > [ 131.460171] usb 1-1: USB disconnect, device number 2 > > > > > > (...) > > > > > > If I add "if (!recv_size) continue;" there, it works! > > > > I think we should check "copy" not the "recv_size" because "copy" > > shows the amount of data received from the server. > > > > int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb) > > ... > > ... > > if (urb->num_sgs) { > > copy = size; > > for_each_sg(urb->sg, sg, urb->num_sgs, i) { > > int recv_size; > > > > if (copy < sg->length) > > recv_size = copy; > > else > > recv_size = sg->length; > > > > recv = usbip_recv(ud->tcp_socket, sg_virt(sg), > > recv_size); > > > > if (recv != recv_size) > > goto error; > > > > copy -= recv; > > ret += recv; > > > > /* Add here */ > > if (!copy) > > break; > > ^^^^^^^^^^^^^^ > > } > > This helps too. Good. We should fix this issue first. I will submit a patch soon. Regrads, Suwan Kim