On Sat, Jan 11, 2025 at 2:31 AM Junzhong Pan <panjunzhong@xxxxxxxxxxx> wrote: > > Hi Maciej > > Thanks for the reply, I am sorry for the unclear description. > > +remove hgajjar@xxxxxxxxxxxxxx from CC list due to mail delivery failure. > > On 2025/1/11 3:00, Maciej Żenczykowski wrote: > > On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan <panjunzhong@xxxxxxxxxxx> wrote: > >> Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512 > >> bytes=15872 bytes OUT Transfer but have no ZLP: > >> > >> OUT Transfer wSequence=5765 > >> NTH32 Datagrams: 1514B * 8 + 580B NDP32 > >> Transfer length: 512B * 31 > >> NO ZLP > >> OUT Transfer wSequence=5766 > >> NTH32 Datagrams: 1514B * 8 NDP32 > >> Transfer length: 512B * 29 + 432 > >> > >> This lead to a result that the first givebacked 16K skb correponding to > >> a usb_request contains two NTH but not complete: > >> > >> USB Request 1 SKB 16384B > >> (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766) > >> USB Request 2 SKB 14768B > >> (Datagrams piece of wSequence=5766) (NDP32) > >> > >> From the context, it seems the first report of Wrong NDP SIGN is caused > >> by out-of-bound accessing, the second report of Wrong NTH SIGN is caused > >> by a wrong beginning of NTB parsing. > > > > Could you clarify which Linux Kernel you're testing against? > > I am using linux 6.6.63, but the related codes have not newer updates since. > > > Could you provide some pcap of the actual usb frames? > > Or perhaps describe better the problem, because I'm not quite > > following from your email. > > (I'm not sure if the problem is what windows is sending, or how Linux > > is parsing it) > > I think the root cause of the problem is because Windows 10 ncm class > driver doesn't send ZLP, meanwhile the current parsing is depend on a condition > that NTB won't spread across usb_request. > > This is observed only on Windows 10 but not on Windows 11. > > To reproduce the issue, you just need a Windows 10 machine and run iperf3 -c > from the windows and iperf3 -s on the gadget board, configure the gadget > with the following os_desc, windows10 will bind its ncm driver automatically: > echo 0xEF > $GADGET/bDeviceClass > echo 0x02 > $GADGET/bDeviceSubClass > echo 0x01 > $GADGET/bDeviceProtocol > echo 1 > $GADGET/os_desc/use > echo 0x1 > $GADGET/os_desc/b_vendor_code > echo "MSFT100" > $GADGET/os_desc/qw_sign > mkdir -p $FUNCTIONS/ncm.0/os_desc/interface.ncm > echo WINNCM > $FUNCTIONS/ncm.0/os_desc/interface.ncm/compatible_id > > Reproduced on dwc2 and dwc3 controller with linux 6.6.63. > > > I *think* what you're saying is that wSequence=5765 & 5766 are being > > treated as a single ncm message due to their being a multiple of 512 > > in the former, not followed by a ZLP? I thought that was precisely > > when microsoft ncm added an extra zero byte... > > > > What's at the end of 5755? Padding? No padding? > > Is there an NTH32 header in 5766? Should there be? > > > > Okay, I will explain it more precisely (some numbers are corrected). > On the USB wire, the transfers on OUT endpoint is like this: > > OUT Transfer #1 issued by win10, Transfer length: 512B * 31 = 15872 Bytes > OUT Transaction 512B for 31 times, no ZLP and other things. Parsed as below: > [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 > [Offset 0x0012] Datagram blocks > [Offset 0x3D90] A NDP32 describing 11 Datagram blocks plus one zero length item and padding, len = 112 Bytes. > (0)index=0x0012 length=1514 > (1)index=0x05FE length=1514 > (2)index=0x0BEA length=1514 > (3)index=0x11D6 length=1514 > (4)index=0x17C2 length=1514 > (5)index=0x1DAE length=1514 > (6)index=0x239A length=1514 > (7)index=0x2986 length=1514 > (8)index=...... length=1514 > (9)index=...... length=1514 > (10)index=..... length=580 > (11)index=0x0 length=0 > ...padding... > > OUT Transfer #2 issued by win10, Transfer length: 512B * 29 + 432 = 15280 Bytes > OUT Transaction 512B for 29 times, and OUT Transaction 432B for one time. > [Offset 0x0000] A NTH32 with wSequence=5766, dwNdpIndex=0x3B48 > [Offset 0x0012] Datagram block > [Offset 0x3B48] A NDP32 describing Datagram blocks > > In the u_ether driver, we first queued a usb_request(buf=skb.data, length=16384) > to the udc driver, the udc should give it back when it receive a 16384B data or > encounter ZLP/Short packet. > > Since the OUT Xfer #1 doesn't have ZLP or SP, the udc won't giveback the usb_request > with req->actual = 15872, instead, it gather some piece of the data from OUT Transfer #2 > to make a req->actual=16384 then return to rx_complete. > > For f_ncm, we now have a the first usb_request givebacked from udc driver having > skb->data organized as the following structure: > > usb_request #1 > -----------------------from OUT Transfer #1:------------------- > [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 > [Offset 0x0012] Datagram blocks > [Offset 0x3D90] A NDP32 describing 10 Datagram blocks plus one zero length item and padding, len = 112 Bytes. > (0)index=0x0012 length=1514 > (1)index=0x05FE length=1514 > (2)index=0x0BEA length=1514 > (3)index=0x11D6 length=1514 > (4)index=0x17C2 length=1514 > (5)index=0x1DAE length=1514 > (6)index=0x239A length=1514 > (7)index=0x2986 length=1514 > (8)index=...... length=1514 > (9)index=...... length=1514 > (10)index=..... length=580 > (11)index=0x0 length=0 > ...padding... > --------------------from OUT Transfer #2:------------------- > [Offset 0x3E00/15872] A NTH32 Header with wSequence=5766, dwNdpIndex=0x3B48 > [Offset 0x3E00+???] Datagram block piece from next NDP (wSequence=5766) > > During the unwrap, we will first report a "Wrong NTH SIGN" when we try to redirect to > the NDP of wSequence=5765 parsing the second NTB in the usb_request, since theie is no > bound checking, we read some garbage data from skb->data + [0x3E00 + 0x3B48]. > > Raising the following message (without modification): > [ 174] configfs-gadget.0: Wrong NDP SIGN > > After this, we go to the next usb_request, at this time, we have a short package(432B) from > the OUT Transfer #2, thus the udc would giveback next usb_request to us like this: > > usb_request #2 > --------------------from OUT Transfer #2:------------------- > [Offset 0x0000] Datagram block piece from NTB of wSequence=5765 > 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa > [Offset 0x????] A NDP32 describing Datagram blocks > > At this point, the unwrap function try to read a NTH sign but got unexpected user data, > then it raise this message: > [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768 > [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa > > In short, there is two NTBs across two usb_requests, the beginning of a usb_request is > not necessary a NTH sign. > Do this make sense to you? > > Best Regards, > Pan Ok, this was *very* helpful. I'll dig into this more during the week, but here's a few quick questions/statements that immediately come to mind. (a) I think this looks like a bug on the sending Win10 side, rather than a parsing bug in Linux, with there being no ZLP, and no short (<512) frame, there's simply no way for the receiver to split at the right spot. Indeed, fixing this on the Linux/parsing side seems non-trivial... I guess we could try to treat the connection as simply a serial connection (ie. ignore frame boundaries), but then we might have issues with other senders... I guess the most likely 'correct' hack/fix would be to hold on to the extra 'N*512' bytes (it doesn't even have to be 1, though likely the N is odd), if it starts with a NTH header... (b) I notice the '512' not '1024', I think this implies a USB2 connection instead of USB3 -- could you try replicating this with a USB3 capable data cable (and USB3 ports), this should result in 1024 block size instead of 512. I'm wondering if the win10 stack is avoiding generating N*1024, but then hitting N*512 with odd N... Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a USB1.x port/host to test against is likely to be near impossible... I'll try to see if I can find the source of the bug in the Win driver's sources (though based on it being Win10 only, may need to search history) -- Maciej Żenczykowski, Kernel Networking Developer @ Google