Re: [PATCH v3] usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 11, 2025 at 2:31 AM Junzhong Pan <panjunzhong@xxxxxxxxxxx> wrote:
>
> Hi Maciej
>
> Thanks for the reply, I am sorry for the unclear description.
>
> +remove hgajjar@xxxxxxxxxxxxxx from CC list due to mail delivery failure.
>
> On 2025/1/11 3:00, Maciej Żenczykowski wrote:
> > On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan <panjunzhong@xxxxxxxxxxx> wrote:
> >> Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512
> >> bytes=15872 bytes OUT Transfer but have no ZLP:
> >>
> >> OUT Transfer wSequence=5765
> >>         NTH32 Datagrams: 1514B * 8 + 580B NDP32
> >>         Transfer length: 512B * 31
> >>         NO ZLP
> >> OUT Transfer wSequence=5766
> >>         NTH32 Datagrams: 1514B * 8 NDP32
> >>         Transfer length: 512B * 29  + 432
> >>
> >> This lead to a result that the first givebacked 16K skb correponding to
> >> a usb_request contains two NTH but not complete:
> >>
> >> USB Request 1 SKB 16384B
> >>         (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766)
> >> USB Request 2 SKB 14768B
> >>         (Datagrams piece of wSequence=5766) (NDP32)
> >>
> >>  From the context, it seems the first report of Wrong NDP SIGN is caused
> >> by out-of-bound accessing, the second report of Wrong NTH SIGN is caused
> >> by a wrong beginning of NTB parsing.
> >
> > Could you clarify which Linux Kernel you're testing against?
>
> I am using linux 6.6.63, but the related codes have not newer updates since.
>
> > Could you provide some pcap of the actual usb frames?
> > Or perhaps describe better the problem, because I'm not quite
> > following from your email.
> > (I'm not sure if the problem is what windows is sending, or how Linux
> > is parsing it)
>
> I think the root cause of the problem is because Windows 10 ncm class
> driver doesn't send ZLP, meanwhile the current parsing is depend on a condition
> that NTB won't spread across usb_request.
>
> This is observed only on Windows 10 but not on Windows 11.
>
> To reproduce the issue, you just need a Windows 10 machine and run iperf3 -c
> from the windows and iperf3 -s on the gadget board, configure the gadget
> with the following os_desc, windows10 will bind its ncm driver automatically:
>         echo 0xEF >  $GADGET/bDeviceClass
>         echo 0x02 > $GADGET/bDeviceSubClass
>         echo 0x01 > $GADGET/bDeviceProtocol
>         echo 1 > $GADGET/os_desc/use
>         echo 0x1 > $GADGET/os_desc/b_vendor_code
>         echo "MSFT100" > $GADGET/os_desc/qw_sign
>         mkdir -p $FUNCTIONS/ncm.0/os_desc/interface.ncm
>         echo WINNCM > $FUNCTIONS/ncm.0/os_desc/interface.ncm/compatible_id
>
> Reproduced on dwc2 and dwc3 controller with linux 6.6.63.
>
> > I *think* what you're saying is that wSequence=5765 & 5766 are being
> > treated as a single ncm message due to their being a multiple of 512
> > in the former, not followed by a ZLP?  I thought that was precisely
> > when microsoft ncm added an extra zero byte...
> >
> > What's at the end of 5755?  Padding? No padding?
> > Is there an NTH32 header in 5766?  Should there be?
> >
>
> Okay, I will explain it more precisely (some numbers are corrected).
> On the USB wire, the transfers on OUT endpoint is like this:
>
> OUT Transfer #1 issued by win10, Transfer length: 512B * 31 = 15872 Bytes
>         OUT Transaction 512B for 31 times, no ZLP and other things. Parsed as below:
>         [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90
>         [Offset 0x0012] Datagram blocks
>         [Offset 0x3D90] A NDP32 describing 11 Datagram blocks plus one zero length item and padding, len = 112 Bytes.
>                         (0)index=0x0012 length=1514
>                         (1)index=0x05FE length=1514
>                         (2)index=0x0BEA length=1514
>                         (3)index=0x11D6 length=1514
>                         (4)index=0x17C2 length=1514
>                         (5)index=0x1DAE length=1514
>                         (6)index=0x239A length=1514
>                         (7)index=0x2986 length=1514
>                         (8)index=...... length=1514
>                         (9)index=...... length=1514
>                         (10)index=..... length=580
>                         (11)index=0x0 length=0
>                         ...padding...
>
> OUT Transfer #2 issued by win10, Transfer length: 512B * 29  + 432 = 15280 Bytes
>         OUT Transaction 512B for 29 times, and OUT Transaction 432B for one time.
>         [Offset 0x0000] A NTH32 with wSequence=5766, dwNdpIndex=0x3B48
>         [Offset 0x0012] Datagram block
>         [Offset 0x3B48] A NDP32 describing Datagram blocks
>
> In the u_ether driver, we first queued a usb_request(buf=skb.data, length=16384)
> to the udc driver, the udc should give it back when it receive a 16384B data or
> encounter ZLP/Short packet.
>
> Since the OUT Xfer #1 doesn't have ZLP or SP, the udc won't giveback the usb_request
> with req->actual = 15872, instead, it gather some piece of the data from OUT Transfer #2
> to make a req->actual=16384 then return to rx_complete.
>
> For f_ncm, we now have a the first usb_request givebacked from udc driver having
> skb->data organized as the following structure:
>
>       usb_request #1
>         -----------------------from OUT Transfer #1:-------------------
>         [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90
>         [Offset 0x0012] Datagram blocks
>         [Offset 0x3D90] A NDP32 describing 10 Datagram blocks plus one zero length item and padding, len = 112 Bytes.
>                         (0)index=0x0012 length=1514
>                         (1)index=0x05FE length=1514
>                         (2)index=0x0BEA length=1514
>                         (3)index=0x11D6 length=1514
>                         (4)index=0x17C2 length=1514
>                         (5)index=0x1DAE length=1514
>                         (6)index=0x239A length=1514
>                         (7)index=0x2986 length=1514
>                         (8)index=...... length=1514
>                         (9)index=...... length=1514
>                         (10)index=..... length=580
>                         (11)index=0x0 length=0
>                         ...padding...
>         --------------------from OUT Transfer #2:-------------------
>         [Offset 0x3E00/15872] A NTH32 Header with wSequence=5766, dwNdpIndex=0x3B48
>         [Offset 0x3E00+???] Datagram block piece from next NDP (wSequence=5766)
>
> During the unwrap, we will first report a "Wrong NTH SIGN" when we try to redirect to
> the NDP of wSequence=5765 parsing the second NTB in the usb_request, since theie is no
> bound checking, we read some garbage data from skb->data + [0x3E00 + 0x3B48].
>
> Raising the following message (without modification):
> [  174] configfs-gadget.0: Wrong NDP SIGN
>
> After this, we go to the next usb_request, at this time, we have a short package(432B) from
> the OUT Transfer #2, thus the udc would giveback next usb_request to us like this:
>
>       usb_request #2
>         --------------------from OUT Transfer #2:-------------------
>         [Offset 0x0000]  Datagram block piece from NTB of wSequence=5765
>                          3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
>         [Offset 0x????] A NDP32 describing Datagram blocks
>
> At this point, the unwrap function try to read a NTH sign but got unexpected user data,
> then it raise this message:
> [  174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768
> [  174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
>
> In short, there is two NTBs across two usb_requests, the beginning of a usb_request is
> not necessary a NTH sign.
> Do this make sense to you?
>
> Best Regards,
> Pan

Ok, this was *very* helpful.  I'll dig into this more during the week,
but here's a few quick questions/statements that immediately come to
mind.

(a) I think this looks like a bug on the sending Win10 side, rather
than a parsing bug in Linux,
with there being no ZLP, and no short (<512) frame, there's simply no
way for the receiver to split at the right spot.
Indeed, fixing this on the Linux/parsing side seems non-trivial...
I guess we could try to treat the connection as simply a serial
connection (ie. ignore frame boundaries), but then we might have
issues with other senders...

I guess the most likely 'correct' hack/fix would be to hold on to the
extra 'N*512' bytes (it doesn't even have to be 1, though likely the N
is odd), if it starts with a NTH header...

(b) I notice the '512' not '1024', I think this implies a USB2
connection instead of USB3
-- could you try replicating this with a USB3 capable data cable (and
USB3 ports), this should result in 1024 block size instead of 512.

I'm wondering if the win10 stack is avoiding generating N*1024, but
then hitting N*512 with odd N...

Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a
USB1.x port/host to test against is likely to be near impossible...

I'll try to see if I can find the source of the bug in the Win
driver's sources (though based on it being Win10 only, may need to
search history)

--
Maciej Żenczykowski, Kernel Networking Developer @ Google





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux