Re: Receiving report multiple times when changing cpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, Oct 12, 2022 at 04:35:52PM +0200, Thilo Rörig wrote:
> > Dear maintainer, dear community,
> > 
> > I'm new to looking into the linux kernel, but am hoping that this is the
> > right spot to address my problem.
> > Otherwise I would be very thankful, if you could point me in the right
> > direction.
> > 
> > I have a problem with a custom usb device on a raspberry pi running a
> > buildroot 64bit linux with a v5.19.14 kernel.
> > I ran into a problem reading data from a custom usb hid device using
> > interrupt transfers. I'm using
> > the python hidapi library (https://pypi.org/project/hidapi/) with the
> > libusb backend. After sending
> > a custom command to the device, the driver changes from usbhid to usbfs.
> > The device starts sending
> > reports of size 1023 bytes to endpoint 0x83. `usb-devices` yields the
> > following output:
> > 
> > T:  Bus=3D01 Lev=3D03 Prnt=3D13 Port=3D00 Cnt=3D01 Dev#=3D 14 Spd=3D480 MxC=
> > h=3D 0
> > D:  Ver=3D 2.00 Cls=3D00(>ifc ) Sub=3D00 Prot=3D00 MxPS=3D64 #Cfgs=3D  1
> > C:  #Ifs=3D 1 Cfg#=3D 1 Atr=3D80 MxPwr=3D500mA
> > I:  If#=3D 0 Alt=3D 0 #EPs=3D 1 Cls=3D03(HID  ) Sub=3D00 Prot=3D00 Driver=
> > =3Dusbfs
> > E:  Ad=3D83(I) Atr=3D03(Int.) MxPS=3D1024 Ivl=3D125us
> > 
> > The reports are numbered by an increasing report id and are available via
> > interrupt transfer approx. every
> > 230 microseconds (1.85 microframes). The device does not retain any old
> > reports.
> > 
> > Initially, I get all reports in the correct order, but after some time I
> > start getting reports that I have already
> > received previously. The report ids start to include a shift after a
> > certain number of reports and
> > schematically they look like the following:
> > 
> > ... 1,2,...,578,  238,239,...,816,  476,477,...,1054, and so on
> > 
> > So there is approximately the same number of consecutive "forward" reports
> > followed by a jump "backwards"
> > by approximately the same number of reports. I snooped the transferred data
> > using the `usbcore.usbfs_snoop` kernel
> > parameter but this slowed things down, so that I started missing reports
> > and could not reproduce the problem.
> 
> As Greg mentioned, usbmon is a better way to snoop the data transfers.

I have continued to investigate the issue using usbmon, but unfortunately, I
could not find any other irregularity, but the jumps in the report id pattern.

> > Then I traced the `xhci-hcd` events and observed that the pattern started
> > to occur, when my python process was
> > rescheduled from one cpu-core to another
> 
> How on earth do you know when your python process is rescheduled from 
> one CPU core to another?  That information certainly is not part of the 
> xhci-hcd trace output.
> 

Sorry for the messed up output of the xhci_hcd trace. Maybe I am not
interpreting the information in the log correctly as it is the first time that
I'm looking at kernel traces. To get a better view I have reformatted the log:

...

                         _-----=> irqs-off/BH-disabled
                        / _----=> need-resched
                       | / _---=> hardirq/softirq
                       || / _--=> preempt-depth
                       ||| / _-=> migrate-disable
                       |||| /     delay
    TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
       | |         |   |||||     |         |
----------------------------------------------------------------------------
  python-22331   [000] ..... 174497.068723: xhci_urb_enqueue: ep3in-intr: ...
  python-22331   [000] d.... 174497.068727: xhci_queue_trb: ...
  python-22331   [000] d.... 174497.068728: xhci_inc_enq: INTR ...
  python-22331   [000] d.... 174497.068729: xhci_ring_ep_doorbell: ...
  <idle>-0       [000] d.h.. 174497.068814: xhci_handle_transfer: INTR: ...
  <idle>-0       [000] d.h.. 174497.068820: xhci_inc_deq: INTR ...
  <idle>-0       [000] d.h.. 174497.068822: xhci_urb_giveback: ep3in-intr: ...
  python-22331   [000] ..... 174497.068894: xhci_urb_enqueue: ep3in-intr: ...
  python-22331   [000] d.... 174497.068899: xhci_queue_trb: INTR: ...
  python-22331   [000] d.... 174497.068900: xhci_inc_enq: INTR ...
  python-22331   [000] d.... 174497.068901: xhci_ring_ep_doorbell: ...
  <idle>-0       [000] d.h.. 174497.069957: xhci_handle_transfer: INTR: ...
  <idle>-0       [000] d.h.. 174497.069959: xhci_inc_deq: INTR ...
  <idle>-0       [000] d.h.. 174497.069961: xhci_urb_giveback: ep3in-intr: ...
  python-22331   [003] ..... 174497.070160: xhci_urb_enqueue: ep3in-intr: ...
  python-22331   [003] d.... 174497.070171: xhci_queue_trb: INTR: ...
  python-22331   [003] d.... 174497.070172: xhci_inc_enq: INTR ...
  python-22331   [003] d.... 174497.070174: xhci_ring_ep_doorbell: ...
  <idle>-0       [000] d.h.. 174497.071204: xhci_handle_transfer: INTR: ...
  <idle>-0       [000] d.h.. 174497.071206: xhci_inc_deq: INTR ...
  <idle>-0       [000] d.h.. 174497.071208: xhci_urb_giveback: ep3in-intr: ...
  python-22331   [003] ..... 174497.071269: xhci_urb_enqueue: ep3in-intr: ...
  python-22331   [003] d.... 174497.071272: xhci_queue_trb: INTR: ...
  python-22331   [003] d.... 174497.071273: xhci_inc_enq: INTR ...
  python-22331   [003] d.... 174497.071274: xhci_ring_ep_doorbell: ...
  <idle>-0       [000] d.h.. 174497.072331: xhci_handle_transfer: INTR: ...
  <idle>-0       [000] d.h.. 174497.072332: xhci_inc_deq: INTR ...
  <idle>-0       [000] d.h.. 174497.072334: xhci_urb_giveback: ep3in-intr: ...
...

In the middle, the python-22331 process changes from CPU# 000 to CPU# 003. 
This is the reason for my statement in the original question. I first observed
in htop that the irregularities in the report pattern started, when the cpu
usage changed. So maybe I am also only seeing what I want to see here and 
this might no trigger my problem.

> > ...
> >           python-22331   [000] ..... 174497.068723: xhci_urb_enqueue:
> > ep3in-intr: urb 0000000070a812bc pipe 1073852544 slot 3 length 0/1024
> > sgs 0/0 stream 0 flags 00010200
> >           python-22331   [000] d.... 174497.068727: xhci_queue_trb:
> > INTR: Buffer 00000004669c9800 length 1024 TD size 0 intr 0 type
> > 'Normal' flags b:i:I:c:s:I:e:c
> >           python-22331   [000] d.... 174497.068728: xhci_inc_enq: INTR
> > 000000006122f915: enq 0x00000004526bf8f0(0x00000004526bf000) deq
> > 0x00000004526bf8e0(0x00000004526bf000) segs 2 stream 0 free_trbs 500
> > bounce 1024 cycle 1
> >           python-22331   [000] d.... 174497.068729:
> > xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep3in
> >           <idle>-0       [000] d.h.. 174497.068811: xhci_handle_event:
> > EVENT: TRB 00000004526bf8e0 status 'Short Packet' len 1 slot 3 ep 7
> > type 'Transfer Event' flags e:c
> >           <idle>-0       [000] d.h.. 174497.068814:
> > xhci_handle_transfer: INTR: Buffer 00000004669c9800 length 1024 TD
> > size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >           <idle>-0       [000] d.h.. 174497.068820: xhci_inc_deq: INTR
> > 000000006122f915: enq 0x00000004526bf8f0(0x00000004526bf000) deq
> > 0x00000004526bf8f0(0x00000004526bf000) segs 2 stream 0 free_trbs 501
> > bounce 1024 cycle 1
> >           <idle>-0       [000] d.h.. 174497.068822: xhci_urb_giveback:
> > ep3in-intr: urb 0000000070a812bc pipe 1073852544 slot 3 length
> > 1023/1024 sgs 0/0 stream 0 flags 00010200
> >           <idle>-0       [000] d.h.. 174497.068824: xhci_inc_deq:
> > EVENT 000000002a1d7c0b: enq 0x0000000440f9c000(0x0000000440f9c000) deq
> > 0x0000000440f9f620(0x0000000440f9f000) segs 8 stream 0 free_trbs 2039
> > bounce 0 cycle 0
> >           python-22331   [000] ..... 174497.068894: xhci_urb_enqueue:
> > ep3in-intr: urb 0000000070a812bc pipe 1073852544 slot 3 length 0/1024
> > sgs 0/0 stream 0 flags 00010200
> >           python-22331   [000] d.... 174497.068899: xhci_queue_trb:
> > INTR: Buffer 00000004669c9800 length 1024 TD size 0 intr 0 type
> > 'Normal' flags b:i:I:c:s:I:e:c
> >           python-22331   [000] d.... 174497.068900: xhci_inc_enq: INTR
> > 000000006122f915: enq 0x00000004526bf900(0x00000004526bf000) deq
> > 0x00000004526bf8f0(0x00000004526bf000) segs 2 stream 0 free_trbs 500
> > bounce 1024 cycle 1
> >           python-22331   [000] d.... 174497.068901:
> > xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep3in
> >           <idle>-0       [000] d.h.. 174497.069954: xhci_handle_event:
> > EVENT: TRB 00000004526bf8f0 status 'Short Packet' len 1 slot 3 ep 7
> > type 'Transfer Event' flags e:c
> >           <idle>-0       [000] d.h.. 174497.069957:
> > xhci_handle_transfer: INTR: Buffer 00000004669c9800 length 1024 TD
> > size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >           <idle>-0       [000] d.h.. 174497.069959: xhci_inc_deq: INTR
> > 000000006122f915: enq 0x00000004526bf900(0x00000004526bf000) deq
> > 0x00000004526bf900(0x00000004526bf000) segs 2 stream 0 free_trbs 501
> > bounce 1024 cycle 1
> >           <idle>-0       [000] d.h.. 174497.069961: xhci_urb_giveback:
> > ep3in-intr: urb 0000000070a812bc pipe 1073852544 slot 3 length
> > 1023/1024 sgs 0/0 stream 0 flags 00010200
> >           <idle>-0       [000] d.h.. 174497.069962: xhci_inc_deq:
> > EVENT 000000002a1d7c0b: enq 0x0000000440f9c000(0x0000000440f9c000) deq
> > 0x0000000440f9f630(0x0000000440f9f000) segs 8 stream 0 free_trbs 2039
> > bounce 0 cycle 0
> >           python-22331   [003] ..... 174497.070160: xhci_urb_enqueue:
> > ep3in-intr: urb 00000000a553f707 pipe 1073852544 slot 3 length 0/1024
> > sgs 0/0 stream 0 flags 00010200
> >           python-22331   [003] d.... 174497.070171: xhci_queue_trb:
> > INTR: Buffer 00000004799b0000 length 1024 TD size 0 intr 0 type
> > 'Normal' flags b:i:I:c:s:I:e:c
> >           python-22331   [003] d.... 174497.070172: xhci_inc_enq: INTR
> > 000000006122f915: enq 0x00000004526bf910(0x00000004526bf000) deq
> > 0x00000004526bf900(0x00000004526bf000) segs 2 stream 0 free_trbs 500
> > bounce 1024 cycle 1
> >           python-22331   [003] d.... 174497.070174:
> > xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep3in
> >           <idle>-0       [000] d.h.. 174497.071201: xhci_handle_event:
> > EVENT: TRB 00000004526bf900 status 'Short Packet' len 1 slot 3 ep 7
> > type 'Transfer Event' flags e:c
> >           <idle>-0       [000] d.h.. 174497.071204:
> > xhci_handle_transfer: INTR: Buffer 00000004799b0000 length 1024 TD
> > size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >           <idle>-0       [000] d.h.. 174497.071206: xhci_inc_deq: INTR
> > 000000006122f915: enq 0x00000004526bf910(0x00000004526bf000) deq
> > 0x00000004526bf910(0x00000004526bf000) segs 2 stream 0 free_trbs 501
> > bounce 1024 cycle 1
> >           <idle>-0       [000] d.h.. 174497.071208: xhci_urb_giveback:
> > ep3in-intr: urb 00000000a553f707 pipe 1073852544 slot 3 length
> > 1023/1024 sgs 0/0 stream 0 flags 00010200
> >           <idle>-0       [000] d.h.. 174497.071210: xhci_inc_deq:
> > EVENT 000000002a1d7c0b: enq 0x0000000440f9c000(0x0000000440f9c000) deq
> > 0x0000000440f9f640(0x0000000440f9f000) segs 8 stream 0 free_trbs 2039
> > bounce 0 cycle 0
> >           python-22331   [003] ..... 174497.071269: xhci_urb_enqueue:
> > ep3in-intr: urb 00000000a553f707 pipe 1073852544 slot 3 length 0/1024
> > sgs 0/0 stream 0 flags 00010200
> >           python-22331   [003] d.... 174497.071272: xhci_queue_trb:
> > INTR: Buffer 00000004799b0000 length 1024 TD size 0 intr 0 type
> > 'Normal' flags b:i:I:c:s:I:e:c
> >           python-22331   [003] d.... 174497.071273: xhci_inc_enq: INTR
> > 000000006122f915: enq 0x00000004526bf920(0x00000004526bf000) deq
> > 0x00000004526bf910(0x00000004526bf000) segs 2 stream 0 free_trbs 500
> > bounce 1024 cycle 1
> >           python-22331   [003] d.... 174497.071274:
> > xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep3in
> >           <idle>-0       [000] d.h.. 174497.072330: xhci_handle_event:
> > EVENT: TRB 00000004526bf910 status 'Short Packet' len 1 slot 3 ep 7
> > type 'Transfer Event' flags e:c
> >           <idle>-0       [000] d.h.. 174497.072331:
> > xhci_handle_transfer: INTR: Buffer 00000004799b0000 length 1024 TD
> > size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >           <idle>-0       [000] d.h.. 174497.072332: xhci_inc_deq: INTR
> > 000000006122f915: enq 0x00000004526bf920(0x00000004526bf000) deq
> > 0x00000004526bf920(0x00000004526bf000) segs 2 stream 0 free_trbs 501
> > bounce 1024 cycle 1
> >           <idle>-0       [000] d.h.. 174497.072334: xhci_urb_giveback:
> > ep3in-intr: urb 00000000a553f707 pipe 1073852544 slot 3 length
> > 1023/1024 sgs 0/0 stream 0 flags 00010200
> >           <idle>-0       [000] d.h.. 174497.072334: xhci_inc_deq:
> > EVENT 000000002a1d7c0b: enq 0x0000000440f9c000(0x0000000440f9c000) deq
> > 0x0000000440f9f650(0x0000000440f9f000) segs 8 stream 0 free_trbs 2039
> > bounce 0 cycle 0
> > ...
> > 
> > Unfortunately, I do not understand the details of the linux usb stack in
> > depth and the details of
> > this log enough to know where to look any further. I am hoping to get some
> > help or pointers to
> > how to attack my problem. I have tried using different libraries (hidapi C,
> > libusb), linux systems
> > (raspberry pi OS, ubuntu), but the problem persisted.
> > 
> > Thank you very much for taking the time for reading about my problem and
> > thank you in advance
> > for any advice.
> 
> The most important thing you should do, to ensure that reports do not 
> get lost, is submit a large queue of interrupt transfers in advance.  
> Each time a transfer completes, add another submission to the queue.
>
> That way, even if your process loses control of the CPU for some period 
> of time, the transfers will continue (unless the period is so long that 
> your entire queue gets consumed before the process is able to submit 
> more requests).

Thank you for your help! Initially, I was submitting 4 transfers only.
I'll try to submit more transfers once I start loosing reports. Currently
I am able to get all reports using only 4 transfers. I'm loosing reports
only if I'm debugging with `usbfs_snoop`. In this case, I tried submitting
more transfers, but I was still loosing some reports, so I decided, that 
I will try to find another debugging possibility iother then usbfs_snoop.

> The fact that reports get lost if you don't retrieve them every 230 
> microseconds may indicate that your custom device should not be using 
> the HID protocol at all, but rather should use isochronous transfers.
> Isochronous is meant for real-time applications where data is simply 
> lost if it's not transferred within a fixed interval of time.

Thank you very much for your comments and pointers. I will look into the
possibility to work with isochronous transfers. Due to the regularity
of the wrong report pattern, I'm suspecting that there is some ring
buffer involved, where dequeue and enqueue pointers are getting out of
sync. Unfortunately, I could not find a suspicious buffer looking at
the usbfs driver or libusb. 

Thank you very much again for your time and help.

Best
Thilo




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux