Re: bus: mhi: parse_xfer_event running transfer completion callbacks more than once for a given buffer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2021-08-26 at 09:54 -0600, Jeffrey Hugo wrote:
> On 8/23/2021 12:47 AM, Paul Davey wrote:
> > Hi Hemant, Jeffery
> > 
> > I have some more information after some testing.
> > 
> > > > Do you have a log which prints the TRE being processed?
> > > > Basically i
> > > > am
> > > > trying understand this : by the time you get double free issue,
> > > > is
> > > > there
> > > > any pattern with respect to the TRE that is being processed.
> > > > For
> > > > example
> > > > when host processed the given TRE for the first time with RP1,
> > > > stale
> > > > TRE
> > > > was posted by Event RP2 right after RP1
> > > > 
> > > > ->RP1 [TRE1]
> > > > ->RP2 [TRE1]
> > > > 
> > > > or occurrence of stale TRE event is random?
> > 
> > I have now collected some information by adding buffers which
> > record
> > some of the information desired and searching or printing this
> > information only when the issue is detected in order to avoid
> > constant
> > verbose debug information and potential slowdowns.
> > 
> >  From this information I can report that when this issue happens
> > two
> > consecutive transfer completion events occur with the same TRE
> > pointer
> > in them, I did not record events which are not transfer completion
> > events or the event ring RP during processing.
> > 
> > So the event is as follows:
> > 
> > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2
> > ptr:
> > 77c94780
> > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2
> > ptr:
> > 77c94780
> 
> This isn't good.  I would suspect that the device is glitching then, 
> which should be fixed on the device side, but that doesn't help you
> here 
> and now.
> 
> I'm thinking your change is probably a good idea based on this, but
> I 
> have additional questions.
> 
> Can you check the address of the completion events in the shared
> memory 
> (basically the event ring) when you see this?  I want to rule out
> the 
> possibility that host is double processing the same event, and this
> is 
> truly a case of the device duplicating an event.
> 
> I hope that makes sense to you.

I have added recording of the event tre address in my debug collecting
so that should answer this question when the results from that test
come back, which will run over the weekend.

Additionally I have another update with some results that occurred
since my last mail.

Two incidents occurred but I am not sure about their relative timing
the TRE addresses suggest they are not immediately after eachother.

First my check for cb_buf in buf_info is NULL went off but nothing
otherwise untoward seemed to be happening.  I have added a check to
parse_xfer_event to check if the wp in the buf_info matches the TRE
address being processed to try and catch the rings somehow getting out
of sync but aside from that I can only think of concurrency related
issues causing this problem.  Perhaps I should check if the ring in
question is full or something to give any insight into this.

Secondly we saw the following pattern in completion events:

mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
7c4004e0
mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
7c400520
mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
7c4004c0
mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
7c4004b0
mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
7c4004a0

Here we can see that instead of a completion event for 7c4004d0 we have
one for 7c400520 which is significantly ahead of the other point and
from the list of TREs I store in mhi_gen_tre I suspect that 7c400520 is
the next TRE to be used in the TRE ring at this time, as the other
information shows it would be the oldest entry in that list.  I am not
sure what could have caused this but this is a different case to the
modem repeating the same TRE in a completion event.

Thanks,
Paul







[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux