On Fri, 2021-08-27 at 16:51 +1200, Paul Davey wrote: > On Thu, 2021-08-26 at 09:54 -0600, Jeffrey Hugo wrote: > > On 8/23/2021 12:47 AM, Paul Davey wrote: > > > Hi Hemant, Jeffery > > > > > > I have some more information after some testing. > > > > > > > > Do you have a log which prints the TRE being processed? > > > > > Basically i > > > > > am > > > > > trying understand this : by the time you get double free > > > > > issue, > > > > > is > > > > > there > > > > > any pattern with respect to the TRE that is being processed. > > > > > For > > > > > example > > > > > when host processed the given TRE for the first time with > > > > > RP1, > > > > > stale > > > > > TRE > > > > > was posted by Event RP2 right after RP1 > > > > > > > > > > ->RP1 [TRE1] > > > > > ->RP2 [TRE1] > > > > > > > > > > or occurrence of stale TRE event is random? > > > > > > [...] > > > > Secondly we saw the following pattern in completion events: > > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr: > 7c4004e0 > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr: > 7c400520 > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr: > 7c4004c0 > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr: > 7c4004b0 > mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr: > 7c4004a0 > > Here we can see that instead of a completion event for 7c4004d0 we > have > one for 7c400520 which is significantly ahead of the other point and > from the list of TREs I store in mhi_gen_tre I suspect that 7c400520 > is > the next TRE to be used in the TRE ring at this time, as the other > information shows it would be the oldest entry in that list. I am > not > sure what could have caused this but this is a different case to the > modem repeating the same TRE in a completion event. > I have considered the code further, and while I have seen cases of identical TRE completion events occurring, I do not think these result in the double free case because if the event is actually the same as the last one then the new dev_rp which parse_xfer_event will attempt to advance the local_rp to will already be equal to the local_rp and the whole loop will be skipped in the first place. The troublesome behaviour comes from the case I describe above where a jump to a "future" TRE's completion event seems to occur followed by a continuation of the order, this results in the tre_ring rp being advanced to that future TRE and then the next completion event following the previous ordered pattern would be before that rp location and the loop will run through the entire tre_ring to reach the new rp location. I do have another question though, the driver code seems to in some cases take the mhi_chan->lock when updating the doorbell register, but not when queueing new transfers. What is the actual purpose of this lock and why does it seem so inconsistently used? Is there any chance that some of my problems may be the result of queueing new transfers racing somehow with completion event processing? Thanks, Paul