Hi Paul,
On 8/6/2021 2:43 AM, Loic Poulain wrote:
+ MHI people
On Fri, 6 Aug 2021 at 06:20, Paul Davey <Paul.Davey@xxxxxxxxxxxxxxxxxxx> wrote:
Hi linux-arm-msm list,
We have been using the mhi driver with a Sierra EM9191 5G modem module
and have seen an occasional issue where the kernel would crash with
messages showing "BUG: Bad page state" which we debugged further and
found it was due to mhi_net_ul_callback freeing the same skb multiple
times, further debugging tracked this down to a case where
parse_xfer_event computed a dev_rp from the passed event's ev_tre
which does not lie within the region of valid "in flight" transfers
for the tre_ring. See the patch below for how this was detected.
I believe that receiving such an event results in the loop which runs
completion events for the transfers to re-run some completion
callbacks as it walks all the way around the ring again to reach the
invalid dev_rp position.
Do you have a log which prints the TRE being processed? Basically i am
trying understand this : by the time you get double free issue, is there
any pattern with respect to the TRE that is being processed. For example
when host processed the given TRE for the first time with RP1, stale TRE
was posted by Event RP2 right after RP1
->RP1 [TRE1]
->RP2 [TRE1]
or occurrence of stale TRE event is random?
What could cause parse_xfer_event to receive a transfer event with a
tre pointer which would be outside the range of in-flight transfers?
For example receiving events where the tre pointers do not only
increase or receive a second event of types MHI_EV_CC_OVERFLOW,
MHI_EV_CC_EOB, or MHI_EV_CC_EOT for a previous tre.
In theory this is not suppose to happen. once a xfer completion event is
posted on event ring TRE belongs to Host MHI, Device is not suppose to
work on this TRE any more.
The existing mhi driver code appears to assume that transfer events
are received strictly in order such that you can never receive a
transfer completion event for a transfer descriptor outside the
current set of "in flight" transfers in the tre ring (those between
the read pointer and write pointer).
This assumption is as per MHI spec.
I am checking internally if there is any know issue on device side. This
model seems to be Qualcomm® Snapdragon™ X55 ?
[..]
Thanks,
Hemant
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project