Re: [PATCH v2 1/2] bus: mhi: host: Add spinlock to protect WP access when queueing TREs

Qiang Yu <quic_qianyu@xxxxxxxxxxx> · Tue, 7 Nov 2023 15:59:49 +0800

On 11/6/2023 12:51 PM, Manivannan Sadhasivam wrote:
On Fri, Oct 20, 2023 at 09:07:35AM -0600, Jeffrey Hugo wrote:
On 10/16/2023 2:46 AM, Qiang Yu wrote:
On 9/29/2023 11:22 PM, Jeffrey Hugo wrote:
On 9/24/2023 9:10 PM, Qiang Yu wrote:
On 9/22/2023 10:44 PM, Jeffrey Hugo wrote:
On 9/13/2023 2:47 AM, Qiang Yu wrote:
From: Bhaumik Bhatt <bbhatt@xxxxxxxxxxxxxx>

Protect WP accesses such that multiple threads queueing buffers for
incoming data do not race and access the same WP twice.
Ensure read and
write locks for the channel are not taken in succession
by dropping the
read lock from parse_xfer_event() such that a callback given to client
can potentially queue buffers and acquire the write lock
in that process.
Any queueing of buffers should be done without channel
read lock acquired
as it can result in multiple locks and a soft lockup.

Signed-off-by: Bhaumik Bhatt <bbhatt@xxxxxxxxxxxxxx>
Signed-off-by: Qiang Yu <quic_qianyu@xxxxxxxxxxx>
---
   drivers/bus/mhi/host/main.c | 11 ++++++++++-
   1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index dcf627b..13c4b89 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -642,6 +642,7 @@ static int parse_xfer_event(struct
mhi_controller *mhi_cntrl,
               mhi_del_ring_element(mhi_cntrl, tre_ring);
               local_rp = tre_ring->rp;
   +            read_unlock_bh(&mhi_chan->lock);
This doesn't work due to the
write_lock_irqsave(&mhi_chan->lock, flags); on line 591.
Write_lock_irqsave(&mhi_chan->lock, flags) is used in case of
ev_code >= MHI_EV_CC_OOB. We only read_lock/read_unlock the
mhi_chan while ev_code < MHI_EV_CC_OOB.
Sorry.  OOB != EOB

I really don't like that we are unlocking the mhi_chan while
still using it.  It opens up a window where the mhi_chan
state can be updated between here and the client using the
callback to queue a buf.

Perhaps we need a new lock that just protects the wp, and
needs to be only grabbed while mhi_chan->lock is held?
Since we have employed mhi_chan lock to protect the channel and
what we are concerned here is that client may queue buf to a
disabled or stopped channel, can we check channel state after
getting mhi_chan->lock like line 595.

We can add the check after getting write lock in mhi_gen_tre()
and after getting read lock again here.
I'm not sure that is sufficient.  After you unlock to notify the
client, MHI is going to manipulate the packet count and runtime_pm
without the lock (648-652).  It seems like that adds additional
races which won't be covered by the additional check you propose.
I don't think read_lock_bh(&mhi_chan->lock) can protect runtime_pm and
the packet count here. Even if we do not unlock, mhi state and packet
count can still be changed because we did not get pm_lock here, which is
used in all mhi state transition function.

I also checked all places that mhi_chan->lock is grabbed, did not see
packet count and runtime_pm be protected by write_lock(&mhi_chan->lock).


If you really don't like the unlock operation, we can also take a new
lock. But I think we only need to add the new lock in two places,
mhi_gen_tre and mhi_pm_m0_transition while mhi_chan->lock is held.
Mani, if I recall correctly, you were the architect of the locking.  Do you
have an opinion?

TBH, the locking situation is a mess with MHI. Initially, we happen to have
separate locks for protecting various operations, but then during review, it was
advised to reuse existing locks and avoid having too many separate locks.

This worked well but then we kind of abused the locks over time. I asked Hemant
and Bhaumik to audit the locks and fix them, but both of them left Qcom.

So in this situation, the intent of the pm_lock was to protect concurrent access
against updating the pm_state. And it also happen to protect _other_things_ such
as runtime_put, pending_pkts etc... But not properly, because most of the time
read lock is taken in places where pm_state is being read. So there is still a
possibility of race while accessing these _other_things_.

For this patch, I'm happy with dropping chan->lock before calling xfer_cb() and
I want someone (maybe Qiang) to do the audit of locking in general and come up
with fixes where needed.

- Mani

As discussed with Jeff before, we also need to check channel state 
before queue buffer and after re-lock

in parse_xfer_event, so I also add the channel state check in next 
version patch.

Probably I can do the audit of locking. It's a good chance for me to 
understand various locks in MHI host

driver completely.