On Wed, Apr 13, 2022 at 10:41:53AM -0600, Jeffrey Hugo wrote: > mhi_recycle_ev_ring() computes the shared write pointer for the ring > (ctxt_wp) using a read/modify/write pattern where the ctxt_wp value in the > shared memory is read, incremented, and written back. There are no checks > on the read value, it is assumed that it is kept in sync with the locally > cached value. Per the MHI spec, this is correct. The device should only > read ctxt_wp, never write it. > > However, there are devices in the wild that violate the spec, and can > update the ctxt_wp in a specific scenario. This can cause corruption, and > violate the above assumption that the ctxt_wp is in sync with the cached > value. > > This can occur when the device has loaded firmware from the host, and is > transitioning from the SBL EE to the AMSS EE. As part of shutting down > SBL, the SBL flushes it's local MHI context to the shared memory since > the local context will not persist across an EE change. In the case of > the event ring, SBL will flush its entire context, not just the parts that > it is allowed to update. This means SBL will write to ctxt_wp, and > possibly corrupt it. > > An example: > > Host Device > ---- --- > Update ctxt_wp to 0x1f0 > SBL observes 0x1f0 > Update ctxt_wp to 0x0 > Starts transition to AMSS EE > Context flush, writes 0x1f0 to ctxt_wp > Update ctxt_wp to 0x200 > Update ctxt_wp to 0x210 > AMSS observes 0x210 > 0x210 exceeds ring size > AMSS signals syserr > > The reason the ctxt_wp goes off the end of the ring is that the rollover > check is only performed on the cached wp, which is out of sync with > ctxt_wp. > > Since the host is the authority of the value of ctxt_wp per the MHI spec, > we can fix this issue by not reading ctxt_wp from the shared memory, and > instead compute it based on the cached value. If SBL corrupts ctxt_wp, > the host won't observe it, and will correct the value at some point later. > > Signed-off-by: Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> > Reviewed-by: Hemant Kumar <hemantk@xxxxxxxxxxxxxx> > Reviewed-by: Bhaumik Bhatt <bbhatt@xxxxxxxxxxxxxx> Applied to mhi-next! Thanks, Mani > --- > > v3: > Rebase to -next > > v2: > Fix typo on the ring base > > drivers/bus/mhi/host/main.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > index 142eea1..f3aef77a 100644 > --- a/drivers/bus/mhi/host/main.c > +++ b/drivers/bus/mhi/host/main.c > @@ -534,18 +534,13 @@ irqreturn_t mhi_intvec_handler(int irq_number, void *dev) > static void mhi_recycle_ev_ring_element(struct mhi_controller *mhi_cntrl, > struct mhi_ring *ring) > { > - dma_addr_t ctxt_wp; > - > /* Update the WP */ > ring->wp += ring->el_size; > - ctxt_wp = le64_to_cpu(*ring->ctxt_wp) + ring->el_size; > > - if (ring->wp >= (ring->base + ring->len)) { > + if (ring->wp >= (ring->base + ring->len)) > ring->wp = ring->base; > - ctxt_wp = ring->iommu_base; > - } > > - *ring->ctxt_wp = cpu_to_le64(ctxt_wp); > + *ring->ctxt_wp = cpu_to_le64(ring->iommu_base + (ring->wp - ring->base)); > > /* Update the RP */ > ring->rp += ring->el_size; > -- > 2.7.4 >