Re: [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi_isr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Dec 9, 2020 at 7:17 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Doug Anderson (2020-12-03 08:40:46)
>
> > I would guess that if "mas->cur_xfer" is NULL then
> > geni_spi_handle_rx() should read all data in the FIFO and throw it
> > away and geni_spi_handle_tx() should set SE_GENI_TX_WATERMARK_REG to
> > 0.  NOTE: I _think_ that with the synchronize_irq() I'm suggesting
> > above we'll avoid this case, but it never hurts to be defensive.
> >
> >
> > Does that all make sense?  So the summary is that instead of your patch:
>
> Can we get a CPU diagram describing the race and scenario where this
> happens? Something like:
>
>   CPU0                                CPU1
>   ----                                ----
>   setup_fifo_xfer()
>    spin_lock_irq(&mas->lock);
>    spin_unlock_irq(&mas->lock);
>    mas->cur_xfer = xfer
>    ...
>    <IRQ>
>                                       geni_spi_isr()
>                                        geni_spi_handle_rx()
>                                         <NULL deref boom explosion!>
>
> But obviously this example diagram is incorrect and some timeout happens
> instead? Sorry, I'm super lazy and don't want to read many paragraphs of
> text. :) I'd rather have a diagram like above that clearly points out
> the steps taken to the NULL pointer deref.

This is my untested belief of what's happening

 CPU0                                CPU1
 ----                                ----
                                     setup_fifo_xfer()
                                      ...
                                      geni_se_setup_m_cmd()
                                      <hardware starts transfer>
 <unrelated interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         <transfer complets in hardware>
 <continued interrupt storm>         <hardware sets M_RX_FIFO_WATERMARK_EN>
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         handle_fifo_timeout()
 <continued interrupt storm>          spin_lock_irq()
 <continued interrupt storm>          mas->cur_xfer = NULL
 <continued interrupt storm>          geni_se_cancel_m_cmd()
 <continued interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>          wait_for_completion_timeout() => timeout
 <continued interrupt storm>          spin_lock_irq()
 <continued interrupt storm>          geni_se_abort_m_cmd()
 <continued interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>          wait_for_completion_timeout() => timeout
 <interrupt storm ends>
 geni_spi_isr()
  spin_lock()
  if (m_irq & M_RX_FIFO_WATERMARK_EN)
   geni_spi_handle_rx()
    mas->cur_xfer NULL derefrence

With my proposed fix, I believe that would transform into:

 CPU0                                CPU1
 ----                                ----
                                     setup_fifo_xfer()
                                      ...
                                      geni_se_setup_m_cmd()
                                      <hardware starts transfer>
 <unrelated interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         <transfer complets in hardware>
 <continued interrupt storm>         <hardware sets M_RX_FIFO_WATERMARK_EN>
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         handle_fifo_timeout()
 <continued interrupt storm>          synchronize_irq()
 <continued interrupt storm>           <time passes>
 <interrupt storm ends>
 geni_spi_isr()
  ...
                                       <synchronize_irq() finishes>
                                      spin_lock_irq()
                                      mas->cur_xfer = NULL
                                      geni_se_cancel_m_cmd()
                                      spin_unlock_irq()
 geni_spi_isr()
   ...
                                      wait_for_completion_timeout() => success

The extra synchronize_irq() I was suggesting at the end of the
function would be an extra bit of paranoia.  Maybe a new storm showed
up while we were processing the timeout?


> > 1. Add synchronize_irq() at the start and end of
> > handle_fifo_timeout().  Not under lock.
> >
> > 2. In geni_spi_handle_rx(), check for NULL "mas->cur_xfer".  Read all
> > data in the FIFO (don't cap at rx_rem_bytes), but throw it away.
> >
> > 3. In geni_spi_handle_tx(), check for NULL "mas->cur_xfer".  Don't
> > write any data.  Just write 0 to SE_GENI_TX_WATERMARK_REG.
> >
> > I think #1 is the real fix, but #2 and #3 will avoid crashes in case
> > there's another bug somewhere.
>
> Aren't 2 and 3 papering over some weird problem though where irqs are
> coming in unexpectedly?

I think that's what I said but in different words?  #1 is the real fix
but #2 and #3 will keep us from crashing (AKA paper over) if we have
some other (unexpected) bug.  We'll already have an error in the log
in this case "Failed to cancel/abort m_cmd" so it doesn't feel
necessary to crash with a NULL dereference...

-Doug



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux