Re: [bug report] scsi: target: tcmu: Fix possible data corruption

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Mon, 9 May 2022 19:12:01 +0100

On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote:
> > If there is no other way to avoid the Smatch warning,
> 
> The Smatch warning is not the issue.  If we're holding a spinlock and
> we call might_sleep() then that generates a stack trace at runtime if
> you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.  Probably enabling
> CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC
> process.
> 
> Anyway, it sounds you're just doing the locking to silence a warning in
> xarray.  Let's ask Matthew if he has a hint.

I suspect that the tcmu code is doing something horrible & wrong again.
It keeps poking at VM internals without understanding of what's going
on or why.  Anyway ...

> On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote:
> > I'm quite sure that our code does not cause any problems, because
> > in tcmu we explicitly or implicitly take the xarray's lock only while
> > holding the so called cmdr_lock mutex. Also, we take the lock without
> > disabling irq or bh. So I think there is no problem if lock_page sleeps
> > while we hold the xarray's lock.
> > 
> > Of course, we wouldn't need the xarray lock at all, but xarray code
> > forces us to take it to avoid warnings.

You mean "The XArray warns us when we break its locking rules".

> > In tcmu_blocks_release we use the advanced xarray API to keep the
> > overhead small. It allows us to lock/unlock before and after the loop
> > only. If there is no other way to avoid the Smatch warning, we could
> > easily put additional xas_unlock() and xas_lock() around the
> > lock_page/unlock_page block.

... then you'd have to call xas_reset(), and you might as well use
xa_for_each().

> > >      1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
> > >      1662                                 unsigned long last)
> > >      1663 {
> > >      1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
> > >      1665         struct page *page;
> > >      1666         u32 pages_freed = 0;
> > >      1667
> > >      1668         xas_lock(&xas);
> > >                   ^^^^^^^^^^^^^^
> > > We take a spinlock here.
> > > 
> > > 
> > >      1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
> > >      1670                 xas_store(&xas, NULL);
> > >      1671                 /*
> > >      1672                  * While reaching here there may be page faults occurring on
> > >      1673                  * the to-be-released pages. A race condition may occur if
> > >      1674                  * unmap_mapping_range() is called before page faults on these
> > >      1675                  * pages have completed; a valid but stale map is created.
> > >      1676                  *
> > >      1677                  * If another command subsequently runs and needs to extend
> > >      1678                  * dbi_thresh, it may reuse the slot corresponding to the
> > >      1679                  * previous page in data_bitmap. Though we will allocate a new
> > >      1680                  * page for the slot in data_area, no page fault will happen
> > >      1681                  * because we have a valid map. Therefore the command's data
> > >      1682                  * will be lost.
> > >      1683                  *
> > >      1684                  * We lock and unlock pages that are to be released to ensure
> > >      1685                  * all page faults have completed. This way
> > >      1686                  * unmap_mapping_range() can ensure stale maps are cleanly
> > >      1687                  * removed.
> > >      1688                  */
> > > --> 1689                 lock_page(page);
> > >                           ^^^^^^^^^^^^^^^
> > > The lock_page() function calls might_sleep() (inside the declaration
> > > block).
> > > 
> > >      1690                 unlock_page(page);
> > >      1691                 __free_page(page);
> > >      1692                 pages_freed++;
> > >      1693         }
> > >      1694         xas_unlock(&xas);

There are a number of things you can do.  One is to remove pages into
a pagevec until it is full, then xas_unlock(); xas_reset(); lock and
unlock each page and pass the pagevec to __pagevec_release().  Then
xas_lock() and continue the loop.

Another possibility is to trylock each page; if it fails, put it into
the pagevec (and do the above dance if the pagevec is now full), but
if it succeeds, you can now unlock it and __free_page().

The key to going fast is batching.  And that goes for __free_page()
vs __pagevec_release() as much as for walking the XArray.