On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote: > > If there is no other way to avoid the Smatch warning, > > The Smatch warning is not the issue. If we're holding a spinlock and > we call might_sleep() then that generates a stack trace at runtime if > you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Probably enabling > CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC > process. > > Anyway, it sounds you're just doing the locking to silence a warning in > xarray. Let's ask Matthew if he has a hint. I suspect that the tcmu code is doing something horrible & wrong again. It keeps poking at VM internals without understanding of what's going on or why. Anyway ... > On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote: > > I'm quite sure that our code does not cause any problems, because > > in tcmu we explicitly or implicitly take the xarray's lock only while > > holding the so called cmdr_lock mutex. Also, we take the lock without > > disabling irq or bh. So I think there is no problem if lock_page sleeps > > while we hold the xarray's lock. > > > > Of course, we wouldn't need the xarray lock at all, but xarray code > > forces us to take it to avoid warnings. You mean "The XArray warns us when we break its locking rules". > > In tcmu_blocks_release we use the advanced xarray API to keep the > > overhead small. It allows us to lock/unlock before and after the loop > > only. If there is no other way to avoid the Smatch warning, we could > > easily put additional xas_unlock() and xas_lock() around the > > lock_page/unlock_page block. ... then you'd have to call xas_reset(), and you might as well use xa_for_each(). > > > 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, > > > 1662 unsigned long last) > > > 1663 { > > > 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); > > > 1665 struct page *page; > > > 1666 u32 pages_freed = 0; > > > 1667 > > > 1668 xas_lock(&xas); > > > ^^^^^^^^^^^^^^ > > > We take a spinlock here. > > > > > > > > > 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { > > > 1670 xas_store(&xas, NULL); > > > 1671 /* > > > 1672 * While reaching here there may be page faults occurring on > > > 1673 * the to-be-released pages. A race condition may occur if > > > 1674 * unmap_mapping_range() is called before page faults on these > > > 1675 * pages have completed; a valid but stale map is created. > > > 1676 * > > > 1677 * If another command subsequently runs and needs to extend > > > 1678 * dbi_thresh, it may reuse the slot corresponding to the > > > 1679 * previous page in data_bitmap. Though we will allocate a new > > > 1680 * page for the slot in data_area, no page fault will happen > > > 1681 * because we have a valid map. Therefore the command's data > > > 1682 * will be lost. > > > 1683 * > > > 1684 * We lock and unlock pages that are to be released to ensure > > > 1685 * all page faults have completed. This way > > > 1686 * unmap_mapping_range() can ensure stale maps are cleanly > > > 1687 * removed. > > > 1688 */ > > > --> 1689 lock_page(page); > > > ^^^^^^^^^^^^^^^ > > > The lock_page() function calls might_sleep() (inside the declaration > > > block). > > > > > > 1690 unlock_page(page); > > > 1691 __free_page(page); > > > 1692 pages_freed++; > > > 1693 } > > > 1694 xas_unlock(&xas); There are a number of things you can do. One is to remove pages into a pagevec until it is full, then xas_unlock(); xas_reset(); lock and unlock each page and pass the pagevec to __pagevec_release(). Then xas_lock() and continue the loop. Another possibility is to trylock each page; if it fails, put it into the pagevec (and do the above dance if the pagevec is now full), but if it succeeds, you can now unlock it and __free_page(). The key to going fast is batching. And that goes for __free_page() vs __pagevec_release() as much as for walking the XArray.