hi,
hi,
Sorry for the late response. Currently I'm quite busy.
Really never mind :)
In your earlier mail you described a possible dead lock.
With this patch applied, are you sure a similar deadlock cannot
happen?
AFAIK, this patch will solve the deadlock.
Additionally, let's assume tcmu_vma_fault/tcmu_try_get_data_page
- after having found a valid page to map - is interrupted after
releasing the invalidate_lock. Are there any locks held to prevent
find_free_blocks from jumping in and possibly remove that page from
xarray and try to remove it from the mmapped area?
If not, we might end up mapping a no longer valid page.
Yeah, after tcmu_try_get_data_page() returns, find_free_blocks()
definitely
may come in and do unmap_mapping_range() and tcmu_blocks_release(),
but I think it won't cause problems:
1) Since page fault procedure and unmap_mapping_range are designed to
be able to run concurrently, they sync at pte_offset_map_lock(). See
=> do_user_addr_fault
==> handle_mm_fault
===> __handle_mm_fault
====> do_fault
=====> do_shared_fault
=======> finish_fault
========> pte_offset_map_lock
========> do_set_pte
========> pte_unmap_unlock
and in find_free_blocks():
=> unmap_mapping_range
== > unmap_mapping_range_tree
===> zap_page_range_single
====> unmap_page_range
=====> zap_p4d_range
======> zap_pud_range
========> zap_pmd_range
==========> zap_pte_range
===========> pte_offset_map_lock
===========> pte_clear_not_present_full
===========> pte_unmap_unlock(start_pte, ptl);
So what I want to express is that because of the concurrency of page
fault
procedure and unmap_mapping_range(), one will either see a valid map, or
not. And if not, because this page exceeds dbi_max, a later page fault
will
happen, and will get sigbus, but it's reasonable.
As for your question, tcmu_try_get_data_page() finds a page successfully,
this page will get a refcount properly, if later unmap_mapping_range()
and
tcmu_blocks_release() come in, just after tcmu_try_get_data_page()
returns and
before tcmu_vma_fault() returns, then actually tcmu_blocks_release()
won't
free this page because there is one refcount. So yes, we'll map a no
longer
valid page, but this page also won't be re-used, unless the map is
unmapped
later(process exits or killed), then put_page() will be called and
page will finally
be given back to mm subsystem.
After thinking more about this problem, if we now have a valid map which
points
to a truncated page, and this offset of this page in data_bitmap is
freed. If later
another command runs in, it reuse the previous freed slot in
data_bitmap. Though
we'll allocate new page for this slot in data_area, but seems no page
fault will
happen again, because we have a valid map.. so real request's data will
lose.
As you say, indeed this would be a long standing problem, we'll need to have
a deeper look at codes.
Regards,
Xiaoguang Wang
Of course, this would be a long standing problem not caused by your
change. But if there would be a problem, we should try to fix it
when touching this code, I think.
Unfortunately I didn't manage yet to check which locks are involved
during page fault handling and unmap_mapping_range.
At least for my knowledge, page fault will hold mmap_read_lock() and
pte lock, unmap_mapping_range() will hold mapping->i_mmap_rwsem
and pte lock.
Regards,
Xiaoguang Wang
Bodo
On 16.03.22 11:43, Xiaoguang Wang wrote:
hello,
Gentle ping.
Regards,
Xiaoguang Wang
Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent
find_free_blocks(), which unmaps idle pages and truncates them. This
work is really like many filesystem's truncate operations, but they
use address_space->invalidate_lock to protect race.
This patch replaces cmdr_lock with address_space->invalidate_lock in
tcmu fault procedure, which will also make page-fault have
concurrency.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx>
---
drivers/target/target_core_user.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/target/target_core_user.c
b/drivers/target/target_core_user.c
index 06a5c4086551..e0a62623ccd7 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct
vm_area_struct *vma)
static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev,
uint32_t dpi)
{
+ struct address_space *mapping = udev->inode->i_mapping;
struct page *page;
- mutex_lock(&udev->cmdr_lock);
+ filemap_invalidate_lock_shared(mapping);
page = xa_load(&udev->data_pages, dpi);
if (likely(page)) {
get_page(page);
- mutex_unlock(&udev->cmdr_lock);
+ filemap_invalidate_unlock_shared(mapping);
return page;
}
@@ -1831,7 +1832,7 @@ static struct page
*tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi)
*/
pr_err("Invalid addr to data page mapping (dpi %u) on device
%s\n",
dpi, udev->name);
- mutex_unlock(&udev->cmdr_lock);
+ filemap_invalidate_unlock_shared(mapping);
return NULL;
}
@@ -3111,6 +3112,7 @@ static void find_free_blocks(void)
loff_t off;
u32 pages_freed, total_pages_freed = 0;
u32 start, end, block, total_blocks_freed = 0;
+ struct address_space *mapping;
if (atomic_read(&global_page_count) <= tcmu_global_max_pages)
return;
@@ -3134,6 +3136,7 @@ static void find_free_blocks(void)
continue;
}
+ mapping = udev->inode->i_mapping;
end = udev->dbi_max + 1;
block = find_last_bit(udev->data_bitmap, end);
if (block == udev->dbi_max) {
@@ -3152,12 +3155,14 @@ static void find_free_blocks(void)
udev->dbi_max = block;
}
+ filemap_invalidate_lock(mapping);
/* Here will truncate the data area from off */
off = udev->data_off + (loff_t)start * udev->data_blk_size;
- unmap_mapping_range(udev->inode->i_mapping, off, 0, 1);
+ unmap_mapping_range(mapping, off, 0, 1);
/* Release the block pages */
pages_freed = tcmu_blocks_release(udev, start, end - 1);
+ filemap_invalidate_unlock(mapping);
mutex_unlock(&udev->cmdr_lock);
total_pages_freed += pages_freed;