hi, > Hi, > > Thank you again for the patch! > > You might call me a nitpicker, but ... > > Wouldn't it be good to add a comment in find_free_blocks explaining > why it is safe to first call tcmu_blocks_release and then > unmap_mapping_range? Never mind. I think better comments will help developers understand codes better. Regards, Xiaoguang Wang > > Regards, > Bodo > > On 15.04.22 17:34, Xiaoguang Wang wrote: >> When tcmu_vma_fault() gets one page successfully, before the current >> context completes page fault procedure, find_free_blocks() may run in >> and call unmap_mapping_range() to unmap this page. Assume when >> find_free_blocks() completes its job firstly, previous page fault >> procedure starts to run again and completes, then one truncated page has >> beed mapped to use space, but note that tcmu_vma_fault() has gotten one >> refcount for this page, so any other subsystem won't use this page, >> unless later the use space addr is unmapped. >> >> If another command runs in later and needs to extends dbi_thresh, it may >> reuse the corresponding slot to previous page in data_bitmap, then though >> we'll allocate new page for this slot in data_area, but no page fault will >> happen again, because we have a valid map, real request's data will lose. >> >> Filesystem implementations will also run into this issue, but they >> usually lock page when vm_operations_struct->fault gets one page, and >> unlock page after finish_fault() completes. In truncate sides, they >> lock pages in truncate_inode_pages() to protect race with page fault. >> We can also have similar codes like filesystem to fix this issue. >> >> To fix this possible data corruption, we can apply similar method like >> filesystem. For pages that are to be freed, find_free_blocks() locks >> and unlocks these pages, and make tcmu_vma_fault() also lock found page >> under cmdr_lock. With this action, for above race, find_free_blocks() >> will wait all page faults to be completed before calling >> unmap_mapping_range(), and later if unmap_mapping_range() is called, >> it will ensure stale mappings to be removed cleanly. >> >> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx> >> --- >> V3: >> Just lock/unlock_page in tcmu_blocks_release(), and call >> tcmu_blocks_release() before unmap_mapping_range(). >> >> V2: >> Wait all possible inflight page faults to be completed in >> find_free_blocks() to fix possible stale map. >> --- >> drivers/target/target_core_user.c | 30 +++++++++++++++++++++++++++--- >> 1 file changed, 27 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c >> index fd7267baa707..ff4a575a14d2 100644 >> --- a/drivers/target/target_core_user.c >> +++ b/drivers/target/target_core_user.c >> @@ -20,6 +20,7 @@ >> #include <linux/configfs.h> >> #include <linux/mutex.h> >> #include <linux/workqueue.h> >> +#include <linux/pagemap.h> >> #include <net/genetlink.h> >> #include <scsi/scsi_common.h> >> #include <scsi/scsi_proto.h> >> @@ -1667,6 +1668,25 @@ static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, >> xas_lock(&xas); >> xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { >> xas_store(&xas, NULL); >> + /* >> + * While reaching here, there maybe page faults occurring on >> + * these to be released pages, and there maybe one race that >> + * unmap_mapping_range() is called before page fault on these >> + * pages are finished, then valid but stale map is created. >> + * >> + * If another command runs in later and needs to extends >> + * dbi_thresh, it may reuse the corresponding slot to previous >> + * page in data_bitmap, then though we'll allocate new page for >> + * this slot in data_area, but no page fault will happen again, >> + * because we have a valid map, command's data will lose. >> + * >> + * So here we lock and unlock pages that are to be released to >> + * ensure all page faults to be completed, then following >> + * unmap_mapping_range() can ensure stale maps to be removed >> + * cleanly. >> + */ >> + lock_page(page); >> + unlock_page(page); >> __free_page(page); >> pages_freed++; >> } >> @@ -1822,6 +1842,7 @@ static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) >> page = xa_load(&udev->data_pages, dpi); >> if (likely(page)) { >> get_page(page); >> + lock_page(page); >> mutex_unlock(&udev->cmdr_lock); >> return page; >> } >> @@ -1863,6 +1884,7 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf) >> struct page *page; >> unsigned long offset; >> void *addr; >> + vm_fault_t ret = 0; >> int mi = tcmu_find_mem_index(vmf->vma); >> if (mi < 0) >> @@ -1887,10 +1909,11 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf) >> page = tcmu_try_get_data_page(udev, dpi); >> if (!page) >> return VM_FAULT_SIGBUS; >> + ret = VM_FAULT_LOCKED; >> } >> vmf->page = page; >> - return 0; >> + return ret; >> } >> static const struct vm_operations_struct tcmu_vm_ops = { >> @@ -3205,12 +3228,13 @@ static void find_free_blocks(void) >> udev->dbi_max = block; >> } >> + /* Release the block pages */ >> + pages_freed = tcmu_blocks_release(udev, start, end - 1); >> + >> /* Here will truncate the data area from off */ >> off = udev->data_off + (loff_t)start * udev->data_blk_size; >> unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); >> - /* Release the block pages */ >> - pages_freed = tcmu_blocks_release(udev, start, end - 1); >> mutex_unlock(&udev->cmdr_lock); >> total_pages_freed += pages_freed;