hi, > On 05.04.22 18:03, Xiaoguang Wang wrote: >> hi, >> >>> On 23.03.22 14:49, Xiaoguang Wang wrote: >>>> When tcmu_vma_fault() gets one page successfully, before the current >>>> context completes page fault procedure, find_free_blocks() may run in >>>> and call unmap_mapping_range() to unmap this page. Assume when >>>> find_free_blocks() completes its job firstly, previous page fault >>>> procedure starts to run again and completes, then one truncated page has >>>> beed mapped to use space, but note that tcmu_vma_fault() has gotten one >>>> refcount for this page, so any other subsystem won't use this page, >>>> unless later the use space addr is unmapped. >>>> >>>> If another command runs in later and needs to extends dbi_thresh, it may >>>> reuse the corresponding slot to previous page in data_bitmap, then thouth >>>> we'll allocate new page for this slot in data_area, but no page fault will >>>> happen again, because we have a valid map, real request's data will lose. >>> >>> I don't think, this is a safe fix. It is possible that not only >>> find_free_blocks runs before page fault procedure completes, but also >>> allocation for next cmd happens. In that case the new call to >>> unmap_mapping_range would also happen before page fault completes -> >>> data corruption. >>> >>> AFAIK, no one ever has seen this this bug in real life, as >> Yeah, I know, just find this maybe an issue by reading codes :) >> >>> find_free_blocks only runs seldomly and userspace would have to access >>> a data page the very first time while the cmd that owned this page >>> already has been completed by userspace. Therefore I think we should >>> apply a perfect fix only. >>> >>> I'm wondering whether there really is such a race. If so, couldn't the >>> same race happen in other drivers or even when truncating mapped files? >> Indeed, I have described how filesystem implementations avoid this issue >> in patch's commit message: >> >> Filesystem implementations will also run into this issue, but they >> usually lock page when vm_operations_struct->fault gets one page, and >> unlock page after finish_fault() completes. In truncate sides, they >> lock pages in truncate_inode_pages() to protect race with page fault. >> We can also have similar codes like filesystem to fix this issue. >> >> >> Take ext4 as example, a file in ext4 is mapped to user space, if then a truncate >> operation occurs, ext4 calls truncate_pagecache(): >> void truncate_pagecache(struct inode *inode, loff_t newsize) >> { >> struct address_space *mapping = inode->i_mapping; >> loff_t holebegin = round_up(newsize, PAGE_SIZE); >> >> /* >> * unmap_mapping_range is called twice, first simply for >> * efficiency so that truncate_inode_pages does fewer >> * single-page unmaps. However after this first call, and >> * before truncate_inode_pages finishes, it is possible for >> * private pages to be COWed, which remain after >> * truncate_inode_pages finishes, hence the second >> * unmap_mapping_range call must be made for correctness. >> */ >> unmap_mapping_range(mapping, holebegin, 0, 1); >> truncate_inode_pages(mapping, newsize); >> unmap_mapping_range(mapping, holebegin, 0, 1); >> } >> >> In truncate_inode_pages(), it'll lock page and set page->mapping >> to be NULL, and in ext4's filemap_fault(), it'll lock page and check whether >> page->mapping has been changed, if it's true, it'll just fail the page >> fault procedure. >> >> For tcmu, though the data area's pages don't have a valid mapping, >> but we can apply similar method. >> In tcmu_vma_fault(), we lock the page and set VM_FAULT_LOCKED >> flag, > > Yeah, looking into page fault handling I'm wondering why tcmu didn't do > that from the beginning! > >> in find_free_blocks(), we firstly try to lock pages which are going >> to be released, if lock_page() returns, > > I assume, we immediately unlock the page again, right? Yeah, and I'll add proper code comments in new version patch set. > >> we can ensure that there are >> not inflight running page fault procedure, and following unmap_mapping_range() >> will also ensure that all user maps will be cleared. >> Seems that it'll resolve this possible issue, please have a check, thanks. > > AFAICS, this is the clean solution we were searching for. Agree, I'll prepare new patch set soon. Regards, Xiaoguang Wang > > Thank you > Bodo