On Wed, 9 Jul 2014, Johannes Weiner wrote: > On Thu, Jun 26, 2014 at 10:36:20PM -0700, Hugh Dickins wrote: > > Hannes, a question for you please, I just could not make up my mind. > > In mm/truncate.c truncate_inode_pages_range(), what should be done > > with a failed clear_exceptional_entry() in the case of hole-punch? > > Is that case currently depending on the rescan loop (that I'm about > > to revert) to remove a new page, so I would need to add a retry for > > that rather like the shmem_free_swap() one? Or is it irrelevant, > > and can stay unchanged as below? I've veered back and forth, > > thinking first one and then the other. > > I realize you have given up on changing truncate.c in the meantime, > but I'm still asking myself about the swap retry case: why retry for > swap-to-page changes, yet not for page-to-page changes? > > In case faults are disabled through i_size, concurrent swapin could > still turn swap entries into pages, so I can see the need to retry. > There is no equivalent for shadow entries, though, and they can only > be turned through page faults, so no retry necessary in that case. > > However, you explicitely mentioned the hole-punch case above: if that > can't guarantee the hole will be reliably cleared under concurrent > faults, I'm not sure why it would put in more effort to free it of > swap (or shadow) entries than to free it of pages. > > What am I missing? In dropping the pincer effect, I am conceding that data written (via mmap) racily into the hole, behind the punching cursor, between the starting and the ending of the punch operation, may be allowed to remain. It will not often happen (given the two loops), but it might. But I insist that all data in the hole at the starting of the punch operation must be removed by the ending of the punch operation (though of course, given the paragraph above, identical data might be written in its place concurrently, via mmap, if the application chooses). I think you probably agree with both of those propositions. As the punching cursor moves along the radix_tree, it gathers page pointers and swap entries (the emply slots are already skipped at the level below; and tmpfs takes care that there is no instant in switching between page and swap when the slot appears empty). Dealing with the page pointers is easy: a reference is already held, then shmem_undo_range takes the page lock which prevents swizzling to swap, then truncates that page out of the tree. But dealing with swap entries is slippery: there is no reference held, and no lock to prevent swizzling to page (outside of the tree_lock taken in shmem_free_swap). So, as I see it, the page lock ensures that any pages present at the starting of the punch operation will be removed, without any need to go back and retry. But a swap entry present at the starting of the punch operation might be swizzled back to page (and, if we imagine massive preemption, even back to swap again, and to page again, etc) at the wrong moment: so for swap we do need to retry. (What I said there is not quite correct: that swap would actually have to be a locked page at the time when the first loop meets it.) Does that make sense? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>