Nick Piggin wrote: > Again it comes back to the whole writeout thing, which makes it more > constraining on the kernel to optimise. Cute :-) It was intended to make it easier to optimise, but maybe it failed. > For example, my fsync "livelock" avoidance patches did the following: > > 1. find all pages which are dirty or under writeout first. > 2. write out the dirty pages. > 3. wait for our set of pages. > > Simple, obvious, and the kernel can optimise this well because the > userspace has asked for a high level request "make this data safe" > rather than low level directives. We can't do this same nice simple > sequence with sync_file_range because SYNC_FILE_RANGE_WAIT_AFTER > means we have to wait for all writeout pages in the range, including > unrelated ones, after the dirty writeout. SYNC_FILE_RANGE_WAIT_BEFORE > means we have to wait for clean writeout pages before we even start > doing real work. As noted in my other mail just now, although sync_file_range() is described as though it does the three bulk operations consecutively, I think it wouldn't be too shocking to think the intended semantics _could_ be: "wait and initiate writeous _as if_ we did, for each page _in parallel_ { if (SYNC_FILE_RANGE_WAIT_BEFORE && page->writeout) wait(page) if (SYNC_FILE_RANGE_WRITE) start_writeout(page) if (SYNC_FILE_RANGE_WAIT_AFTER && writeout) wait(page) }" That permits many strategies, and I think one of them is the nice livelock-avoiding fsync you describe up above. You might be able to squeeze the sync_file_range() flags into that by chopping it up like this. Btw, you omitted step 1.5 "wait for dirty pages which are already under writeout", but it's made explicit here: 1. find all pages which are dirty or under writeout first, and remember which of them are dirty _and_ under writeout (DW). 2. if (SYNC_FILE_RANGE_WRITE) write out the dirty pages not in DW. 3. if (SYNC_FILE_RANGE_WAIT_BEFORE) { wait for the set of pages in DW. write out the pages in DW. } 4. if (SYNC_FILE_RANGE_WAIT_BEFORE || SYNC_FILE_RANGE_WAIT_AFTER) wait for our set of pages. However, maybe the flags aren't all that useful really, and maybe sync_file_range() could be replaced by a stub which ignores the flags and calls fsync_range(). -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html