> On Jun 24, 2022, at 3:17 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Fri, Jun 24, 2022 at 05:58:17PM -0400, Peter Xu wrote: >> [Sorry for replying late] >> >> Said that, I think it doesn't really necessary need to be that complex, >> since make_huge_pte() already sets dirty bit when "writable=1", so IIUC >> what you need to do is simply make sure dirty bit set when write_hint=1. >> >> Does it sounds correct to you? > > Hmm, hold on... I failed to figure out how that write-likely hint could > help us for either huge or non-huge pages, since: > > (1) Old code always set dirty, so no perf degrade anyway with/without the > hint > > (2) If we want to rework dirty bit (which I'm totally fine with..), then > we don't apply it when we shouldn't, and afaict we should set D bit > whenever we should... if the user assumes this page is likely to be > written but made it read-only, say, with UFFDIO_COPY(wp_mode=1), > setting D bit will not help, instead, the user should simply use an > UFFDIO_COPY(wp_mode=0) then the dirty will be set with write=1.. > > It'll be helpful but only helpful for UFFDIO_ZEROCOPY because it avoids one > COW. But that seems to be it. > > In short: I'm wondering whether we only really need the ACCESS_LIKELY hint > as you proposed earlier. We may want UFFDIO_ZEROPAGE_MODE_ALLOCATE > separately, but keep that only for zeropage op (and it shouldn't really be > called WRITE_LIKELY)? Or did I miss something? Let’s see if I get you correctly. I am not sure whether we had this discussion before. We are talking about a scenario in which WP=0. You argue that if the page is already set as dirty, what is the benefit of not setting the dirty-bit, right? So first, IIUC, there are cases in which the page would not be set as dirty, e.g., UFFDIO_CONTINUE. [ I am admittedly not too familiar with this use-case, so I say it based on the comments. ] Second, even if the page is dirty (e.g., following UFFDIO_COPY), but it is not written by the user after UFFDI_COPY, marking the PTE as dirty when it is mapped would induce overhead, as we discussed before, since if/when the PTE is unmapped, TLB flush batching might not be possible. So I don’t think there is a problem in having WRITE_LIKELY hint. Moreover, I would reiterate my position (which you guys convinced me in!) that having hints that indicate what the user does (WRITE_LIKELY) is a better API than something that indicates directly what the kernel should do (e.g., UFFDIO_ZEROPAGE_MODE_ALLOCATE). But this discussion made me think that there are two somewhat related matters that we may want to address: 1. mwriteprotect_range() should use MM_CP_TRY_CHANGE_WRITABLE when !wp to proactively make entries writable and save . 2. The WRITE_LIKELY hint should be propagated from mwriteprotect_range() to change_pte_range() through cp_flags, and the entry should be set dirty accordingly. With that, and with leaving hugetlbfs as it is (I meant before - leaving it as it is in the patchset that I sent), would it be ok with you?