On Sat, Jun 25, 2022 at 07:49:54AM +0000, Nadav Amit wrote: > > > > On Jun 24, 2022, at 3:17 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Fri, Jun 24, 2022 at 05:58:17PM -0400, Peter Xu wrote: > >> [Sorry for replying late] > >> > >> Said that, I think it doesn't really necessary need to be that complex, > >> since make_huge_pte() already sets dirty bit when "writable=1", so IIUC > >> what you need to do is simply make sure dirty bit set when write_hint=1. > >> > >> Does it sounds correct to you? > > > > Hmm, hold on... I failed to figure out how that write-likely hint could > > help us for either huge or non-huge pages, since: > > > > (1) Old code always set dirty, so no perf degrade anyway with/without the > > hint > > > > (2) If we want to rework dirty bit (which I'm totally fine with..), then > > we don't apply it when we shouldn't, and afaict we should set D bit > > whenever we should... if the user assumes this page is likely to be > > written but made it read-only, say, with UFFDIO_COPY(wp_mode=1), > > setting D bit will not help, instead, the user should simply use an > > UFFDIO_COPY(wp_mode=0) then the dirty will be set with write=1.. > > > > It'll be helpful but only helpful for UFFDIO_ZEROCOPY because it avoids one > > COW. But that seems to be it. > > > > In short: I'm wondering whether we only really need the ACCESS_LIKELY hint > > as you proposed earlier. We may want UFFDIO_ZEROPAGE_MODE_ALLOCATE > > separately, but keep that only for zeropage op (and it shouldn't really be > > called WRITE_LIKELY)? Or did I miss something? > > Let’s see if I get you correctly. I am not sure whether we had this > discussion before. > > We are talking about a scenario in which WP=0. You argue that if the page > is already set as dirty, what is the benefit of not setting the dirty-bit, > right? > > So first, IIUC, there are cases in which the page would not be set as > dirty, e.g., UFFDIO_CONTINUE. [ I am admittedly not too familiar with this > use-case, so I say it based on the comments. ] > > Second, even if the page is dirty (e.g., following UFFDIO_COPY), but it > is not written by the user after UFFDI_COPY, marking the PTE as dirty > when it is mapped would induce overhead, as we discussed before, since > if/when the PTE is unmapped, TLB flush batching might not be possible. I'd hope we don't make an interface design just to service that purpose of when write=0 and dirty=1 use case that is internal to the kernel so far, and I still think it's the tlb flush code to change.. or do we have other use case for this WRITE_LIKELY hint? For UFFDIO_CONTINUE, if we want to make things clear on dirty bit, then IMHO for UFFDIO_CONTINUE the right place for the dirty process is where the user writes to the page in the other mapping, where PageDirty() will start to be true already even if the pte that to be CONTINUEd will have dirty=0 in the pte entry. From that pov I still don't see why we need to grant the user on the dirty bit control, no matter with a hint only, or explicit. > > So I don’t think there is a problem in having WRITE_LIKELY hint. Moreover, > I would reiterate my position (which you guys convinced me in!) David convinced you I think :) > that having hints that indicate what the user does (WRITE_LIKELY) is a > better API than something that indicates directly what the kernel should > do (e.g., UFFDIO_ZEROPAGE_MODE_ALLOCATE). The hint idea sounds good to me, it's just that we actually have two steps here: (1) We think providing user the control of dirty bit makes sense, then, (2) We think the flag should be a hint not explicit "set dirty bit" I agree with (2) in this case if (1) is applicable. And now I think I'm questioning myself on (1). Fundamentally, access bit has more meaningful context (0 means cold, 1 means hot), for dirty it's really more a perf thing to me (when clear, it'll take extra cycles to set it when memory write happens to it; being clear _may_ help only for the tlb flush example you mentioned but I'm not fully convinced that's correct). Maybe with the to be proposed RFC patch for tlb flush we can know whether that should be something we can rely on. It'll add more dependency on this work which I'm sorry to say. It's just that IMHO we should think carefully for the write-hint because this is a solid new uABI we're talking about. The other option is we can introduce the access hint first and think more on the dirty one (we can always add it when proper). What do you think? Also, David please chim in anytime if I missed the whole point when you proposed the idea. > > But this discussion made me think that there are two somewhat related > matters that we may want to address: > > 1. mwriteprotect_range() should use MM_CP_TRY_CHANGE_WRITABLE when !wp > to proactively make entries writable and save . I'm not sure I'm right here, but I think David's patch should have covered that case? The new helper only checks pte_uffd_wp() based on my memory, and when resolving page faults uffd-wp bit should have been gone, so it should be treated the same as normal ptes. > > 2. The WRITE_LIKELY hint should be propagated from mwriteprotect_range() > to change_pte_range() through cp_flags, and the entry should be set > dirty accordingly. Sounds correct. Though again I hope we can think more thoroughly on whether we need the write-hint first. Thanks, -- Peter Xu