On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote: > On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote: > > On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote: > > > > On Oct 30, 2018, at 9:37 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > > I support the addition of a rare-write mechanism to the upstream kernel. > > > And I think that there is only one sane way to implement it: using an > > > mm_struct. That mm_struct, just like any sane mm_struct, should only > > > differ from init_mm in that it has extra mappings in the *user* region. > > > > I'd like to understand this approach a little better. In a syscall path, > > we run with the user task's mm. What you're proposing is that when we > > want to modify rare data, we switch to rare_mm which contains a > > writable mapping to all the kernel data which is rare-write. > > > > So the API might look something like this: > > > > void *p = rare_alloc(...); /* writable pointer */ > > p->a = x; > > q = rare_protect(p); /* read-only pointer */ > > > > To subsequently modify q, > > > > p = rare_modify(q); > > q->a = y; > > Do you mean > > p->a = y; > > here? I assume the intent is that q isn't writable ever, but that's > the one we have in the structure at rest. Yes, that was my intent, thanks. To handle the list case that Igor has pointed out, you might want to do something like this: list_for_each_entry(x, &xs, entry) { struct foo *writable = rare_modify(entry); kref_get(&writable->ref); rare_protect(writable); } but we'd probably wrap it in list_for_each_rare_entry(), just to be nicer.