Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 18, 2020 at 01:59:41PM -0700, Linus Torvalds wrote:

> Honestly, if we had a completely *reliable* sign of "this page is
> pinned", then I think the much nicer option would be to just say
> "pinned pages will not be copied at all". Kind of an implicit
> VM_DONTCOPY.

It would be simpler to implement, but it makes the programming model
really sketchy. For instance O_DIRECT is using FOLL_PIN, so imagine
this program:

        CPU0                                      CPU1

 a = malloc(1024);
                                                b = malloc(1024);
 read(fd, a, 1024); // FD is O_DIRECT
 ...                                            fork()
                                                  *b = ...
 read completes

Here a and b got lucky and both come from the same page due to the
allocator.

In this case the fork() child in CPU1, would be very surprised that
'b' was not mapped into the fork.

Similiarly, CPU0 would have silent data corruption if the read didn't
deposit data into 'a' - which is a bug we have today. In this race the
COW break of *b might steal the physical page to the child, and *a
won't see the data. For this reason, John is right, fork needs to
eventually do this for O_DIRECT as well.

The copy on fork nicely fixes all of this weird oddball stuff.

Jason




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux