Re: [PATCH 0/1] soft_dirty: fix soft_dirty during THP split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 19, 2016 at 04:52:51PM +0300, Pavel Emelyanov wrote:
> Hm... Are you talking about some in-kernel test, or just any? We have
> tests in CRIU tree for UFFD (not sure we've wired up the non-cooperative
> part though).

Nice. I wasn't aware you had uffd specific tests in CRIU, I'll check.

I was referring to the tools/testing/selftest/vm/userfault*, but I
suppose it's fine in CIRU as well. A self contained test suitable for
testing/selftest would be nice too as not everyone will run CRIU tests
to test the kernel.

Currently what's tested is anon missing, tmpfs missing and hugetlbfs
missing and they all work (just fixed two tmpfs bugs yesterday thanks
to the tmpfs test that crashed my workstation when I tried it, now it
passes fine :).

> And my main worry about this is COW-sharing. If we have two tasks that
> fork()-ed from each other and we try to lazily restore a page that
> is still COW-ed between them, the uffd API doesn't give us anything to
> do it. So we effectively break COW on lazy restore. Do you have any
> ideas what can be done about it?

Building a shared page is tricky, not even khugepaged was doing that
for anon.

Kirill extended khugepaged to do it, along the THP on tmpfs support,
as it's more important for tmpfs (I haven't yet checked if it landed
upstream with the rest of tmpfs in 4.8-rc though).

The main API problem is the uffd is different between parent and
child, fork with your non cooperative patches gives you a new uffd
that represents the child mm.

To create a shared page among two "mm" the API should be able to
specify the two "mm" and two "addresses" atomically in the same
ioctl. And the uffd _is_ the "mm" with the current API.

So what it takes to do it is to add a UFFDIO_COPY_COW that takes as
parameter an address for the current "uffd" and a list of "int uffd,
unsigned long address" pairs.

Even with the UFFDIO_COPY things should still work solid, it'll just
take more memory and it'll break-COW during restore. The important
thing is "break" is as in "allocate more memory", not as in "crashing" :).

> We have ... readiness to do it :) since once CRIU hits this we'll have to.

Ok great.

I also thought about it a bit and I think it's just a matter of
specifying which uffd should get the notification first. The manager
then will take the notification first and it will call an
UFFDIO_FAULT_PASS to cascade in the second uffd registered in the
region if the page was missing in the source container, without waking
up the task blocked in handle_userfault. To find the page is missing
in the source container you could use pagemap.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]