Re: COW in XArray

Shawn Landden <slandden@xxxxxxxxx> · Sun, 12 May 2019 22:42:11 -0500

On Sun, May 12, 2019 at 9:22 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Sun, May 12, 2019 at 09:56:47AM -0500, Shawn Landden wrote:
> > I am trying to implement epochs for pids. For this I need to allow
> > radix tree operations to be specified COW (deletion does not need to
> > change). Radix
> > trees look like they are under alot of work by you, so how can I best
> > get this feature, and have some code I can work with to write my
> > feature?
>
> Hi Shawn,
>
> I'd love to help, but I don't quite understand what you want.
>
> Here's the conversion of the PID allocator from the IDR to the XArray:
>
> http://git.infradead.org/users/willy/linux-dax.git/commitdiff/223ad3ae5dfffdfc5642b1ce54df2c7836b57ef1
>
> What semantics do you want to change?
When allocating a pid, you pass an epoch number. If the pids being
allocated wrap, then the epoch is incremented, and a new radix tree
created that is COW of the last epoch. If the page that is found for
allocation is of an older epoch, it is copied and the allocation only
happens in the copy.

On freeing a pid, there a single radix-tree bit for every still-active
epoch that is set to indicate that this slot has expired. This will be
used for the (new) waitpidv syscall, which can provide all the
functionality of wait4() and more, and allows process to synchronize
their references to the current epoch.

The current versions of the pid syscalls will continue to operate with
the same existing racy semantics. New pid syscalls will be added that
take an epoch argument. A current pid epoch u32 is added to
task_sched, that reset on fork() when a new process is allocated, then
a new pid is allocated, and the epoch has a prctl setter and getter.

If a syscall comes in with and the epoch passed is not current AND has
passed the pid of the process (this is not a lock, because we current
and previous epochs are always available), then it might fail with
EEPOCH, the caller then has to call a new syscall, waitpidv(pidv
*pid_t, epoch, O_NONBLOCK) providing a list of pids it has references
to in a specific epoch, and it gets back a list of which processes
have excited.

The epoch of a process is always relative to it's pid (not thread-id),
so the same epoch number can mean differn't things in differn't
places.

The process can then invalidate its own internal pids and use ptctl to
indicate it doesn't need the old epoch. Processes also get a signal if
they haven't updated and are 2 full epochs behind. Being behind should
also could against a process in kernel memory accounting. I am sure
there is much more to consider....