>>>>> If it's not allowed, another approach should be new system call. >>>>> >>>>> int sys_softdirty(pid_t pid, void *addr, size_t len); >>>> >>>> This looks like existing sys_madvise() one. >>> >>> Except pid part. It is added by your purpose, which external task >>> can control any process. In CRIU we can work with pid-less syscalls just fine :) So extending regular madvise would work. >>>> >>>>> If we approach new system call, we don't need to maintain current >>>>> proc interface and it would be very handy to get a information >>>>> without pagemap (open/read/close) so we can add a parameter to >>>>> get a dirty information easily. >>>>> >>>>> int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec) >>>>> >>>>> What do you think about it? >>>>> >>>> >>>> This is OK for me, though there's another issue with this API I'd like >>>> to mention -- consider your app is doing these tricks with soft-dirty >>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits >>>> to optimize the freeze time. >>>> >>>> In that case soft-dirty bits would be in wrong state for both -- you app >>>> and CRIU, but with the proc API we could compare the ctime-s of the >>>> clear_refs file and find out, that someone spoiled the soft-dirty state >>>> from last time we messed with it and handle it somehow (copy all the memory >>>> in the worst case). Can we somehow handle this with your proposal? >>> >>> Good point I didn't think over that. >>> A simple idea popped from my mind is we can use read/write lock >>> so if pid is equal to calling process's one or pid is NULL, >>> we use read side lock, which can allow marking soft-dirty >>> several vmas with parallel. And pid is not equal to calling >>> process's one, the API should try to hold write-side lock >>> then, if it's fail, the API should return EAGAIN so that CRIU >>> can progress other processes and retry it after a while. >>> Of course, it would make live-lock so that sys_softdirty might >>> need another argument like "int block". >> >> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY >> and the flag will be protected by above lock. It could prevent mixed >> case by self and external. > > I realized it's not enough. Another idea is here. > The intenion is followin as, > > self softdirty VS self softdirty -> NOT exclusive > self softdirty VS external softdirty -> exclusive > external softdirty VS external softdirty-> excluisve I think it might work for us. However, I have two comments to the implementation, please see below. > struct softdirty token { > u64 external; > u64 internal; > }; > > int sys_set_softdirty(pid_t pid, unsigned long start, size_t len, > struct softdirty *token); > int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, > struct softdirty token, char *vec); Can you please show an example how to use these two, I don't quite get how can I do external soft-dirty tracking in atomic manner. > > SYSCALL(set_softdirty, ..., token) > { > struct task_struct *tsk = task_from_pid(pid); > mutex_lock(&mm->st_lock); > if (tsk == current) > tsk->mm->token.internal++; > else > tsk->mm->token.external++; > token->external = mm->token.external; > token->internal = mm->token.internal; > mutex_unlock(&mm->st_lock); > .. > .. > > } > > SYSCALL(get_softdirty, ..., token, ...) > { > struct task_struct *tsk = task_from_pid(pid); > mutex_lock(&mm->st_lock); > if (tsk == current) { > if (tsk->mm->token.external != token.external) { > mutex_unlock > return -EAGAIN; > } > } else { > if (tsk->mm->token.external != token.external || > tsk->mm->token.internal != token.internal) { > mutex_unlock; > return -EAGAIN; > } > } > mutex_unlock(&mm->st_lock); Presumably the critical section should be longer, as if tokens match and we release the lock and proceed with working on pagemap, the concurrent call to set_softdirty can proceed and spoil the picture. > ... > } > > > > >> >> -- >> Kind regards, >> Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>