On Fri, Jun 14, 2013 at 09:41:33AM +0900, Minchan Kim wrote: > On Fri, Jun 14, 2013 at 09:32:13AM +0900, Minchan Kim wrote: > > Hello Pavel, > > > > On Thu, Jun 13, 2013 at 01:10:50PM +0400, Pavel Emelyanov wrote: > > > On 06/13/2013 05:53 AM, Minchan Kim wrote: > > > > Hi all, > > > > > > > > Sorry for late interrupting to promote patchset to the mainline. > > > > I'd like to discuss our usecase so I'd like to change per-process > > > > interface with per-range interface. > > > > > > > > Our usecase is following as, > > > > > > > > A application allocates a big buffer(A) and makes backup buffer(B) > > > > for it and copy B from A. > > > > Let's assume A consists of subranges (A-1, A-2, A-3, A-4). > > > > As time goes by, application can modify anywhere of A. > > > > In this example, let's assume A-1 and A-2 are modified. > > > > When the time happen, we compare A-1 with B-1 to make > > > > diff of the range(On every iteration, we don't need all range's diff by design) > > > > and do something with diff, then we'd like to remark only the A-1 with > > > > soft-dirty, NOT A's all range of the process to track the A-1's > > > > further difference in future while keeping dirty information (A-2, A-3, A-4) > > > > because we will make A-2's diff in next iteration. > > > > > > > > We can't do it by existing interface. > > > > > > So you need to track changes not in the whole range, but in sub-ranges. > > > OK. > > > > Right. > > > > > > > > > So, I'd like to add [addr, len] argument with using proc > > > > > > > > echo 4 0x100000 0x3000 > /proc/self/clear_refs > > > > > > > > It doesn't break anything but not sure everyone like the interface > > > > because recently I heard from akpm following comment. > > > > > > > > https://lkml.org/lkml/2013/5/21/529 > > > > > > > > Although per-process reclaim is another story with this, > > > > I feel he seems to hate doing something on proc interface with > > > > /proc/pid/maps like above range parameter. > > > > > > > > If it's not allowed, another approach should be new system call. > > > > > > > > int sys_softdirty(pid_t pid, void *addr, size_t len); > > > > > > This looks like existing sys_madvise() one. > > > > Except pid part. It is added by your purpose, which external task > > can control any process. > > > > > > > > > If we approach new system call, we don't need to maintain current > > > > proc interface and it would be very handy to get a information > > > > without pagemap (open/read/close) so we can add a parameter to > > > > get a dirty information easily. > > > > > > > > int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec) > > > > > > > > What do you think about it? > > > > > > > > > > This is OK for me, though there's another issue with this API I'd like > > > to mention -- consider your app is doing these tricks with soft-dirty > > > and at the same time CRIU tools live-migrate it using the soft-dirty bits > > > to optimize the freeze time. > > > > > > In that case soft-dirty bits would be in wrong state for both -- you app > > > and CRIU, but with the proc API we could compare the ctime-s of the > > > clear_refs file and find out, that someone spoiled the soft-dirty state > > > from last time we messed with it and handle it somehow (copy all the memory > > > in the worst case). Can we somehow handle this with your proposal? > > > > Good point I didn't think over that. > > A simple idea popped from my mind is we can use read/write lock > > so if pid is equal to calling process's one or pid is NULL, > > we use read side lock, which can allow marking soft-dirty > > several vmas with parallel. And pid is not equal to calling > > process's one, the API should try to hold write-side lock > > then, if it's fail, the API should return EAGAIN so that CRIU > > can progress other processes and retry it after a while. > > Of course, it would make live-lock so that sys_softdirty might > > need another argument like "int block". > > And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY > and the flag will be protected by above lock. It could prevent mixed > case by self and external. I realized it's not enough. Another idea is here. The intenion is followin as, self softdirty VS self softdirty -> NOT exclusive self softdirty VS external softdirty -> exclusive external softdirty VS external softdirty-> excluisve struct softdirty token { u64 external; u64 internal; }; int sys_set_softdirty(pid_t pid, unsigned long start, size_t len, struct softdirty *token); int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, struct softdirty token, char *vec); SYSCALL(set_softdirty, ..., token) { struct task_struct *tsk = task_from_pid(pid); mutex_lock(&mm->st_lock); if (tsk == current) tsk->mm->token.internal++; else tsk->mm->token.external++; token->external = mm->token.external; token->internal = mm->token.internal; mutex_unlock(&mm->st_lock); .. .. } SYSCALL(get_softdirty, ..., token, ...) { struct task_struct *tsk = task_from_pid(pid); mutex_lock(&mm->st_lock); if (tsk == current) { if (tsk->mm->token.external != token.external) { mutex_unlock return -EAGAIN; } } else { if (tsk->mm->token.external != token.external || tsk->mm->token.internal != token.internal) { mutex_unlock; return -EAGAIN; } } mutex_unlock(&mm->st_lock); ... } > > -- > Kind regards, > Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>