Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote: | | | Sukadev Bhattiprolu wrote: | > Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote: | > | Add a 'users' count to objhash items, and, for a !CHECKPOINT_SUBTREE | > | checkpoint, return an error code if the actual objects' counts are | > | higher, indicating leaks (references to the objects from a task not | > | being checkpointed). Of course, by this time most of the checkpoint | > | image has been written out to disk, so this is purely advisory. But | > | then, it's probably naive to argue that anything more than an advisory | > | 'this went wrong' error code is useful. | > | | > | The comparison of the objhash user counts to object refcounts as a | > | basis for checking for leaks comes from Alexey's OpenVZ-based c/r | > | patchset. | > | | > | Signed-off-by: Oren Laadan <orenl@xxxxxxxxxxxxxxx> | > | --- | > | checkpoint/checkpoint.c | 8 +++ | > | checkpoint/memory.c | 2 + | > | checkpoint/objhash.c | 108 +++++++++++++++++++++++++++++++++++++++---- | > | include/linux/checkpoint.h | 2 + | > | 4 files changed, 110 insertions(+), 10 deletions(-) | > | | > | diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c | > | index 4319976..32a0a8e 100644 | > | --- a/checkpoint/checkpoint.c | > | +++ b/checkpoint/checkpoint.c | > | @@ -498,6 +498,14 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid) | > | if (ret < 0) | > | goto out; | > | | > | + if (!(ctx->flags & CHECKPOINT_SUBTREE)) { | > | + /* verify that all objects are contained (no leaks) */ | > | + if (!ckpt_obj_contained(ctx)) { | > | + ret = -EBUSY; | > | + goto out; | > | + } | > | + } | > | + | > | /* on success, return (unique) checkpoint identifier */ | > | ctx->crid = atomic_inc_return(&ctx_count); | > | ret = ctx->crid; | > | diff --git a/checkpoint/memory.c b/checkpoint/memory.c | > | index 7637c1e..5ae2b41 100644 | > | --- a/checkpoint/memory.c | > | +++ b/checkpoint/memory.c | > | @@ -687,6 +687,8 @@ static int do_checkpoint_mm(struct ckpt_ctx *ctx, struct mm_struct *mm) | > | ret = exe_objref; | > | goto out; | > | } | > | + /* account for all references through vma/exe_file */ | > | + ckpt_obj_users_inc(ctx, mm->exe_file, mm->num_exe_file_vmas); | > | > Do we really need to add num_exe_file_vmas here ? | > | > A quick look at all callers for added_exe_file_vma() seems to show that | > those callers also do a get_file(). | | Each vma whose file is the same as mm->exe_file causes the refcount | of that file to increase by 2: once by vma->vm_file, and once via | added_exe_file_vma(). The c/r code calls ckpt_obj_checkpoint() only | once, thus once one obj_file_grab() for that file. The code above | accounts for the missing count. | | > | > Anyway, when I try to C/R a simple process tree, I get -EBUSY because | > ckpt_obj_contained() finds a ref-count mismatch for the executable file. | | Are you certain the culprit is this count and not, possibly, some | other "leak" ? Well, I will look some more tomorrow. | If so, do you know what's the count difference ? Yes, dmesg had: c/r: FILE users 10 != count 16 The process tree had 3 processes (parent, child, grant-child) all executing the same file (i.e none of them called exec()). The difference of 6 (or 2 per process made me suspect the num_file_exe_vmas) | Posting the test program and a description of what you did would | be useful... Attached (ptree2.c). I ran it as: $ ./ptree2 -n 1 -d 2 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers