Quoting Andrew G. Morgan (morgan@xxxxxxxxxx): > You guys are well on the road to having something functional. I do not > want to get in the way of that. > > What I'm trying to question is the way the code is abstracted. My > sense is that its not very friendly to the subsystems being > checkpointed. I could offer an alternative patch for the capabilities > code, but if it breaks some invisible constraint I'd rather you tell > me about it first. > > There are three sets of code in these changes: > > 1. the capabilities code (but I think I'm arguing about that because > it is an example I'm familiar with and this probably applies to other > subsystems in the kernel) > 2. the glue code in the kernel that implements sys_restore() etc. > 3. the user space code > > What seems to have happened with these patches is that the glue code > and the user space code have knowledge of each other and things are > written to make an API/ABI that they can agree on. (Part of this is > evidenced by the u64 reference to capabilities and non-use of the more > natural kernel_cap_t data that the kernel itself uses.) > > It seems to me that a more natural abstraction is that 1 and 3 know > about each other (actually 3 knows about how 1 chooses to define > things), and 2 is a generic data packager interface. One reason (3) and (2) need to know about each is other is bc userspace needs to read the checkpoint image enough to read the task states, clone off suitable children, and then each child calls sys_restart(). So to this end, there are certainly details which userspace doesn't need to be able to parse. The other reason is that userspace will expect to be able to make changes to the checkpoint file. For instance, if restarting a process from one kernel version with 34 capabilities defined, on a kernel version with 35 defined, then userspace may want to fill in an extra cap. But I strongly agree with your concern that subsystems will get out of sync with the checkpoint code. If we just move the struct ckpt_capabilities definition into include/linux/capabilities.h, does that suffice? The c/r userspace could then do something like cat > ckpthdr.sed << EOF /^[^ \t]*struct ckpt[^\n]*{/,/}/p EOF cat /usr/include/linux/*.h /usr/include/asm/*.h | sed -n -f ckpthdr.sed \ > checkpoint_headers.h Now you mention using kernel_cap_t's... we can go partway down that road, but the inherent problem is serializing the relevant data so it can be streamed to some file. So I think it's better if the subsystems are required to specify a useful format (like struct ckpt_capabilities) and define the checkpoint/restart helpers to serialize data into the struct. We can try and get cute with dual mirrored struct defs, one which is defined in terms the caps code can understand, and one defined in more arch-independent terms (which is why we need __u64s and packed, for instance). But that seems more fragile than having clear and simple requirements for the $SUBYSTEM_checkpoint() and $SUBSYSTEM_restart() helpers. (I'll wait to hear whether you think I'm on the right track before reworking my patch) > This should be functionally equivalent, but my belief is an > implementation based on the current abstraction will be broken more > often because some subsystem (eg. 1) evolves its critical state and > the corresponding (2+3)'s relationship is too confusing/constraining > for the kernel subsystem developer to keep in sync (someone else's > problem). And subtle problems of ignored state in checkpoints can be > really hard to debug! > > Bottom line. Don't wait for me to Ack this change. I intend to lurk > and if I see a way to "clean it up", I'll offer a patch, which you can > of course reject... But don't wait on me. Thanks, it's much appreciated. But please don't hold back, as your points are imo valid, and we need to work out how to address them. -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers