On Wed, Jun 3, 2009 at 9:45 AM, Serge E. Hallyn <serue@xxxxxxxxxx> wrote: > Quoting Andrew G. Morgan (morgan@xxxxxxxxxx): >> You guys are well on the road to having something functional. I do not >> want to get in the way of that. >> >> What I'm trying to question is the way the code is abstracted. My >> sense is that its not very friendly to the subsystems being >> checkpointed. I could offer an alternative patch for the capabilities >> code, but if it breaks some invisible constraint I'd rather you tell >> me about it first. >> >> There are three sets of code in these changes: >> >> 1. the capabilities code (but I think I'm arguing about that because >> it is an example I'm familiar with and this probably applies to other >> subsystems in the kernel) >> 2. the glue code in the kernel that implements sys_restore() etc. >> 3. the user space code >> >> What seems to have happened with these patches is that the glue code >> and the user space code have knowledge of each other and things are >> written to make an API/ABI that they can agree on. (Part of this is >> evidenced by the u64 reference to capabilities and non-use of the more >> natural kernel_cap_t data that the kernel itself uses.) >> >> It seems to me that a more natural abstraction is that 1 and 3 know >> about each other (actually 3 knows about how 1 chooses to define >> things), and 2 is a generic data packager interface. > > One reason (3) and (2) need to know about each is other is bc > userspace needs to read the checkpoint image enough to read > the task states, clone off suitable children, and then each > child calls sys_restart(). So to this end, there are certainly > details which userspace doesn't need to be able to parse. > > The other reason is that userspace will expect to be able to > make changes to the checkpoint file. For instance, if restarting > a process from one kernel version with 34 capabilities defined, > on a kernel version with 35 defined, then userspace may want to > fill in an extra cap. > > But I strongly agree with your concern that subsystems will get > out of sync with the checkpoint code. > > If we just move the struct ckpt_capabilities definition into > include/linux/capabilities.h, does that suffice? The c/r > userspace could then do something like > > cat > ckpthdr.sed << EOF > /^[^ \t]*struct ckpt[^\n]*{/,/}/p > EOF > cat /usr/include/linux/*.h /usr/include/asm/*.h | sed -n -f ckpthdr.sed \ > > checkpoint_headers.h > > Now you mention using kernel_cap_t's... we can go partway > down that road, but the inherent problem is serializing the > relevant data so it can be streamed to some file. So I > think it's better if the subsystems are required to specify > a useful format (like struct ckpt_capabilities) and define > the checkpoint/restart helpers to serialize data into the > struct. We can try and get cute with dual mirrored > struct defs, one which is defined in terms the caps code > can understand, and one defined in more arch-independent > terms (which is why we need __u64s and packed, for instance). > But that seems more fragile than having clear and simple > requirements for the $SUBYSTEM_checkpoint() and $SUBSYSTEM_restart() > helpers. I like this $SUBSYSTEM_checkpoint() etc. thing. I like the ckpthdr.sed thing. I think a similar rule could be used to generate the calls to the list of $SUBSYSTEM_checkpoint() functions. For serialization, could a kernel "gcc -E checkpoint-headers.h > this-kernel-checkpoint-file.h" build rule be enough? Cheers Andrew > > (I'll wait to hear whether you think I'm on the right track > before reworking my patch) > >> This should be functionally equivalent, but my belief is an >> implementation based on the current abstraction will be broken more >> often because some subsystem (eg. 1) evolves its critical state and >> the corresponding (2+3)'s relationship is too confusing/constraining >> for the kernel subsystem developer to keep in sync (someone else's >> problem). And subtle problems of ignored state in checkpoints can be >> really hard to debug! >> >> Bottom line. Don't wait for me to Ack this change. I intend to lurk >> and if I see a way to "clean it up", I'll offer a patch, which you can >> of course reject... But don't wait on me. > > Thanks, it's much appreciated. But please don't hold back, > as your points are imo valid, and we need to work out how to > address them. > > -serge > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers