Dan Smith wrote: > OL> What about leak detection ? > OL> Aren't we missing {netns,netdev}_users()? > > This is something I need to give more thought to, but it's not as easy > as it sounds. Network devices aren't released at the last put() like > a lot of other things, and my initial attempts to reconcile the > refcount after a checkpoint operation have not been successful. > > However, I'm not sure that it's as important here, because AFAIK, a > network device can only exist in one network namespace at a time. If > we're checkpointing a netdev, it's because we are checkpointing the > namespace that it lives in. Making sure the netns isn't leaked out of > the process tree would be much easier and just as effective, no? We should guarantee that neither netns nor netdev leaks outside the container; currently none is. If a netdev can only belong to a single netns, then it suffices to only care about netns. > >>> +config CHECKPOINT_NETNS >>> + bool >>> + default y if NET && NET_NS && CHECKPOINT >>> + > > OL> Did you mean this to be visible (settable) by the user ? > > No, it was specifically supposed to enable itself when those other > items are enabled, but not be a user adjustable toggle. I had a > discussion with Serge about it and we came to this as a solution, > although I don't remember what the problem we started with was. I'll > dig through my IRC logs to see if I can figure it out. Duh.. my bad, I misinterpreted the code. That's fine. BTW, there is a similar SYSVIPC_CHECKPOINT - we should decide if we do X_CHECKPOINT or CHECKPOINT_X for a subsystem X, and stick to that convention. I prefer the latter - what you did... > >>> + retry: >>> + if (++pages > 4) { >>> + addrs = -E2BIG; >>> + goto out; >>> + } > > OL> Why 4 ? > > It's just a sanity limit. Hmm... let me be more explicit: why not keep trying until it realloc fails ? or switch to vmalloc() at some point ? > > OL> Do we really need this special case ? I'd be happy with a ckpt_err() > OL> for any error - and the actual error number would be useful to tell > OL> which case it was. > > Unless I'm missing something, you asked for this specifically: > > https://lists.linux-foundation.org/pipermail/containers/2010-February/022844.html Lol .. that was me :o But looking at the code it feels wrong, because the errno already reveals the type of the problem. I'm thinking - wouldn't it make sense to do error reporting in checkpoint_netdev() if the call to ->ndo_checkpoint() fails ? > > OL> Isn't this check redundant ? I expect it to fail promptly in > OL> checkpoint_netdev() above. > > No, as I've said a couple of times previously, this isn't the only way > we can arrive at a netdev for checkpointing. This case is the one > where we're marching through the netns and find a netdev that is not > supported. The other is where we arrive at a device as a peer of > another device, so the other check may come into play at times where > this one doesn't and vice versa. I'm confused: in checkpoint_ns() inside the for_each_netdev() loop you first test for dev->netdev_ops->ndo_checkpoint and then call checkpoint_obj(... CKPT_OBJ_NETDEV) - which in turn will call checkpoint_netdev(), which will again test for dev->netdev_ops->ndo_checkpoint ... am I reading it wrongly ? > > OL> This may be a bit simpler if you move the first deferqueue_add() > OL> forward to just before the other one. Or better: change dq_netdev > OL> to have two pointers, dev and peer (if any is null, the cleanup > OL> function will skip). > > The reason it is this messy is because of the way network devices are > deallocated. Since they don't destroy themselves on the final put(), > we have to explicitly call unregister_netdev() on them when we know > they're no longer used (else we block). Once we've added them to the > deferqueue, we can no longer destroy them here because a reference is > held and the deferqueue will run afterwards. > > The ordering of this is a result of me injecting failures at each step > and working it out until I got it to not block on unregistering either > of the devices in all of the error paths. That's not to say it's the > best way, but there is a reason it's ordered the way it is. > How about this - to me it feels simpler: dev = rtnl_newlink(veth_new_link_msg, &veth, this_name); if (IS_ERR(dev)) return dev; peer = dev_get_by_name(current->nsproxy->net_ns, peer_name); if (!peer) { ret = -EINVAL; goto err_dev; } ret = ckpt_obj_insert(ctx, peer, h->veth.peer_ref, CKPT_OBJ_NETDEV); if (ret < 0) goto err_peer; dev_put(peer); dq.dev = dev; dq.peer = peer; ret = deferqueue_add(ctx->deferqueue, &dq, sizeof(dq), netdev_noop, netdev_cleanup); if (ret) goto err_peer; (yes, you need to adjust struct dq_netdev and netdev_cleanup). BTW, the variable "didreg" should disappear from restore_veth(). Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers