On Thu 11-07-13 09:32:38, Tejun Heo wrote: > Hello, > > On Thu, Jul 11, 2013 at 06:22:15PM +0200, Michal Hocko wrote: > > I would rather not put vmpressure clean up code into memcg offlining. > > We have reference counting for exactly this purposes so it feels strange > > to overcome it like that. > > It's not something white and black but for things which can be made > trivially synchrnous, it usually is better to do it that way, True in general but it is also true (in general) that once we have a reference counting for controlling life cycle for an object we should not bypass it. > especially while shutting down objects as they enter a lingering stage > where they're de-registered but not destroyed and you should be > careful which parts of the object are still accessible. I haven't > read it carefully but here I'm not sure whether it's safe to do event > related operations after removal. From cgroup core side, event list > is shut down synchronously from cgroup_destroy_locked(). It doesn't > seem like that part is explicitly built to remain accessible > afterwards. /me goes and checks the code vmpressure_event sends signals to _registered_ events but those are unregistered from the work queue context by cgroup_event_remove (via vmpressure_unregister_event) queued from cgroup_destroy_locked. I am not sure what are the guarantees for ordering on the workqueue but this all suggests that either vmpressure_event sees an empty vmpr->events or it can safely send signals as cgroup_event_remove is pending on the queue. cgroup_event_remove drops a reference to cgrp->dentry after everything is unregistered and event->wait removed from the wait queue so cgroup_free_fn couldn't have been called yet and so memcg is still alive. This means that even css_get/put is not necessary. So I guess we are safe with the code as is but this all is really _tricky_ and deserves a fat comment. So rather than adding flushing work item code we should document it properly. Or am I missing something? > > Besides that wouldn't be that racy? The work item could be already > > executing and preempted and we do not want vmpr to disappear from under > > our feet. I know this is _highly_ unlikely but why to play games? > > Hmmm? flush_work() and cancel_work_sync() guarantee that the work > item isn't executing on return unless it's being requeued. There's no > race condition. OK, I haven't realized the action waits for finishing. /me is not regular work_queue user... [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>