On Wed, Sep 08, 2021 at 09:06:24AM +0200, Michal Hocko wrote: > On Wed 08-09-21 09:50:14, Feng Tang wrote: > > On Tue, Sep 07, 2021 at 10:44:32AM +0200, Michal Hocko wrote: > [...] > > > While this is a good fix from the functionality POV I believe you can go > > > a step further. Please add a detection to the cpuset code and complain > > > to the kernel log if somebody tries to configure movable only cpuset. > > > Once you have that in place you can easily create a static branch for > > > cpuset_insane_setup() and have zero overhead for all reasonable > > > configuration. There shouldn't be any reason to pay a single cpu cycle > > > to check for something that almost nobody does. > > > > > > What do you think? > > > > I thought about the implementation, IIUC, the static_branch_enable() is > > easy, it could be done when cpuset.mems is set with movable only nodes, > > but disable() is much complexer, > > Do we care about disable at all? The point is to not have 99,999999% > users pay overhead of the check which is irrelevant to them. Once > somebody wants to use this "creative" setup then paying an extra check > sounds perfectly sensible to me. If somebody cares enough then the > disable logic could be implemented. But for now I believe we should be > OK with only enable case. Makes sense to me, thanks! > > as we may need a global reference > > counter to track the set/unset, and the unset could be the time when > > freeing the cpuset data structure, also one cpuset.mems could be changed > > runtime, and system could have multiple cpuset dirs (user space usage > > could be creative or crazy :)). > > > > While checking cpuset code, I thought more about configuring cpuset with > > movable only nodes, that we may still have normal usage: mallocing a big > > trunk of memory and do some scientific calculation, or AI training. It > > works with current code. > > It might work but it would be inherently subtle because a single > non-movable allocation will throw the whole thing off the cliff. Yes, this is a valid concern. Though I think when there is really usage reuqirement for cpuset binding to HBM (High Bandwidth Memory) or PMEM, we may need to reconsider to loose the limit for GFP_HIGHUSER, as the GFP_KERNEL type of allocation is permitted. Thanks, Feng > I do not think we want to even pretend we support such a setup. > -- > Michal Hocko > SUSE Labs