On Tue, May 10, 2022 at 03:36:34PM +0200, Michal Hocko wrote: > On Tue 10-05-22 11:52:51, CGEL wrote: > > On Tue, May 10, 2022 at 12:00:04PM +0200, Michal Hocko wrote: > > > On Tue 10-05-22 01:43:38, CGEL wrote: > > > > On Mon, May 09, 2022 at 01:48:39PM +0200, Michal Hocko wrote: > > > > > On Mon 09-05-22 11:26:43, CGEL wrote: > > > > > > On Mon, May 09, 2022 at 12:00:28PM +0200, Michal Hocko wrote: > > > > > > > On Sat 07-05-22 02:05:25, CGEL wrote: > > > > > > > [...] > > > > > > > > If there are many containers to run on one host, and some of them have high > > > > > > > > performance requirements, administrator could turn on thp for them: > > > > > > > > # docker run -it --thp-enabled=always > > > > > > > > Then all the processes in those containers will always use thp. > > > > > > > > While other containers turn off thp by: > > > > > > > > # docker run -it --thp-enabled=never > > > > > > > > > > > > > > I do not know. The THP config space is already too confusing and complex > > > > > > > and this just adds on top. E.g. is the behavior of the knob > > > > > > > hierarchical? What is the policy if parent memcg says madivise while > > > > > > > child says always? How does the per-application configuration aligns > > > > > > > with all that (e.g. memcg policy madivise but application says never via > > > > > > > prctl while still uses some madvised - e.g. via library). > > > > > > > > > > > > > > > > > > > The cgroup THP behavior is align to host and totally independent just likes > > > > > > /sys/fs/cgroup/memory.swappiness. That means if one cgroup config 'always' > > > > > > for thp, it has no matter with host or other cgroup. This make it simple for > > > > > > user to understand or control. > > > > > > > > > > All controls in cgroup v2 should be hierarchical. This is really > > > > > required for a proper delegation semantic. > > > > > > > > > > > > > Could we align to the semantic of /sys/fs/cgroup/memory.swappiness? > > > > Some distributions like Ubuntu is still using cgroup v1. > > > > > > cgroup v1 interface is mostly frozen. All new features are added to the > > > v2 interface. > > > > > > > So what about we add this interface to cgroup v2? > > Can you come up with a sane hierarchical behavior? > > [...] > > > > For micro-service architecture, the application in one container is not a > > > > set of loosely tight processes, it's aim at provide one certain service, > > > > so different containers means different service, and different service > > > > has different QoS demand. > > > > > > OK, if they are tightly coupled you could apply the same THP policy by > > > an existing prctl interface. Why is that not feasible. As you are noting > > > below... > > > > > > > 5.containers usually managed by compose software, which treats container as > > > > base management unit; > > > > > > ..so the compose software can easily start up the workload by using prctl > > > to disable THP for whatever workloads it is not suitable for. > > > > prctl(PR_SET_THP_DISABLE..) can not be elegance to support the semantic we > > need. If only some containers needs THP, other containers and host do not need > > THP. We must set host THP to always first, and call prctl() to close THP for > > host tasks and other containers one by one, > > It might not be the most elegant solution but it should work. > Maintaining user interfaces for ever has some cost and the THP > configuration space is quite large already. So I would rather not add > more complication in unless that is absolutely necessary. > By the way, should we let prctl() support PR_SET_THP_ALWAYS? Just likes PR_TASK_PERF_EVENTS_DISABLE and PR_TASK_PERF_EVENTS_ENABLE. This would make it simpler to let certain process use THP while others not use. > > in this process some tasks that start before we call prctl() may > > already use THP with no need. > > As long as all those processes have a common ancestor I do not see how > that would be possible. > > -- > Michal Hocko > SUSE Labs