(cc'ing Johannes for memory sizing part) Hello, On Mon, Oct 18, 2021 at 08:59:16PM +0530, Pratik Sampat wrote: ... > Also, I agree with your point about variability of requirements. If the > interface we give even though it is in conjunction with the limits set, > if the applications have to derive metrics from this or from other > kernel information regardless; then the interface would not be useful. > If the solution to this problem lies in userspace, then I'm all for it > as well. However, the intention is to probe if this could potentially be > solved in cleanly in the kernel. Just to be clear, avoiding application changes would have to involve userspace (at least parameterization from it), and I think to set that as a goal for kernel would be more of a distraction. Please note that we should definitely provide metrics which actually capture what's going on in terms of resource availability in a way which can be used to size workloads automatically. > Yes, these shortcomings exist even without containerization, on a > dynamically loaded multi-tenant system it becomes very difficult to > determine what is the maximum amount resource that can be requested > before we hurt our own performance. As I mentioned before, feedback loop on PSI can work really well in finding the saturation points for cpu/mem/io and regulating workload size automatically and dynamically. While such dynamic sizing can work without any other inputs, it sucks to have to probe the entire range each time and it'd be really useful if the kernel can provide ballpark numbers that are needed to estimate the saturation points. What gets challenging is that there doesn't seem to be a good way to consistently describe availability for each of the three resources and the different distribution rules they may be under. e.g. For CPU, the affinity restrictions from cpuset determines the maximum number of threads that a workload would need to saturate the available CPUs. However, conveying the results of cpu.max and cpu.weight controls isn't as straight-fowrads. For memory, it's even trickier because in a lot of cases it's impossible to tell how much memory is actually available without trying to use them as active workingset can only be learned by trying to reclaim memory. IO is in somewhat similar boat as CPU in that there are both io.max and io.weight. However, if io.cost is in use and configured according to the hardware, we can map those two in terms iocost. Another thing is that the dynamic nature of these control mechanisms means that the numbers can keep changing moment to moment and we'd need to provide some time averaged numbers. We can probably take the same approach as PSI and load-avgs and provide running avgs of a few time intervals. > The question that I have essentially tries to understand the > implications of overloading existing interface's definitions to be > context sensitive. > The way that the prototype works today is that it does not interfere > with the information when the system boots or even when it is run in a > new namespace. > The effects are only observed when restrictions are applied to it. > Therefore, what would potentially break if interfaces like these are > made to divulge information based on restrictions rather than the whole > system view? I don't think the problem is that something would necessarily break by doing that. It's more that it's a dead-end approach which won't get us far for all the reasons that have been discussed so far. It'd be more productive to focus on long term solutions and leave backward compatibility to the domains where they can actually be solved by applying the necessary local knoweldge to emulate and fake whatever necessary numbers. Thanks. -- tejun