On 10/31/21 1:39 PM, Mina Almasry wrote: > On Wed, Oct 27, 2021 at 4:36 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: >> >> On 10/20/21 12:09 PM, Mina Almasry wrote: >> >> I have no objections to adding this functionality, and do not see any >> blocking issues in hugetlb code. However, it would be GREAT if someone >> more familiar/experienced with cgroups would comment. My cgroup >> experience is very limited. >> > > I will send V2 to Shakeel as well from our team and ask him to take a > look and he has more than enough experience to review. If anyone else > reading with cgroups knowledge can Review/Ack that would be great. > > It's possible I'm a bit off the mark here, but FWIW I don't think > there is much that is continuous or ambiguous here. For > memory.numa_stat it's a bit nuanced because there is anon, rss, shmem, > etc.. but for hugetlb we just have hugetlb memory and the only care > needed is hierarchical vs not in cgroups v1. > I agree. It is straight forward from a hugetlb POV. Just would like to get an ack from someone more familiar with cgroups. To me it also looks fine from a cgroups POV, but my cgroup knowledge is so limited that does not mean much. >> alloc_mem_cgroup_per_node_info provides similar functionality and has >> the following comment. >> >> * TODO: this routine can waste much memory for nodes which will >> * never be onlined. It's better to use memory hotplug callback >> * function. >> > > So the memory allocated per node in total is (HUGE_MAX_HSTATE * > unsigned long * num_css_on_the system). HUGE_MAX_HSTATE is 2 on x86. > This is significant, but to address this comment I have to complicate > the hugetlb_cgroup code quite a bit so I thought I'd check with you if > you think it's still OK/worth it. slub.c implements these changes > (slab_memory_callback) and they are: > > - When creating a new hugetlb_cgroup, I create per_node data for only > online nodes. > - On node online I need to loop over all existing hugetlb_cgroups and > allocate per_node data. I need rcu_read_lock() here and below. > - On node offline I need to loop over all existing hugetlb_cgroups and > free per_node data. > - If I follow the slub example, I need a lock to synchronize > onlining/offlining with references to per_node data in commit_charge() > uncharge_page() and show_numa_stats(). > > I don't think it's worth it TBH, but I'm happy to oblige if you think so. > No need to do anything extra/special here. I just noted that you would be allocating memory for offline nodes and took a look to see if the memory cgroup code handled this case. They do not, and have this as a TODO. I think that is sufficient here. Thanks, -- Mike Kravetz