On 9/3/19 10:57 AM, Mike Kravetz wrote: > On 8/29/19 12:18 AM, Michal Hocko wrote: >> [Cc cgroups maintainers] >> >> On Wed 28-08-19 10:58:00, Mina Almasry wrote: >>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: >>>> >>>> On Mon 26-08-19 16:32:34, Mina Almasry wrote: >>>>> mm/hugetlb.c | 493 ++++++++++++------ >>>>> mm/hugetlb_cgroup.c | 187 +++++-- >>>> >>>> This is a lot of changes to an already subtle code which hugetlb >>>> reservations undoubly are. >>> >>> For what it's worth, I think this patch series is a net decrease in >>> the complexity of the reservation code, especially the region_* >>> functions, which is where a lot of the complexity lies. I removed the >>> race between region_del and region_{add|chg}, refactored the main >>> logic into smaller code, moved common code to helpers and deleted the >>> duplicates, and finally added lots of comments to the hard to >>> understand pieces. I hope that when folks review the changes they will >>> see that! :) >> >> Post those improvements as standalone patches and sell them as >> improvements. We can talk about the net additional complexity of the >> controller much easier then. > > All such changes appear to be in patch 4 of this series. The commit message > says "region_add() and region_chg() are heavily refactored to in this commit > to make the code easier to understand and remove duplication.". However, the > modifications were also added to accommodate the new cgroup reservation > accounting. I think it would be helpful to explain why the existing code does > not work with the new accounting. For example, one change is because > "existing code coalesces resv_map entries for shared mappings. new cgroup > accounting requires that resv_map entries be kept separate for proper > uncharging." > > I am starting to review the changes, but it would help if there was a high > level description. I also like Michal's idea of calling out the region_* > changes separately. If not a standalone patch, at least the first patch of > the series. This new code will be exercised even if cgroup reservation > accounting not enabled, so it is very important than no subtle regressions > be introduced. While looking at the region_* changes, I started thinking about this no coalesce change for shared mappings which I think is necessary. Am I mistaken, or is this a requirement? If it is a requirement, then think about some of the possible scenarios such as: - There is a hugetlbfs file of size 10 huge pages. - Task A has reservations for pages at offset 1 3 5 7 and 9 - Task B then mmaps the entire file which should result in reservations at 0 2 4 6 and 8. - region_chg will return 5, but will also need to allocate 5 resv_map entries for the subsequent region_add which can not fail. Correct? The code does not appear to handle this. BTW, this series will BUG when running libhugetlbfs test suite. It will hit this in resv_map_release(). VM_BUG_ON(resv_map->adds_in_progress); -- Mike Kravetz