On 10.07.19 22:45, Dave Hansen wrote: > On 7/10/19 12:51 PM, Nitesh Narayan Lal wrote: >> +struct zone_free_area { >> + unsigned long *bitmap; >> + unsigned long base_pfn; >> + unsigned long end_pfn; >> + atomic_t free_pages; >> + unsigned long nbits; >> +} free_area[MAX_NR_ZONES]; > > Why do we need an extra data structure. What's wrong with putting > per-zone data in ... 'struct zone'? The cover letter claims that it > doesn't touch core-mm infrastructure, but if it depends on mechanisms > like this, I think that's a very bad thing. > > To be honest, I'm not sure this series is worth reviewing at this point. > It's horribly lightly commented and full of kernel antipatterns lik > > void func() > { > if () { > ... indent entire logic > ... of function > } > } "full of". Hmm. > > It has big "TODO"s. It's virtually comment-free. I'm shocked it's at > the 11th version and still looking like this. > >> + >> + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { >> + unsigned long pages = free_area[zone_idx].end_pfn - >> + free_area[zone_idx].base_pfn; >> + bitmap_size = (pages >> PAGE_HINTING_MIN_ORDER) + 1; >> + if (!bitmap_size) >> + continue; >> + free_area[zone_idx].bitmap = bitmap_zalloc(bitmap_size, >> + GFP_KERNEL); > > This doesn't support sparse zones. We can have zones with massive > spanned page sizes, but very few present pages. On those zones, this > will exhaust memory for no good reason. Yes, AFAIKS, sparse zones are problematic when we have NORMAL/MOVABLE mixed. 1 bit for 2MB, 1 byte for 16MB, 64 bytes for 1GB IOW, this isn't optimal but only really problematic for big systems / very huge sparse zones. > > Comparing this to Alex's patch set, it's of much lower quality and at a > much earlier stage of development. The two sets are not really even > comparable right now. This certainly doesn't sell me on (or even really To be honest, I find this statement quite harsh. Nitesh's hard work in the previous RFC's and many discussions with Alex essentially resulted in the two approaches we have right now. Alex's approach would not look the way it looks today without Nitesh's RFCs. So much to that. > enumerate the deltas in) this approach vs. Alex's. I am aware that memory hotplug is not properly supported yet (future work). Sparse zones work but eventually waste a handful of pages (!) - future work. Anything else you are aware of that is missing? My opinion: 1. Alex' solution is clearly beneficial, as we don't need to manage/scan a bitmap. *however* we were concerned right from the beginning if core-buddy modifications will be accepted upstream for a purely virtualization-specific (as of now!) feature. If we can get it upstream, perfect. Back when we discussed the idea with Alex I was skeptical - I was expecting way more core modifications. 2. We were looking for an alternative solution that doesn't require to modify the buddy. We have that now - yes, some things have to be worked out and cleaned up, not arguing against that. A cleaned-up version of this RFC with some fixes and enhancements should be ready to be used in *many* (not all) setups. Which is perfectly fine. So in summary, I think we should try our best to get Alex's series into shape and accepted upstream. However, if we get upstream resistance or it will take ages to get it in, I think we can start with this series here (which requires no major buddy modifications as of now) and the slowly see if we can convert it into Alex approach. The important part for me is that the core<->driver interface and the virtio interface is in a clean shape, so we can essentially swap out the implementation specific parts in the core. Cheers. -- Thanks, David / dhildenb