On Wed, Jan 27, 2016 at 09:49:57AM -0800, Mike Kravetz wrote: > On 01/25/2016 05:50 AM, Mike Kravetz wrote: > >> Do you have any thoughts how it's going to be implemented? It would be > >> nice to have some design overview or better proof-of-concept patch before > >> the summit to be able analyze implications for the kernel. > >> > > > > Good to know the hugetlbfs implementation is considered a hack. I just > > started looking at this, and was going to use hugetlbfs as a starting > > point. I'll reconsider that decision. > > Kirill, can you (or others) explain your reasons for saying the hugetlbfs > implementation is an ugly hack? I do not have enough history/experience > with this to say what is most offensive. I would be happy to start by > cleaning up issues with the current implementation. > Historically, it was considered a hack because it had special handling in a number of paths in the VM. Of course THP also has similar handling now so it's less of a concern but there are differences that cause base pages, transparent hugepages and hugetlbfs pages to all be special cases. That does not sit comfortably with everyone. For a long time, it was considered ugly because a fault on private child mappings was so unreliable and a fork could cause a parent to unexpectedly fail a fault and die. These days it's different as only the child can die so while it's less of a concern, hugetlbfs pages allow a child to be killed if enough huge pages are not available. It was also considered ugly because application-awareness was required in so many cases. Granted, libhugetlbfs can hide some of that ugliness but even that was considered hacky. The fact that hugetlbfs pages cannot be swapped even without mlock is another fact that makes them different to the rest of the VM. It has its own reservation scheme that is different to everything else. One that crippled it to some extent with the label was the fact that fixing swap on it was effectively impossible because of power. Once huge pages had been installed on that architecture for a lont time, it was impossible to remap them at a different size. The limitation has been relaxed to some extent but those around long enough remember it. So it is a bit of a hack that behaves differently to other page types. It's fairly complex and while the semantics used to be a lot uglier than it is now, the "ugly hack" label has stuck. > If we do shared page tables for DAX, it makes sense that it and hugetlbfs > should be similar (or common) if possible. > It's been a long time since I looked at shared page tables so I can't remember why but it was a difficult area. A few years were spent on it so if shared page tables are being considered, I would make damn sure first that they actually help on modern hardware before jumping into that hole. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html