On Wed, Mar 27, 2019 at 9:13 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Tue 26-03-19 17:20:41, Dan Williams wrote: > > On Tue, Mar 26, 2019 at 1:04 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Mon 25-03-19 13:03:47, Dan Williams wrote: > > > > On Mon, Mar 25, 2019 at 3:20 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > [...] > > > > > > User-defined memory namespaces have this problem, but 2MB is the > > > > > > default alignment and is sufficient for most uses. > > > > > > > > > > What does prevent users to go and use a larger alignment? > > > > > > > > Given that we are living with 64MB granularity on mainstream platforms > > > > for the foreseeable future, the reason users can't rely on a larger > > > > alignment to address the issue is that the physical alignment may > > > > change from one boot to the next. > > > > > > I would love to learn more about this inter boot volatility. Could you > > > expand on that some more? I though that the HW configuration presented > > > to the OS would be more or less stable unless the underlying HW changes. > > > > Even if the configuration is static there can be hardware failures > > that prevent a DIMM, or a PCI device to be included in the memory map. > > When that happens the BIOS needs to re-layout the map and the result > > is not guaranteed to maintain the previous alignment. > > > > > > No, you can't just wish hardware / platform firmware won't do this, > > > > because there are not enough platform resources to give every hardware > > > > device a guaranteed alignment. > > > > > > Guarantee is one part and I can see how nobody wants to give you > > > something as strong but how often does that happen in the real life? > > > > I expect a "rare" event to happen everyday in a data-center fleet. > > Failure rates tend towards 100% daily occurrence at scale and in this > > case the kernel has everything it needs to mitigate such an event. > > > > Setting aside the success rate of a software-alignment mitigation, the > > reason I am charging this hill again after a 2 year hiatus is the > > realization that this problem is wider spread than the original > > failing scenario. Back in 2017 the problem seemed limited to custom > > memmap= configurations, and collisions between PMEM and System RAM. > > Now it is clear that the collisions can happen between PMEM regions > > and namespaces as well, and the problem spans platforms from multiple > > vendors. Here is the most recent collision problem: > > https://github.com/pmem/ndctl/issues/76, from a third-party platform. > > > > The fix for that issue uncovered a bug in the padding implementation, > > and a fix for that bug would result in even more hacks in the nvdimm > > code for what is a core kernel deficiency. Code review of those > > changes resulted in changing direction to go after the core > > deficiency. > > This kind of information along with real world examples is exactly what > you should have added into the cover letter. A previous very vague > claims were not really convincing or something that can be considered a > proper justification. Please do realize that people who are not working > with the affected HW are unlikely to have an idea how serious/relevant > those problems really are. > > People are asking for a smaller memory hotplug granularity for other > usecases (e.g. memory ballooning into VMs) which are quite dubious to > be honest and not really worth all the code rework. If we are talking > about something that can be worked around elsewhere then it is preferred > because the code base is not in an excellent shape and putting more on > top is just going to cause more headaches. > > I will try to find some time to review this more deeply (no promises > though because time is hectic and this is not a simple feature). For the > future, please try harder to write up a proper justification and a > highlevel design description which tells a bit about all important parts > of the new scheme. Fair enough. I've been steeped in this for too long, and should have taken a wider view to bring reviewers up to speed.