On 9/6/18 11:41 AM, Alexander Duyck wrote: > On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: >> >> On Thu 06-09-18 07:59:03, Dave Hansen wrote: >>> On 09/05/2018 10:47 PM, Michal Hocko wrote: >>>> why do you have to keep DEBUG_VM enabled for workloads where the boot >>>> time matters so much that few seconds matter? >>> >>> There are a number of distributions that run with it enabled in the >>> default build. Fedora, for one. We've basically assumed for a while >>> that we have to live with it in production environments. >>> >>> So, where does leave us? I think we either need a _generic_ debug >>> option like: >>> >>> CONFIG_DEBUG_VM_SLOW_AS_HECK >>> >>> under which we can put this an other really slow VM debugging. Or, we >>> need some kind of boot-time parameter to trigger the extra checking >>> instead of a new CONFIG option. >> >> I strongly suspect nobody will ever enable such a scary looking config >> TBH. Besides I am not sure what should go under that config option. >> Something that takes few cycles but it is called often or one time stuff >> that takes quite a long but less than aggregated overhead of the former? >> >> Just consider this particular case. It basically re-adds an overhead >> that has always been there before the struct page init optimization >> went it. The poisoning just returns it in a different form to catch >> potential left overs. And we would like to have as many people willing >> to running in debug mode to test for those paths because they are >> basically impossible to review by the code inspection. More importantnly >> the major overhead is boot time so my question still stands. Is this >> worth a separate config option almost nobody is going to enable? >> >> Enabling DEBUG_VM by Fedora and others serves us a very good testing >> coverage and I appreciate that because it has generated some useful bug >> reports. Those people are paying quite a lot of overhead in runtime >> which can aggregate over time is it so much to ask about one time boot >> overhead? > > The kind of boot time add-on I saw as a result of this was about 170 > seconds, or 2 minutes and 50 seconds on a 12TB system. I spent a > couple minutes wondering if I had built a bad kernel or not as I was > staring at a dead console the entire time after the grub prompt since > I hit this so early in the boot. That is the reason why I am so eager > to slice this off and make it something separate. I could easily see > this as something that would get in the way of other debugging that is > going on in a system. > > If we don't want to do a config option, then what about adding a > kernel parameter to put a limit on how much memory we will initialize > like this before we just start skipping it. We could put a default > limit on it like 256GB and then once we cross that threshold we just > don't bother poisoning any more memory. With that we would probably be > able to at least cover most of the early memory init, and that value > should cover most systems without getting into delays on the order of > minutes. I am OK with a boot parameter to optionally disable it when DEBUG_VM is enabled. But, I do not think it is a good idea to make that parameter "smart" basically always poison memory with DEBUG_VM unless bootet with a parameter that tells not to poison memory. CONFIG_DEBUG_VM is disbled on: RedHat, Oracle Linux, CentOS, Ubuntu, Arch Linux, SUSE Enabled on: Fedora Are there other distros where it is enabled? I think, this could be filed as a performance bug against Fedora distro, and let the decide what to do about it. I do not want to make this feature less tested. Poisoning memory allowed us to catch corner case bugs like these: ab1e8d8960b68f54af42b6484b5950bd13a4054b mm: don't allow deferred pages with NEED_PER_CPU_KM e181ae0c5db9544de9c53239eb22bc012ce75033 mm: zero unavailable pages before memmap init And several more that were fixed by other people. For a very long linux relied on assumption that boot memory is zeroed, and I am sure we will continue detect more bugs over time. Thank you, Pavel > > - Alex >