On 01/28/2020 09:03 AM, Qian Cai wrote: > > >> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: >> >> >> >> On 01/28/2020 07:41 AM, Qian Cai wrote: >>> >>> >>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@xxxxxxx> wrote: >>>> >>>> This adds tests which will validate architecture page table helpers and >>>> other accessors in their compliance with expected generic MM semantics. >>>> This will help various architectures in validating changes to existing >>>> page table helpers or addition of new ones. >>>> >>>> This test covers basic page table entry transformations including but not >>>> limited to old, young, dirty, clean, write, write protect etc at various >>>> level along with populating intermediate entries with next page table page >>>> and validating them. >>>> >>>> Test page table pages are allocated from system memory with required size >>>> and alignments. The mapped pfns at page table levels are derived from a >>>> real pfn representing a valid kernel text symbol. This test gets called >>>> right after page_alloc_init_late(). >>>> >>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with >>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to >>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and >>>> arm64. Going forward, other architectures too can enable this after fixing >>>> build or runtime problems (if any) with their page table helpers. >> >> Hello Qian, >> >>> >>> What’s the value of this block of new code? It only supports x86 and arm64 >>> which are supposed to be good now. >> >> We have been over the usefulness of this code many times before as the patch is >> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and >> ppc32. There are build time or runtime problems with other archs which prevent > > I am not sure if I care too much about arc and ppc32 which are pretty much legacy > platforms. Okay but FWIW the maintainers for all these enabled platforms cared for this test at the least and really helped in shaping the test to it's current state. Besides I am still failing to understand your point here about evaluating particular feature's usefulness based on it's support on relative and perceived importance of some platforms compared to others. Again the idea is to integrate all platforms eventually but we had discovered build and runtime issues which needs to be resolved at platform level first. Unless I am mistaken, debug feature like this which is putting down a framework while also benefiting some initial platforms to start with, will be a potential candidate for eventual inclusion in the mainline. Otherwise, please point to any other agreed upon community criteria for debug feature's mainline inclusion which I will try to adhere. I wonder if all other similar debug features from the past ever met 'the all inclusive at the beginning' criteria which you are trying to propose here. This test also adds a feature file, enlisting all supported archs as suggested by Ingo for the exact same reason. This is not the first time, a feature is listing out archs which are supported and archs which are not. > >> enablement of this test (for the moment) but then the goal is to integrate all >> of them going forward. The test not only validates platform's adherence to the >> expected semantics from generic MM but also helps in keeping it that way during >> code changes in future as well. > > Another option maybe to get some decent arches on board first before merging this > thing, so it have more changes to catch regressions for developers who might run this. > >> >>> Did those tests ever find any regression or this is almost only useful for new >> >> The test has already found problems with s390 page table helpers. > > Hmm, that is pretty weak where s390 is not even official supported with this version. And there were valid reasons why s390 could not be enabled just yet as explained by s390 folks during our previous discussions. I just pointed out an example where this test was useful as you had asked previously. Not being official supported in this version does not take away the fact the it was indeed useful for that platform in discovering a bug. > >> >>> architectures which only happened once in a few years? >> >> Again, not only it validates what exist today but its also a tool to make >> sure that all platforms continue adhere to a common agreed upon semantics >> as reflected through the tests here. >> >>> The worry if not many people will use this config and code those that much in >> >> Debug features or tests in the kernel are used when required. These are never or >> should not be enabled by default. AFAICT this is true even for entire DEBUG_VM >> packaged tests. Do you have any particular data or precedence to substantiate >> the fact that this test will be used any less often than the other similar ones >> in the tree ? I can only speak for arm64 platform but the very idea for this >> test came from Catalin when we were trying to understand the semantics for THP >> helpers while enabling THP migration without split. Apart from going over the >> commit messages from the past, there were no other way to figure out how any >> particular page table helper is suppose to change given page table entry. This >> test tries to formalize those semantics. > > I am thinking about how we made so many mistakes before by merging too many of > those debugging options that many of them have been broken for many releases > proving that nobody actually used them regularly. We don’t need to repeat the same Again will ask for some data to substantiate these claims. Though I am not really sure but believe that there are integration test frameworks out there which regularly validates each of these code path on multiple platforms. One such automation found that V11 of the test was broken on X86 PAE platform which I fixed. Nonetheless, I can speak only for arm64 platform and we intend to use this test to validate arm64 exported page table helpers. Citing unsubstantiated past examples should not really block these enabled platforms (arm64 at the very least) from getting this debug feature which has already demonstrated it's usefulness during arm64 THP migration development and on s390 platforms as well. > mistake again. I am actually thinking about to remove things like page_poisoning often > which is almost are never found any bug recently and only cause pains when interacting > with other new features that almost nobody will test them together to begin with. > We even have some SLUB debugging code sit there for almost 15 years that almost > nobody used it and maintainers refused to remove it. Unlike those, the proposed test here is isolated as a stand alone test and stays clear off from any other code path. I have not been involved in or aware of the usefulness of existing MM debug features and hence will just leave them upto the judgment of the maintainers whether to keep or discard them. > >> >>> the future because it is inefficient to find bugs, it will simply be rotten >> Could you be more specific here ? What parts of the test are inefficient ? I >> am happy to improve upon the test. Do let me know you if you have suggestions. >> >>> like a few other debugging options out there we have in the mainline that >> will be a pain to remove later on. >>> >> >> Even though I am not agreeing to your assessment about the usefulness of the >> test without any substantial data backing up the claims, the test case in >> itself is very much compartmentalized, staying clear from generic MM and >> debug_vm_pgtable() is only function executing the test which is getting >> called from kernel_init_freeable() path. > > I am thinking exactly the other way around. You are proposing to merge this tests > without proving how useful it will be able to find regressions for future developers > to make sure it will actually get used. As I had mentioned before, the test attempts to formalize page table helper semantics as expected from generic MM code paths and intend to catch deviations when enabled on a given platform. How else should we test semantics errors otherwise ? There are past examples of usefulness for this procedure on arm64 and on s390. I am wondering how else to prove the usefulness of a debug feature if these references are not enough. >