Hello folks, This mail raises a pmem memmap dump requirement and possible solutions, but they are all still premature. I really hope you can provide some feedback. pmem memmap can also be called pmem metadata here. ### Background and motivate overview ### --- Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support. However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze trouble around pmem (especially Filesystem-DAX). A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for more details. In fsdax, struct page array becomes very important, it is one of the key data to find status of reverse map. So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means troubleshooters are unable to check more details about pmem from the dumpfile. ### Make pmem memmap dump support ### --- Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled. First, based on our previous investigation, according to the location of metadata and the scope of dump, we can divide it into the following four cases: A, B, C, D. It should be noted that although we mentioned case A&B below, we do not want these two cases to be part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly, it may contain user sensitive data. +-------------+----------+------------+ |\+--------+\ metadata location | | ++-----------------------+ | dump scope | mem | PMEM | +-------------+----------+------------+ | entire pmem | A | B | +-------------+----------+------------+ | metadata | C | D | +-------------+----------+------------+ Case A&B: unsupported - Only the regions listed in PT_LOAD in vmcore are dumpable. This can be resolved by adding the pmem region into vmcore's PT_LOADs in kexec-tools. - For makedumpfile which will assume that all page objects of the entire region described in PT_LOADs are readable, and then skips/excludes the specific page according to its attributes. But in the case of pmem, 1st kernel only allocates page objects for the namespaces of pmem, so makedumpfile will throw errors[2] when specific -d options are specified. Accordingly, we should make makedumpfile to ignore these errors if it's pmem region. Because these above cases are not in our goal, we must consider how to prevent the data part of pmem from reading by the dump application(makedumpfile). Case C: native supported metadata is stored in mem, and the entire mem/ram is dumpable. Case D: unsupported && need your input To support this situation, the makedumpfile needs to know the location of metadata for each pmem namespace and the address and size of metadata in the pmem [start, end) We have thought of a few possible options: 1) In the 2nd kernel, with the help of the information from /sys/bus/nd/devices/{namespaceX.Y, daxX.Y, pfnX.Y} exported by pmem drivers, makedumpfile is able to calculate the address and size of metadata 2) In the 1st kernel, add a new symbol to the vmcore. The symbol is associated with the layout of each namespace. The makedumpfile reads the symbol and figures out the address and size of the metadata. 3) others ? But then we found that we have always ignored a user case, that is, the user could save the dumpfile to the pmem. Neither of these two options can solve this problem, because the pmem drivers will re-initialize the metadata during the pmem drivers loading process, which leads to the metadata we dumped is inconsistent with the metadata at the moment of the crash happening. Simply, can we just disable the pmem directly in 2nd kernel so that previous metadata will not be destroyed? But this operation will bring us inconvenience that 2nd kernel doesn’t allow user storing dumpfile on the filesystem/partition based on pmem. So here I hope you can provide some ideas about this feature/requirement and on the possible solution for the cases A&B&D mentioned above, it would be greatly appreciated. If I’m missing something, feel free to let me know. Any feedback & comment are very welcome. [1] Pmem region layout: ^<--namespace0.0---->^<--namespace0.1------>^ | | | +--+m----------------+--+m------------------+---------------------+-+a |++|e |++|e | |+|l |++|t |++|t | |+|i |++|a |++|a | |+|g |++|d namespace0.0 |++|d namespace0.1 | un-allocated |+|n |++|a fsdax |++|a devdax | |+|m |++|t |++|t | |+|e +--+a----------------+--+a------------------+---------------------+-+n | |t v<-----------------------pmem region------------------------------->v [2] https://lore.kernel.org/linux-mm/70F971CF-1A96-4D87-B70C-B971C2A1747C@xxxxxxxxxxxxxxxx/T/ Thanks Zhijian