On 28/11/2022 23:03, Eliot Moss wrote: > On 11/28/2022 9:46 AM, lizhijian@xxxxxxxxxxx wrote: >> >> >> On 28/11/2022 20:53, Eliot Moss wrote: >>> On 11/28/2022 7:04 AM, lizhijian@xxxxxxxxxxx wrote: >>>> Hi folks, >>>> >>>> I'm going to make crash coredump support pmem region. So >>>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore. >>>> >>>> But it failed at makedumpfile, log are as following: >>>> >>>> In my environment, i found the last 512 pages in pmem region will cause the error. >>> >>> I wonder if an issue I reported is related: when set up to map >>> 2Mb (huge) pages, the last 2Mb of a large region got mapped as >>> 4Kb pages, and then later, half of a large region was treated >>> that way. >>> >> Could you share the url/link ? I'd like to take a look > > It was in a previous email to the nvdimm list. the title was: > > "Possible PMD (huge pages) bug in fs dax" > > And here is the body. I just sent directly to the list so there > is no URL (if I should be engaging in a different way, please let me know): I found it :) at https://www.mail-archive.com/nvdimm@xxxxxxxxxxxxxxx/msg02743.html > ================================================================================ > Folks - I posted already on nvdimm, but perhaps the topic did not quite grab > anyone's attention. I had had some trouble figuring all the details to get > dax mapping of files from an xfs file system with underlying Optane DC memory > going, but now have that working reliably. But there is an odd behavior: > > When first mapping a file, I request mapping a 32 Gb range, aligned on a 1 Gb > (and thus clearly on a 2 Mb) boundary. > > For each group of 8 Gb, the first 4095 entries map with a 2 Mb huge (PMD) > page. The 4096th one does FALLBACK. I suspect some problem in > dax.c:grab_mapping_entry or its callees, but am not personally well enough > versed in either the dax code or the xarray implementation to dig further. > > > If you'd like a second puzzle 😄 ... after completing this mapping, another > thread accesses the whole range sequentially. This results in NOPAGE fault > handling for the first 4095+4095 2M regions that previously resulted in > NOPAGE -- so far so good. But it gives FALLBACK for the upper 16 Gb (except > the two PMD regions it alrady gave FALLBACK for). > > > I can provide trace output from a run if you'd like and all the ndctl, gdisk > -l, fdisk -l, and xfs_info details if you like. > > > In my application, it would be nice if dax.c could deliver 1 Gb PUD size > mappings as well, though it would appear that that would require more surgery > on dax.c. It would be somewhat analogous to what's already there, of course, > but I don't mean to minimize the possible trickiness of it. I realize I > should submit that request as a separate thread 😄 which I intend to do > later. > ================================================================================ > > Regards - Eliot Moss