Re: [PATCH v13] mm: report per-page metadata information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



++ nvdimm, linux-cxl, Yu Zhang

On Wed, Jun 05, 2024 at 10:27:51PM +0000, Sourav Panda wrote:
> Today, we do not have any observability of per-page metadata
> and how much it takes away from the machine capacity. Thus,
> we want to describe the amount of memory that is going towards
> per-page metadata, which can vary depending on build
> configuration, machine architecture, and system use.
> 
> This patch adds 2 fields to /proc/vmstat that can used as shown
> below:
> 
> Accounting per-page metadata allocated by boot-allocator:
> 	/proc/vmstat:nr_memmap_boot * PAGE_SIZE
> 
> Accounting per-page metadata allocated by buddy-allocator:
> 	/proc/vmstat:nr_memmap * PAGE_SIZE
> 
> Accounting total Perpage metadata allocated on the machine:
> 	(/proc/vmstat:nr_memmap_boot +
> 	 /proc/vmstat:nr_memmap) * PAGE_SIZE
> 
> Utility for userspace:
> 
> Observability: Describe the amount of memory overhead that is
> going to per-page metadata on the system at any given time since
> this overhead is not currently observable.
> 
> Debugging: Tracking the changes or absolute value in struct pages
> can help detect anomalies as they can be correlated with other
> metrics in the machine (e.g., memtotal, number of huge pages,
> etc).
> 
> page_ext overheads: Some kernel features such as page_owner
> page_table_check that use page_ext can be optionally enabled via
> kernel parameters. Having the total per-page metadata information
> helps users precisely measure impact. Furthermore, page-metadata
> metrics will reflect the amount of struct pages reliquished
> (or overhead reduced) when hugetlbfs pages are reserved which
> will vary depending on whether hugetlb vmemmap optimization is
> enabled or not.
> 
> For background and results see:
> lore.kernel.org/all/20240220214558.3377482-1-souravpanda@xxxxxxxxxx
> 
> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
> Signed-off-by: Sourav Panda <souravpanda@xxxxxxxxxx>
> Reviewed-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx>

This patch is leading to Oops in 6.11-rc1 when CONFIG_MEMORY_HOTPLUG
is enabled. Folks hitting it have had success with reverting this patch.
Disabling CONFIG_MEMORY_HOTPLUG is not a long term solution.

Reported here:
https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@xxxxxxxxxxxxxx/

A bit of detail below, follow above link for more:
dmesg:
[ 1408.632268] Oops: general protection fault, probably for
non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
PTI
[ 1408.644006] KASAN: probably user-memory-access in range
[0x000000000002b280-0x000000000002b287]
[ 1408.652699] CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
[ 1408.659571] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
2.20.1 09/13/2023
[ 1408.667136] RIP: 0010:mod_node_page_state+0x2a/0x110
[ 1408.672112] Code: 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41
54 55 48 89 fd 48 81 c7 80 b2 02 00 53 48 89 f9 89 d3 48 c1 e9 03 48
83 ec 10 <80> 3c 01 00 0f 85 b8 00 00 00 48 8b bd 80 b2 02 00 41 89 f0
83 ee
[ 1408.690856] RSP: 0018:ffffc900246d7388 EFLAGS: 00010286
[ 1408.696088] RAX: dffffc0000000000 RBX: 00000000fffffe00 RCX: 0000000000005650
[ 1408.703222] RDX: fffffffffffffe00 RSI: 000000000000002f RDI: 000000000002b280
[ 1408.710353] RBP: 0000000000000000 R08: ffff88a06ffcb1c8 R09: 1ffffffff218c681
[ 1408.717486] R10: ffffffff93d922bf R11: ffff88855e790f10 R12: 00000000000003ff
[ 1408.724619] R13: 1ffff920048dae7b R14: ffffea0081e00000 R15: ffffffff90c63408
[ 1408.731750] FS:  00007f753c219200(0000) GS:ffff889bf2a00000(0000)
knlGS:0000000000000000
[ 1408.739834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1408.745581] CR2: 0000559f5902a5a8 CR3: 00000001292f0006 CR4: 00000000007706f0
[ 1408.752713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1408.759843] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1408.766976] PKRU: 55555554
[ 1408.769690] Call Trace:
[ 1408.772143]  <TASK>
[ 1408.774248]  ? die_addr+0x3d/0xa0
[ 1408.777577]  ? exc_general_protection+0x150/0x230
[ 1408.782297]  ? asm_exc_general_protection+0x22/0x30
[ 1408.787182]  ? mod_node_page_state+0x2a/0x110
[ 1408.791548]  section_deactivate+0x519/0x780
[ 1408.795740]  ? __pfx_section_deactivate+0x10/0x10
[ 1408.800449]  __remove_pages+0x6c/0xa0
[ 1408.804119]  arch_remove_memory+0x1a/0x70
[ 1408.808141]  pageunmap_range+0x2ad/0x5e0
[ 1408.812067]  memunmap_pages+0x320/0x5a0
[ 1408.815909]  release_nodes+0xd6/0x170
[ 1408.819581]  ? lockdep_hardirqs_on+0x78/0x100
[ 1408.823941]  devres_release_all+0x106/0x170
[ 1408.828126]  ? __pfx_devres_release_all+0x10/0x10
[ 1408.832834]  device_unbind_cleanup+0x16/0x1a0
[ 1408.837198]  device_release_driver_internal+0x3d5/0x530
[ 1408.842423]  ? klist_put+0xf7/0x170
[ 1408.845916]  bus_remove_device+0x1ed/0x3f0
[ 1408.850017]  device_del+0x33b/0x8c0
[ 1408.853518]  ? __pfx_device_del+0x10/0x10
[ 1408.857532]  unregister_dev_dax+0x112/0x210
[ 1408.861722]  release_nodes+0xd6/0x170
[ 1408.865387]  ? lockdep_hardirqs_on+0x78/0x100
[ 1408.869749]  devres_release_all+0x106/0x170
[ 1408.873933]  ? __pfx_devres_release_all+0x10/0x10
[ 1408.878643]  device_unbind_cleanup+0x16/0x1a0
[ 1408.883007]  device_release_driver_internal+0x3d5/0x530
[ 1408.888235]  ? __pfx_sysfs_kf_write+0x10/0x10
[ 1408.892598]  unbind_store+0xdc/0xf0
[ 1408.896093]  kernfs_fop_write_iter+0x358/0x530
[ 1408.900539]  vfs_write+0x9b2/0xf60
[ 1408.903954]  ? __pfx_vfs_write+0x10/0x10
[ 1408.907891]  ? __fget_light+0x53/0x1e0
[ 1408.911646]  ? __x64_sys_openat+0x11f/0x1e0
[ 1408.915835]  ksys_write+0xf1/0x1d0
[ 1408.919249]  ? __pfx_ksys_write+0x10/0x10
[ 1408.923264]  do_syscall_64+0x8c/0x180
[ 1408.926934]  ? __debug_check_no_obj_freed+0x253/0x520
[ 1408.931997]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
[ 1408.937405]  ? kasan_quarantine_put+0x109/0x220
[ 1408.941944]  ? lockdep_hardirqs_on+0x78/0x100
[ 1408.946304]  ? kmem_cache_free+0x1a6/0x4c0
[ 1408.950408]  ? do_sys_openat2+0x10a/0x160
[ 1408.954424]  ? do_sys_openat2+0x10a/0x160
[ 1408.958434]  ? __pfx_do_sys_openat2+0x10/0x10
[ 1408.962794]  ? lockdep_hardirqs_on+0x78/0x100
[ 1408.967153]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
[ 1408.972554]  ? __x64_sys_openat+0x11f/0x1e0
[ 1408.976737]  ? __pfx___x64_sys_openat+0x10/0x10
[ 1408.981269]  ? rcu_is_watching+0x11/0xb0
[ 1408.985204]  ? lockdep_hardirqs_on_prepare+0x179/0x400
[ 1408.990351]  ? do_syscall_64+0x98/0x180
[ 1408.994191]  ? lockdep_hardirqs_on+0x78/0x100
[ 1408.998549]  ? do_syscall_64+0x98/0x180
[ 1409.002386]  ? do_syscall_64+0x98/0x180
[ 1409.006227]  ? lockdep_hardirqs_on+0x78/0x100
[ 1409.010585]  ? do_syscall_64+0x98/0x180
[ 1409.014425]  ? lockdep_hardirqs_on_prepare+0x179/0x400
[ 1409.019565]  ? do_syscall_64+0x98/0x180
[ 1409.023401]  ? lockdep_hardirqs_on+0x78/0x100
[ 1409.027763]  ? do_syscall_64+0x98/0x180
[ 1409.031600]  ? do_syscall_64+0x98/0x180
[ 1409.035439]  ? do_syscall_64+0x98/0x180
[ 1409.039281]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1409.044331] RIP: 0033:0x7f753c0fda57
[ 1409.047911] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89
74 24
[ 1409.066655] RSP: 002b:00007ffc19323e28 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 1409.074220] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f753c0fda57
[ 1409.081352] RDX: 0000000000000007 RSI: 0000559f5901f740 RDI: 0000000000000003
[ 1409.088483] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffc19323d20
[ 1409.095616] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559f5901f740
[ 1409.102748] R13: 00007ffc19323e90 R14: 00007f753c219120 R15: 0000559f5901fc30
[ 1409.109887]  </TASK>
[ 1409.112082] Modules linked in: kmem device_dax rpcsec_gss_krb5
auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs rfkill sunrpc
dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common skx_edac skx_edac_common
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200
rapl cdc_ether iTCO_wdt dell_pc i2c_algo_bit iTCO_vendor_support
ipmi_ssif usbnet acpi_power_meter drm_shmem_helper mei_me dell_smbios
platform_profile intel_cstate dcdbas wmi_bmof dell_wmi_descriptor
intel_uncore pcspkr mii drm_kms_helper i2c_i801 mei i2c_smbus
intel_pch_thermal lpc_ich ipmi_si acpi_ipmi dax_pmem ipmi_devintf
ipmi_msghandler drm fuse xfs libcrc32c sd_mod sg nd_pmem nd_btt
crct10dif_pclmul crc32_pclmul crc32c_intel ahci ghash_clmulni_intel
libahci bnxt_en megaraid_sas tg3 libata wmi nfit libnvdimm dm_mirror
dm_region_hash dm_log dm_mod
[ 1409.189120] ---[ end trace 0000000000000000 ]---

-- snip
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux