Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

David Hildenbrand <david@xxxxxxxxxx> · Wed, 14 Oct 2020 14:06:16 +0200

On 14.10.20 13:53, Michal Privoznik wrote:
> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>> On 14.10.20 08:30, Michal Privoznik wrote:
> 
> Sorry for hijacking this thread, but I need to report it somewhere (what 
> is the best place?)
> 
> When I want to start a guest with this feature turned on + memfd + 
> hugepages, the host kernel prints this warning into dmesg and hugepages 
> stop working from then on (meaning, even if the pool of allocated HPs is 
> large enough I can't start any guest with HPs):
> 
> 
> 
> 
> [  139.434748] ------------[ cut here ]------------
> [  139.434754] WARNING: CPU: 2 PID: 6280 at mm/page_counter.c:57 
> page_counter_uncharge+0x33/0x40
> [  139.434754] Modules linked in: kvm_amd amdgpu kvm btusb btrtl btbcm 
> btintel sp5100_tco watchdog k10temp mfd_core gpu_sched ttm
> [  139.434759] CPU: 2 PID: 6280 Comm: CPU 1/KVM Not tainted 
> 5.8.13-gentoo-x86_64 #2
> [  139.434759] Hardware name: System manufacturer System Product 
> Name/PRIME X570-PRO, BIOS 1005 08/01/2019
> [  139.434760] RIP: 0010:page_counter_uncharge+0x33/0x40
> [  139.434762] Code: 48 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 
> 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0a 48 8b 7f 28 48 85 ff 75 dc 
> c3 <0f> 0b eb f2 66 0f 1f 84 00 00 00 00 00 48 8b 17 48 39 d6 72 41 41
> [  139.434762] RSP: 0018:ffffc9000355fb38 EFLAGS: 00010286
> [  139.434763] RAX: fffffffffffb4000 RBX: ffff888fc267e900 RCX: 
> fffffffffffb4000
> [  139.434763] RDX: 0000000000000402 RSI: fffffffffffb4000 RDI: 
> ffff888fd8411dd0
> [  139.434764] RBP: ffff888fcba983c0 R08: 0000000000080400 R09: 
> fffffffffff7fc00
> [  139.434764] R10: ffffc9000355fb40 R11: 000000000000000a R12: 
> 0000000000000001
> [  139.434765] R13: ffff888fc3d89140 R14: 00000000000001b2 R15: 
> 00000000000001b1
> [  139.434765] FS:  00007fc9d4c35700(0000) GS:ffff888fde880000(0000) 
> knlGS:0000000000000000
> [  139.434766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.434766] CR2: 00007f09a4003000 CR3: 0000000fc06fe000 CR4: 
> 0000000000340ee0
> [  139.434767] Call Trace:
> [  139.434769]  hugetlb_cgroup_uncharge_file_region+0x46/0x70
> [  139.434772]  region_del+0x1a0/0x260
> [  139.434773]  hugetlb_unreserve_pages+0x32/0xa0
> [  139.434775]  remove_inode_hugepages+0x19d/0x3a0
> [  139.434776]  hugetlbfs_fallocate+0x3f2/0x4a0
> [  139.434778]  ? __seccomp_filter+0x75/0x6a0
> [  139.434779]  vfs_fallocate+0x124/0x260
> [  139.434780]  __x64_sys_fallocate+0x39/0x60
> [  139.434783]  do_syscall_64+0x38/0x60
> [  139.434784]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  139.434785] RIP: 0033:0x7fc9e0994de7
> [  139.434786] Code: 89 7c 24 08 48 89 4c 24 18 e8 45 fc f8 ff 41 89 c0 
> 4c 8b 54 24 18 48 8b 54 24 10 b8 1d 01 00 00 8b 74 24 0c 8b 7c 24 08 0f 
> 05 <48> 3d 00 f0 ff ff 77 41 44 89 c7 89 44 24 08 e8 75 fc f8 ff 8b 44
> [  139.434787] RSP: 002b:00007fc9d4c337a0 EFLAGS: 00000293 ORIG_RAX: 
> 000000000000011d
> [  139.434787] RAX: ffffffffffffffda RBX: 0000000036400000 RCX: 
> 00007fc9e0994de7
> [  139.434788] RDX: 0000000036200000 RSI: 0000000000000003 RDI: 
> 000000000000001d
> [  139.434788] RBP: 00007fc9d4c33800 R08: 0000000000000000 R09: 
> 0000000000000000
> [  139.434789] R10: 0000000000200000 R11: 0000000000000293 R12: 
> 00007fff9a75c3fe
> [  139.434789] R13: 00007fff9a75c3ff R14: 00007fc9d4c35700 R15: 
> 00007fc9d4c33dc0
> [  139.434790] ---[ end trace fb9808303959fc01 ]---
> 
> 
> Is this known problem?

No, not at all. Thanks for reporting!

And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
zap reported pages. The same mechanism is also used for postcopy live
migration and virtio-mem with hugetlbfs.

Which kernel are you running?

1. Is it an upstream kernel, lkml + -mm lists are the right place
(please cc me, or I can try to reproduce and report it).

2. Is it a distro kernel? Then create a BUG there.

I was just recently testing virtio-mem with hugetlbfs and it worked on
decent upstream Fedora. But maybe I was not able to trigger it.

> 
> Michal
> 

-- 
Thanks,

David / dhildenb