On 2016/7/20 10:07, Eric W. Biederman wrote: > zhongjiang <zhongjiang at huawei.com> writes: > >> From: zhong jiang <zhongjiang at huawei.com> >> >> I hit the following question when run trinity in my system. The >> kernel is 3.4 version. but the mainline have same question to be >> solved. The root cause is the segment size is too large, it can >> expand the most of the area or the whole memory, therefore, it >> may waste an amount of time to abtain a useable page. and other >> cases will block until the test case quit. at the some time, >> OOM will come up. > 5MiB is way too small. I have seen vmlinux images not to mention > ramdisks that get larger than that. Depending on the system > 1GiB might not be an unreasonable ramdisk size. AKA run an entire live > system out of a ramfs. It works well if you have enough memory. > > I think there is a practical limit at about 50% of memory (because we > need two copies in memory the source and the destination pages), but > anything else is pretty much reasonable and should have a fair chance of > working. > > A limit that reflected that reality above would be interesting. > Anything else will likely cause someone trouble in the futrue. > > Eric In addition, I had tested when set max segment size to 1G when system memory have 32G, the rlock probabilistic come up when trinity run. >> ck time:20160628120131-243c5 >> rlock reason:SOFT-WATCHDOG detected! on cpu 5. >> CPU 5 Pid: 9485, comm: trinity-c5 >> RIP: 0010:[<ffffffff8111a4cf>] [<ffffffff8111a4cf>] next_zones_zonelist+0x3f/0x60 >> RSP: 0018:ffff88088783bc38 EFLAGS: 00000283 >> RAX: ffff8808bffd9b08 RBX: ffff88088783bbb8 RCX: ffff88088783bd30 >> RDX: ffff88088f15a248 RSI: 0000000000000002 RDI: 0000000000000000 >> RBP: ffff88088783bc38 R08: ffff8808bffd8d80 R09: 0000000412c4d000 >> R10: 0000000412c4e000 R11: 0000000000000000 R12: 0000000000000002 >> R13: 0000000000000000 R14: ffff8808bffd9b00 R15: 0000000000000000 >> FS: 00007f91137ee700(0000) GS:ffff88089f2a0000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 000000000016161a CR3: 0000000887820000 CR4: 00000000000407e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process trinity-c5 (pid: 9485, threadinfo ffff88088783a000, task ffff88088f159980) >> Stack: >> ffff88088783bd88 ffffffff81106eac ffff8808bffd8d80 0000000000000000 >> 0000000000000000 ffffffff8124c2be 0000000000000001 000000000000001e >> 0000000000000000 ffffffff8124c2be 0000000000000002 ffffffff8124c2be >> Call Trace: >> [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0 >> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c >> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c >> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c >> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c >> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c >> [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120 >> [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60 >> [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270 >> [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0 >> [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70 >> [<ffffffff810a1921>] sys_kexec_load+0x141/0x600 >> [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180 >> [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b >> >> The patch just add condition on sanity_check_segment_list to >> restriction the segment size. >> >> Signed-off-by: zhong jiang <zhongjiang at huawei.com> >> --- >> arch/x86/include/asm/kexec.h | 1 + >> kernel/kexec_core.c | 12 ++++++++++++ >> 2 files changed, 13 insertions(+) >> >> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h >> index d2434c1..b31a723 100644 >> --- a/arch/x86/include/asm/kexec.h >> +++ b/arch/x86/include/asm/kexec.h >> @@ -67,6 +67,7 @@ struct kimage; >> /* Memory to backup during crash kdump */ >> #define KEXEC_BACKUP_SRC_START (0UL) >> #define KEXEC_BACKUP_SRC_END (640 * 1024UL) /* 640K */ >> +#define KEXEC_MAX_SEGMENT_SIZE (5 * 1024 * 1024UL) /* 5M */ >> >> /* >> * CPU does not save ss and sp on stack if execution is already >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index 448127d..35c5159 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -209,6 +209,18 @@ int sanity_check_segment_list(struct kimage *image) >> return result; >> } >> >> + >> + /* Verity all segment size donnot exceed the specified size. >> + * if segment size from user space is too large, a large >> + * amount of time will be wasted when allocating page. so, >> + * softlockup may be come up. >> + */ >> + for (i = 0; i< nr_segments; i++) { >> + if (image->segment[i].memsz > KEXEC_MAX_SEGMENT_SIZE) >> + return result; >> + } >> + >> + >> /* >> * Verify we have good destination addresses. Normally >> * the caller is responsible for making certain we don't > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo at kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont at kvack.org"> email at kvack.org </a> > > . >