在 2021年04月22日 17:33, HAGIO KAZUHITO(萩尾 一仁) 写道: > -----Original Message----- >> -----Original Message----- >>> 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道: >>>> Hi Bhupesh, >>>> >>>> -----Original Message----- >>>>> We have hard-coded the HZ value for some ARCHs to either 1000 or 100 >>>>> (mainly for kernel versions > 2.6.0), which causes 'help -m' to show >>>>> an incorrect hz value for various architectures. >>>> >>>> Good catch. but seems crash uses (cfq_slice_async * 25) for machdep->hz >>>> if it exists (please see task_init()), RHEL7 has it, but RHEL8 does not. >>>> What do you see on RHEL8 for x86_64 with your patch? >>>> >>> >>> The symbol 'cfq_slice_async' has been removed from upstream kernel: >>> f382fb0bcef4 ("block: remove legacy IO schedulers") >>> >>> And RHEL8 also removed it. >>> >>>> We should search for an alternate way like the current one first. >>>> >>> >>> Currently, there are several ways to get the value of HZ as below: >>> >>> [1] calculate hz via the symbol 'cfq_slice_async' >>> But this symbol has been removed from upstream kernel >> >> According to [0] below, the 'cfq_slice_async' cannot be used for the HZ >> calculation on 4.8 and later kernels. I've not found a perfect alternate, >> but how about using 'bfq_timeout' for 4.12 and later including RHEL8? > > e.g. like this: > > --- a/task.c > +++ b/task.c > @@ -417,7 +417,16 @@ task_init(void) > > STRUCT_SIZE_INIT(cputime_t, "cputime_t"); > > - if (symbol_exists("cfq_slice_async")) { > + if (symbol_exists("bfq_timeout")) { > + uint bfq_timeout; > + get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout); > + if (bfq_timeout) { > + machdep->hz = bfq_timeout * 8; > + if (CRASHDEBUG(2)) > + fprintf(fp, "bfq_timeout exists: setting hz to %d\n", > + machdep->hz); > + } > + } else if (symbol_exists("cfq_slice_async")) { > uint cfq_slice_async; > > get_symbol_data("cfq_slice_async", sizeof(int), > > > Lianbo, could you try this on ppc64le if it looks good? > Sure. On my ppc64le machine, crash got 96hz after applying the above patch. The reason is that kernel calculates the value of bfq_timeout as below: bfq_timeout = HZ / 8; The actual value of HZ is 100, so bfq_timeout = 100 / 8 = 12, but in crash, we calculate the value of HZ: HZ = bfq_timeout * 8 = 12 * 8 = 96 It seems that this is not the result what we expected. > btw, I thought 'read_expire' was better than the 'bfq_timeout' because it > was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux) Sounds good. But unfortunately, the 'read_expire' is a static variable in kernel, we can not get it directly by the symbol search. Maybe we should try to find a static variable(kernel) in another ways. If it is possible, I would tend to use the 'write_expire' to calculate the value of HZ in crash as below, that can avoid the above issues and get a correct result. HZ = write_expire / 5; /* * source: block/mq-deadline.c */ static const int write_expire = 5 * HZ For example: + if (symbol_exists("write_expire")) { ----> Here, it failed, maybe we can try to find the symbol in another way. + uint write_expire; + get_symbol_data("write_expire", sizeof(int), &write_expire); + if (write_expire) { + machdep->hz = write_expire / 5; + if (CRASHDEBUG(2)) + fprintf(fp, "write_expire exists: setting hz to %d\n", + machdep->hz); + } + } else > that I have do not have a symbol for it. (some optimization?) > I can get the values of 'read_expire' and 'write_expire' in the latest rhel8 or later. crash> p read_expire $1 = 50 crash> p write_expire $2 = 500 Thanks. Linabo > static const int read_expire = HZ / 2; /* max time before a read is submitted. */ > > RELEASE: 4.18.0-80.el8.x86_64 > > crash> p read_expire > No symbol "read_expire" in current context. > p: gdb request failed: p read_expire > > Thanks, > Kazu > >> >> const int bfq_timeout = HZ / 8; >> >> RELEASE: 4.18.0-80.el8.x86_64 >> >> crash> p bfq_timeout >> bfq_timeout = $1 = 125 >> >> This value has not been changed since its introduction (aee69d78dec0). >> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with this? >> >> [0] https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html >> >> Thanks, >> Kazu >> >> >>> >>> [2] hardcode hz with the value 1000 (if kernel version > 2.6.0) >>> >>> [3] get the hz value from vmcore, but that relies on kernel config >>> such as CONFIG_IKCONFIG, etc. >>> >>> [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches. >>> See the micro definition of HZ in the defs.h >>> >>> There seems to be no perfect solution. Any ideas? >>> >>> >>> Thanks. >>> Lianbo >>> >>>> Thanks, >>>> Kazu >>>> >>>>> >>>>> I tested this on ppc64le and x86_64 and the hz value reported is 1000, >>>>> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs below: >>>>> >>>>> crash> help -m >>>>> flags: 124000f5 >>>>> >>> >> (KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP >>>>> AL_FW) >>>>> kvbase: c000000000000000 >>>>> identity_map_base: c000000000000000 >>>>> pagesize: 65536 >>>>> pageshift: 16 >>>>> pagemask: ffffffffffff0000 >>>>> pageoffset: ffff >>>>> stacksize: 16384 >>>>> hz: 1000 >>>>> mhz: 2800 >>>>> >>>>> [host@rhel7]$ grep CONFIG_HZ_100= redhat/configs/kernel-3.10.0-ppc64le.config >>>>> CONFIG_HZ_100=y >>>>> >>>>> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of the >>>>> hardcoded HZ values depending on kernel versions. >>>>> >>>> >> >> >> -- >> Crash-utility mailing list >> Crash-utility@xxxxxxxxxx >> https://listman.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility