在 2021年04月22日 22:26, lijiang 写道: > 在 2021年04月22日 17:33, HAGIO KAZUHITO(萩尾 一仁) 写道: >> -----Original Message----- >>> -----Original Message----- >>>> 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道: >>>>> Hi Bhupesh, >>>>> >>>>> -----Original Message----- >>>>>> We have hard-coded the HZ value for some ARCHs to either 1000 or 100 >>>>>> (mainly for kernel versions > 2.6.0), which causes 'help -m' to show >>>>>> an incorrect hz value for various architectures. >>>>> >>>>> Good catch. but seems crash uses (cfq_slice_async * 25) for machdep->hz >>>>> if it exists (please see task_init()), RHEL7 has it, but RHEL8 does not. >>>>> What do you see on RHEL8 for x86_64 with your patch? >>>>> >>>> >>>> The symbol 'cfq_slice_async' has been removed from upstream kernel: >>>> f382fb0bcef4 ("block: remove legacy IO schedulers") >>>> >>>> And RHEL8 also removed it. >>>> >>>>> We should search for an alternate way like the current one first. >>>>> >>>> >>>> Currently, there are several ways to get the value of HZ as below: >>>> >>>> [1] calculate hz via the symbol 'cfq_slice_async' >>>> But this symbol has been removed from upstream kernel >>> >>> According to [0] below, the 'cfq_slice_async' cannot be used for the HZ >>> calculation on 4.8 and later kernels. I've not found a perfect alternate, >>> but how about using 'bfq_timeout' for 4.12 and later including RHEL8? >> >> e.g. like this: >> >> --- a/task.c >> +++ b/task.c >> @@ -417,7 +417,16 @@ task_init(void) >> >> STRUCT_SIZE_INIT(cputime_t, "cputime_t"); >> >> - if (symbol_exists("cfq_slice_async")) { >> + if (symbol_exists("bfq_timeout")) { >> + uint bfq_timeout; >> + get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout); >> + if (bfq_timeout) { >> + machdep->hz = bfq_timeout * 8; >> + if (CRASHDEBUG(2)) >> + fprintf(fp, "bfq_timeout exists: setting hz to %d\n", >> + machdep->hz); >> + } >> + } else if (symbol_exists("cfq_slice_async")) { >> uint cfq_slice_async; >> >> get_symbol_data("cfq_slice_async", sizeof(int), >> >> >> Lianbo, could you try this on ppc64le if it looks good? >> > Sure. On my ppc64le machine, crash got 96hz after applying the above patch. The reason > is that kernel calculates the value of bfq_timeout as below: > > bfq_timeout = HZ / 8; > > The actual value of HZ is 100, so bfq_timeout = 100 / 8 = 12, but in crash, we calculate > the value of HZ: > > HZ = bfq_timeout * 8 = 12 * 8 = 96 > > It seems that this is not the result what we expected. > >> btw, I thought 'read_expire' was better than the 'bfq_timeout' because it >> was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux) > > Sounds good. But unfortunately, the 'read_expire' is a static variable in kernel, we > can not get it directly by the symbol search. Maybe we should try to find a static > variable(kernel) in another ways. > > If it is possible, I would tend to use the 'write_expire' to calculate the value of HZ > in crash as below, that can avoid the above issues and get a correct result. > > HZ = write_expire / 5; > > /* > * source: block/mq-deadline.c > */ > static const int write_expire = 5 * HZ > > For example: > + if (symbol_exists("write_expire")) { ----> Here, it failed, maybe we can try to find the symbol in another way. > + uint write_expire; > + get_symbol_data("write_expire", sizeof(int), &write_expire); > + if (write_expire) { > + machdep->hz = write_expire / 5; > + if (CRASHDEBUG(2)) > + fprintf(fp, "write_expire exists: setting hz to %d\n", > + machdep->hz); > + } > + } else > >> that I have do not have a symbol for it. (some optimization?) >> > I can get the values of 'read_expire' and 'write_expire' in the latest rhel8 or later. > > crash> p read_expire > $1 = 50 > crash> p write_expire > $2 = 500 > > Thanks. > Linabo > How do you think about the following changes? It works for me. /* * source: net/ipv4/inetpeer.c * int inet_peer_minttl __read_mostly = 120 * HZ; /* TTL under high load: 120 sec */ */ diff --git a/task.c b/task.c index 423cd45..4af3ef3 100644 --- a/task.c +++ b/task.c @@ -417,7 +417,17 @@ task_init(void) STRUCT_SIZE_INIT(cputime_t, "cputime_t"); - if (symbol_exists("cfq_slice_async")) { + if (symbol_exists("inet_peer_minttl")) { + uint inet_peer_minttl; + get_symbol_data("inet_peer_minttl", sizeof(int), &inet_peer_minttl); + if (inet_peer_minttl) { + machdep->hz = inet_peer_minttl / 120; + if (CRASHDEBUG(2)) + fprintf(fp, "inet_peer_minttl exists: setting hz to %d\n", + machdep->hz); + } + } else if (symbol_exists("cfq_slice_async")) { uint cfq_slice_async; Thanks. Lianbo >> static const int read_expire = HZ / 2; /* max time before a read is submitted. */ >> >> RELEASE: 4.18.0-80.el8.x86_64 >> >> crash> p read_expire >> No symbol "read_expire" in current context. >> p: gdb request failed: p read_expire >> >> Thanks, >> Kazu >> >>> >>> const int bfq_timeout = HZ / 8; >>> >>> RELEASE: 4.18.0-80.el8.x86_64 >>> >>> crash> p bfq_timeout >>> bfq_timeout = $1 = 125 >>> >>> This value has not been changed since its introduction (aee69d78dec0). >>> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with this? >>> >>> [0] https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html >>> >>> Thanks, >>> Kazu >>> >>> >>>> >>>> [2] hardcode hz with the value 1000 (if kernel version > 2.6.0) >>>> >>>> [3] get the hz value from vmcore, but that relies on kernel config >>>> such as CONFIG_IKCONFIG, etc. >>>> >>>> [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches. >>>> See the micro definition of HZ in the defs.h >>>> >>>> There seems to be no perfect solution. Any ideas? >>>> >>>> >>>> Thanks. >>>> Lianbo >>>> >>>>> Thanks, >>>>> Kazu >>>>> >>>>>> >>>>>> I tested this on ppc64le and x86_64 and the hz value reported is 1000, >>>>>> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs below: >>>>>> >>>>>> crash> help -m >>>>>> flags: 124000f5 >>>>>> >>>> >>> (KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP >>>>>> AL_FW) >>>>>> kvbase: c000000000000000 >>>>>> identity_map_base: c000000000000000 >>>>>> pagesize: 65536 >>>>>> pageshift: 16 >>>>>> pagemask: ffffffffffff0000 >>>>>> pageoffset: ffff >>>>>> stacksize: 16384 >>>>>> hz: 1000 >>>>>> mhz: 2800 >>>>>> >>>>>> [host@rhel7]$ grep CONFIG_HZ_100= redhat/configs/kernel-3.10.0-ppc64le.config >>>>>> CONFIG_HZ_100=y >>>>>> >>>>>> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of the >>>>>> hardcoded HZ values depending on kernel versions. >>>>>> >>>>> >>> >>> >>> -- >>> Crash-utility mailing list >>> Crash-utility@xxxxxxxxxx >>> https://listman.redhat.com/mailman/listinfo/crash-utility >> -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility