在 2021年04月23日 14:41, lijiang 写道: > 在 2021年04月22日 22:26, lijiang 写道: >> 在 2021年04月22日 17:33, HAGIO KAZUHITO(萩尾 一仁) 写道: >>> -----Original Message----- >>>> -----Original Message----- >>>>> 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道: >>>>>> Hi Bhupesh, >>>>>> >>>>>> -----Original Message----- >>>>>>> We have hard-coded the HZ value for some ARCHs to either 1000 or 100 >>>>>>> (mainly for kernel versions > 2.6.0), which causes 'help -m' to show >>>>>>> an incorrect hz value for various architectures. >>>>>> >>>>>> Good catch. but seems crash uses (cfq_slice_async * 25) for machdep->hz >>>>>> if it exists (please see task_init()), RHEL7 has it, but RHEL8 does not. >>>>>> What do you see on RHEL8 for x86_64 with your patch? >>>>>> >>>>> >>>>> The symbol 'cfq_slice_async' has been removed from upstream kernel: >>>>> f382fb0bcef4 ("block: remove legacy IO schedulers") >>>>> >>>>> And RHEL8 also removed it. >>>>> >>>>>> We should search for an alternate way like the current one first. >>>>>> >>>>> >>>>> Currently, there are several ways to get the value of HZ as below: >>>>> >>>>> [1] calculate hz via the symbol 'cfq_slice_async' >>>>> But this symbol has been removed from upstream kernel >>>> >>>> According to [0] below, the 'cfq_slice_async' cannot be used for the HZ >>>> calculation on 4.8 and later kernels. I've not found a perfect alternate, >>>> but how about using 'bfq_timeout' for 4.12 and later including RHEL8? >>> >>> e.g. like this: >>> >>> --- a/task.c >>> +++ b/task.c >>> @@ -417,7 +417,16 @@ task_init(void) >>> >>> STRUCT_SIZE_INIT(cputime_t, "cputime_t"); >>> >>> - if (symbol_exists("cfq_slice_async")) { >>> + if (symbol_exists("bfq_timeout")) { >>> + uint bfq_timeout; >>> + get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout); >>> + if (bfq_timeout) { >>> + machdep->hz = bfq_timeout * 8; >>> + if (CRASHDEBUG(2)) >>> + fprintf(fp, "bfq_timeout exists: setting hz to %d\n", >>> + machdep->hz); >>> + } >>> + } else if (symbol_exists("cfq_slice_async")) { >>> uint cfq_slice_async; >>> >>> get_symbol_data("cfq_slice_async", sizeof(int), >>> >>> >>> Lianbo, could you try this on ppc64le if it looks good? >>> >> Sure. On my ppc64le machine, crash got 96hz after applying the above patch. The reason >> is that kernel calculates the value of bfq_timeout as below: >> >> bfq_timeout = HZ / 8; >> >> The actual value of HZ is 100, so bfq_timeout = 100 / 8 = 12, but in crash, we calculate >> the value of HZ: >> >> HZ = bfq_timeout * 8 = 12 * 8 = 96 >> >> It seems that this is not the result what we expected. >> >>> btw, I thought 'read_expire' was better than the 'bfq_timeout' because it >>> was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux) >> >> Sounds good. But unfortunately, the 'read_expire' is a static variable in kernel, we >> can not get it directly by the symbol search. Maybe we should try to find a static >> variable(kernel) in another ways. >> >> If it is possible, I would tend to use the 'write_expire' to calculate the value of HZ >> in crash as below, that can avoid the above issues and get a correct result. >> >> HZ = write_expire / 5; >> >> /* >> * source: block/mq-deadline.c >> */ >> static const int write_expire = 5 * HZ >> >> For example: >> + if (symbol_exists("write_expire")) { ----> Here, it failed, maybe we can try to find the symbol in another way. >> + uint write_expire; >> + get_symbol_data("write_expire", sizeof(int), &write_expire); >> + if (write_expire) { >> + machdep->hz = write_expire / 5; >> + if (CRASHDEBUG(2)) >> + fprintf(fp, "write_expire exists: setting hz to %d\n", >> + machdep->hz); >> + } >> + } else >> >>> that I have do not have a symbol for it. (some optimization?) >>> >> I can get the values of 'read_expire' and 'write_expire' in the latest rhel8 or later. >> >> crash> p read_expire >> $1 = 50 >> crash> p write_expire >> $2 = 500 >> >> Thanks. >> Linabo >> > > How do you think about the following changes? It works for me. > > /* > * source: net/ipv4/inetpeer.c > * int inet_peer_minttl __read_mostly = 120 * HZ; /* TTL under high load: 120 sec */ > */ > > diff --git a/task.c b/task.c > index 423cd45..4af3ef3 100644 > --- a/task.c > +++ b/task.c > @@ -417,7 +417,17 @@ task_init(void) > > STRUCT_SIZE_INIT(cputime_t, "cputime_t"); > > - if (symbol_exists("cfq_slice_async")) { > + if (symbol_exists("inet_peer_minttl")) { > + uint inet_peer_minttl; > + get_symbol_data("inet_peer_minttl", sizeof(int), &inet_peer_minttl); > + if (inet_peer_minttl) { > + machdep->hz = inet_peer_minttl / 120; > + if (CRASHDEBUG(2)) > + fprintf(fp, "inet_peer_minttl exists: setting hz to %d\n", > + machdep->hz); > + } > + } else if (symbol_exists("cfq_slice_async")) { > uint cfq_slice_async; > And, I would tend to replace the 'cfq_slice_async' with the 'inet_peer_minttl' as below, the reason is that it has hardly changed so far(v2.6.12-rc2), and the variable is in the net/ipv4/inetpeer.c module, which is supported by most kernel configuration. What's your opinion? diff --git a/task.c b/task.c index 423cd454502b..5994fe2b7351 100644 --- a/task.c +++ b/task.c @@ -417,18 +417,18 @@ task_init(void) STRUCT_SIZE_INIT(cputime_t, "cputime_t"); - if (symbol_exists("cfq_slice_async")) { - uint cfq_slice_async; + if (symbol_exists("inet_peer_minttl")) { + int inet_peer_minttl; - get_symbol_data("cfq_slice_async", sizeof(int), - &cfq_slice_async); + get_symbol_data("inet_peer_minttl", sizeof(int), + &inet_peer_minttl); - if (cfq_slice_async) { - machdep->hz = cfq_slice_async * 25; + if (inet_peer_minttl) { + machdep->hz = inet_peer_minttl / 120; if (CRASHDEBUG(2)) fprintf(fp, - "cfq_slice_async exists: setting hz to %d\n", + "inet_peer_minttl exists: setting hz to %d\n", machdep->hz); } } -- Thanks. > Thanks. > Lianbo > >>> static const int read_expire = HZ / 2; /* max time before a read is submitted. */ >>> >>> RELEASE: 4.18.0-80.el8.x86_64 >>> >>> crash> p read_expire >>> No symbol "read_expire" in current context. >>> p: gdb request failed: p read_expire >>> >>> Thanks, >>> Kazu >>> >>>> >>>> const int bfq_timeout = HZ / 8; >>>> >>>> RELEASE: 4.18.0-80.el8.x86_64 >>>> >>>> crash> p bfq_timeout >>>> bfq_timeout = $1 = 125 >>>> >>>> This value has not been changed since its introduction (aee69d78dec0). >>>> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with this? >>>> >>>> [0] https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html >>>> >>>> Thanks, >>>> Kazu >>>> >>>> >>>>> >>>>> [2] hardcode hz with the value 1000 (if kernel version > 2.6.0) >>>>> >>>>> [3] get the hz value from vmcore, but that relies on kernel config >>>>> such as CONFIG_IKCONFIG, etc. >>>>> >>>>> [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches. >>>>> See the micro definition of HZ in the defs.h >>>>> >>>>> There seems to be no perfect solution. Any ideas? >>>>> >>>>> >>>>> Thanks. >>>>> Lianbo >>>>> >>>>>> Thanks, >>>>>> Kazu >>>>>> >>>>>>> >>>>>>> I tested this on ppc64le and x86_64 and the hz value reported is 1000, >>>>>>> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs below: >>>>>>> >>>>>>> crash> help -m >>>>>>> flags: 124000f5 >>>>>>> >>>>> >>>> (KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP >>>>>>> AL_FW) >>>>>>> kvbase: c000000000000000 >>>>>>> identity_map_base: c000000000000000 >>>>>>> pagesize: 65536 >>>>>>> pageshift: 16 >>>>>>> pagemask: ffffffffffff0000 >>>>>>> pageoffset: ffff >>>>>>> stacksize: 16384 >>>>>>> hz: 1000 >>>>>>> mhz: 2800 >>>>>>> >>>>>>> [host@rhel7]$ grep CONFIG_HZ_100= redhat/configs/kernel-3.10.0-ppc64le.config >>>>>>> CONFIG_HZ_100=y >>>>>>> >>>>>>> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of the >>>>>>> hardcoded HZ values depending on kernel versions. >>>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Crash-utility mailing list >>>> Crash-utility@xxxxxxxxxx >>>> https://listman.redhat.com/mailman/listinfo/crash-utility >>> -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility