Hi Martin, Thank you for the detailed report and interesting idea. The "log -T" option imitates the "dmesg -T" command, and it also looks inaccurate in nature and has a warning in its help text. # systemctl restart psacct ; date Wed Sep 22 17:30:49 JST 2021 # dmesg -T | tail -n 1 [Wed Sep 22 17:30:22 2021] Process accounting resumed # dmesg --help | grep -- -T -T, --ctime show human-readable timestamp (may be inaccurate!) Maybe at least we should add a similar warning to the help page. Personally I would prefer users to use it with the acknowledgement of that inaccuracy like dmesg -T, rather than doing a crash specific adjustment.. Thanks, Kazu -----Original Message----- > From: crash-utility-bounces@xxxxxxxxxx <crash-utility-bounces@xxxxxxxxxx> On Behalf Of Moore, Martin > (Linux ERT) > Sent: Tuesday, September 21, 2021 1:58 AM > To: Discussion list for crash utility usage, maintenance and development <crash-utility@xxxxxxxxxx> > Subject: Inaccurate timestamps in log -T output > > Hello, > > > > As the current maintainer of the PyKdump extension, I have been investigating a reported discrepancy between > the timestamps generated by crash’s log -T and PyKdump’s tslog command, which provides the same functionality > of timestamping the kernel log entries. What I have found is that *both* log -T and tslog generate some > inaccurate timestamps on vmcores with more than a small amount of uptime, but in different ways. > Specifically, log -T timestamps are accurate at boot time but become less accurate later in the log, while > tslog timestamps are just the opposite: they’re accurate around crash time but less accurate earlier in > the log. > > > > Here’s an example from a panic on a kernel 3.10.0-957.5.1 system with a moderate uptime: > > > > DATE: Tue Jul 6 09:50:58 +03 2021 > > UPTIME: 32 days, 16:29:29 > > PANIC: "BUG: unable to handle kernel " > > > > Crash correctly computes the boot time based on the date (crash time) and uptime shown above. From “help > -k”: > > > > boot_date: Thu Jun 3 17:21:29 +03 2021 > > > > log -T shows the correct timestamps of the boot messages: > > > > [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpuset > > [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpu > > [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpuacct > > > > However, the timestamps of the BUG and surrounding messages are off by 150 seconds or 2.5 minutes (09:53:28, > but the crash occurred at 09:50:58): > > > > [Tue Jul 6 09:53:28 +03 2021] BUG: unable to handle kernel > > [Tue Jul 6 09:53:28 +03 2021] NULL pointer dereference at (null) > > [Tue Jul 6 09:53:28 +03 2021] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40 > > > > For comparison, tslog has the correct timestamps around the crash: > > > > 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel > > 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at (null) > > 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40 > > > > But it’s off by the same 2.5-minute interval during the boot messages (17:18:59, but it should be 17:21:29): > > > > 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpuset > > 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpu > > 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpuacct > > > > The reasons for these inaccuracies are: > > > > 1. log -T computes boot time as (crash time minus uptime), and then adds the seconds part of the time > stored in each log entry to generate the entry’s timestamp. > > > > 2. tslog computes a base time as (crash time minus seconds part of the final log entry) and then adds > the seconds part of the time stored in each log entry. > > > > 3. The key to the problem: printk() stores the log entry timestamp based on a call to local_clock(), > which provides the raw time in the local CPU, which is not adjusted by NTP. As such, inaccuracy accumulates > as the clock drifts over time; in the example above, the discrepancy has reached 150 seconds by the time > of the crash. > > > > I have seen this now in numerous vmcores from kernels 3.0 through 4.12, which is the latest version I’ve > checked. The discrepancy exists in all of them with more than a few hours of uptime and can be positive > or negative. In one vmcore it was over 13 hours after 239 days of uptime. > > > > From the PyKdump perspective, I could live with the current tslog behavior because it’s accurate for log > entries near the crash time, which are generally the most useful for crash analysis. But in an attempt > to improve it, I’m testing an experimental version of tslog that smooths out the discrepancy. It computes > the boot time the same way that crash does (crash time minus uptime) and a scale factor for the time discrepancy > equal to (uptime)/(seconds part of final log entry). Then each log entry’s timestamp is computed as (boot > time) + (seconds part of log entry)*(scale factor). This version results in correct timestamps at both > ends of the log: > > > > 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpuset > > 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpu > > 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpuacct > > : > > 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel > > 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at (null) > > 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40 > > > > This is something of a hack since there’s no guarantee that the clock will drift uniformly during the system > uptime, so timestamps in the middle of the log may still be somewhat inaccurate (but probably more accurate > than without this adjustment). > > > > Regards, > > Martin > > > > > > Explore the HPE digital support experience: https://support.hpe.com <https://support.hpe.com> > > > > Martin Moore > Linux Engineering Resolution Team > > HPE Pointnext Services > Hewlett Packard Enterprise > > Martin.Moore@xxxxxxx <mailto:Martin.Moore@xxxxxxx> > 8AM-5PM EDT (UTC-4) Monday-Friday > Manager: Becky McBride (becky.mcbride@xxxxxxx <mailto:becky.mcbride@xxxxxxx> ) > > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility