On Thu, Jan 9, 2020 at 10:50 AM Borislav Petkov <bp@xxxxxxxxx> wrote: > > On Wed, Jan 08, 2020 at 09:17:38AM -0800, Bhaskar Upadhaya wrote: > > Currently Linux register ghes_poll_func with TIMER_DEFERRABLE flag, > > because of which it is serviced when the CPU eventually wakes up with a > > subsequent non-deferrable timer and not at the configured polling interval. > > > > For polling mode, the polling interval configured by firmware should not > > be exceeded as per ACPI_6_3 spec[refer Table 18-394], So Timer need to > > be configured in non-deferrable mode by removing TIMER_DEFERRABLE flag. > > With NO_HZ enabled and timer callback being configured in non-deferrable > > mode, timer callback will get called exactly after polling interval. > > > > Definition of poll interval as per spec (referred ACPI 6.3): > > "Indicates the poll interval in milliseconds OSPM should use to > > periodically check the error source for the presence of an error > > condition" > > > > We are observing an issue in our ThunderX2 platforms wherein > > ghes_poll_func is not called within poll interval when timer is > > configured with TIMER_DEFERRABLE flag(For NO_HZ kernel) and hence > > we are losing the error records. > > > > Impact of removing TIMER_DEFFERABLE flag > > - With NO_HZ enabled, additional timer ticks and unnecessary wakeups of > > the cpu happens exactly after polling interval. > > > > - If polling interval is too small than polling function will be called > > too frequently which may stall the cpu. > > If that becomes a problem, the polling interval setting should be fixed > to filter too small values. > > Anyway, I went and streamlined your commit message: > > apei/ghes: Do not delay GHES polling > > Currently, the ghes_poll_func() timer callback is registered with the > TIMER_DEFERRABLE flag. Thus, it is run when the CPU eventually wakes > up together with a subsequent non-deferrable timer and not at the precisely > configured polling interval. > > For polling mode, the polling interval configured by firmware should not > be exceeded according to the ACPI spec 6.3, Table 18-394. The definition > of the polling interval is: > > "Indicates the poll interval in milliseconds OSPM should use to > periodically check the error source for the presence of an error > condition." > > If this interval is extended due to the timer callback deferring, error > records can get lost. Which we are observing on our ThunderX2 platforms. > > Therefore, remove the TIMER_DEFERRABLE flag so that the timer callback > executes at the precise interval. > > and made it more readable, hopefully. > > Rafael, pls fixup when applying. Done. > With that: > > Acked-by: Borislav Petkov <bp@xxxxxxx> Thanks!