On Fri, Oct 22, 2021 at 09:44:24PM +0800, Shuai Xue wrote: > When injecting an error into the platform, the OSPM executes an > EXECUTE_OPERATION action to instruct the platform to begin the injection > operation. And then, the OSPM busy waits for a while by continually > executing CHECK_BUSY_STATUS action until the platform indicates that the > operation is complete. More specifically, the platform is limited to > respond within 1 millisecond right now. This is too strict for some > platforms. > > For example, in Arm platform, when injecting a Processor Correctable error, > the OSPM will warn: > Firmware does not respond in time. > > And a message is printed on the console: > echo: write error: Input/output error > > We observe that the waiting time for DDR error injection is about 10 ms > and that for PCIe error injection is about 500 ms in Arm platform. > > In this patch, we relax the response timeout to 1 second and allow user to > pass the time out value as a argument. > > Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> > --- > Changelog v1 -> v2: > - Implemented the timeout in msleep instead of udelay. > - Link to the v1 patch: https://lkml.org/lkml/2021/10/14/1402 > --- > drivers/acpi/apei/einj.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c > index 133156759551..e411eb30e0ee 100644 > --- a/drivers/acpi/apei/einj.c > +++ b/drivers/acpi/apei/einj.c > @@ -28,9 +28,9 @@ > #undef pr_fmt > #define pr_fmt(fmt) "EINJ: " fmt > > -#define SPIN_UNIT 100 /* 100ns */ > -/* Firmware should respond within 1 milliseconds */ > -#define FIRMWARE_TIMEOUT (1 * NSEC_PER_MSEC) > +#define SLEEP_UNIT 1 /* 1ms */ I know I pointed you to msleep() ... sorry, I was wrong. For a 1 ms sleep the recommendation is to use usleep_range() See this write-up in Documentation/timers/timers-howto.rst: - Why not msleep for (1ms - 20ms)? Explained originally here: https://lore.kernel.org/r/15327.1186166232@xxxxxxx msleep(1~20) may not do what the caller intends, and will often sleep longer (~20 ms actual sleep for any value given in the 1~20ms range). In many cases this is not the desired behavior. To answer the question posed in that document on "What is a good range?" I don't think injection cares too much about precision here. Maybe go with usleep_range(1000, 5000); [with #defines for SLEEP_UNIT_MIN, SLEEP_UNIT_MAX instead of those numbers] > +/* Firmware should respond within 1 seconds */ > +#define FIRMWARE_TIMEOUT (1 * MSEC_PER_SEC) > #define ACPI5_VENDOR_BIT BIT(31) > #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ > ACPI_EINJ_MEMORY_UNCORRECTABLE | \ > @@ -40,6 +40,8 @@ > * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. > */ > static int acpi5; > +static int timeout_default = FIRMWARE_TIMEOUT; > +module_param(timeout_default, int, 0644); You've set the default to 1 second. Who would use this parameter? Do you anticipate systems that take even longer to inject? A user might set a shorter limit ... but I don't see why they would want to. > > struct set_error_type_with_address { > u32 type; > @@ -171,12 +173,12 @@ static int einj_get_available_error_type(u32 *type) > > static int einj_timedout(u64 *t) > { > - if ((s64)*t < SPIN_UNIT) { > + if ((s64)*t < SLEEP_UNIT) { > pr_warn(FW_WARN "Firmware does not respond in time\n"); > return 1; > } > - *t -= SPIN_UNIT; > - ndelay(SPIN_UNIT); > + *t -= SLEEP_UNIT; > + msleep(SLEEP_UNIT); > touch_nmi_watchdog(); Since we are sleeping instead of spinning, maybe we don't need to touch the nmi watchdog? > return 0; > } > @@ -403,7 +405,7 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, > u64 param3, u64 param4) > { > struct apei_exec_context ctx; > - u64 val, trigger_paddr, timeout = FIRMWARE_TIMEOUT; > + u64 val, trigger_paddr, timeout = timeout_default; > int rc; > > einj_exec_ctx_init(&ctx); > -- > 2.20.1.12.g72788fdb > -Tony