On Apr 13, 2023 / 09:20, Shin'ichiro Kawasaki wrote: > On Apr 12, 2023 / 10:49, alan.adamson@xxxxxxxxxx wrote: > > > > On 4/12/23 1:59 AM, Shin'ichiro Kawasaki wrote: > > > From: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > > > > > > The test case nvme/039 tests that expected error messages are printed > > > for errors injected to the nvme driver. However, the test case fails by > > > chance when previous test cases generate many error messages. In this > > > case, the kernel function pr_err_ratelimited() may suppress the error > > > messages that the test case expects. Also, it may print messages that > > > the test case does not expect, such as "blk_print_req_error: xxxx > > > callbacks suppressed". > > > > > > To avoid the failure, make two improvements for the test case. Firstly, > > > wait DEFAULT_RATE_LIMIT seconds at the beginning of the test to ensure > > > the expected error messages are not suppressed. Secondly, exclude the > > > unexpected message for the error message check. Introduce a helper > > > function last_dmesg() for the second improvement. > > > > Why are we seeing the callback messages? By the time the test starts > > generating errors (after a 5 sec delay) we should be able to log 10 messages > > without any being suppressed. > > That is because other test cases before nvme/039 can generate errors. For > instance, block/014 generates many errors. When I ran block/014 and nvme/039 in > sequence, I always observe nvme/039 failure even with the 5 seconds wait in > nvme/039. I suggest to excldue the "callbacks message" to avoid the nvme/038 > failure regardless of the errors generated before nvme/039. Reading back my explanation above, I found it may not be clear enough. Let me ammend it. My understanding is as follows. The test case block/014 generates many error messages by blk_print_req_error(). The messages are rate limited and some of them are suppressed. At that time, __ratelimit() calls printk_deferred() to print the "callbacks suppressed" message, which is deferred and not printed during block/014. When nvme/039 triggers an error message (after the 5 sec delay), the "callbacks suppressed" message for blk_print_req_error() and block/014 is printed. It makes the nvme/039 fail, since nvme/039 expects the error message it triggered, but it finds the "callbacks suppressed" message instead. In theory, not only block/014 but also other workload with rate limited error messages can cause this nvme/039 failure. The 5 sec delay in nvme/039 ensures that the error messages for nvme/039 are not suppressed, but it can not avoid the "callbacks suppressed" messages that deferred before nvme/039.