Re: [PATCH blktests] nvme/039: avoid failure by error message rate limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/12/23 8:56 PM, Shin'ichiro Kawasaki wrote:
On Apr 13, 2023 / 09:20, Shin'ichiro Kawasaki wrote:
On Apr 12, 2023 / 10:49, alan.adamson@xxxxxxxxxx wrote:
On 4/12/23 1:59 AM, Shin'ichiro Kawasaki wrote:
From: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>

The test case nvme/039 tests that expected error messages are printed
for errors injected to the nvme driver. However, the test case fails by
chance when previous test cases generate many error messages. In this
case, the kernel function pr_err_ratelimited() may suppress the error
messages that the test case expects. Also, it may print messages that
the test case does not expect, such as "blk_print_req_error: xxxx
callbacks suppressed".

To avoid the failure, make two improvements for the test case. Firstly,
wait DEFAULT_RATE_LIMIT seconds at the beginning of the test to ensure
the expected error messages are not suppressed. Secondly, exclude the
unexpected message for the error message check. Introduce a helper
function last_dmesg() for the second improvement.
Why are we seeing the callback messages?  By the time the test starts
generating errors (after a 5 sec delay) we should be able to log 10 messages
without any being suppressed.
That is because other test cases before nvme/039 can generate errors. For
instance, block/014 generates many errors. When I ran block/014 and nvme/039 in
sequence, I always observe nvme/039 failure even with the 5 seconds wait in
nvme/039. I suggest to excldue the "callbacks message" to avoid the nvme/038
failure regardless of the errors generated before nvme/039.
Reading back my explanation above, I found it may not be clear enough. Let me
ammend it. My understanding is as follows.

The test case block/014 generates many error messages by blk_print_req_error().
The messages are rate limited and some of them are suppressed. At that time,
__ratelimit() calls printk_deferred() to print the "callbacks suppressed"
message, which is deferred and not printed during block/014. When nvme/039
triggers an error message (after the 5 sec delay), the "callbacks suppressed"
message for blk_print_req_error() and block/014 is printed. It makes the
nvme/039 fail, since nvme/039 expects the error message it triggered, but it
finds the "callbacks suppressed" message instead. In theory, not only block/014
but also other workload with rate limited error messages can cause this nvme/039
failure.

The 5 sec delay in nvme/039 ensures that the error messages for nvme/039 are not
suppressed, but it can not avoid the "callbacks suppressed" messages that
deferred before nvme/039.

That's right.  I went through the rate limit code and that's how it works, you will get a "callback suppressed" message along with the first message nvme/039 logs.

I ran the following for a while:

while true

do

./check block/014

./check nvme/039

done


Reviewed-by: Alan Adamson <alan.adamson@xxxxxxxxxx>

Tested-by: Alan Adamson <alan.adamson@xxxxxxxxxx>


Alan





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux