Re: blktest failures

Bob Pearson <rpearsonhpe@xxxxxxxxx> · Fri, 15 Apr 2022 02:46:30 -0500

On 4/15/22 02:37, Yanjun Zhu wrote:
> 
> 在 2022/4/15 15:29, Bob Pearson 写道:
>> On 4/15/22 02:12, Yanjun Zhu wrote:
>>> 在 2022/4/10 5:43, Bob Pearson 写道:
>>>> On 4/9/22 00:04, Christoph Hellwig wrote:
>>>>> On Fri, Apr 08, 2022 at 04:25:12PM -0700, Bart Van Assche wrote:
>>>>>> One of the functions in the above call stack is sd_remove(). sd_remove()
>>>>>> calls del_gendisk() just before calling sd_shutdown(). sd_shutdown()
>>>>>> submits the SYNCHRONIZE CACHE command. In del_gendisk() I found the
>>>>>> following comment: "Fail any new I/O". Do you agree that failing new I/O
>>>>>> before sd_shutdown() is called is wrong? Is there any other way to fix this
>>>>>> than moving the blk_queue_start_drain() etc. calls out of del_gendisk() and
>>>>>> into a new function?
>>>>> That SYNCHRONIZE CACHE is a passthrough command sent on the request_queue
>>>>> and should not be affected by stopping all file system I/O.
>>>> When I run check -q srp
>>>> all the test cases pass but each one stops for 3+ minutes at synchronize cache.
>>>> The rxe device is still active until sync cache returns when the last QP and the PD
>>>> are destroyed. It may be that the queues are blocked waiting for something else
>>>> even though they have reported success??
>>> If you remove all the xarray patches and use the original source code. This will not occur.
>>>
>>> Zhu Yanjun
>>>
>> I missed one other point. The 3 minute delay is actually not a rxe bug at all but was recently
>> caused by a bad scsi patch which has since been reverted.
> 
> I am not sure about this because wr NULL problem exists with xarray patches.
> 
> Please let us find the root cause of wr NULL.
> 
> This can make RXE more stable.
> 
> Zhu Yanjun
> 

You mean mr = NULL. And it is not happening in my tree. I have WARN_ONs looking for it
and it isn't happening.