On 4/15/22 02:37, Yanjun Zhu wrote: > > 在 2022/4/15 15:29, Bob Pearson 写道: >> On 4/15/22 02:12, Yanjun Zhu wrote: >>> 在 2022/4/10 5:43, Bob Pearson 写道: >>>> On 4/9/22 00:04, Christoph Hellwig wrote: >>>>> On Fri, Apr 08, 2022 at 04:25:12PM -0700, Bart Van Assche wrote: >>>>>> One of the functions in the above call stack is sd_remove(). sd_remove() >>>>>> calls del_gendisk() just before calling sd_shutdown(). sd_shutdown() >>>>>> submits the SYNCHRONIZE CACHE command. In del_gendisk() I found the >>>>>> following comment: "Fail any new I/O". Do you agree that failing new I/O >>>>>> before sd_shutdown() is called is wrong? Is there any other way to fix this >>>>>> than moving the blk_queue_start_drain() etc. calls out of del_gendisk() and >>>>>> into a new function? >>>>> That SYNCHRONIZE CACHE is a passthrough command sent on the request_queue >>>>> and should not be affected by stopping all file system I/O. >>>> When I run check -q srp >>>> all the test cases pass but each one stops for 3+ minutes at synchronize cache. >>>> The rxe device is still active until sync cache returns when the last QP and the PD >>>> are destroyed. It may be that the queues are blocked waiting for something else >>>> even though they have reported success?? >>> If you remove all the xarray patches and use the original source code. This will not occur. >>> >>> Zhu Yanjun >>> >> I missed one other point. The 3 minute delay is actually not a rxe bug at all but was recently >> caused by a bad scsi patch which has since been reverted. > > I am not sure about this because wr NULL problem exists with xarray patches. > > Please let us find the root cause of wr NULL. > > This can make RXE more stable. > > Zhu Yanjun > You mean mr = NULL. And it is not happening in my tree. I have WARN_ONs looking for it and it isn't happening.