在 2018/2/27 22:57, Bart Van Assche 写道:
On Tue, 2018-02-27 at 15:09 +0800, chenxiang (M) wrote:
在 2018/2/26 23:25, Bart Van Assche 写道:
On Mon, 2018-02-26 at 17:37 +0800, chenxiang (M) wrote:
When i have a test on kernel 4.16-rc1, find a issue: running IO on SATA disk, then disable the disk through
sysfs interface(echo 0 > /sys/class/sas_phy/phy-1:0:0/enable), IO will hang and never enter SCSI EH. The issue
appears every time.
I add some prints on code and find that those IOs will be timeout after 30s, and they all enter
function scsi_eh_scmd_add, but only some of them can enter function scsi_eh_inc_host_failed. So it will never
enter SCSI EH. I suspect it is related to the patch ("commit 3bd6f43f5cb371" scsi: core: Ensure that the
SCSI error handler gets woken up ). Please have a check.
Hello chenxiang,
Had you already noticed patch "[PATCH v2] Avoid that ATA error handling can
trigger a kernel hang or oops"? If not, can you apply that patch to your
kernel and verify whether it fixes this behavior? See also
https://www.mail-archive.com/linux-scsi@xxxxxxxxxxxxxxx/msg71189.html or
https://patchwork.kernel.org/patch/10236213/.
After applied your patch, the issue i reported seems be solved.
Thanks for having testing that patch!
But when i have long time test(disable/enable disk when running IO) on
the testcase, Null pointer occurs.
It seems not related to current issue but i am not sure.
I ran the testcase for long time before in kernel 4.15-rc5, and it was okay.
Part of log is as follows, and i add attachment of log in the email :
[ 485.716578] pc : blk_abort_request+0x14/0x68
(+Tejun)
Hello chenxiang,
Please check whether the following patch fixes the kernel crash you ran into:
https://marc.info/?l=linux-block&m=151895951207014
It seems the patch is for block mq, but the issue i encount is under
block legacy as CONFIG_SCSI_MQ_DEFAULT is not enabled.
Thanks,
Bart.