On 2022/12/19 22:53, John Garry wrote:
On 19/12/2022 12:59, yangxingui wrote:
Firstly, I think that there is a bug in sas_ata_device_link_abort()
-> ata_link_abort() code in that the host lock in not grabbed, as the
comment in ata_port_abort() mentions. Having said that, libsas had
already some dodgy host locking usage - specifically dropping the
lock for the queuing path (that's something else to be fixed up ... I
think that is due to queue command CB calling task_done() in some
cases), but I still think that sas_ata_device_link_abort() should be
fixed (to grab the host lock).
ok, I agree with you very much for this, I had doubts about whether we
needed to grab lock before.
ok, I hope that you can fix this up separately.
Secondly, this just seems like a half solution to the age-old problem
- that is, EH eventually kicking in only after 30 seconds when a disk
is removed with active IO. I say half solution as SAS disks still
have this issue for libsas. Can we instead push to try to solve both
of them now?
Jason said you must have such an opinion "a half solution". As libsas
does not have any interface to mark all outstanding commands as failed
for SAS disk currently and SAS disk support I/O resumable transmission
after intermittent disconnections
I don't know what you mean by "resumable transmission after intermittent
disconnections".
, so I want to optimize sata disk first.
If we want to achieve a complete solution, perhaps we need to define
such an interface in libsas and implement it by lldd. My current idea
is to call sas_abort_task() for all outstanding commands in lldd. I
wonder if you approve of this?
Are you sure you mean sas_abort_task()? That is for the LLDD to issue an
abort TMF. I assume that you mean sas_task_abort(). If so, I am not too
keen on the idea of libsas calling into the LLDD to inform of such an
event.
I've implemented this solution. The verification seems to be ok both for
sas/sata device. I'll update the version again. Please have a look?
Thanks,
Xingui
Note that maybe a tagset iter function could be used by libsas to
abort each active IO, but I don't like libsas messing with such a thing;
in addition, there may be some conflict between libsas aborting the IO
and the IO completing with error in the LLDD.
Please note that I need to refresh my memory on this whole EH topic...
Thanks,
John
.