On Thu, Oct 14, 2021 at 09:09:07AM +0200, Hannes Reinecke wrote: > On 10/13/21 4:53 PM, Konstantin Shelekhin wrote: > > On Wed, Oct 13, 2021 at 04:22:41PM +0200, Hannes Reinecke wrote: > > > On 10/13/21 3:21 PM, Konstantin Shelekhin wrote: > > > Short answer: you can't. > > > > > > There is no feasible path in the I/O stack to abort running I/O; the > > > only chance you have here is to wait for it to time-out. > > > > > > We have run into similar issues, and found that the only sane solution > > > was to wait for the I/O to come back and then retry. > > > As this would take some time (30 seconds if you are unlucky) most > > > initiators will get unhappy and try to reset. > > > Which won't work, either, as the I/O is still stuck. > > > So we finally delayed relogin until all I/O was cleared. > > > > > > Not the best solution, but the only thing we can do in the absense of a > > > proper I/O abort mechanism. > > > > I'm not sure we are talking about the same bug. In this case the relogin > > is not possible, because new connections are rejected by the target and > > the existing one is not going anywhere, because it's deadlocked on ABORT > > TASK. The only solution is to reset the server. > > > Precisely. > > Relogin fails as there is I/O outstanding on the original session, and you > try to relogin into the same session. Which is still busy, hence you cannot > login. > > And I/O is outstanding as it can't be aborted, as the only transport > implementing abort is target_core_user.c; for all the others you are > screwed. If I understand you correctly, you're talking about the very different case where bios sent to a backend's block device get stuck. True, we can do little in that case. In this case, however, there are no bios yet, TCM is still waiting for the data from the initiator. We can do anything we want here, because at this point TCM has complete control over the request execution.