Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




James Bottomley wrote:
> On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
>> James Bottomley wrote:
>>> On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
>>>> James Bottomley wrote:
>>>>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>>>>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>>>>>> async threads executing sd_probe_async of disks on unrelated host adapters.
>>>>>> This patch executes every sd_probe_async in its own async domain allowing
>>>>>> sd_remove() to wait for just the completion of the async thread associated with
>>>>>> the scsi_disk being removed.
>>>>> This patch was thought of a while ago. Unfortunately, some of the
>>>>> unrelated threads we end up waiting on are libata ones.  you confine sd
>>>>> to only its own probes, we end up unsynchronised with respect to libata
>>>>> probes and we might cause ordering problems amongst the ata devices.
>>>>>
>>>> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
>>>> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
>>>> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
>>>> else that I admittedly don't understand (but should)?
>>> Well, no ... the sync on remove is preventing us removing a device whose
>>> async part is still running.  That async part for libata includes pieces
>>> kicked off from the sd probe.
>> What does the call stack look like that spawns the async part of libata?
> 
> it's in libata-core (and some in ahci) just do a git grep async in
> drivers/ata
> 
>> What kind of hardware do I need to demonstrate this?
> 
> To demonstrate what?  The out of order sequencing that can arise?  any
> ata based system with > 1 device should do.
> 
> There's actually another problem with the previous patch ...
> scsi_wait_scan() also needs to synchronise with the sd probes.
> 
>>>> I would truly like to better understand the issue.  Would someone mind expanding
>>>> upon the concern about ata ordering issues associated with the removal of a
>>>> single scsi_disk?
>>> The problem isn't removal per se ... it's the fact that remove can't
>>> complete until any async pieces remaining from probe have run. 
>> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
>> async work other than sd_probe_async().  It does complicate serialization
>> at removal.
>>
>> I'll try to capture the "motivation" from within my fibre channel centric world
>> for the change and see if anyone's got some ideas on how to resolve the issue.
> 
> OK ... actually describing the problem would be helpful.  The async
> schedule is only in sd_remove() to guard against add/remove races ...
> usually when we do removal, the async probing parts should be long
> finished so I don't understand why you think we would be waiting for
> stuff.

As I've been working out of 2.6.31 and things have changed pretty dramatically
in scsi-misc, I'm going to update my test environment to scsi-misc and see
if I continue to observe the same behavior.

Moving targets are good.  Moving targets are good.  Moving targets are good.

:)

Mike

> 
> James
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux