Re: SRPt oops with 4.5-rc3-ish

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/28/2016 12:00 AM, Nicholas A. Bellinger wrote:
> On Sat, 2016-02-27 at 20:49 -0800, Bart Van Assche wrote:
>> On 02/27/16 20:47, Nicholas A. Bellinger wrote:
>>> On Sat, 2016-02-27 at 20:18 -0800, Bart Van Assche wrote:
>>>> On 02/27/16 19:37, Nicholas A. Bellinger wrote:
>>>>> This is a fairly recent srpt shutdown regression, right..?
>>>>
>>>> Hi Nic,
>>>>
>>>> My patch series to make TMR handling synchronous fixes what Doug
>>>> reported. If you want I can rebase and repost that patch series.
>>>>
>>>
>>> There aren't even any TMRs being processed, so I don't see how that has
>>> anything to do with it.
>>>
>>> >From the logs, this OOPsen is related to some manner of recent srpt
>>> configfs se_node_acl + se_session active I/O shutdown regression.
>>>
>>> So short of sitting down and reproducing myself on v4.5-rc code,
>>> commit 59fae4de's removal of ib_create_cq() + ib_comp_handler callback
>>> usage look like a good place to start the investigation.
>>>
>>> It would be useful to first find out what changes introduced this
>>> regression, and how far back Doug is able to reproduce.
>>
>> As I wrote before, this patch series works 100% stable on top of my most 
>> recent LIO core patch series, a patch series I have also made available 
>> on github. So what Doug ran into is a LIO core bug and not an ib_srpt bug.
>>
> 
> Active I/O shutdown with srpt has not always triggered this OOPs.
> 
> There is a reason why this is happening now, and it needs to be
> identified.
> 
> Either you can help out doing that, or not.  Either way, I'm certainly
> not going to let you hack up LIO TMR code, when there even aren't signs
> ABORT_TASK and friends are occuring in Doug's particular shutdown case.
> 

Sorry I didn't notice this thread had picked back up, I was off on other
stuff.

I can't say if this is new or not.  We added some new testing, that had
considerably more luns in use and more transfers taking place, and while
I was rebooting some actively used servers, I saw this issue.  It might
exist on earlier kernels, I would have to try them to know for sure.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux