Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/07/2017 07:58 PM, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 07:55:32PM -0700, Jens Axboe wrote:
>> On 11/07/2017 05:39 PM, Ming Lei wrote:
>>> On Tue, Nov 07, 2017 at 04:20:08PM +0000, Bart Van Assche wrote:
>>>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>>>>> If you can reproduce, please provide me at least the following log
>>>>> first:
>>>>>
>>>>> 	find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>>>>
>>>>> If any pending requests arn't completed, please provide the related
>>>>> info in dbgfs about where is the request.
>>>>
>>>> Every time I ran the above or a similar command its output was empty. I
>>>> assume that's because the hang usually occurs in a phase where these debugfs
>>>> attributes either have not yet been created or have already disappeared.
>>>
>>> Could you dump all tags? Then you can see if this attribute is disappeared.
>>>
>>> If that output is empty, it often means there isn't pending request not
>>> completed. So if that is true, your hang is _not_ related with RESTART.
>>
>> You need to check sched_tags as well. It could still be a restart race
>> or problem, if tags is empty but sched_tags has busy bits.
> 
> Yeah, I didn't mention because SRP is MQ hardware, and the default
> scheduler is none, but if Bart changes that, the sched_tags need to
> checked first.

At this point, I have no idea what Bart's setup looks like. Bart, it
would be REALLY helpful if you could tell us how you are reproducing
your hang. I don't know why this has to be dragged out.

Ming/Bart - there seems to be an increasing amount of tension between
you two, for reasons that are unknown to me. I suggest you put that
aside in the pursuit of fixing the current issue, and then we can
discuss how to best resolve these going forward. But right now the top
priority is getting to the bottom of this. There's a chance that the
issue I can reproduce is the same that Bart is seeing, in which case we
might be fixing both in one fell swoop. But if that isn't the case, then
we have some work to do this week.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux