[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Wed, 25 Apr 2018 14:50:48 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #17 from Don (don.brace@xxxxxxxxxxxxx) ---
(In reply to Anthony Hausman from comment #16)
> Don,
> 
> So I'm actually running the kernel 4.16.3 (build 18-04-19) with the hpsa
> modules patch to use local work-queue insead of system work-queue.
> 
> I have a reproduce a reset with no stack trace (which is a good news).
> The only thing is between the resetting logical and the completation, 2
> hours passed and caused an heavy load on the server during this time:
> 
> Apr 25 01:31:09 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical 
> Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
> Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: device is ready.
> Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical 
> completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0
> SSDSmartPathCap- En- Exp=1
> 
> The good thing after the reset has completed, this one is removed:
> 
> Apr 25 03:31:45 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: removed
> Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1

The driver was notified by the P420i that the volume went offline, so the
driver removed it from SML.

> Apr 25 03:31:48 kernel: scsi 0:1:0:0: rejecting I/O to dead device

There were I/O requests for the device, but the SML detected that it was
deleted.

> 
> So the question is if it's normal than the reset logical take such a long
> time (and causing trouble on the server)?

It is not normal.

For a Logical Volume reset, the P420i flushes out any outstanding I/O requests
then returns. The SML should block any new requests from coming down while the
reset is in progress.

Do you know what process was consuming the CPU cycles?
ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort
-nk1 | head -20

Are your using sg_reset to test LV resets? Or, does the device have some
intermittent issues which is causing the SML to issue the reset operation?

If you turn off the agents, do the resets complete more quickly?

I am wondering if the agents are frequently probing the P420i for changes when
the reset is active and the agents are consuming the CPU cycles.

-- 
You are receiving this mail because:
You are the assignee for the bug.