"Power-on or device reset occurred" after a LUN resize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Last week some action that we do regularly caused some issues.

00:50:31 CEST -> We resized a iSCSI LUN on a SAN from 3TB -> 4TB.

The clients did detect the change fine, and resized it devices:

Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: Capacity data has changed
Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: Inquiry data has changed
Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: alua: supports implicit TPGS Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: alua: device t10.NETAPP   LUN 80Vcx]PVRq4F        port group 3e9 rel port 8
Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: Capacity data has changed
Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: [sdf] 8589934592 512-byte logical blocks: (4.40 TB/4.00 TiB) Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: [sdf] 4096-byte physical blocks Sep 22 00:51:07 server001 kernel: sdf: detected capacity change from 3298534883328 to 4398046511104 Sep 22 00:51:07 server001 kernel: sd 16:0:0:1: alua: port group 3e9 state A non-preferred supports TolUsNA
Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: Inquiry data has changed
Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: alua: supports implicit TPGS Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: alua: device t10.NETAPP   LUN 80Vcx]PVRq4F        port group 3e9 rel port 7 Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: [sdi] 8589934592 512-byte logical blocks: (4.40 TB/4.00 TiB) Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: [sdi] 4096-byte physical blocks Sep 22 00:51:07 server001 kernel: sdi: detected capacity change from 3298534883328 to 4398046511104 Sep 22 00:51:07 server001 kernel: sd 17:0:0:1: alua: port group 3e9 state A non-preferred supports TolUsNA
Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: Capacity data has changed
Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: Inquiry data has changed
Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: alua: supports implicit TPGS Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: alua: device t10.NETAPP   LUN 80Vcx]PVRq4F        port group 3e8 rel port 6 Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: [sdl] 8589934592 512-byte logical blocks: (4.40 TB/4.00 TiB) Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: [sdl] 4096-byte physical blocks Sep 22 00:51:12 server001 kernel: sdl: detected capacity change from 3298534883328 to 4398046511104 Sep 22 00:51:12 server001 kernel: sd 18:0:0:1: alua: port group 3e8 state N non-preferred supports TolUsNA
Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: Capacity data has changed
Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: Inquiry data has changed
Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: alua: supports implicit TPGS Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: alua: device t10.NETAPP   LUN 80Vcx]PVRq4F        port group 3e8 rel port 5 Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: [sdc] 8589934592 512-byte logical blocks: (4.40 TB/4.00 TiB) Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: [sdc] 4096-byte physical blocks Sep 22 00:51:18 server001 kernel: sdc: detected capacity change from 3298534883328 to 4398046511104 Sep 22 00:51:18 server001 kernel: sd 15:0:0:1: alua: port group 3e8 state N non-preferred supports TolUsNA Sep 22 00:52:09 server001 kernel: sd 16:0:0:1: Power-on or device reset occurred Sep 22 00:52:09 server001 kernel: sd 16:0:0:1: alua: port group 3e9 state A non-preferred supports TolUsNA Sep 22 00:52:09 server001 kernel: sd 17:0:0:1: Power-on or device reset occurred

But then it kept doing resets:
Sep 22 00:54:39 server001 kernel: sd 16:0:0:1: Power-on or device reset occurred Sep 22 00:54:39 server001 kernel: sd 16:0:0:1: alua: port group 3e9 state A non-preferred supports TolUsNA Sep 22 00:54:39 server001 kernel: sd 17:0:0:1: Power-on or device reset occurred Sep 22 00:54:39 server001 kernel: sd 17:0:0:1: alua: port group 3e9 state A non-preferred supports TolUsNA Sep 22 00:54:42 server001 kernel: sd 15:0:0:1: Power-on or device reset occurred Sep 22 00:54:42 server001 kernel: sd 15:0:0:1: alua: port group 3e8 state N non-preferred supports TolUsNA

This caused some multipath failovers until it stopped after ~10 minutes.

We do use ALUA multipath:
3600a098038305663785d505652713446 dm-15 NETAPP,LUN C-Mode
size=4.0T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 16:0:0:1 sdf 8:80  active ready running
| `- 17:0:0:1 sdi 8:128 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 15:0:0:1 sdc 8:32  active ready running
  `- 18:0:0:1 sdl 8:176 active ready running


Who is sending the Power-on or device reset?
Is that the SAN?
Or does the client trigger a reset (for which reason then?)?
The LUN is attachted to multiple servers (all CentOS 8), and all showed the same resets.

It would be nice to find out what caused this!

Thanks for having a look :)
Jean-Louis





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux