On Tue, Jun 05, 2018 at 10:18:53AM -0600, Keith Busch wrote: > On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote: > > Hi Keith > > I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. > > > > Server: Dell R730xd > > NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01) > > > > Console log: > > Kernel 4.17.0-rc7 on an x86_64 > > > > storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34 > > [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 > > [ 6049.108478] {1}[Hardware Error]: event severity: fatal > > [ 6049.108479] {1}[Hardware Error]: Error 0, type: fatal > > [ 6049.108481] {1}[Hardware Error]: section_type: PCIe error > > [ 6049.108482] {1}[Hardware Error]: port_type: 6, downstream switch port > > [ 6049.108483] {1}[Hardware Error]: version: 1.16 > > [ 6049.108484] {1}[Hardware Error]: command: 0x0407, status: 0x0010 > > [ 6049.108485] {1}[Hardware Error]: device_id: 0000:83:05.0 > > [ 6049.108486] {1}[Hardware Error]: slot: 0 > > [ 6049.108487] {1}[Hardware Error]: secondary_bus: 0x85 > > [ 6049.108488] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8734 > > [ 6049.108489] {1}[Hardware Error]: class_code: 000406 > > [ 6049.108489] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003 > > [ 6049.108491] Kernel panic - not syncing: Fatal hardware error! > > [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Could you attach 'lspci -vvv -s 0000:83:05.0'? Just want to see your switch's capabilities to confirm the pre-test checks are really sufficient.