Hi, we have 4 qemu hosts with identical hardware in a cluster which were running fine for a year or so. Recently one host started exhibiting issues with the disks going offline. If it happens all 10 disks attached to the controller go offline. Managed to capture some logs, they're a bit lower. According to our supplier these are hardware related. But if I boot the machine with Ubuntu 18.04.2 LTS and open 10 consoles, each of them looping while (true); dd if=/dev/sdX of=/dev/null bs=8M status=progress; done There's constantly over 3GB/s being pulled over the controller and this remains stable for days (haven't tested longer yet, still running ;) - the 5 SSD's are constantly (well) over 300MB/s, occasionally 1 or 2 will even go beyond 1TB/s and the 5 HDD's are constantly around 200-245MB/s starting higher and reducing a bit when the heads get closer to the axis). I don't mount the filesystems in the test. Any ideas on what might be going on? If it's hardware I have a hard time figuring out what part it would be. If the cables were bad or something I wouldn't expect Ubuntu to be capable of pulling these amounts of data constantly in a loop. SMART tests on the disk don't show any errors either, besides a lot of error corrections on the SAS HDD's (5 SATA SSD's and 5 SAS HDD's connected) -- but I have yet to see a HDD that ran for a while w/o any error correcting ;). Supplier uses a modified CentOS kernel with mpt3sas v 16.100.0.0. With this it usually doesn't remain stable for more than 2 days anymore. Frequently the disks go within hours with much less load than the dd on Ubuntu is generating. Then again, that's all sequential, doubt it has much relation to hardware errors though, just for disk speeds. # modinfo mpt3sas filename: /lib/modules/3.10.0-862.3.3.el7.ABCDE0005.07278de50e2c.x86_64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz license: GPL version: 16.100.00.00 description: LSI MPT Fusion SAS 3.0 Device Driver author: Avago Technologies <MPT-FusionLinux.pdl@xxxxxxxxxxxxx> retpoline: Y rhelversion: 7.5 srcversion: 0DA8EFC10B0842ECF25AEDC alias: pci:v00001000d000000D1sv*sd*bc*sc*i* alias: pci:v00001000d000000ACsv*sd*bc*sc*i* alias: pci:v00001000d000000ABsv*sd*bc*sc*i* alias: pci:v00001000d000000AAsv*sd*bc*sc*i* alias: pci:v00001000d000000AFsv*sd*bc*sc*i* alias: pci:v00001000d000000AEsv*sd*bc*sc*i* alias: pci:v00001000d000000ADsv*sd*bc*sc*i* alias: pci:v00001000d000000C3sv*sd*bc*sc*i* alias: pci:v00001000d000000C2sv*sd*bc*sc*i* alias: pci:v00001000d000000C1sv*sd*bc*sc*i* alias: pci:v00001000d000000C0sv*sd*bc*sc*i* alias: pci:v00001000d000000C8sv*sd*bc*sc*i* alias: pci:v00001000d000000C7sv*sd*bc*sc*i* alias: pci:v00001000d000000C6sv*sd*bc*sc*i* alias: pci:v00001000d000000C5sv*sd*bc*sc*i* alias: pci:v00001000d000000C4sv*sd*bc*sc*i* alias: pci:v00001000d000000C9sv*sd*bc*sc*i* alias: pci:v00001000d00000095sv*sd*bc*sc*i* alias: pci:v00001000d00000094sv*sd*bc*sc*i* alias: pci:v00001000d00000091sv*sd*bc*sc*i* alias: pci:v00001000d00000090sv*sd*bc*sc*i* alias: pci:v00001000d00000097sv*sd*bc*sc*i* alias: pci:v00001000d00000096sv*sd*bc*sc*i* depends: scsi_transport_sas,raid_class intree: Y vermagic: 3.10.0-862.3.3.el7.ABCDE0005.07278de50e2c.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 50:74:09:55:7C:DE:2C:54:03:15:E0:A6:EE:37:54:F6:92:34:3A:29 sig_hashalgo: sha256 parm: logging_level: bits for enabling additional logging info (default=0) parm: max_sectors:max sectors, range 64 to 32767 default=32767 (ushort) parm: missing_delay: device missing delay , io missing delay (array of int) parm: max_lun: max lun, default=16895 (int) parm: diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int) parm: disable_discovery: disable discovery (int) parm: prot_mask: host protection capabilities mask, def=7 (int) parm: max_queue_depth: max controller queue depth (int) parm: max_sgl_entries: max sg entries (int) parm: msix_disable: disable msix routed interrupts (default=0) (int) parm: smp_affinity_enable:SMP affinity feature enable/disbale Default: enable(1) (int) parm: max_msix_vectors: max msix vectors (int) parm: mpt3sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0) [ 470.880878] XFS (dm-23): Mounting V5 Filesystem [ 470.984565] XFS (dm-23): Ending clean mount [ 471.016086] XFS (dm-23): Unmounting Filesystem [ 471.164072] XFS (dm-23): Mounting V5 Filesystem [ 471.285423] XFS (dm-23): Ending clean mount [ 471.299501] XFS (dm-23): Unmounting Filesystem [ 471.432076] XFS (dm-23): Mounting V5 Filesystem [ 471.457987] XFS (dm-23): Ending clean mount [ 487.787187] mpt3sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100) [ 488.620705] sd 0:0:0:0: device_block, handle(0x000a) [ 488.620767] sd 0:0:1:0: device_block, handle(0x000b) [ 488.620813] sd 0:0:2:0: device_block, handle(0x000c) [ 488.620858] sd 0:0:3:0: device_block, handle(0x000d) [ 488.620903] sd 0:0:4:0: device_block, handle(0x000e) [ 488.620948] sd 0:0:5:0: device_block, handle(0x000f) [ 488.620993] sd 0:0:6:0: device_block, handle(0x0010) [ 488.621038] sd 0:0:7:0: device_block, handle(0x0011) [ 488.621084] sd 0:0:8:0: device_block, handle(0x0012) [ 488.621129] sd 0:0:9:0: device_block, handle(0x0013) [ 488.621177] ses 0:0:10:0: _scsih_block_io_device skip device_block for SES handle(0x0014) [ 488.621273] ses 0:0:10:0: _scsih_block_io_device skip device_block for SES handle(0x0014) [ 489.022953] sd 0:0:5:0: device_unblock and setting to running, handle(0x000f) [ 489.023224] sd 0:0:6:0: device_unblock and setting to running, handle(0x0010) [ 489.023501] sd 0:0:7:0: device_unblock and setting to running, handle(0x0011) [ 489.023777] sd 0:0:8:0: device_unblock and setting to running, handle(0x0012) [ 489.024049] sd 0:0:9:0: device_unblock and setting to running, handle(0x0013) [ 489.614432] sd 0:0:0:0: device_unblock and setting to running, handle(0x000a) [ 489.614527] sd 0:0:1:0: device_unblock and setting to running, handle(0x000b) [ 489.614586] sd 0:0:2:0: device_unblock and setting to running, handle(0x000c) [ 489.614643] sd 0:0:3:0: device_unblock and setting to running, handle(0x000d) [ 489.614700] sd 0:0:4:0: device_unblock and setting to running, handle(0x000e) [ 489.619660] blk_update_request: I/O error, dev sda, sector 0 [ 489.619776] Aborting journal on device dm-1-8. [ 489.619783] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 489.619789] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 30 05 95 10 00 00 08 00 [ 489.619791] blk_update_request: I/O error, dev sda, sector 805672208 [ 489.619812] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 489.619816] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 06 74 d5 50 00 00 48 00 [ 489.619818] blk_update_request: I/O error, dev sda, sector 108320080 [ 489.619877] Aborting journal on device dm-3-8. [ 489.619901] Buffer I/O error on dev dm-3, logical block 1606659, lost sync page write [ 489.619906] JBD2: Error -5 detected when updating journal superblock for dm-3-8. [ 489.619965] Buffer I/O error on dev dm-3, logical block 0, lost sync page write [ 489.619985] EXT4-fs error (device dm-3): ext4_journal_check_start:56: Detected aborted journal [ 489.619988] EXT4-fs (dm-3): Remounting filesystem read-only [ 489.620012] ------------[ cut here ]------------ [ 489.620025] WARNING: CPU: 11 PID: 1288 at fs/buffer.c:1118 mark_buffer_dirty+0xcd/0xe0 [ 489.620069] Buffer I/O error on dev dm-3, logical block 0, lost sync page write [ 489.620074] Modules linked in: iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) xt_comment xt_multiport ipt_REJECT nf_reject_ipv4 xfs iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx5_core(OE) [ 489.620076] EXT4-fs error (device dm-3): ext4_journal_check_start:56: [ 489.620077] mlxfw(OE) [ 489.620078] Detected aborted journal [ 489.620168] mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) devlink iTCO_wdt skx_edac iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass pcspkr joydev ses enclosure sg mei_me mei lpc_ich shpchp i2c_i801 acpi_cpufreq acpi_power_meter nbd nfsd auth_rpcgss nfs_acl lockd grace ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit crct10dif_pclmul crct10dif_common drm_kms_helper crc32_pclmul crc32c_intel ghash_clmulni_intel syscopyarea sysfillrect aesni_intel scsi_transport_iscsi sysimgblt lrw fb_sys_fops gf128mul ttm glue_helper ablk_helper e1000e cryptd ixgbe mpt3sas ahci drm libahci raid_class dm_multipath scsi_transport_sas ptp libata pps_core mdio dca i2c_core wmi ipmi_si ipmi_devintf nfit ipmi_msghandler [ 489.621695] Buffer I/O error on dev dm-1, logical block 2655236, lost sync page write [ 489.621701] JBD2: Error -5 detected when updating journal superblock for dm-1-8. [ 489.635832] libnvdimm acpi_pad sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zunicode] [ 489.635838] CPU: 11 PID: 1288 Comm: systemd-journal Kdump: loaded Tainted: P OE ------------ 3.10.0-862.3.3.el7.strato0005.07278de50e2c.x86_64 #1 [ 489.635840] Hardware name: Supermicro SYS-1029U-E1CRTP/X11DPU, BIOS 2.0 11/29/2017 [ 489.635841] Call Trace: [ 489.635857] [<ffffffff88f7378b>] dump_stack+0x19/0x1b [ 489.635866] [<ffffffff88891998>] __warn+0xd8/0x100 [ 489.635871] [<ffffffff88891add>] warn_slowpath_null+0x1d/0x20 [ 489.635878] [<ffffffff88a5104d>] mark_buffer_dirty+0xcd/0xe0 [ 489.635886] [<ffffffff88ad311a>] ext4_commit_super+0x18a/0x240 [ 489.635890] [<ffffffff88ad3c13>] __ext4_abort+0x153/0x170 [ 489.635901] [<ffffffff88abab2a>] ? ext4_dirty_inode+0x2a/0x60 [ 489.635906] [<ffffffff88ae581d>] ext4_journal_check_start+0x6d/0x90 [ 489.635911] [<ffffffff88ae5966>] __ext4_journal_start_sb+0x36/0xe0 [ 489.635916] [<ffffffff88abab2a>] ext4_dirty_inode+0x2a/0x60 [ 489.635921] [<ffffffff88a493bd>] __mark_inode_dirty+0x16d/0x270 [ 489.635928] [<ffffffff88a37b19>] update_time+0x89/0xd0 [ 489.635937] [<ffffffff88a1df98>] ? __sb_start_write+0x58/0x110 [ 489.635941] [<ffffffff88a37c00>] file_update_time+0xa0/0xf0 [ 489.635947] [<ffffffff88abad6c>] ext4_page_mkwrite+0x6c/0x470 [ 489.635956] [<ffffffff889c0a3a>] do_page_mkwrite+0x8a/0xe0 [ 489.635962] [<ffffffff889c411f>] do_wp_page+0x41f/0x710 [ 489.635971] [<ffffffff88f7893c>] ? __schedule+0x41c/0xa20 [ 489.635975] [<ffffffff889c56ad>] handle_pte_fault+0x36d/0xc30 [ 489.635982] [<ffffffff88a67264>] ? ep_send_events_proc+0x174/0x1d0 [ 489.635987] [<ffffffff889c77bd>] handle_mm_fault+0x39d/0x9b0 [ 489.635994] [<ffffffff88f805b7>] __do_page_fault+0x197/0x4f0 [ 489.635998] [<ffffffff88f80945>] do_page_fault+0x35/0x90 [ 489.636002] [<ffffffff88f7c788>] page_fault+0x28/0x30 [ 489.636006] ---[ end trace ff541faaddb03c28 ]--- [ 489.636054] Buffer I/O error on dev dm-3, logical block 0, lost sync page write [ 489.636562] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 489.636614] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 489.637674] Buffer I/O error on dev dm-1, logical block 0, lost sync page write [ 489.637680] EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal [ 489.637682] EXT4-fs (dm-1): Remounting filesystem read-only [ 489.637685] EXT4-fs (dm-1): previous I/O error to superblock detected [ 489.637709] Buffer I/O error on dev dm-1, logical block 0, lost sync page write [ 489.763482] mpt3sas_cm0: removing handle(0x000a), sas_addr(0x500304801f580c80) [ 489.763484] mpt3sas_cm0: enclosure logical id(0x500304801f580cbf), slot(0) [ 489.763486] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 489.770742] sd 0:0:1:0: [sdb] Synchronizing SCSI cache [ 489.770802] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 489.792424] mpt3sas_cm0: removing handle(0x000b), sas_addr(0x500304801f580c81) [ 489.792427] mpt3sas_cm0: enclosure logical id(0x500304801f580cbf), slot(1) [ 489.792429] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 489.795686] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 489.795730] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 489.806968] systemd-journald[1288]: /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Journal file corrupted, rotating. [ 489.807162] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.807216] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.809826] systemd-journald[1288]: Failed to write entry (11 items, 380 bytes) despite vacuuming, ignoring: Bad message [ 489.815896] systemd-journald[1288]: /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: IO error, rotating. [ 489.815946] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.815976] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.816654] systemd-journald[1288]: Failed to write entry (19 items, 484 bytes) despite vacuuming, ignoring: Input/output error [ 489.817338] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.817368] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.818018] systemd-journald[1288]: Failed to write entry (19 items, 484 bytes), ignoring: Input/output error [ 489.857345] mpt3sas_cm0: removing handle(0x000c), sas_addr(0x500304801f580c82) [ 489.857347] mpt3sas_cm0: enclosure logical id(0x500304801f580cbf), slot(2) [ 489.857349] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 489.872289] sd 0:0:3:0: [sdd] Synchronizing SCSI cache [ 489.872809] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.872844] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.873581] systemd-journald[1288]: Failed to write entry (23 items, 605 bytes), ignoring: Input/output error [ 489.873646] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.873678] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.874379] systemd-journald[1288]: Failed to write entry (9 items, 296 bytes), ignoring: Input/output error [ 489.874437] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.874467] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.875179] systemd-journald[1288]: Failed to write entry (9 items, 293 bytes), ignoring: Input/output error [ 489.875233] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.875261] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.875905] systemd-journald[1288]: Failed to write entry (9 items, 289 bytes), ignoring: Input/output error [ 489.875995] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.876025] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.876638] systemd-journald[1288]: Failed to write entry (11 items, 322 bytes), ignoring: Input/output error [ 489.877558] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.877589] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.878249] systemd-journald[1288]: Failed to write entry (20 items, 527 bytes), ignoring: Input/output error [ 489.878780] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.878799] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.879260] systemd-journald[1288]: Failed to write entry (22 items, 613 bytes), ignoring: Input/output error [ 489.879794] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.879811] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.880265] systemd-journald[1288]: Failed to write entry (18 items, 483 bytes), ignoring: Input/output error [ 489.880712] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.880729] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.881192] systemd-journald[1288]: Failed to write entry (18 items, 483 bytes), ignoring: Input/output error [ 489.881643] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.881660] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.882117] systemd-journald[1288]: Failed to write entry (18 items, 483 bytes), ignoring: Input/output error [ 489.884119] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.884139] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.884623] systemd-journald[1288]: Failed to write entry (19 items, 484 bytes), ignoring: Input/output error [ 489.885105] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/system.journal: Read-only file system [ 489.885122] systemd-journald[1288]: Failed to rotate /var/log/journal/258ea9c6be66478a9b85e74b36234621/user-42435.journal: Read-only file system [ 489.885574] systemd-journald[1288]: Failed to write entry (19 items, 484 bytes), ignoring: Input/output error [ 489.890626] sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Any ideas? Would be much obliged :). Kind regards,