Re: 5.1.21 Dell 2950 terrible swraid5 I/O performance with swraid on top of Perc 5/i raid0/jbod

Paolo Valente <paolo.valente@xxxxxxxxxx> · Mon, 19 Aug 2019 14:02:53 +0200



> Il giorno 19 ago 2019, alle ore 11:18, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
> 
> 
> 
>> Il giorno 19 ago 2019, alle ore 09:08, Marc MERLIN <marc@xxxxxxxxxxx> ha scritto:
>> 
>> (Please Cc me on replies so that I can see them more quickly)
>> 
>> Dear Block Folks,
>> 
> 
> Hi Marc,
> 
>> I just inherited a Dell 2950 with a Perc 5/i.
>> I really don't want to use that Perc 5/i card, but from all the reading
>> I did, there is no IT/unraid mode for it, so I was stuck setting the 6
>> 2TB drives as 6 independent raid0 drives using the card.
>> I wish I could just bypass the card and connect the drives directly to a
>> sata card, but the case and backplane do not seem to make this possible.
>> 
>> I'm getting very weird and effectively unusable I/O performance if
>> do I do swraid resync which is throttled at 5MB/s
>> 
>> By bad, I mean bad, see this (in more details below):
>> Timing buffered disk reads:   2 MB in 36.15 seconds =  56.65 kB/sec
>> 
>> Dear linux-raid folks,
>> 
>> I realize I have a perc 5/i card underneath I've very much like to remove,
>> but can't on that system.
>> Still, I'm hitting some quite unexpected swraid performance, including
>> a kernel warning and raid unclean shutdown on sysrq poweroff.
>> 
>> 
>> So, the 6 perc5/i raid0 drives show up in linux as 6 drives, I
>> partitioned them and created various software raid slices on top
>> (raid1, raid5 and raid6).  They work fine, but there is something very
>> wrong with a block layer somewhere. If I send a bunch of writes, the
>> IO scheduler seems to introduce terrible latency where my whole system
>> hangs for a few seconds trying to read simple binaries while from what I
>> can tell, the I/O platters spend all their time writing the backlog of
>> what's being sent.
>> 
> 
> Solving this kind of problem is one of the goals of the BFQ I/O scheduler [1].
> Have you tried?  If you want to, then start by swathing

switching, sorry

> to BFQ in both the
> physical and the virtual block devices in your stack.
> 
> Thanks,
> Paolo
> 
> [1] https://algo.ing.unimo.it/people/paolo/BFQ/
> 
>> You'll read below that somehow I have a swraid6 running on those 6 drives
>> and that seems to run at ok speed. But I have a bigger swraid5 across the
>> same 6 drives, and that runs at terrible speed right now.
>> 
>> 
>> I tried to disable the card's write cache to let linux and its 32GB of
>> RAM, do it better, but I didn't see a real improvement:
>> newmagic:~# megacli -LDSetProp -DisDskCache -L0 -a0  (0,1,2,3,4,5)
>> newmagic:~# megacli -LDGetProp -DskCache -Lall -a0
>>> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
>>> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled
>>> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disabled
>>> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disabled
>>> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disabled
>>> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disabled
>> 
>> For the raid card, I installed the last bios I could find, and here is what it says.
>>> megasas: 07.707.51.00-rc1
>>> megaraid_sas 0000:02:0e.0: PCI IRQ 78 -> rerouted to legacy IRQ 18
>>> megaraid_sas 0000:02:0e.0: FW now in Ready state
>>> megaraid_sas 0000:02:0e.0: 63 bit DMA mask and 32 bit consistent mask
>>> megaraid_sas 0000:02:0e.0: firmware supports msix	: (0)
>>> megaraid_sas 0000:02:0e.0: current msix/online cpus	: (0/4)
>>> megaraid_sas 0000:02:0e.0: RDPQ mode	: (disabled)
>>> megaraid_sas 0000:02:0e.0: controller type	: MR(256MB)
>>> megaraid_sas 0000:02:0e.0: Online Controller Reset(OCR)	: Enabled
>>> megaraid_sas 0000:02:0e.0: Secure JBOD support	: No
>>> megaraid_sas 0000:02:0e.0: NVMe passthru support	: No
>>> megaraid_sas 0000:02:0e.0: FW provided TM TaskAbort/Reset timeout	: 0 secs/0 secs
>>> megaraid_sas 0000:02:0e.0: megasas_init_mfi: fw_support_ieee=0
>>> megaraid_sas 0000:02:0e.0: INIT adapter done
>>> megaraid_sas 0000:02:0e.0: fw state:c0000000
>>> megaraid_sas 0000:02:0e.0: Jbod map is not supported megasas_setup_jbod_map 5388
>>> megaraid_sas 0000:02:0e.0: fwstate:c0000000, dis_OCR=0
>>> megaraid_sas 0000:02:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
>>> megaraid_sas 0000:02:0e.0: DCMD not supported by firmware - megasas_ld_list_query 4590
>>> megaraid_sas 0000:02:0e.0: pci id		: (0x1028)/(0x0015)/(0x1028)/(0x1f03)
>>> megaraid_sas 0000:02:0e.0: unevenspan support	: no
>>> megaraid_sas 0000:02:0e.0: firmware crash dump	: no
>>> megaraid_sas 0000:02:0e.0: jbod sync map		: no
>> 
>> I'm also only getting about 5MB/s sustained write speed, which is
>> pathetic. I have lots of servers with normal sata cards, software raid,
>> and I get 50 to 100MB/s normally.
>> I'm hoping the Perc 5/i card is not _that_ bad?  See below.
>> md0 : active raid1 sde1[4] sdb1[1] sdd1[3] sda1[0] sdc1[2] sdf1[5]
>>     975872 blocks super 1.2 [6/6] [UUUUUU]
>> md1 : active raid6 sde3[4] sdb3[1] sdd3[3] sdf3[5] sda3[0] sdc3[2]
>>     419164160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
>> 
>> md2 : active raid6 sde5[4] sdb5[1] sdf5[5] sdd5[3] sdc5[2] sda5[0]
>>     1677193216 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
>>     bitmap: 1/4 pages [4KB], 65536KB chunk
>> 
>> md3 : active raid5 sde6[4] sdb6[1] sdd6[3] sdf6[6] sdc6[2] sda6[0]
>>     7118330880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUUU_]
>>     [=>...................]  recovery =  7.7% (109702192/1423666176) finish=5790.5min speed=3781K/sec
>>     bitmap: 0/11 pages [0KB], 65536KB chunk
>> 
>> If I access drives plugged directly into the motherboard's sata port, I
>> get perfect speed. I've also added an SSD with bcache to frontload one
>> of the raid arrays that is so slow, and sure enough, it becomes usuable.
>> When my system is slow as crap due to this issue, I can do full speed
>> I/O to a different drive plugged into the motherboard's Sata chip (but due 
>> to the case, the drive is actually sitting on the motherboard, there is 
>> nowhere to mount it).
>> 
>> The main problem is all my raids are using the same 6 devices, so if
>> anything spams them with a huge queue, I/O is completely starved for the
>> other devices.
>> The terrible write performance, which on top of being bad, prevents
>> pretty much any other I/O to those drives.
>> 
>> After an unclean shutdown explained below, a resync on the same drives but the other 2 raid arrays,
>> is much faster and does not make the system unresponsive.
>> md1 : active raid6 sda3[0] sdb3[1] sdf3[5] sdc3[2] sde3[4] sdd3[3]
>>     419164160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
>>     [==================>..]  resync = 91.1% (95553272/104791040) finish=1.7min speed=86952K/sec
>> 
>> 
>> If I start the recovery or a big copy/rsync towards md2, things get so slow that other IO
>> hangs for multiple seconds or even 2mn or more sometimes. Yes, that was the stock debian 
>> kernel, but similar problems with 5.1.21:
>>> [13900.007277] INFO: task sendmail:30862 blocked for more than 120 seconds.
>>> [13900.030181]       Not tainted 4.19.0-5-amd64 #1 Debian 4.19.37-5+deb10u2
>>> [13900.053131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [13900.078495] sendmail        D    0 30862  30812 0x00000000
>>> [13900.099272] Call Trace:
>>> [13900.113941]  ? __schedule+0x2a2/0x870
>>> [13900.131022]  ? lookup_fast+0xc8/0x2e0
>>> [13900.148085]  schedule+0x28/0x80
>>> [13900.163959]  rwsem_down_write_failed+0x183/0x3a0
>>> [13900.182741]  ? inode_permission+0xbe/0x180
>>> [13900.200431]  call_rwsem_down_write_failed+0x13/0x20
>>> [13900.219731]  down_write+0x29/0x40
>>> [13900.235849]  path_openat+0x615/0x15c0
>>> [13900.252665]  ? mem_cgroup_commit_charge+0x7a/0x560
>>> [13900.271680]  do_filp_open+0x93/0x100
>>> [13900.288163]  ? __check_object_size+0x15d/0x189
>>> [13900.306276]  do_sys_open+0x186/0x210
>>> [13900.322529]  do_syscall_64+0x53/0x110
>>> [13900.338867]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [13900.358047] RIP: 0033:0x7fa715212c8b
>>> [13900.374306] Code: Bad RIP value.
>>> [13900.389850] RSP: 002b:00007ffc26ba42a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
>>> [13900.414289] RAX: ffffffffffffffda RBX: 00005584ee809978 RCX: 00007fa715212c8b
>>> [13900.437957] RDX: 00000000000000c2 RSI: 00005584ee8198f0 RDI: 00000000ffffff9c
>>> [13900.461660] RBP: 00005584ee8198f0 R08: 0000000000007fdd R09: 0000000000000000
>>> [13900.485361] R10: 00000000000001a0 R11: 0000000000000246 R12: 0000000000000000
>>> [13900.509096] R13: 0000000000000000 R14: 000000000000000a R15: 0000000000000000
>> 
>> I know I can slow down raid recovery speed, to be able to use the system I actually have to do this:
>> echo 1000 > /proc/sys/dev/raid/speed_limit_min
>> of course, at 1MB/s, it will take weeks to resync...
>> 
>> At this point, you could ask if my drives are ok speed wise, and we already have the raid6 resync
>> I showed above at over 80MB/s
>> 
>> I did some basic I/O read-write tests when the resync wasn't running:
>>> dd if=/dev/mdx of=/dev/null bs=1M count=40000
>>> f=/var/space/test; dd if=/dev/zero of=$f bs=1M count=3000 conv=fdatasync; \rm $f
>>> 
>>> dd read test: /dev/md0 419430400 bytes (419 MB, 400 MiB) copied, 3.13387 s, 134 MB/s, hdparm -t 208.18MB/s
>>> dd104857600 bytes (105 MB, 100 MiB) copied, 16.1961 s, 6.5 MB/s
>>> 
>>> /dev/md1 419430400 bytes (419 MB, 400 MiB) copied, 1.58549 s, 265 MB/s, hdparm -t 335.11MB/s
>>> 3145728000 bytes (3.1 GB, 2.9 GiB) copied, 6.51223 s, 483 MB/s
>>> 
>>> /dev/md2 419430400 bytes (419 MB, 400 MiB) copied, 1.75172 s, 239 MB/s, hdparm -t 256.08MB/s
>>> 3145728000 bytes (3.1 GB, 2.9 GiB) copied, 5.25801 s, 598 MB/s
>>> 
>>> /dev/md3 419430400 bytes (419 MB, 400 MiB) copied, 1.81613 s, 231 MB/s, hdparm -t 382.33MB/s
>> 
>> Then, when it's running at a mere 4MB/s and apparently spamming all the I/O available:
>> newmagic:~# for i in md0 md1 md2 md3; do hdparm -t /dev/$i; done
>> 
>> /dev/md0:
>> HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>> Timing buffered disk reads: 190 MB in  3.00 seconds =  63.26 MB/sec
>> 
>> /dev/md1:
>> HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>> Timing buffered disk reads:   4 MB in  3.21 seconds =   1.25 MB/sec
>> ^[
>> /dev/md2:
>> HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>> Timing buffered disk reads:   6 MB in  9.08 seconds = 676.33 kB/sec
>> 
>> /dev/md3:
>> HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>> Timing buffered disk reads:   2 MB in 36.15 seconds =  56.65 kB/sec
>> 
>> 
>> I also maybe found a bug in software raid during shutoff:
>>> [14847.171978] sysrq: SysRq : Power Off
>>> [14852.341924] WARNING: CPU: 0 PID: 2530 at drivers/md/md.c:8180 md_write_inc+0x15/0x40 [md_mod]
>>> [14852.359192] Modules linked in: fuse ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs dm_mod cpuid ipt_MASQUERADE ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_state xt_conntrack nf_log_ipv4 nf_log_common xt_LOG nft_compat nft_counter nft_chain_nat_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_chain_route_ipv4 nf_tables nfnetlink binfmt_misc ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 ipmi_ssif radeon coretemp ttm drm_kms_helper kvm drm evdev dcdbas iTCO_wdt iTCO_vendor_support serio_raw irqbypass sg pcspkr rng_core i2c_algo_bit ipmi_si i5000_edac ipmi_devintf i5k_amb ipmi_msghandler button ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash raid10 raid0 multipath linear sata_sil24 e1000e r8169 realtek libphy mii uas usb_storage
>>> [14852.502352]  raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid1 raid6_pq libcrc32c crc32c_generic hid_generic bcache crc64 usbhid md_mod hid ses enclosure sr_mod scsi_transport_sas cdrom sd_mod ata_generic uhci_hcd ehci_pci ehci_hcd ata_piix libata psmouse lpc_ich megaraid_sas usbcore scsi_mod usb_common bnx2
>>> [14852.562340] CPU: 0 PID: 2530 Comm: sendmail Not tainted 4.19.0-5-amd64 #1 Debian 4.19.37-5+deb10u2
>>> [14852.580463] Hardware name: Dell Inc. PowerEdge 2950/0DT021, BIOS 2.7.0 10/30/2010
>>> [14852.595607] RIP: 0010:md_write_inc+0x15/0x40 [md_mod]
>>> [14852.605820] Code: ff e8 9f 54 32 f3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 66 66 90 f6 46 10 01 74 1b 8b 97 c4 01 00 00 85 d2 74 12 <0f> 0b 48 8b 87 e0 02 00 00 a8 03 75 0e 65 48 ff 00 c3 8b 47 40 85
>>> [14852.643807] RSP: 0000:ffffb1c287767ac0 EFLAGS: 00010002
>>> [14852.654378] RAX: ffff9615c93a4cf8 RBX: ffff9615c93a4910 RCX: 0000000000000001
>>> [14852.668807] RDX: 0000000000000001 RSI: ffff96162aa17f00 RDI: ffff961625000000
>>> [14852.683235] RBP: ffff9615c93a4978 R08: 0000000000000000 R09: ffff961624c3a918
>>> [14852.697661] R10: 0000000000000000 R11: ffff961625a1f800 R12: 0000000000000001
>>> [14852.712089] R13: 0000000000000001 R14: ffff961623b6e000 R15: ffff96162aa17f00
>>> [14852.726518] FS:  00007f2ca54d3f40(0000) GS:ffff96162fa00000(0000) knlGS:0000000000000000
>>> [14852.742891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [14852.754505] CR2: 00007f8e63e501c0 CR3: 000000031fb6e000 CR4: 00000000000006f0
>>> [14852.768931] Call Trace:
>>> [14852.773891]  add_stripe_bio+0x205/0x7c0 [raid456]
>>> [14852.783405]  raid5_make_request+0x1bd/0xb60 [raid456]
>>> [14852.793619]  ? finish_wait+0x80/0x80
>>> [14852.800851]  ? finish_wait+0x80/0x80
>>> [14852.808093]  md_handle_request+0x119/0x190 [md_mod]
>>> [14852.817964]  md_make_request+0x78/0x160 [md_mod]
>>> [14852.827311]  generic_make_request+0x1a4/0x410
>>> [14852.836116]  submit_bio+0x45/0x140
>>> [14852.842991]  ? guard_bio_eod+0x32/0x100
>>> [14852.850747]  submit_bh_wbc+0x163/0x190
>>> [14852.858377]  write_all_supers+0x22f/0xa60 [btrfs]
>>> [14852.867905]  btrfs_commit_transaction+0x581/0x870 [btrfs]
>>> [14852.878819]  ? finish_wait+0x80/0x80
>>> [14852.886071]  btrfs_sync_file+0x380/0x3d0 [btrfs]
>>> [14852.895415]  do_fsync+0x38/0x70
>>> [14852.901764]  __x64_sys_fsync+0x10/0x20
>>> [14852.909342]  do_syscall_64+0x53/0x110
>>> [14852.916742]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [14852.926952] RIP: 0033:0x7f2ca6944a71
>>> [14852.934185] Code: 6d a5 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 8b 05 da e9 00 00 85 c0 75 16 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10
>>> [14852.972172] RSP: 002b:00007fffe32a0368 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
>>> [14852.987483] RAX: ffffffffffffffda RBX: 000056297ca540d0 RCX: 00007f2ca6944a71
>>> [14853.001908] RDX: 0000000000000000 RSI: 000056297ca541b0 RDI: 0000000000000004
>>> [14853.016334] RBP: 00000000000001d7 R08: 000056297ca541b0 R09: 00007f2ca54d3f40
>>> [14853.030760] R10: 7541203831202c6e R11: 0000000000000246 R12: 000056297bfbe369
>>> [14853.045189] R13: 00007fffe32a03b0 R14: 000000000000000a R15: 0000000000000000
>>> [14853.059617] ---[ end trace 407005be9d52ae9f ]---
>>> [14854.715315] md: md3: recovery interrupted.
>>> [14877.083807] bcache: bcache_reboot() Stopping all devices:
>>> [14879.097334] bcache: bcache_reboot() Timeout waiting for devices to be closed
>>> [14879.111948] sd 4:0:0:0: [sdh] Synchronizing SCSI cache
>>> [14879.122617] sd 4:0:0:0: [sdh] Stopping disk
>>> [14879.615609] sd 3:0:0:0: [sdg] Synchronizing SCSI cache
>>> [14879.626667] sd 3:0:0:0: [sdg] Stopping disk
>>> [14881.520158] sd 0:2:2:0: [sdc] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.538216] sd 0:2:2:0: [sdc] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14881.553614] print_req_error: I/O error, dev sdc, sector 320282600
>>> [14881.566001] sd 0:2:4:0: [sde] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.583982] sd 0:2:4:0: [sde] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14881.599303] print_req_error: I/O error, dev sde, sector 320282600
>>> [14881.611638] sd 0:2:5:0: [sdf] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.629587] sd 0:2:5:0: [sdf] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14881.661536] print_req_error: I/O error, dev sdf, sector 320282600
>>> [14881.690648] sd 0:2:5:0: [sdf] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.725455] sd 0:2:5:0: [sdf] tag#684 CDB: Write(10) 2a 00 13 17 20 00 00 02 80 00
>>> [14881.757640] print_req_error: I/O error, dev sdf, sector 320282624
>>> [14881.786840] sd 0:2:3:0: [sdd] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.821202] sd 0:2:3:0: [sdd] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14881.852497] print_req_error: I/O error, dev sdd, sector 320282600
>>> [14881.880392] sd 0:2:0:0: [sda] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14881.913429] sd 0:2:0:0: [sda] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14881.943303] print_req_error: I/O error, dev sda, sector 320282600
>>> [14881.969675] sd 0:2:1:0: [sdb] tag#684 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14882.001626] sd 0:2:1:0: [sdb] tag#684 CDB: Write(10) 2a 00 13 17 1f e8 00 00 08 00
>>> [14882.030904] print_req_error: I/O error, dev sdb, sector 320282600
>>> [14882.057411] sd 0:2:4:0: [sde] tag#299 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14882.088845] sd 0:2:4:0: [sde] tag#299 CDB: Write(10) 2a 00 13 17 20 00 00 02 80 00
>>> [14882.117051] print_req_error: I/O error, dev sde, sector 320282624
>>> [14882.142352] sd 0:2:5:0: [sdf] tag#299 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14882.142430] sd 0:2:4:0: [sde] tag#300 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [14882.173313] sd 0:2:5:0: [sdf] tag#299 CDB: Write(10) 2a 00 13 17 22 80 00 01 80 00
>>> [14882.173315] print_req_error: I/O error, dev sdf, sector 320283264
>>> [14882.257818] sd 0:2:4:0: [sde] tag#300 CDB: Write(10) 2a 00 13 17 22 80 00 01 80 00
>>> [14882.286196] print_req_error: I/O error, dev sde, sector 320283264
>>> [14882.372678] md: super_written gets error=10
>>> [14882.394226] md/raid:md2: Disk failure on sdc5, disabling device.
>>> [14882.394226] md/raid:md2: Operation continuing on 5 devices.
>>> [14882.396634] md: super_written gets error=10
>>> [14882.443706] md: super_written gets error=10
>>> [14882.465231] md/raid:md2: Disk failure on sde5, disabling device.
>>> [14882.465231] md/raid:md2: Operation continuing on 4 devices.
>>> [14885.396071] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.423090] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.450404] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.476946] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.503344] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.530389] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.563027] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.589494] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.615995] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.642142] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.667968] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.693224] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.717937] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.743191] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.767407] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14885.792214] bcache: bch_count_backing_io_errors() md2: IO error on backing device, unrecoverable
>>> [14890.416424] btrfs_dev_stat_print_on_error: 1409 callbacks suppressed
>>> [14890.416429] BTRFS error (device bcache0): bdev /dev/bcache0 errs: wr 1417, rd 0, flush 0, corrupt 0, gen 0
>>> [14890.460838] BTRFS error (device bcache0): bdev /dev/bcache0 errs: wr 1418, rd 0, flush 0, corrupt 0, gen 0
>>> [14890.486347] BTRFS error (device bcache0): bdev /dev/bcache0 errs: wr 1419, rd 0, flush 0, corrupt 0, gen 0
>>> [14890.511308] BTRFS error (device bcache0): bdev /dev/bcache0 errs: wr 1420, rd 0, flush 0, corrupt 0, gen 0
>>> [14890.536129] Emergency Sync complete
>>> [14891.398791] ACPI: Preparing to enter system sleep state S5
>>> [14891.460410] reboot: Power down
>>> [14891.471830] acpi_power_off called
>> 
>> 
>> megacli -LdPdInfo -a0  output for the first drive below.  
>>> Number of Virtual Disks: 6
>>> Virtual Drive: 0 (Target Id: 0)
>>> Name                :0
>>> RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
>>> Size                : 1.818 TB
>>> Sector Size         : 512
>>> Parity Size         : 0
>>> State               : Optimal
>>> Strip Size          : 64 KB
>>> Number Of Drives    : 1
>>> Span Depth          : 1
>>> Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
>>> Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
>>> Default Access Policy: Read/Write
>>> Current Access Policy: Read/Write
>>> Disk Cache Policy   : Disabled
>>> Encryption Type     : None
>>> Is VD Cached: No
>>> Number of Spans: 1
>>> Span: 0 - Number of PDs: 1
>>> 
>>> PD: 0 Information
>>> Enclosure Device ID: 8
>>> Slot Number: 0
>>> Drive's position: DiskGroup: 0, Span: 0, Arm: 0
>>> Enclosure position: N/A
>>> Device Id: 0
>>> WWN: 
>>> Sequence Number: 2
>>> Media Error Count: 0
>>> Other Error Count: 1
>>> Predictive Failure Count: 0
>>> Last Predictive Failure Event Seq Number: 0
>>> PD Type: SATA
>>> 
>>> Raw Size: 1.819 TB [0xe8e088b0 Sectors]
>>> Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
>>> Coerced Size: 1.818 TB [0xe8d00000 Sectors]
>>> Sector Size:  0
>>> Firmware state: Online, Spun Up
>>> Device Firmware Level: AB50
>>> Shield Counter: 0
>>> Successful diagnostics completion on :  N/A
>>> SAS Address(0):
>>> 0x1221000000000000
>>> Connected Port Number: 0 
>>> Inquiry Data:      WD-WMAZA0374092WDC WD20EARS-00MVWB0                    50.0AB50
>>> FDE Capable: Not Capable
>>> FDE Enable: Disable
>>> Secured: Unsecured
>>> Locked: Unlocked
>>> Needs EKM Attention: No
>>> Foreign State: None 
>>> Device Speed: Unknown 
>>> Link Speed: Unknown 
>>> Media Type: Hard Disk Device
>>> Drive Temperature : N/A
>>> PI Eligibility:  No 
>>> Drive is formatted for PI information:  No
>>> PI: No PI
>>> Port-0 :
>>> Port status: Active
>>> Port's Linkspeed: Unknown 
>>> Drive has flagged a S.M.A.R.T alert : No
>> 
>> Thanks,
>> Marc
>> -- 
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> Microsoft is to operating systems ....
>>                                     .... what McDonalds is to gourmet cooking
>> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08