Several LIO(/mdadm) issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

running into some weird issues. At first I set up a 12 disk md raid-10 (/dev/md0) and exported it with LIO with buffered fileio. It did 119MB/s (GBe full) (just 1 IP / portal).

Rebooted, had a disk being removed from the md target, trying to re-add it segfaulted md. It has some bad sectors in the first 4096 sectors region - never seen mdadm segfault on that tho'... Rebooted again, it's now degraded. Added another IP on the portal, activated multipathing, all looked well, performance creating a eager zero'd vmdk is now 8-9MB/s not 119MB/s - that's quite a difference. Tried switching paths, but that made no difference.

I'm suspecting it's no longer buffered - don't know how to verify this though. After initial creating it seems to be no longer visible in targetcli.

What's even worse is that the rebuild of mdadm dropped to 1MB/s whilst the iSCSI initiator was doing 8-9MB/s. iostat -x 2 showed the load on the disk (last value) being around 20% on average, nothing above 25-30%, one would say that would leave plenty of performance for md to at least go over 1MB/s (minimum rebuild/sync) - but it did not. Now I don't know how accurate these iostat values are, but I can tell you this does not happen that badly with IET. Not by a long shot.

Btw, I've also never seen mdadm segfault on a bad disk - until now that is, had some issues in the past too with 3.2 kernel (in combination with mdadm - haven't seen such issues in 15+ years - only when used with LIO, might be coincidence though). At that point I went back to IET and was hoping 3.5 on ubuntu, being out for ~3 months now, would have stabilized a bit.

This array is a 12 disk RAID-10 consisting of 1TB SAS drives.

On another target - which is less important to me - I see a similar drop in performance (hence I suspect buffered not being restored, can't see this in targetcli though, wanted to copy the sys config fs for the target so I could diff them after resetting it up, but cp doesn't copy as the file would have changed below it - all of em).

Anyways, figured I'd just quickly delete the backstore and recreate it. After 20 mins the delete still hangs:

/backstores/fileio> ls
o- fileio ...................................................................................................... [2 Storage Objects] o- BACKUPVOL1 ............................................................................................... [/dev/md4 activated] o- BACKUPVOL2 .................................................................................................. [/dev/md5 activated]
/backstores/fileio> delete BACKUPVOL2
^C

^C

^C^C^C^C^C^C^C^C^C^C^C^C^C^C


^C
^C
^C
^C
<remains hanging>

Although the delete still hangs - the I/O on the device immediately died!! All performance counters towards the volume just flat lined immediately.

Starting targetcli at this point from another console hangs too:

 Copyright (c) 2011 by RisingTide Systems LLC.

Visit us at http://www.risingtidesystems.com.

Using qla2xxx fabric module.
Using loopback fabric module.
Using iscsi fabric module.
<hangs>


So basically I'm left with some questions:
* How prime time ready is LIO? The vmware ready certification that some devices get with it seem to imply whole different things than what I'm seeing now. * Can I verify buffered mode is on? Synchronous iSCSI kills performance, this is well known. IIRC buffered mode on blockio has been removed, but should have returned in 3.7, did that actually happen? I'll try the 3.7 kernel with buffered blockio if it exists. I know the risks, don't bother :). * Why are there weird issues with mdadm? Like segfaults and huge sync performance drops?

This is all running on Ubuntu 12.10 server (64 bit) as I wanted/needed a somewhat recent kernel for LIO and don't really do anything else with the box anyways. Fully updated yesterday.

Will be able to test / debug some things for maybe a couple of days. Any advise is appreciated :). After that I'll need it running again which will probably mean moving back to IET.

Kind regards,
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux