Re: [PATCH] MVSAS: hot plug handling and IO issues

Audio Haven <audiohaven@xxxxxxxxx> · Fri, 5 Mar 2010 09:57:27 +0100

Hello all mvsas testers ...

Some good test progress related to mvsas & the Srinivas patch:

I have dumped 10TB of data on a 10.5TB net softraid RAID6 (9x1.5TB)
volume while the initial resync was running. 8 drives were connected
to an mvsas controller, 1 to sata_nv. No XFS corruption, no drives
kicked out of raid. The raid synced without problems.

I'm repeating the experiment with a RAID6 based on 12 1.5T drives: 8
on mvsas, 4 on sata_nv. All of the mvsas attached drives remain
stable. All devices connected to mvsas except for device[4] experience
one of more mvs_I_T_nexus_reset :

drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[0]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[2]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[1]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[7]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[3]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[2]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[7]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[6]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[3]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[3]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[2]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[3]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[3]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[1]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[7]:rc= 0
drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[5]:rc= 0

time window = +- 1 day
This does not seem to affect raid, all drives are still in the set
after 3TB of copying. Are the above warnings something to worry about
?

On Thu, Feb 25, 2010 at 2:35 PM, Audio Haven <audiohaven@xxxxxxxxx> wrote:
> Hello,
>
> I have similar findings related to the Srinivas patch.
>
> 1) with the patch from Srinivas applied to 2.6.32.7, I cannot get my
> raid6 on it's knees (yet). Using 8 drives on the marvell controller, 1
> drive on onboard sata_nv .
>
> Created raid6:
> mdadm --create /dev/md2 --verbose --level=6 --chunk=1024
> --raid-devices=9 /dev/sd[bcdefghij]1
>
> XFS on top
> mkfs.xfs -f -d su=1m,sw=7 /dev/md2
>
> During the first raid resync, I'm also dumping 2TB of data on this
> 11TB xfs volume. It no longer drops drives. Currently copied 1.4T
> without glitches.
>
> So if I can fill my 11TB volume with data, and no drives are ever
> kicked out, and xfs does not get corrupt, this patch is a huge
> improvement. But this will take some more days to fill up. I'll report
> the status when done.
>
> Thanks !
>
> On Tue, Feb 23, 2010 at 11:11 AM, Caspar Smit <c.smit@xxxxxxxxxx> wrote:
>> Hi Srinivas,
>>
>> I finally had some time to test your new patch.
>>
>> 1) After numerous hotplug actions with SAS and SATA disks I still can't
>> get any kernel panic to occur :)
>>
>> 2) I can finally boot a system with 3x 6480 controllers loaded with SATA
>> disks without a kernel panic.
>>
>> 3) Raid5/6 initialization completes without dropping the disks one after
>> another.
>>
>> 4) One thing that occured was the following: during a raid1 initialization
>> of 2 SAS disks and a raid5 init of 8x SSD's i got a call trace by
>> libata-core.c (see attachment for details). The system continued to work
>> fine after the trace.
>>
>> Great work, this is a much more stable driver now!
>>
>> Kind regards,
>> Caspar Smit
>>
>>> On Wed, Feb 17, 2010 at 12:53 PM, Srinivas Naga Venkatasatya
>>> Pasagadugula - ERS, HCL Tech <satyasrinivasp@xxxxxx> wrote:
>>>> Hi Smit,
>>>>
>>>> This patch is not exactly replaced with Nov-09 patches.
>>>> My patch addresses the RAID5/6 issues also. Below issues are addressed
>>>> by my patch.
>>>> 1. Tape issues.
>>>> 2. RAID-5/6 I/O fails.
>>>> 3. LVM IO fails and subsequent init 6 hang (connect SAS+SATA in cascaded
>>>>        expanders, crate volume group and logical volumes, run file I/O
>>>>       (alltest), unplug one drive)
>>>> 4. Disk stress I/O on 4096 sector size.
>>>> 5. Hot insertion of drives giving panic.
>>>> 6. 'fdisk -l' hangs with hot plugging of SATA/SAS drives in expander
>>>> while      IO (Diskstress and alltest) is going on and IO stopped.
>>>>
>>>> I can't combined my patch with November-09 patches. James also rejected
>>>> those patches as those are not proper. Let me know if you have issues
>>>> with my patch.
>>>>
>>>> --Srini.
>>>
>>>
>>> I haven't tested yet, but looks like you're doing excellent work, and
>>> your documentation/overview of the work is superb.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html