On Tue, Mar 2, 2010 at 1:31 AM, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote: > On Mon, Mar 1, 2010 at 11:24 PM, Srinivas Naga Venkatasatya > Pasagadugula - ERS, HCL Tech <satyasrinivasp@xxxxxx> wrote: >> Hi, >> >> Did you tried with latest mvsas patch submitted by me? If not try with attached patch and let me know if you have any issues. >> Note: Apply this patch on 2.6.32 kernels. > > Thanks for the suggestion. Graham Reed also suggested I try your > patch. I glanced at the patch -- it seems fairly similar to patch 6/7 > from Andy Yan from Nov 2009, although with some differences. I will > try both your patch and also Andy Yan's patch series (separately) and > see how it goes. I've tried both patches 1-6 from Andy Yan, as well as your patch, Srinivas, and so far -- no go. Andy's patches actually lasted through 1.5 runs of my "dd" test (i.e. dd'ing both ways from all but one of the drives), yours only made it ~10% of the way through the first run. (For all I know it's some random condition that triggers it, so on a "luckier" run the two patchsets may have had the opposite success levels). Anyways, there's no notable stability improvement from either set of patches. <sorry about linewrapping nastiness below...> Here are the errors with Srinivas's patch running: Mar 2 20:12:32 172.16.0.35 [ 4634.237122] sas: command 0xffff880061ad5900, task 0xffff880100217cc0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:32 172.16.0.35 [ 4634.237629] sas: command 0xffff88033c519f00, task 0xffff88033d214b40, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:32 172.16.0.35 [ 4634.238097] sas: command 0xffff88033c519c00, task 0xffff88033d2158c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:32 172.16.0.35 [ 4634.238565] sas: command 0xffff880061ad5100, task 0xffff880100216ac0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.230715] sas: command 0xffff8803bb2ee200, task 0xffff8803511ce1c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.231212] sas: command 0xffff880061ad4600, task 0xffff880351093840, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.231694] sas: command 0xffff880061ad5700, task 0xffff880351092f40, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.232173] sas: command 0xffff880061ad4900, task 0xffff8803510906c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.232715] sas: command 0xffff88033e5ad700, task 0xffff880351091440, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.233186] sas: command 0xffff88033e5adb00, task 0xffff880351090d80, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.233723] sas: command 0xffff88033d0e3f00, task 0xffff880351090fc0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.233726] sas: command 0xffff88033d0e3600, task 0xffff880351091b00, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.242725] sas: command 0xffff88033d0e3200, task 0xffff880351093600, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.243223] sas: command 0xffff88033d675100, task 0xffff88033d00cb40, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.243703] sas: command 0xffff880061ad4000, task 0xffff880351090240, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.244187] sas: command 0xffff880061ad4e00, task 0xffff880351091f80, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.254662] sas: command 0xffff88033d03c800, task 0xffff88033d7ba640, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.255135] sas: command 0xffff88033e5adc00, task 0xffff880351091200, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.257661] sas: command 0xffff880061ad5000, task 0xffff880351092d00, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.258134] sas: command 0xffff880061ad5400, task 0xffff880351090900, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.263541] sas: command 0xffff880061ad4200, task 0xffff880351093180, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.264012] sas: command 0xffff88033d0cb600, task 0xffff88033d24aac0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.264473] sas: command 0xffff88033d0ca800, task 0xffff88033d24a1c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.264930] sas: command 0xffff880061ad5800, task 0xffff880351093a80, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.265402] sas: command 0xffff88033d0e2e00, task 0xffff880351091d40, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.265862] sas: command 0xffff88033d0e2a00, task 0xffff880351091680, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.266347] sas: command 0xffff88033d0e3400, task 0xffff880351090480, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.266617] sas: command 0xffff88033d0e2700, task 0xffff8803510921c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.266620] sas: command 0xffff88033e5ad800, task 0xffff880351092ac0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.266623] sas: command 0xffff880061ad5600, task 0xffff880351090b40, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.266626] sas: command 0xffff880061ad5300, task 0xffff8803510918c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.268789] sas: command 0xffff8803bb2eed00, task 0xffff8803511cc000, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.269272] sas: command 0xffff8803bb2eec00, task 0xffff8803511cd8c0, timed out: BLK_EH_NOT_HA NDLED Mar 2 20:12:35 172.16.0.35 [ 4637.269769] sas: Enter sas_scsi_recover_host Mar 2 20:12:35 172.16.0.35 [ 4637.270036] sas: trying to find task 0xffff880100217cc0 Mar 2 20:12:35 172.16.0.35 [ 4637.270309] sas: sas_scsi_find_task: aborting task 0xffff880100217cc0 Mar 2 20:12:35 172.16.0.35 [ 4637.270314] sas: sas_scsi_find_task: querying task 0xffff880100217cc0 Mar 2 20:12:35 172.16.0.35 [ 4637.270316] drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5 Mar 2 20:12:35 172.16.0.35 [ 4637.270318] sas: sas_scsi_find_task: task 0xffff880100217cc0 failed to abort Mar 2 20:12:35 172.16.0.35 [ 4637.270319] sas: task 0xffff880100217cc0 is not at LU: I_T recover Mar 2 20:12:35 172.16.0.35 [ 4637.270321] sas: I_T nexus reset for dev 5001b4d5021b401b Mar 2 20:12:36 172.16.0.35 [ 4638.268757] sas: broadcast received: 0 Mar 2 20:12:36 172.16.0.35 [ 4638.269022] sas: REVALIDATING DOMAIN on port 0, pid:1110 Mar 2 20:12:39 172.16.0.35 [ 4641.279345] sas: Expander phy change count has changed Mar 2 20:12:39 172.16.0.35 [ 4641.297198] sas: ex 5001b4d5021b403f phy27 originated BROADCAST(CHANGE) Mar 2 20:12:39 172.16.0.35 [ 4641.297835] sas: ex 5001b4d5021b403f phy 0x1b broadcast flutter Mar 2 20:12:39 172.16.0.35 [ 4641.298473] sas: ex 5001b4d5021b403f phy27:T attached: 5001b4d5021b401b Mar 2 20:12:39 172.16.0.35 [ 4641.299743] sas: done REVALIDATING DOMAIN on port 0, pid:1110, res 0x0 Mar 2 20:12:39 172.16.0.35 [ 4641.300027] sas: broadcast received: 0 Mar 2 20:12:39 172.16.0.35 [ 4641.300300] sas: broadcast received: 0 Mar 2 20:12:39 172.16.0.35 [ 4641.300570] sas: broadcast received: 0 Mar 2 20:12:39 172.16.0.35 [ 4641.300832] sas: REVALIDATING DOMAIN on port 0, pid:1110 Mar 2 20:12:39 172.16.0.35 [ 4641.301193] sas: done REVALIDATING DOMAIN on port 0, pid:1110, res 0x0 Mar 2 20:12:41 172.16.0.35 [ 4643.258099] drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[c]:rc= 0 Mar 2 20:12:41 172.16.0.35 [ 4643.258581] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [19] tag[19], task [ffff88033d214b40 ]: Mar 2 20:12:41 172.16.0.35 [ 4643.259058] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [1a] tag[1a], task [ffff88033d2158c0 ]: Mar 2 20:12:41 172.16.0.35 [ 4643.259530] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [12] tag[12], task [ffff880100216ac0 ]: Mar 2 20:12:41 172.16.0.35 [ 4643.260014] sas: I_T 5001b4d5021b401b recovered Mar 2 20:12:41 172.16.0.35 [ 4643.260277] sas: sas_ata_task_done: SAS error 8d Mar 2 20:12:41 172.16.0.35 [ 4643.260539] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 20:12:41 172.16.0.35 [ 4643.261011] ata12: status=0x01 { Mar 2 20:12:41 172.16.0.35 Error Mar 2 20:12:41 172.16.0.35 } Mar 2 20:12:41 172.16.0.35 [ 4643.261354] ata12: error=0x04 { Mar 2 20:12:41 172.16.0.35 DriveStatusError Mar 2 20:12:41 172.16.0.35 } Mar 2 20:12:41 172.16.0.35 [ 4643.261703] sas: sas_ata_task_done: SAS error 8d Mar 2 20:12:41 172.16.0.35 [ 4643.261961] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 20:12:41 172.16.0.35 [ 4643.262437] ata12: status=0x01 { Mar 2 20:12:41 172.16.0.35 Error Mar 2 20:12:41 172.16.0.35 } Mar 2 20:12:41 172.16.0.35 [ 4643.262782] ata12: error=0x04 { Mar 2 20:12:41 172.16.0.35 DriveStatusError Mar 2 20:12:41 172.16.0.35 } Mar 2 20:12:41 172.16.0.35 [ 4643.263131] sas: sas_ata_task_done: SAS error 8d ... this goes on for a while ... Mar 2 20:13:11 172.16.0.35 [ 4673.276562] sas: trying to find task 0xffff880351091d40 Mar 2 20:13:11 172.16.0.35 [ 4673.276835] sas: sas_scsi_find_task: aborting task 0xffff880351091d40 Mar 2 20:13:11 172.16.0.35 [ 4673.277113] sas: sas_scsi_find_task: querying task 0xffff880351091d40 Mar 2 20:13:11 172.16.0.35 [ 4673.277381] drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5 Mar 2 20:13:11 172.16.0.35 [ 4673.277661] sas: sas_scsi_find_task: task 0xffff880351091d40 failed to abort Mar 2 20:13:11 172.16.0.35 [ 4673.277932] sas: task 0xffff880351091d40 is not at LU: I_T recover Mar 2 20:13:11 172.16.0.35 [ 4673.278218] sas: I_T nexus reset for dev 5001b4d5021b401a Mar 2 20:13:15 172.16.0.35 [ 4677.270363] drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[b]:rc= 0 Mar 2 20:13:15 172.16.0.35 [ 4677.270910] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [5] tag[5], task [ffff880351091680]: Mar 2 20:13:15 172.16.0.35 [ 4677.271392] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [11] tag[11], task [ffff880351090480 ]: Mar 2 20:13:15 172.16.0.35 [ 4677.271920] sas: I_T 5001b4d5021b401a recovered Mar 2 20:13:15 172.16.0.35 [ 4677.272189] sas: sas_ata_task_done: SAS error 8d Mar 2 20:13:15 172.16.0.35 [ 4677.272453] ata11: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 20:13:15 172.16.0.35 [ 4677.272834] sas: ex 5001b4d5021b403f phy9 originated BROADCAST(CHANGE) Mar 2 20:13:15 172.16.0.35 [ 4677.273191] ata11: status=0x01 { Mar 2 20:13:15 172.16.0.35 Error Mar 2 20:13:15 172.16.0.35 } Mar 2 20:13:15 172.16.0.35 [ 4677.273475] sas: ex 5001b4d5021b403f phy 0x9 broadcast flutter Mar 2 20:13:15 172.16.0.35 [ 4677.273793] ata11: error=0x04 { Mar 2 20:13:15 172.16.0.35 DriveStatusError Mar 2 20:13:15 172.16.0.35 } ... Mar 2 20:13:23 172.16.0.35 [ 4685.286128] sas: ex 5001b4d5021b403f phy8 originated BROADCAST(CHANGE) Mar 2 20:13:23 172.16.0.35 [ 4685.286766] sas: ex 5001b4d5021b403f phy 0x8 broadcast flutter Mar 2 20:13:23 172.16.0.35 [ 4685.287406] sas: ex 5001b4d5021b403f phy08:D attached: 5001b4d5021b4008 Mar 2 20:13:23 172.16.0.35 [ 4685.298244] sas: ex 5001b4d5021b403f phy25 originated BROADCAST(CHANGE) Mar 2 20:13:23 172.16.0.35 [ 4685.298882] sas: ex 5001b4d5021b403f phy 0x19 broadcast flutter Mar 2 20:13:23 172.16.0.35 [ 4685.299521] sas: ex 5001b4d5021b403f phy25:T attached: 5001b4d5021b4019 Mar 2 20:13:23 172.16.0.35 [ 4685.302066] sas: done REVALIDATING DOMAIN on port 0, pid:1110, res 0x0 Mar 2 20:13:23 172.16.0.35 [ 4685.302070] sas: broadcast received: 0 Mar 2 20:13:23 172.16.0.35 [ 4685.302076] sas: broadcast received: 0 Mar 2 20:13:23 172.16.0.35 [ 4685.302079] sas: broadcast received: 0 Mar 2 20:13:23 172.16.0.35 [ 4685.302083] sas: broadcast received: 0 Mar 2 20:13:23 172.16.0.35 [ 4685.302086] sas: REVALIDATING DOMAIN on port 0, pid:1110 Mar 2 20:13:23 172.16.0.35 [ 4685.329480] sas: Expander phys DID NOT change Mar 2 20:13:23 172.16.0.35 [ 4685.329742] sas: done REVALIDATING DOMAIN on port 0, pid:1110, res 0x0 Mar 2 20:13:54 172.16.0.35 [ 4716.104914] sas: command 0xffff880061ad5100, task 0xffff88033d00fa80, timed out: BLK_EH_NOT_HANDLED Mar 2 20:13:54 172.16.0.35 [ 4716.105423] sas: command 0xffff88033c519c00, task 0xffff88033d00f840, timed out: BLK_EH_NOT_HANDLED Mar 2 20:13:54 172.16.0.35 [ 4716.105945] sas: command 0xffff88033c519f00, task 0xffff88033d00e1c0, timed out: BLK_EH_NOT_HANDLED Mar 2 20:13:54 172.16.0.35 [ 4716.106410] sas: command 0xffff880061ad5900, task 0xffff8803510918c0, timed out: BLK_EH_NOT_HANDLED Mar 2 20:13:57 172.16.0.35 [ 4719.068583] sas: command 0xffff8803bb2efd00, task 0xffff8803511cd440, timed out: BLK_EH_NOT_HANDLED Mar 2 20:13:57 172.16.0.35 [ 4719.069074] sas: command 0xffff8803bb2ef900, task 0xffff8803511cf3c0, timed out: BLK_EH_NOT_HA ... Mar 2 20:13:57 172.16.0.35 [ 4719.103355] sas: Enter sas_scsi_recover_host Mar 2 20:13:57 172.16.0.35 [ 4719.103647] sas: trying to find task 0xffff88033d00fa80 Mar 2 20:13:57 172.16.0.35 [ 4719.103929] sas: sas_scsi_find_task: aborting task 0xffff88033d00fa80 Mar 2 20:13:57 172.16.0.35 [ 4719.104220] sas: sas_scsi_find_task: querying task 0xffff88033d00fa80 Mar 2 20:13:57 172.16.0.35 [ 4719.104514] drivers/scsi/mvsas/mv_sas.c 1630:mvs_query_task:rc= 5 Mar 2 20:13:57 172.16.0.35 [ 4719.104789] sas: sas_scsi_find_task: task 0xffff88033d00fa80 failed to abort Mar 2 20:13:57 172.16.0.35 [ 4719.105080] sas: task 0xffff88033d00fa80 is not at LU: I_T recover Mar 2 20:13:57 172.16.0.35 [ 4719.105366] sas: I_T nexus reset for dev 5001b4d5021b401b Mar 2 20:14:01 172.16.0.35 [ 4723.097210] drivers/scsi/mvsas/mv_sas.c 1584:mvs_I_T_nexus_reset for device[c]:rc= 0 Mar 2 20:14:01 172.16.0.35 [ 4723.097399] sas: broadcast received: 0 Mar 2 20:14:01 172.16.0.35 [ 4723.097405] sas: broadcast received: 0 Mar 2 20:14:01 172.16.0.35 [ 4723.097408] sas: broadcast received: 0 Mar 2 20:14:01 172.16.0.35 [ 4723.097411] sas: broadcast received: 0 Mar 2 20:14:01 172.16.0.35 [ 4723.097414] sas: REVALIDATING DOMAIN on port 0, pid:1110 Mar 2 20:14:01 172.16.0.35 [ 4723.099314] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [1d] tag[1d], task [ffff88033d00f840]: Mar 2 20:14:01 172.16.0.35 [ 4723.099820] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [1e] tag[1e], task [ffff88033d00e1c0]: Mar 2 20:14:01 172.16.0.35 [ 4723.100281] drivers/scsi/mvsas/mv_sas.c 1968:Release slot [20] tag[20], task [ffff8803510918c0]: Mar 2 20:14:01 172.16.0.35 [ 4723.100754] sas: I_T 5001b4d5021b401b recovered Mar 2 20:14:01 172.16.0.35 [ 4723.101016] sas: sas_ata_task_done: SAS error 8d Mar 2 20:14:01 172.16.0.35 [ 4723.101301] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 20:14:01 172.16.0.35 [ 4723.101801] ata12: status=0x01 { Mar 2 20:14:01 172.16.0.35 Error Mar 2 20:14:01 172.16.0.35 } Mar 2 20:14:01 172.16.0.35 [ 4723.102234] ata12: error=0x04 { Mar 2 20:14:01 172.16.0.35 DriveStatusError Mar 2 20:14:01 172.16.0.35 } and so on -- I let it run for ~5 mins before giving up. The system was basically frozen by this point -- ssh'ing in/etc didn't work, existing terminals didn't respond. Errors from the run with Andy Yan's patches applied: Mar 2 18:39:53 172.16.0.35 [44315.684919] sas: command 0xffff88006e9a1d00, task 0xffff88006121e840, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:53 172.16.0.35 [44315.685391] sas: command 0xffff88006e9a0100, task 0xffff88006121d180, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:53 172.16.0.35 [44315.685903] sas: command 0xffff8803dd844500, task 0xffff88063d1a6d80, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:53 172.16.0.35 [44315.686415] sas: command 0xffff8803dd844800, task 0xffff88063d1a72c0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.672530] sas: command 0xffff8802cd05fb00, task 0xffff88006121e4c0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.672996] sas: command 0xffff8802cd05f400, task 0xffff88006121cfc0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.673523] sas: command 0xffff8802cd05ec00, task 0xffff88006121e300, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.674049] sas: command 0xffff8802cd05ee00, task 0xffff88006121c700, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.674575] sas: command 0xffff88006e9a1c00, task 0xffff8801003064c0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.675110] sas: command 0xffff88006e9a1700, task 0xffff880100307640, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.675568] sas: command 0xffff88006e9a0200, task 0xffff880100305500, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.675573] sas: command 0xffff88063e4a3100, task 0xffff880627db4380, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.675575] sas: command 0xffff88063e4a2000, task 0xffff880627db6840, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.677270] sas: command 0xffff88006e9a0d00, task 0xffff880100305180, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.677804] sas: command 0xffff88006e9a1b00, task 0xffff880100307b80, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678332] sas: command 0xffff88006e9a0000, task 0xffff880100304a80, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678512] sas: command 0xffff8802cd05ff00, task 0xffff88006121d340, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678514] sas: command 0xffff8802cd05f500, task 0xffff88006121e680, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678516] sas: command 0xffff8802cd05ea00, task 0xffff88006121ed80, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678519] sas: command 0xffff88006e9a1100, task 0xffff880100305340, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678521] sas: command 0xffff88006e9a0f00, task 0xffff8801003056c0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.678524] sas: command 0xffff88021396bc00, task 0xffff88033cf52680, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.681997] sas: command 0xffff8802cd05f000, task 0xffff88006121e140, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.682516] sas: command 0xffff8802cd05fe00, task 0xffff88006121c000, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.683036] sas: command 0xffff8802cd05e500, task 0xffff88006121f100, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.683635] sas: command 0xffff8802cd05f700, task 0xffff88006121f480, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.684166] sas: command 0xffff8802cd05e900, task 0xffff88006121fb80, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.684695] sas: command 0xffff8802cd05f100, task 0xffff88006121ddc0, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.687693] sas: command 0xffff88006e9a0a00, task 0xffff880100305a40, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.688157] sas: command 0xffff88006e9a0c00, task 0xffff880100304380, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.714441] sas: command 0xffff88063e4a3600, task 0xffff880627db4700, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.714904] sas: command 0xffff8802cd05e400, task 0xffff88006121f800, timed out: BLK_EH_NOT_HANDLED Mar 2 18:39:56 172.16.0.35 [44318.715545] sas: Enter sas_scsi_recover_host Mar 2 18:39:56 172.16.0.35 [44318.715798] sas: trying to find task 0xffff88006121e840 Mar 2 18:39:56 172.16.0.35 [44318.716051] sas: sas_scsi_find_task: aborting task 0xffff88006121e840 Mar 2 18:39:56 172.16.0.35 [44318.716305] mvs_abort_task() mvi=ffff88063da00000 task=ffff88006121e840 slot=ffff88063da244b8 slot_idx=x0 Mar 2 18:39:56 172.16.0.35 [44318.716786] sas: sas_scsi_find_task: task 0xffff88006121e840 is aborted Mar 2 18:39:56 172.16.0.35 [44318.721983] sas: sas_eh_handle_sas_errors: task 0xffff88006121e840 is aborted Mar 2 18:39:56 172.16.0.35 [44318.722240] sas: trying to find task 0xffff88006121d180 Mar 2 18:39:56 172.16.0.35 [44318.722492] sas: sas_scsi_find_task: aborting task 0xffff88006121d180 Mar 2 18:39:56 172.16.0.35 [44318.722745] mvs_abort_task() mvi=ffff88063da00000 task=ffff88006121d180 slot=ffff88063da24bf0 slot_idx=x15 Mar 2 18:39:56 172.16.0.35 [44318.723200] sas: sas_scsi_find_task: task 0xffff88006121d180 is aborted Mar 2 18:39:56 172.16.0.35 [44318.723456] sas: sas_eh_handle_sas_errors: task 0xffff88006121d180 is aborted Mar 2 18:39:56 172.16.0.35 [44318.723711] sas: trying to find task 0xffff88063d1a6d80 Mar 2 18:39:56 172.16.0.35 [44318.723962] sas: sas_scsi_find_task: aborting task 0xffff88063d1a6d80 Mar 2 18:39:56 172.16.0.35 [44318.724213] mvs_abort_task() mvi=ffff88063da00000 task=ffff88063d1a6d80 slot=ffff88063da25068 slot_idx=x22 Mar 2 18:39:56 172.16.0.35 [44318.724672] sas: sas_scsi_find_task: task 0xffff88063d1a6d80 is aborted Mar 2 18:39:56 172.16.0.35 [44318.724924] sas: sas_eh_handle_sas_errors: task 0xffff88063d1a6d80 is aborted Mar 2 18:39:56 172.16.0.35 [44318.725177] sas: trying to find task 0xffff88063d1a72c0 Mar 2 18:39:56 172.16.0.35 [44318.725431] sas: sas_scsi_find_task: aborting task 0xffff88063d1a72c0 Mar 2 18:39:56 172.16.0.35 [44318.725684] mvs_abort_task() mvi=ffff88063da00000 task=ffff88063d1a72c0 slot=ffff88063da250c0 slot_idx=x23 Mar 2 18:39:56 172.16.0.35 [44318.726138] sas: sas_scsi_find_task: task 0xffff88063d1a72c0 is aborted Mar 2 18:39:56 172.16.0.35 [44318.726391] sas: sas_eh_handle_sas_errors: task 0xffff88063d1a72c0 is aborted Mar 2 18:39:56 172.16.0.35 [44318.726646] sas: trying to find task 0xffff88006121e4c0 Mar 2 18:39:56 172.16.0.35 [44318.726897] sas: sas_scsi_find_task: aborting task 0xffff88006121e4c0 Mar 2 18:39:56 172.16.0.35 [44318.727150] mvs_abort_task() mvi=ffff88063da00000 task=ffff88006121e4c0 slot=ffff88063da24828 slot_idx=xa Mar 2 18:39:56 172.16.0.35 [44318.727612] sas: sas_scsi_find_task: task 0xffff88006121e4c0 is aborted Mar 2 18:39:56 172.16.0.35 [44318.727866] sas: sas_eh_handle_sas_errors: task 0xffff88006121e4c0 is aborted Mar 2 18:39:56 172.16.0.35 [44318.728123] sas: trying to find task 0xffff88006121cfc0 Mar 2 18:39:56 172.16.0.35 [44318.728375] sas: sas_scsi_find_task: aborting task 0xffff88006121cfc0 Mar 2 18:39:56 172.16.0.35 [44318.728628] mvs_abort_task() mvi=ffff88063da00000 task=ffff88006121cfc0 slot=ffff88063da25010 slot_idx=x21 Mar 2 18:39:56 172.16.0.35 [44318.729844] sas: sas_scsi_find_task: aborting task 0xffff88006121e300 Mar 2 18:39:56 172.16.0.35 [44318.730097] mvs_abort_task() mvi=ffff88063da00000 task=ffff88006121e300 slot=ffff88063da25118 slot_idx=x24 ... Mar 2 18:39:56 172.16.0.35 [44318.773027] sas: --- Exit sas_scsi_recover_host Mar 2 18:39:56 172.16.0.35 [44318.773794] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.774249] ata12: status=0x41 { Mar 2 18:39:56 172.16.0.35 DriveReady Mar 2 18:39:56 172.16.0.35 Error Mar 2 18:39:56 172.16.0.35 } Mar 2 18:39:56 172.16.0.35 [44318.774623] ata12: error=0x04 { Mar 2 18:39:56 172.16.0.35 DriveStatusError Mar 2 18:39:56 172.16.0.35 } Mar 2 18:39:56 172.16.0.35 [44318.774977] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.775438] ata12: status=0x41 { DriveReady Error } Mar 2 18:39:56 172.16.0.35 [44318.775439] ata12: error=0x04 { DriveStatusError } Mar 2 18:39:56 172.16.0.35 [44318.775452] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.775454] ata12: status=0x41 { DriveReady Error } Mar 2 18:39:56 172.16.0.35 [44318.775455] ata12: error=0x04 { DriveStatusError } ... Mar 2 18:39:56 172.16.0.35 [44318.784273] sd 0:0:11:0: [sdl] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Mar 2 18:39:56 172.16.0.35 [44318.784278] sd 0:0:11:0: [sdl] Sense Key : Aborted Command [current] [descriptor] Mar 2 18:39:56 172.16.0.35 [44318.784284] Descriptor sense data with sense descriptors (in hex): Mar 2 18:39:56 172.16.0.35 [44318.784288] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.784292] Mar 2 18:39:56 172.16.0.35 [44318.784295] ata12: status=0x41 { Mar 2 18:39:56 172.16.0.35 [44318.784298] DriveReady 72 Error 0b } Mar 2 18:39:56 172.16.0.35 [44318.784306] 00 Mar 2 18:39:56 172.16.0.35 [44318.784309] ata12: error=0x04 { 00 DriveStatusError 00 } Mar 2 18:39:56 172.16.0.35 [44318.784318] 00 00 0c 00 0a 80 00 00 00 00 00 Mar 2 18:39:56 172.16.0.35 [44318.784321] 00 00 00 23 Mar 2 18:39:56 172.16.0.35 [44318.784323] sd 0:0:11:0: [sdl] Add. Sense: No additional sense information Mar 2 18:39:56 172.16.0.35 [44318.784326] sd 0:0:11:0: [sdl] CDB: Read(10): 28 00 1f 8d bc 13 00 00 80 00 Mar 2 18:39:56 172.16.0.35 [44318.784332] end_request: I/O error, dev sdl, sector 529382419 Mar 2 18:39:56 172.16.0.35 [44318.784336] Buffer I/O error on device sdl1, logical block 264691178 Mar 2 18:39:56 172.16.0.35 [44318.784339] Buffer I/O error on device sdl1, logical block 264691179 Mar 2 18:39:56 172.16.0.35 [44318.784343] Buffer I/O error on device sdl1, logical block 264691180 Mar 2 18:39:56 172.16.0.35 [44318.784345] Buffer I/O error on device sdl1, logical block 264691181 Mar 2 18:39:56 172.16.0.35 [44318.784346] Buffer I/O error on device sdl1, logical block 264691182 Mar 2 18:39:56 172.16.0.35 [44318.784348] Buffer I/O error on device sdl1, logical block 264691183 Mar 2 18:39:56 172.16.0.35 [44318.784350] Buffer I/O error on device sdl1, logical block 264691184 Mar 2 18:39:56 172.16.0.35 [44318.784351] Buffer I/O error on device sdl1, logical block 264691185 Mar 2 18:39:56 172.16.0.35 [44318.784353] Buffer I/O error on device sdl1, logical block 264691186 Mar 2 18:39:56 172.16.0.35 [44318.784399] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.784404] Buffer I/O error on device sdl1, logical block 264691187 Mar 2 18:39:56 172.16.0.35 [44318.784408] ata12: status=0x41 { DriveReady Error } Mar 2 18:39:56 172.16.0.35 [44318.784411] ata12: error=0x04 { DriveStatusError } Mar 2 18:39:56 172.16.0.35 [44318.784564] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.784566] ata12: status=0x41 { DriveReady Error } Mar 2 18:39:56 172.16.0.35 [44318.784567] ata12: error=0x04 { DriveStatusError } Mar 2 18:39:56 172.16.0.35 [44318.784739] ata12: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 2 18:39:56 172.16.0.35 [44318.784740] ata12: status=0x41 { DriveReady Error } Mar 2 18:39:56 172.16.0.35 [44318.784742] ata12: error=0x04 { DriveStatusError } Mar 2 18:39:56 172.16.0.35 [44318.784882] sd 0:0:11:0: [sdl] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Mar 2 18:39:56 172.16.0.35 [44318.784885] sd 0:0:11:0: [sdl] Sense Key : Aborted Command [current] [descriptor] Mar 2 18:39:56 172.16.0.35 [44318.784887] Descriptor sense data with sense descriptors (in hex): Mar 2 18:39:56 172.16.0.35 [44318.784888] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Mar 2 18:39:56 172.16.0.35 [44318.784892] 00 00 00 23 Mar 2 18:39:56 172.16.0.35 [44318.784894] sd 0:0:11:0: [sdl] Add. Sense: No additional sense information Mar 2 18:39:56 172.16.0.35 [44318.784896] sd 0:0:11:0: [sdl] CDB: Read(10): 28 00 1f 8d bd 11 00 00 02 00 Mar 2 18:39:56 172.16.0.35 [44318.784900] end_request: I/O error, dev sdl, sector 529382673 ... and so on with errors continuing without apparent end, even after killing the dd's, although in lower volume. Well that's rather sad... anything else for me to try? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html