On Sun, Nov 30, 2014 at 10:40 PM, Jacobo Pantoja <jacobopantoja@xxxxxxxxx> wrote: > On 1 December 2014 at 01:01, Robert Hancock <hancockrwd@xxxxxxxxx> wrote: >> On Sun, Nov 30, 2014 at 5:03 AM, Jacobo Pantoja <jacobopantoja@xxxxxxxxx> wrote: >>> Hello, >>> >>> It took me a while, but I got time to recompile and reproduce the >>> lockup with ultra-verbose output. >>> >>> Three out of four lockups seem identical (1, 2 and 4) but number 3 >>> seems different. The trigger mechanism was the same: connect through >>> ssh (verbose screen made impossible working locally), start dd'ing >>> from disk to /dev/null in an area with some bad sectors, and wait >>> until lockup. >>> >>> It is 100% reproducible, at least for the moment. >>> >>> The link with the 4 photos: >>> https://drive.google.com/folderview?id=0B4EqBXYvV-kTR2daRm1GYVBDbWs&usp=sharing >>> >>> Any idea about what to test now? >> >> It would appear that (in at least 3 of the 4 pictures) the lockup is >> happening during softreset. You can try changing this code in >> sata_nv.c: >> >> /* Do hardreset iff it's post-boot probing, please read the >> * comment above port ops for details. >> */ >> if (!(link->ap->pflags & ATA_PFLAG_LOADING) && >> !ata_dev_enabled(link->device)) >> sata_link_hardreset(link, sata_deb_timing_hotplug, deadline, >> NULL, NULL); >> else { >> const unsigned long *timing = sata_ehc_deb_timing(ehc); >> int rc; >> >> if (!(ehc->i.flags & ATA_EHI_QUIET)) >> ata_link_info(link, >> "nv: skipping hardreset on occupied port\n"); >> >> /* make sure the link is online */ >> rc = sata_link_resume(link, timing, deadline); >> /* whine about phy resume failure but proceed */ >> if (rc && rc != -EOPNOTSUPP) >> ata_link_warn(link, "failed to resume link (errno=%d)\n", >> rc); >> } >> >> to just hard-reset unconditionally: >> >> sata_link_hardreset(link, sata_deb_timing_hotplug, deadline, >> NULL, NULL); >> >> and see what that does to the behavior. This function has to deal with >> quite the comedy of errors that is reset handling on NV SATA, and it >> may be that the actual error-handling case is one where a hardreset is >> actually needed. >> > > Still same behaviour. I don't understand why does it softreset still > (but my knowledge is limited), I have checked several times that I > have modified the code as you proposed. Perhaps the code deciding > whether soft or hard is placed in a different area or file? > > I have uploaded 4 new pictures, and again, one is different than the rest. Looks like it's doing a hardreset now (apparently successfully). However the reason it still does a softreset anyway is this at the end of nv_hardreset: /* device signature acquisition is unreliable */ return -EAGAIN; Try changing that to: return 0; and see if that changes the behavior. That should make it skip the soft-reset. Whether or not the device works or not after that, or if it still locks up at some later point, we'll see. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html