On 1 December 2014 at 06:52, Robert Hancock <hancockrwd@xxxxxxxxx> wrote: > On Sun, Nov 30, 2014 at 10:40 PM, Jacobo Pantoja > <jacobopantoja@xxxxxxxxx> wrote: >> On 1 December 2014 at 01:01, Robert Hancock <hancockrwd@xxxxxxxxx> wrote: >>> On Sun, Nov 30, 2014 at 5:03 AM, Jacobo Pantoja <jacobopantoja@xxxxxxxxx> wrote: >>>> Hello, >>>> >>>> It took me a while, but I got time to recompile and reproduce the >>>> lockup with ultra-verbose output. >>>> >>>> Three out of four lockups seem identical (1, 2 and 4) but number 3 >>>> seems different. The trigger mechanism was the same: connect through >>>> ssh (verbose screen made impossible working locally), start dd'ing >>>> from disk to /dev/null in an area with some bad sectors, and wait >>>> until lockup. >>>> >>>> It is 100% reproducible, at least for the moment. >>>> >>>> The link with the 4 photos: >>>> https://drive.google.com/folderview?id=0B4EqBXYvV-kTR2daRm1GYVBDbWs&usp=sharing >>>> >>>> Any idea about what to test now? >>> >>> It would appear that (in at least 3 of the 4 pictures) the lockup is >>> happening during softreset. You can try changing this code in >>> sata_nv.c: >>> >>> /* Do hardreset iff it's post-boot probing, please read the >>> * comment above port ops for details. >>> */ >>> if (!(link->ap->pflags & ATA_PFLAG_LOADING) && >>> !ata_dev_enabled(link->device)) >>> sata_link_hardreset(link, sata_deb_timing_hotplug, deadline, >>> NULL, NULL); >>> else { >>> const unsigned long *timing = sata_ehc_deb_timing(ehc); >>> int rc; >>> >>> if (!(ehc->i.flags & ATA_EHI_QUIET)) >>> ata_link_info(link, >>> "nv: skipping hardreset on occupied port\n"); >>> >>> /* make sure the link is online */ >>> rc = sata_link_resume(link, timing, deadline); >>> /* whine about phy resume failure but proceed */ >>> if (rc && rc != -EOPNOTSUPP) >>> ata_link_warn(link, "failed to resume link (errno=%d)\n", >>> rc); >>> } >>> >>> to just hard-reset unconditionally: >>> >>> sata_link_hardreset(link, sata_deb_timing_hotplug, deadline, >>> NULL, NULL); >>> >>> and see what that does to the behavior. This function has to deal with >>> quite the comedy of errors that is reset handling on NV SATA, and it >>> may be that the actual error-handling case is one where a hardreset is >>> actually needed. >>> >> >> Still same behaviour. I don't understand why does it softreset still >> (but my knowledge is limited), I have checked several times that I >> have modified the code as you proposed. Perhaps the code deciding >> whether soft or hard is placed in a different area or file? >> >> I have uploaded 4 new pictures, and again, one is different than the rest. > > Looks like it's doing a hardreset now (apparently successfully). > However the reason it still does a softreset anyway is this at the end > of nv_hardreset: > > /* device signature acquisition is unreliable */ > return -EAGAIN; > > Try changing that to: > > return 0; > > and see if that changes the behavior. That should make it skip the > soft-reset. Whether or not the device works or not after that, or if > it still locks up at some later point, we'll see. Ok, after changing -EAGAIN to 0, I cannot boot it completely (it cannot find the rootfs). Every 10s aprox. it stops, but after some 10 tries, it gives up with a panic. I have made pictures but not processed yet (I will soon, but I doubt they are going to be useful). I guess that the only useful stuff I can do is booting from a USB. With my MoBo it is difficult, I will have to play with GRUB. I will be back. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html