Re: [nForce4] - Repeatable issues with nForce 4

Robert Hancock <hancockrwd@xxxxxxxxx> · Sun, 30 Nov 2014 23:52:40 -0600

On Sun, Nov 30, 2014 at 10:40 PM, Jacobo Pantoja
<jacobopantoja@xxxxxxxxx> wrote:
> On 1 December 2014 at 01:01, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>> On Sun, Nov 30, 2014 at 5:03 AM, Jacobo Pantoja <jacobopantoja@xxxxxxxxx> wrote:
>>> Hello,
>>>
>>> It took me a while, but I got time to recompile and reproduce the
>>> lockup with ultra-verbose output.
>>>
>>> Three out of four lockups seem identical (1, 2 and 4) but number 3
>>> seems different. The trigger mechanism was the same: connect through
>>> ssh (verbose screen made impossible working locally), start dd'ing
>>> from disk to /dev/null in an area with some bad sectors, and wait
>>> until lockup.
>>>
>>> It is 100% reproducible, at least for the moment.
>>>
>>> The link with the 4 photos:
>>> https://drive.google.com/folderview?id=0B4EqBXYvV-kTR2daRm1GYVBDbWs&usp=sharing
>>>
>>> Any idea about what to test now?
>>
>> It would appear that (in at least 3 of the 4 pictures) the lockup is
>> happening during softreset. You can try changing this code in
>> sata_nv.c:
>>
>>     /* Do hardreset iff it's post-boot probing, please read the
>>      * comment above port ops for details.
>>      */
>>     if (!(link->ap->pflags & ATA_PFLAG_LOADING) &&
>>         !ata_dev_enabled(link->device))
>>         sata_link_hardreset(link, sata_deb_timing_hotplug, deadline,
>>                     NULL, NULL);
>>     else {
>>         const unsigned long *timing = sata_ehc_deb_timing(ehc);
>>         int rc;
>>
>>         if (!(ehc->i.flags & ATA_EHI_QUIET))
>>             ata_link_info(link,
>>                       "nv: skipping hardreset on occupied port\n");
>>
>>         /* make sure the link is online */
>>         rc = sata_link_resume(link, timing, deadline);
>>         /* whine about phy resume failure but proceed */
>>         if (rc && rc != -EOPNOTSUPP)
>>             ata_link_warn(link, "failed to resume link (errno=%d)\n",
>>                       rc);
>>     }
>>
>> to just hard-reset unconditionally:
>>
>>         sata_link_hardreset(link, sata_deb_timing_hotplug, deadline,
>>                     NULL, NULL);
>>
>> and see what that does to the behavior. This function has to deal with
>> quite the comedy of errors that is reset handling on NV SATA, and it
>> may be that the actual error-handling case is one where a hardreset is
>> actually needed.
>>
>
> Still same behaviour. I don't understand why does it softreset still
> (but my knowledge is limited), I have checked several times that I
> have modified the code as you proposed. Perhaps the code deciding
> whether soft or hard is placed in a different area or file?
>
> I have uploaded 4 new pictures, and again, one is different than the rest.

Looks like it's doing a hardreset now (apparently successfully).
However the reason it still does a softreset anyway is this at the end
of nv_hardreset:

        /* device signature acquisition is unreliable */
        return -EAGAIN;

Try changing that to:

        return 0;

and see if that changes the behavior. That should make it skip the
soft-reset. Whether or not the device works or not after that, or if
it still locks up at some later point, we'll see.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html