Re: ssd keeps dying (OT)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



By dead you mean it just quits answering on the bus at all?

I had a recent crucial 2TB SSD issue.  The first one failed in under
10 days, I got a replacement and the 2nd one pretty much did the same
thing at about the same time so, I returned it for a refund.

It makes me think that whatever is going on with them is something in
the SSD controller related in some way, and maybe both have the same
controller.   I have previously had a SSD firmware bug cause
consistent failures at a given power on hours valu (5+ years ago)e,
and I also know in there have been some of that same sort of power on
hours defect in other brands that "die" at some POH value.

I would check to see if they have any firmware updates for the drive.
  Some of the POH failures leave the drive permanently dead, and some
stay up long enough (after hitting the magic number of POH hours) to
get firmware updated (self test after xxx POH hours would fail, but
did not run until it was powered on for an hour after being reset).

There seem to be a lot of ways to screw up SSD firmware such that the
devices die.

On Tue, Feb 22, 2022 at 8:04 AM Neal Becker <ndbecker2@xxxxxxxxx> wrote:
>
> Thanks Richard.  Yes, I talked with Titan; they suggested trying the pcie-m.2 adapter.  I will try them again.
> I have not checked for bios updates.  Not sure how to go about that (last time I did that it required an msdos floppy disc).
>
> Haven't tried the SSDs in another device because I don't have one.  But the fact that replacing the SSD causes it to work, where it wasn't working before, tells me they were damaged.  I have at least once power off/on the workstation, and the bios did not find any ssd to boot from.  So power cycle didn't fix it, but replace ssd did fix it.
>
> I will try Titan again later today, but just looking for ideas.
>
> Thanks,
> Neal
>
> On Tue, Feb 22, 2022 at 8:44 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote:
>>
>> On Tue, Feb 22, 2022 at 7:34 AM Neal Becker <ndbecker2@xxxxxxxxx> wrote:
>>>
>>> I know this is a bit OT, but you guys are great at answering all questions.
>>>
>>> I bought a workstation from Titan computers around 1/2020 (dual EPYC cpu).  After about 1 year it stopped working.  I could ssh to it, and almost any command would return Input/Output error.  Unfortunately journalctl gave input/output error so I can't see logs.  cat /proc/partitions did not show any nvme device (the root device) on which the OS was installed.
>>>
>>> I replaced the SSD with a samsung 980 pro.  Reinstalled fedora.  It then worked a few weeks, then the exact same symptoms.
>>>
>>> I replaced the SSD with another samsung 980 pro, this time with heatsink.  Reinstalled fedora.  It worked a few weeks.  Then same symptoms.
>>>
>>> Then I replaced with a 4th samsung 980 pro, but this time instead of using the M.2 socket I used a pcie-m.2 adapter (in case something was wrong with the m.2 socket).  Also added a surge protector outlet for good measure. Reinstalled.  Watched the smartctl.  No errors.  Temperature was always low.
>>>
>>> Now it's failed again, exactly same symptoms.
>>>
>>> Any ideas?
>>
>>
>> I remember your other email about a month or so ago and thought it was really strange. Have you tried the drives in another system to confirm they're truly dead?
>>
>> I would check for BIOS updates just for good measure. Other than that, have you had any communication with Titan about it?
>>
>> Thanks,
>> Richard
>> _______________________________________________
>> users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
>> To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
>> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
>> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
>
>
>
> --
> Those who don't understand recursion are doomed to repeat it
> _______________________________________________
> users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux