Re: Repetitive catastrophic failure of a SSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/08/2016 06:59 AM, Marc Planard wrote:
First of all, I'm not sure I'm writing to right place, but it's my best
guess. If it's not, I'm sorry for the noise, and I'll be happy to be
re-routed to the proper person/mailing-list.

I'll detail my story at the end of this mail, but as a quick summary:

* Got an Asus ROG G551JW, including a Kingston smsm151s3128gd SSD
* SSD running Ubuntu failed after 1 month, got replaced by support
* failed again in the very same way 4 month later.

It's possible that I've been unlucky and got two bad units, but there
is also the possibility that this model of SSD has some intrinsic
defect / peculiarity which is handled by the OS it's provided with,
but not by Linux.

In this case I guess I can ben helpful to help improve the linux driver
handling this SSD and prevent other users to brick their SSD as I did.

(my experience with kernel development is minimal but I'm a good
background in system programming, and I was used to compile my own
kernels, when I was young :) )

Detailed story:

So I've bought this Asus ROG G551JW five month ago. It has a HDD
(/dev/sda) , a SSD (/dev/sdb) and a DVD drive (/dev/sdc) .

Upon acquisition I immediately wiped out Windows from the SSD and
installed Ubuntu 15.04 instead. As I don't have huge space requirements
at this point I left the HDD alone, running everything from the SSD. I
simply used the "use the whole disk" option while installing.

After exactly 1 month of satisfying experience, the SSD just failed. It
started with every chrome tabs crashing. When I reached a terminal to
see what was going on, all I saw was some "INPUT/OUTPUT" errors. I
switched the machine off, then on again. After that, the system never
booted again: after a long time, the EFI/Bios was displaying a "NO
SYSTEM DISK" message or was going directly to its settings menu. In it,
sometimes the SSD was listed (rarely), sometimes it was just absent
(mostly), while the HDD and DVD were listed.

After much difficulties (random stalls, reboots...), I managed to boot a
live session and install a new system on the HDD, to investigate what
was going on. After that, the boot process still took ages to go from
the EFI/Bios to grub (I guess it was trying to probe the failing SSD,
and finally timed-out...) but was otherwise quite normal.
Once logged, /dev/sdb (the SSD) sometimes showed up, sometimes was
missing, randomly at each boot. When there, it was impossible to extract
any information from it (smartctrl -a showed nothing interesting,
hexdump -C /dev/sdb returned immediatly without printing anything).

dmsg showed some interesting stuff but I wasn't able to google anything
interesting from it except: "your drive is dead dude, get over it".
(typical output here: http://pastebin.com/v7eJxmg9 )
So I called the support, explained the situation, send them the computer
and got it back in no time with a brand new SSD inside.
So I wiped Windows again and reinstalled Ubuntu, all on the SSD, exactly
like the first time. During the next 4 months, I had some worrying
warnings, like this one time when my root partition was remounted in
read-only on errors, but after a reboot everything was fine. I also
noticed that IO were slugish when the SSD was almost full (with
sometimes seconds of stalls on reads/writes), so I always kept the free
space on the drive at ~20%.

But last week, the exact same failure happened again. Same input/output
errors, no SSD listed in the EFI/Bios menu, same messages in dmesg.
At this point, it's still possible that I'm very, very, very unlucky (I
haven't found uproars of users of this laptop online so I don't think
it's a defect affecting all the series...) and had 2 defective units in
a row, but I doubt it.

Right now I'm not really sure what I want to do. I may call the support
again and got it replaced again but If I can't use this SSD reliably
it's pretty useless. I could simply remove it and only use the HDD, but
unfortunately it's not a standard disk form factor and I fear that
removing it will void the warranty.

If anyone has any idea on what is going on, what diagnoses I could run
to get more insights, or who I could contact, I'll be grateful.

It's hard to advise without the exact errors that you saw. However, if the device dies in IO errors and subsequently doesn't even show up in a BIOS scan, it very much sounds like a hardware issue to me... There are a lot of crappy SSDs out there, unfortunately.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux