On Mon, 2020-12-14 at 20:19 +1100, Eyal Lebedinsky wrote:
On 14/12/2020 13.20, Chris Murphy wrote:On Sun, Dec 13, 2020 at 4:42 PM Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> wrote:I am not sure which list this should go to, so I am starting here.I run f32 fully updated5.9.13-100.fc32.x86_64on relatively new hardwarekernel: DMI: Gigabyte Technology Co., Ltd. Z390 UD/Z390 UD, BIOS F8 05/24/2019boot/root/swap/data is on nvmeWD Blue SN550 1TB M.2 2280 NVMe SSD WDS100T2B0CI can't tell from WD's website if there's any newer firmwareavailable. They seem to hide this information within the Windows-onlysoftware "Western Digital Dashboard". If you have Windows alreadyinstalled, it's straightforward to install this and find out if thefirmware is up to date.Option 1) My nvme disk is on the mobo which has only one slot. I have access to a windows laptopbut I will also need an external nvme/USB adapter - will the Dashboard work this way?Will a fw update leave the disk content safe?I will try something else first.There is a boot parameter 'nvme_core.default_ps_max_latency_us' whichtakes a value in usec, but I can't find a value specific to thismake/model NVMe. My gut instinct is, it's a hack put in by upstreamkernel developers to work around a proper autodetect solution betweenPCIe intereface and the drive. I would sooner return the drive and getone known to work. I can vouch for Crucial, Seagate, and Samsung SSDand NVMe for the most part.Option 2) Reading the reports (and more) I decided to test the boot paramnvme_core.default_ps_max_latency_us=0which I understand turns off the APSTE feature.Oh here's a bug reportThat leads here:comment 1 is a more solid lead than comment 2, because comment 2 is avalue that is based on what? A guess? Reading the rest of the thread,it's still uncertain.For the second time this disk stopped working (first was about two months ago).It seems that the disk failed hard and could not be reset, the machine was powered off/on.I think (not sure) that last time I just hit the reset button but it did not boot.The machine was booted (after dnf update) around 8pm, and crashed at 4am.Following the earlier crash a serial console was set up which is how I can see the failure messages.== nvme related messages[ 7.488638] nvme nvme0: pci function 0000:06:00.0[ 7.536593] nvme nvme0: allocated 32 MiB host memory buffer.[ 7.541819] nvme nvme0: 8/0/0 default/read/poll queues[ 7.558122] nvme0n1: p1 p2 p3 p4[ 19.590010] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)[ 20.653500] Adding 16777212k swap on /dev/nvme0n1p2. Priority:-2 extents:1 across:16777212k SSFS[ 20.820539] EXT4-fs (nvme0n1p3): re-mounted. Opts: (null)[ 23.137206] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)[ 23.210717] EXT4-fs (nvme0n1p4): mounted filesystem with ordered data mode. Opts: (null)## nothing unusual for 8 hours, then[28972.459036] nvme nvme0: I/O 840 QID 6 timeout, aborting[28972.464757] nvme nvme0: I/O 565 QID 7 timeout, aborting[28972.470277] nvme nvme0: I/O 566 QID 7 timeout, aborting[28973.291025] nvme nvme0: I/O 989 QID 1 timeout, aborting[28978.603061] nvme nvme0: I/O 990 QID 1 timeout, aborting[29002.667243] nvme nvme0: I/O 840 QID 6 timeout, reset controller[29032.875421] nvme nvme0: I/O 24 QID 0 timeout, reset controller[29074.097644] nvme nvme0: Device not ready; aborting reset, CSTS=0x1[29074.110354] nvme nvme0: Abort status: 0x371[29074.114953] nvme nvme0: Abort status: 0x371[29074.119523] nvme nvme0: Abort status: 0x371[29074.124114] nvme nvme0: Abort status: 0x371[29074.128710] nvme nvme0: Abort status: 0x371[29096.645478] nvme nvme0: Device not ready; aborting reset, CSTS=0x1[29096.652210] nvme nvme0: Removing after probe failure status: -19[29119.165921] nvme nvme0: Device not ready; aborting reset, CSTS=0x1## many I/O errors on nvme0 (p2/p3/p4) repeating until a reboot at 8:30am## one different message, appearing just once:[29123.800844] nvme nvme0: failed to set APST feature (-19)I'd take the position that it's defective and permit the manufacturera short leash to convince me otherwise via a tech support call oremail. But I really wouldn't just wait around for another 2 months notknowing if it's going to fail again. I'd like some kind of answer forthis problem from support folks. And if they can't give support, getrid of it.The time frame for a repeat of the problem is why I'm taking thisslightly different view, than the tinker with firmware view earlier.It's not horrible to update firmware, and give it a go, if thisproblem happens once a week or more often. But every two months?Forget it. Make it their problem.And seriously I give them one chance. If they b.s. me and it flakesout again in a month or two, no more chances. So the quandary is,what's your return policy window? If it's about to end, just return itnow. It should just work out of the box. WDC does contribute to thekernel. Whether this is a product supported on Linux I don't know.Option 3) Get a new disk from a reliable brand (as mentioned on this thread)and keep this one as a spare. I will do this if the problem happens again..I will log an issue with WD and see what they have to say.Thanks everyone--Eyal Lebedinsky (fedora@xxxxxxxxxxxxxx
You mentioned only a single NVME slot on your motherboard. If you have an available PCIe slot, there's a nifty adapter you can buy for a second NVME drive: https://www.amazon.com/GLOTRENDS-Adapter-Aluminum-Heatsink-PA09_HS/dp/B07FN3YZ8P/ref=sr_1_2_sspa?dchild=1&keywords=nvme-PCIe+adapter&qid=1608832943&sr=8-2-spons&psc=1&smid=A36DOQ8QSJXCYP&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzMEtDMUI4RFZMTTVJJmVuY3J5cHRlZElkPUEwMjkxMzY0MkNPNkxaV0ZHVEFEOSZlbmNyeXB0ZWRBZElkPUEwNzYyNDAxM0hXNDQ0MzFHOVBZVCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=
--Doc Savage
Fairview Heights, IL
_______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx