Re: Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

sorry for the "small" delay... I got distracted and only now revisited
this topic as I wanted to use discard to improve backup space
efficiency and pondered on using devices_handle_discard_safely of the
raid456 module (I run ext4 on lvm on luks on raid5 on 3 ssds) since
otherwise I cannot trim at all.

My inquiry deals with two points:
 - Discussing the addition of ATA_HORKAGE_ZERO_AFTER_TRIM for Crucial
   CT500MX500 (or CT*MX500 to include the 250 GB, 1 TB and 2 TB models)
 - Determining why the Samsung SSD 860 EVO is not recognized to zero
   after trim

On Thu, 26 Sep 2019 18:01:03 -0400
"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> wrote:

> > I don't know the technical details how this is communicated by the
> > drive but I assume it's the same thing that smartctl and hdparm output
> > as "Model Number" and "Device Model" respectively.  
> 
> Yes.
> 
> > If this is correct (is it?) then there is a problem with the list
> > AFAICT because the Crucial SSD I have reports this field simply as
> > "CT500MX500SSD4" but the kernel expects "Crucial" at the beginning of
> > almost all Crucial drives (line 4523+) including the vendor wildcard at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4586
> > Interestingly, in line 4520 there is an entry for the CT500BX100SSD1
> > that does not start with "Crucial".  
> 
> With a few exceptions, the entries in the libata white/blacklist were
> submitted by Crucial/Micron themselves. But it's possible that they
> changed their naming scheme.

I can look for some smartctl logs of similar models but it is obviously
the case for mine.

> > After looking into smartctl's drive database I guess the MX500 [2] (as
> > well as BX100, BX200, BX300 and BX500 [1]) series stand out in this
> > regard. This means that all of them do *not* get the
> > ATA_HORKAGE_ZERO_AFTER_TRIM flag set because they are not matched by
> > any of the model-specific entries nor the cumulative "Crucial*" vendor
> > entry.  
> 
> The newest drives I have are M550 models.

Since Crucial has stopped producing new models I think it makes sense
to eventually conclude this topic and make some (final?) changes if
need be. Apparently the queued trim issues are not fully figured out
yet (saw commits to Linus' tree a short while ago on that) - so maybe
final-ish changes ;)

> > I have not tested my drive to actually return zeros after trimming but
> > from the kernel code I would assume that its intent is to match all
> > Crucial SSDs and thus it is a bug mine is not matched. If someone
> > tells me to the preferred method to test it I am happy to do this. If
> > need be I can also submit a patch (just for MX500? all of the above?).  
> 
> There's no way to exhaustively test. Many drives will return zeroes most
> of the time but can have corner conditions that cause them to ignore
> TRIM commands.

Sure, but since the whitelist was filled with devices that have been
tested/validated empirically, I wonder how thorough this needs to be
to add a drive with good confidence. After all, the vendor wildcard
for Crucial SSDs[1] has been quite broad and only restricted later
with blacklist entries (only due to NCQ trim and LPM problems AFAICT)...
So while queued trim is not blacklisted on my device the safe zeroing
assumption is not whitelisted for no other reason than the model
string missing "Crucial " at the beginning.
 
> > Is there any way to see which flags the kernel applies to a drive?  
> 
> # grep . /sys/class/ata_device/*/trim
> /sys/class/ata_device/dev1.0/trim:unqueued
> /sys/class/ata_device/dev2.0/trim:queued

But that's only to distinguish ATA_HORKAGE_NO_NCQ_TRIM I guess? While
this seems to be the major culprit of trim related issues I don't care
about that (yet).

> > Interestingly, "lsblk -D" does only show "0" for the Samsung device
> > (although AFAICT it is matched by the white list AND reports
> > "Deterministic read ZEROs after TRIM" according to hdparm. But I don't
> > know what lsblk actually looks at...?  
> 
> lsblk looks at /sys/block/*/queue/discard*

Yes, I could have checked strace :)

> You get "0" for the discard granularity on the Samsung?

Not for the granularity - that's fine I presume - but for the zeroing
capability. This is still the case (with Linux 5.10). I would have
expected that to be non-zero for devices with
ATA_HORKAGE_ZERO_AFTER_TRIM.

# lsblk -o PATH,MODEL,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO -d
PATH     MODEL                           DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
/dev/sda CT500MX500SSD4                         0        4K       2G         0
/dev/sdb CT500MX500SSD4                         0        4K       2G         0
/dev/sdc Samsung_SSD_860_EVO_mSATA_500GB        0      512B       2G         0

Just to make sure lsblk is not lying:
# cat /sys/block/sdc/queue/discard_zeroes_data 
0

I don't understand why that's the case.


1: https://github.com/torvalds/linux/blob/7a8526a5cd51cf5f070310c6c37dd7293334ac49/drivers/ata/libata-core.c#L4030

KR
-- 
Dipl.-Ing. Stefan Tauner
Lecturer and former researcher
Embedded Systems Department

University of Applied Sciences Technikum Wien
Hoechstaedtplatz 6, 1200 Vienna, Austria
E: stefan.tauner@xxxxxxxxxxxxxxxxx
I: embsys.technikum-wien.at



[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux