On Sat, Jul 18, 2020 at 02:32:10PM +0200, Maciej S. Szmigiero wrote: > It has been observed that Toshiba DT01ACA family drives have > WRITE FPDMA QUEUED command timeouts and sometimes just freeze until > power-cycled under heavy write loads when their temperature is getting > polled in SCT mode. The SMART mode seems to be fine, though. > > Let's make sure we don't use SCT mode for these drives then. > > While only the 3 TB model was actually caught exhibiting the problem let's > play safe here to avoid data corruption and extend the ban to the whole > family. > > Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx> Applied. Thanks, Guenter > --- > > Notes: > This behavior was observed on two different DT01ACA3 drives. > > Usually, a series of queued WRITE FPDMA QUEUED commands just time out, > but sometimes the whole drive freezes. Merely disconnecting and > reconnecting SATA interface cable then does not unfreeze the drive. > > One has to disconnect and reconnect the drive power connector for the > drive to be detected again (suggesting the drive firmware itself has > crashed). > > This only happens when the drive temperature is polled very often (like > every second), so occasional SCT usage via smartmontools is probably > safe. > > Changes from v1: > 'SCT blacklist' -> 'SCT avoid models' > > Changes from v2: > * Switch to prefix matching and use it to match the DT01ACAx family, > > * Use "!" instead of "== 0", > > * Add a comment about the contents of the "model" field. > > drivers/hwmon/drivetemp.c | 43 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 43 insertions(+) > > diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c > index 0d4f3d97ffc6..72c760373957 100644 > --- a/drivers/hwmon/drivetemp.c > +++ b/drivers/hwmon/drivetemp.c > @@ -285,6 +285,42 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val) > return err; > } > > +static const char * const sct_avoid_models[] = { > +/* > + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just > + * freeze until power-cycled under heavy write loads when their temperature is > + * getting polled in SCT mode. The SMART mode seems to be fine, though. > + * > + * While only the 3 TB model (DT01ACA3) was actually caught exhibiting the > + * problem let's play safe here to avoid data corruption and ban the whole > + * DT01ACAx family. > + > + * The models from this array are prefix-matched. > + */ > + "TOSHIBA DT01ACA", > +}; > + > +static bool drivetemp_sct_avoid(struct drivetemp_data *st) > +{ > + struct scsi_device *sdev = st->sdev; > + unsigned int ctr; > + > + if (!sdev->model) > + return false; > + > + /* > + * The "model" field contains just the raw SCSI INQUIRY response > + * "product identification" field, which has a width of 16 bytes. > + * This field is space-filled, but is NOT NULL-terminated. > + */ > + for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++) > + if (!strncmp(sdev->model, sct_avoid_models[ctr], > + strlen(sct_avoid_models[ctr]))) > + return true; > + > + return false; > +} > + > static int drivetemp_identify_sata(struct drivetemp_data *st) > { > struct scsi_device *sdev = st->sdev; > @@ -326,6 +362,13 @@ static int drivetemp_identify_sata(struct drivetemp_data *st) > /* bail out if this is not a SATA device */ > if (!is_ata || !is_sata) > return -ENODEV; > + > + if (have_sct && drivetemp_sct_avoid(st)) { > + dev_notice(&sdev->sdev_gendev, > + "will avoid using SCT for temperature monitoring\n"); > + have_sct = false; > + } > + > if (!have_sct) > goto skip_sct; >