On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote: > It has been observed that Toshiba DT01ACA family drives have > WRITE FPDMA QUEUED command timeouts and sometimes just freeze until > power-cycled under heavy write loads when their temperature is getting > polled in SCT mode. The SMART mode seems to be fine, though. > > Let's make sure we don't use SCT mode for these drives then. > > While only the 3 TB model was actually caught exhibiting the problem let's > play safe here to avoid data corruption and extend the ban to the whole > family. > > Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx> > --- > > Notes: > This behavior was observed on two different DT01ACA3 drives. > > Usually, a series of queued WRITE FPDMA QUEUED commands just time out, > but sometimes the whole drive freezes. Merely disconnecting and > reconnecting SATA interface cable then does not unfreeze the drive. > > One has to disconnect and reconnect the drive power connector for the > drive to be detected again (suggesting the drive firmware itself has > crashed). > > This only happens when the drive temperature is polled very often (like > every second), so occasional SCT usage via smartmontools is probably > safe. > > Changes from v1: > 'SCT blacklist' -> 'SCT avoid models' > > drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c > index 0d4f3d97ffc6..da9cfcbecc96 100644 > --- a/drivers/hwmon/drivetemp.c > +++ b/drivers/hwmon/drivetemp.c > @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val) > return err; > } > > +static const char * const sct_avoid_models[] = { > +/* > + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just > + * freeze until power-cycled under heavy write loads when their temperature is > + * getting polled in SCT mode. The SMART mode seems to be fine, though. > + * > + * While only the 3 TB model was actually caught exhibiting the problem > + * let's play safe here to avoid data corruption and ban the whole family. > + */ > + "TOSHIBA DT01ACA0", > + "TOSHIBA DT01ACA1", > + "TOSHIBA DT01ACA2", > + "TOSHIBA DT01ACA3", > +}; > + > +static bool drivetemp_sct_avoid(struct drivetemp_data *st) > +{ > + struct scsi_device *sdev = st->sdev; > + unsigned int ctr; > + > + if (!sdev->model) > + return false; > + > + for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++) > + if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0) Why strncmp, and why length 16 ? Both strings are, as far as I can see, 0 terminated. A fixed length only asks for trouble later on as more models are added to the list. Also, please use "!" instead of "== 0". > + return true; > + > + return false; > +} > + > static int drivetemp_identify_sata(struct drivetemp_data *st) > { > struct scsi_device *sdev = st->sdev; > @@ -326,6 +356,13 @@ static int drivetemp_identify_sata(struct drivetemp_data *st) > /* bail out if this is not a SATA device */ > if (!is_ata || !is_sata) > return -ENODEV; > + > + if (have_sct && drivetemp_sct_avoid(st)) { > + dev_notice(&sdev->sdev_gendev, > + "will avoid using SCT for temperature monitoring\n"); > + have_sct = false; > + } > + > if (!have_sct) > goto skip_sct; > >