On 7/14/20 10:47 AM, Maciej S. Szmigiero wrote: > On 14.07.2020 19:14, Guenter Roeck wrote: >> On 7/14/20 7:11 AM, Maciej S. Szmigiero wrote: >>> On 14.07.2020 16:02, Guenter Roeck wrote: >>>> On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote: >>>>> It has been observed that Toshiba DT01ACA family drives have >>>>> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until >>>>> power-cycled under heavy write loads when their temperature is getting >>>>> polled in SCT mode. The SMART mode seems to be fine, though. >>>>> >>>>> Let's make sure we don't use SCT mode for these drives then. >>>>> >>>>> While only the 3 TB model was actually caught exhibiting the problem let's >>>>> play safe here to avoid data corruption and extend the ban to the whole >>>>> family. >>>>> >>>>> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") >>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>> Signed-off-by: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx> >>>>> --- >>>>> >>>>> Notes: >>>>> This behavior was observed on two different DT01ACA3 drives. >>>>> >>>>> Usually, a series of queued WRITE FPDMA QUEUED commands just time out, >>>>> but sometimes the whole drive freezes. Merely disconnecting and >>>>> reconnecting SATA interface cable then does not unfreeze the drive. >>>>> >>>>> One has to disconnect and reconnect the drive power connector for the >>>>> drive to be detected again (suggesting the drive firmware itself has >>>>> crashed). >>>>> >>>>> This only happens when the drive temperature is polled very often (like >>>>> every second), so occasional SCT usage via smartmontools is probably >>>>> safe. >>>>> >>>>> Changes from v1: >>>>> 'SCT blacklist' -> 'SCT avoid models' >>>>> >>>>> drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 37 insertions(+) >>>>> >>>>> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c >>>>> index 0d4f3d97ffc6..da9cfcbecc96 100644 >>>>> --- a/drivers/hwmon/drivetemp.c >>>>> +++ b/drivers/hwmon/drivetemp.c >>>>> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val) >>>>> return err; >>>>> } >>>>> >>>>> +static const char * const sct_avoid_models[] = { >>>>> +/* >>>>> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just >>>>> + * freeze until power-cycled under heavy write loads when their temperature is >>>>> + * getting polled in SCT mode. The SMART mode seems to be fine, though. >>>>> + * >>>>> + * While only the 3 TB model was actually caught exhibiting the problem >>>>> + * let's play safe here to avoid data corruption and ban the whole family. >>>>> + */ >>>>> + "TOSHIBA DT01ACA0", >>>>> + "TOSHIBA DT01ACA1", >>>>> + "TOSHIBA DT01ACA2", >>>>> + "TOSHIBA DT01ACA3", >>>>> +}; >>>>> + >>>>> +static bool drivetemp_sct_avoid(struct drivetemp_data *st) >>>>> +{ >>>>> + struct scsi_device *sdev = st->sdev; >>>>> + unsigned int ctr; >>>>> + >>>>> + if (!sdev->model) >>>>> + return false; >>>>> + >>>>> + for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++) >>>>> + if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0) >>>> >>>> Why strncmp, and why length 16 ? Both strings are, as far as I can see, >>>> 0 terminated. A fixed length only asks for trouble later on as more models >>>> are added to the list. >>> >>> The first 16 bytes of sdev->model contain the model number, the rest >>> seems to be the drive serial number. >>> There is no NULL separator between them. >>> >> >> If the "16" is based on some SCSI standard, there should be >> a define for it somewhere. > > The "model" field seems to contain just the raw SCSI INQUIRY > "product identification" field, which is space-filled to 16 bytes, but > is not NULL-terminated. > > The SCSI layer seems to just open-code the number 16 everywhere, can't > see any specific define for this. > >> Otherwise, the comparison should >> use strlen(sct_avoid_models[ctr]) and explain the reason. >> In the latter case, it might possibly make sense to match >> "TOSHIBA DT01ACA" to also catch later models. > > Will use strlen() for prefix matching and match on the > "TOSHIBA DT01ACA" prefix then. > sgtm Thanks, Guenter