On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@xxxxxxxxx> wrote: > On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@xxxxxxxxxxx> wrote: >> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@xxxxxxxxx> wrote: >>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@xxxxxxxxx> wrote: > Since I have no experience with SCT Write Same at all, and neither do > I own any spinning HDD at all, I cannot firmly suggest what to do. All > I can suggest is: should we decrease it per sector size? Or would 2G > per command still be too large to avoid timeout? Timeout for WS is 120 seconds so we should be fine there. The number to look for is the: Max. Sustained Transfer Rate OD (MB/s): 190 8TB (180 5TB) Which means the above drives should complete a 2G write in about 10 to 11 seconds. If these were 4Kn drives and we allowed a 16G max then it would be 80-90 seconds, assuming the write speed didn't get any better. So holding the maximum to around 2G is probably the best overall, in my opinion. -- Shaun Tancheff On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@xxxxxxxxx> wrote: > On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@xxxxxxxxxxx> wrote: >> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@xxxxxxxxx> wrote: >>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@xxxxxxxxx> wrote: >>>> As mentioned before, as of the latest draft of ACS-4, nothing about a >>>> larger payload size is mentioned. Conservatively speaking, it sort of >>> >>> *payload block size >>> >>>> means that we are allowing four 512-byte block payload on 4Kn device >>> >>> *eight 512-byte-block payload >>> >>>> regardless of the reported limit in the IDENTIFY DEVICE data. I am >>>> really not sure if it's a good thing to do. Doesn't seem necessary >>>> anyway, especially when our block layer does not support such a large >>>> bio size (well, yet), so each request will end up using a payload of >>>> two 512-byte blocks at max anyway. >>>> >>>> Also, it's IMHO better to do it in a seperate patch (series) after the >>>> SCT Write Same support has entered libata's repo too, because this has >>>> nothing to with it but TRIM translation. In case the future ACS >>>> standards has clearer/better instruction on this, it will be easier >>>> for us to revert/follow up too. >> >> I am certainly fine with dropping this patch as it is not critical to >> the reset of the series. >> >> Nothing will break if we stick with the 512 byte fixed limit. This >> is at most a prep patch for handling increased limits should >> they be reported. >> >> All it really is doing is acknowledging that any write same >> must have a payload of sector_size which can be something >> larger than 512 bytes. > > Actually I am not sure if we should hard code the limit > ata_format_dsm_trim_descr() / ata_set_lba_range_entries() at all. The > current implementation (with or without your patch) seems redundant > and unnecessary to me. > > All we need to do should be: making sure that the block limits VPD > advertises a safe Maximum Write Same Length, and reject Write Same > (16) commands that have "number of blocks" that exceeds the limit > (which is what I introduced in commit 5c79097a28c2, "libata-scsi: > reject WRITE SAME (16) with n_block that exceeds limit"). > > In that case, we don't need to hard code the limit in the > while-condition again; instead we should just make it end with the > request size, since the accepted request could never be larger than > the limit we advertise. > >> >>>> And you'll need to fix the Block Limits VPD simulation >>>> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write >>>> Same Length dynamically as per the logical sector size, otherwise your >>>> effort will be completely in vain, even if our block layer is >>>> overhauled in the future. >> >> Martin had earlier suggested that I leave the write same defaults >> as is due to concerns with misbehaving hardware. > > It doesn't really apply in libata's anyway. SD_MAX_WS10_BLOCKS means > nothing to ATA drives, except from coincidentally being the same value > as ATA_MAX_SECTORS_LBA48 (which technically should have been 65536 > instead). > >> >> I think your patch adjusting the reported limits is reasonable >> enough. It seems to me we should have the hardware >> report it's actual limits, for example, report what the spec >> allows. > > As you mentioned yourself before, technically SCT Write Same does not > have a limit. The only practical limit is the timeout in the SCSI > layer, so the actual bytes being (over)written is probably our only > concern. > > For the case of TRIM, devices do report a limit in their IDENTIFY > DEVICE data. However, as Martin always said, it is not an always-safe > piece of data for us to refer to, that's why we have been statically > allowing only 1-block payload. > > Therefore, it seems convenient (and consistent) that we make SCT Write > Same always use the same limit as TRIM, no matter if it is supported > on a certain device. And to make sure the actual bytes being written / > time required per command does not increase enormously as per the > sector size, we decrease the limit accordingly. Certainly that's not > necessary if 16G per command is fine on most devices. > > Also, does SCT Write Same commands that write 32M/256M per command > make any sense? I mean would we benefit from such small SCT Write Same > commands at all? > >> >> Of course there are lots of reasons to limit the absolute >> maximums. >> >> So in this case we are just enabling the limit to be >> increased but not changing the current black-listing >> that distrusts DSM Trim. Once we have 4Kn devices >> to test then we can start white-listing and see if there >> is an overall increase in performance. >> >>>> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at >>>> all, the reported Maximum Write Same Length will be: >>>> >>>> On device with TRIM support: >>>> - 4194240 LOGICAL sector per request split / command >>>> -- ~=2G on non-4Kn drives >>>> -- ~=16G on non-4Kn drives >>>> >>>> On device without TRIM support: >>>> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command >>>> -- ~= 32M on non-4Kn drives >>>> -- ~=256M on non-4Kn drives >>>> >>>> Even if we ignore the upper limit(s) of the block layer, do we want >>>> such inconsistencies? >> >> Hmm. Overall I think it is still okay if a bit confusing. >> It is possible that for devices which support SCT Write Same >> and DSM TRIM will still Trim faster than they can Write Same, >> However the internal implementation is opaque so I can't >> say if Write Same is often implemented in terms of TRIM >> or not. I mean that's how _I_ do it [Write 1 block and map >> N blocks to it], But not every FTL will have come to the >> same conclusion. > > Why would SCT Write Same be implemented in terms of TRIM? Neither > would we need to care about that anyway. Considering we will unlikely > allow multi-block payload TRIM, and we probably have no reason to > touch the SCSI Write Same timeout, the only thing we need to consider > is whether we want to decrease the advertised limit base on the > typical SCT Write Same speed on traditional HDDs and the timeout, > especially in the 4Kn case. > > Since I have no experience with SCT Write Same at all, and neither do > I own any spinning HDD at all, I cannot firmly suggest what to do. All > I can suggest is: should we decrease it per sector size? Or would 2G > per command still be too large to avoid timeout? > >> >> I also suspect that given the choice for most use casess that >> TRIM is preferred over WS when TRIM supports returning >> zeroed blocks. > > Well, for devices with discard_zeroes_data = 1, the block layer will > not issue write same requests (see blkdev_issue_zeroout in > block/blk-lib.c). However, libata only consider the RZAT support bit > from a white list of devices (see ata_scsiop_read_cap in libata-scsi > and the white list in libata-core). > >> >> -- >> Shaun Tancheff -- Shaun Tancheff -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html