Re: scsi: convert discard to REQ_TYPE_FS instead of REQ_TYPE_BLOCK_PC

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Tue, 06 Jul 2010 23:35:53 -0400

On 10-07-06 09:39 PM, Martin K. Petersen wrote:
"Mike" == Mike Snitzer<snitzer@xxxxxxxxxx>  writes:

That is 0x7fffff (over 8 million) blocks (4 GB) being unmapped in one
operation! That may exceed the "maximum unmap lba count" field in the
Block Limits VPD page.  The latest SBC draft (sbc3r22.pdf) says that
field applies to the SCSI UNMAP command and does not mention the
WRITE SAME (16) command but that is probably an oversight.

Maximum Unmap LBA Count>  0 (in combination with the descriptor count)
are what indicate that the device server supports UNMAP.

That has been superseded by the TPU and TPWS bits
in the Thin provisioning VPD page (B2h) in sbc3r22.
TPU and TPWS indicate support for the UNMAP and WRITE
SAME (16) with UNMAP bit ** commands respectively.

You could argue, then, that a Maximum Unmap LBA Count>  0 but a Maximum
Unmap Descriptor Count of 0 would provide means to indicate the maximum
range for WRITE SAME.  But the T10 people I have talked to all agree
that the LBA count for WRITE SAME is gated by the command's LBA count
and nothing else.  So no special casing for when the UNMAP bit is set.
I.e. the max for WRITE SAME(16) is 32-bits times logical_block_size.

I think sbc3r22 is just flaky in that area and will
be cleaned up soon. As the words stand now, in the
Block limits VPD page "maximum unmap lba count" only
applies to the UNMAP command while "optimal unmap
granularity" applies to both the UNMAP command and
the WRITE SAME(16) command. Inconsistent.
And "maximum unmap lba count"==0 implying no UNMAP
command is pointless given the TPU bit.

Mike>  # cat /sys/block/sda/queue/discard_granularity
Mike>  512
Mike>  # cat /sys/block/sda/queue/discard_max_bytes
Mike>  4294966784

Mike>  I'll look to understand why 'discard_max_bytes' is so large for
Mike>  this LUN despite the standard Block limits VPD page not reflecting
Mike>  this.

discard_max_bytes is 0xFFFFFFFF for WRITE SAME(16).

FORMAT UNIT has several associated mechanisms (e.g
IMMED bit and REQUEST SENSE polling) that let it
run for a long time. WRITE SAME has no such mechanisms.
There was a proposal put to t10 to place an upper limit
on WRITE SAME's lba count but I think that has been
dropped. IMO if we want to give large block counts to
UNMAP or WRITE SAME in the absence of guidance from the
block limits VPD page, then we need to cope with
device saying "nope".

Whatever device Mike has it seems to be failing the
WRITE SAME(16) command due to the huge lba block count.
Does the device work with a smaller lba block count?
For example:
   sg_write_same --unmap --lba 0 --num 1024 /dev/sda

** WRITE SAME (32) also has an UNMAP bit.

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html