Re: [PATCH 0/9] block/scsi: Implement SMR drive support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/08/2016 08:35 PM, Shaun Tancheff wrote:
On Mon, Apr 4, 2016 at 5:00 PM, Hannes Reinecke <hare@xxxxxxx> wrote:
Hi all,

here's a patchset implementing SMR (shingled magnetic recording)
device support for the block and SCSI layer.

There are two main parts to it:
- mapping the 'RESET WRITE POINTER' command to the 'discard' functionality.
   The 'RESET WRITE POINTER' operation is pretty close to the existing
   'discard' functionality with the 'discard_zeroes_blocks' bit set.
   So I've added a new 'reset_wp' provisioning mode for this.

Completely agree with the REQ_OP_DISCARD -> Reset WP translation
seems like a good idea. I have tried something similar and ended up
essentially adding a 'reset wp' flag instead.
Now I am optimistic to see if I can use you patch to get the
discard -> reset wp working in my device mapper.

It works quite well here with my setup, although I've tripped across two caveats:

- We currently don't handle conventional zones.
  It would make sense to fallback to normal block zeroing here.
- Issuing 'RESET WP' is dead slow (at least on the prototypes I've had)
  Short-circuiting it for empty zones is a _major_ performance win here;
  the time for issuing discards for an entire drive is reduced by
  several orders of magnitude. So you absolutely need an in-kernel
  zone tree for this.

- Adding a 'zone' pointer to the request queue. This pointer holds an
   RB-tree with the zone information, which can be used by other layers
   to access the write pointer.

Here is where I have some concerns. Having a common in-kernel
shadow of the drive's zone state seems problematic to me.

Well, this is the general SMR programming model, is it not?
And as already pointed out above you really want this tree to be present to avoid unnecessary RESET WP calls. You also need it to format READ calls correctly for host-managed drives; from my understanding of the programming model any READ call crossing the write pointer will be aborted. Which you could easily circumvent by splitting the READ call in two parts, one up to the read pointer and another beyond it. For which again you need the zone tree.

Also if I am understanding the direction here it is to hold the zone
information in an rbtree. Since that comes to just under 30,000
entries I think it would be better to shift to an array of
write pointer offsets.

The thing is that using an rbtree might actually be faster than an array; the rbtree entries easily fit into the processor cache, whereas the array doesn't. So you might end up having a slower access when using arrays despite being easier to code.

At the moment my translation layer keeps track of activity and state
of all the zones on the drive so that is how I have been handling
the zone data up to this point.

As outlined above: Any driver/filesystem need access to the zone states as it might need to align its internal structures to the zones. But you also need to keep track of the zones in the SCSI layer so as to format the RESET WP correctly. Which means you basically need a common tree.

As you might've seen I've also programmed my own zoned device-mapper device, caching individual zones. We should discuss if those two approached can't be merged, to end up with a common device-mapper target.

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux