Re: [RFC PATCH] bcache: enable zoned device support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/6/19 5:37 AM, Coly Li wrote:
> On 2019/12/6 8:30 上午, Damien Le Moal wrote:
>> On 2019/12/06 9:22, Eric Wheeler wrote:
>>> On Thu, 5 Dec 2019, Coly Li wrote:
>>>> This is a very basic zoned device support. With this patch, bcache
>>>> device is able to,
>>>> - Export zoned device attribution via sysfs
>>>> - Response report zones request, e.g. by command 'blkzone report'
>>>> But the bcache device is still NOT able to,
>>>> - Response any zoned device management request or IOCTL command
>>>>
>>>> Here are the testings I have done,
>>>> - read /sys/block/bcache0/queue/zoned, content is 'host-managed'
>>>> - read /sys/block/bcache0/queue/nr_zones, content is number of zones
>>>>   including all zone types.
>>>> - read /sys/block/bcache0/queue/chunk_sectors, content is zone size
>>>>   in sectors.
>>>> - run 'blkzone report /dev/bcache0', all zones information displayed.
>>>> - run 'blkzone reset /dev/bcache0', operation is rejected with error
>>>>   information: "blkzone: /dev/bcache0: BLKRESETZONE ioctl failed:
>>>>   Operation not supported"
>>>> - Sequential writes by dd, I can see some zones' write pointer 'wptr'
>>>>   values updated.
>>>>
>>>> All of these are very basic testings, if you have better testing
>>>> tools or cases, please offer me hint.
>>>
>>> Interesting. 
>>>
>>> 1. should_writeback() could benefit by hinting true when an IO would fall 
>>>    in a zoned region.
>>>
>>> 2. The writeback thread could writeback such that they prefer 
>>>    fully(mostly)-populated zones when choosing what to write out.
>>
>> That definitely would be a good idea since that would certainly benefit
>> backend-GC (that will be needed).
>>
>> However, I do not see the point in exposing the /dev/bcacheX block
>> device itself as a zoned disk. In fact, I think we want exactly the
>> opposite: expose it as a regular disk so that any FS or application can
>> run. If the bcache backend disk is zoned, then the writeback handles
>> sequential writes. This would be in the end a solution similar to
>> dm-zoned, that is, a zoned disk becomes useable as a regular block
>> device (random writes anywhere are possible), but likely far more
>> efficient and faster. That may result in imposing some limitations on
>> bcache operations though, e.g. it can only be setup with writeback, no
>> writethrough allowed (not sure though...).
>> Thoughts ?
>>
> 
> I come to realize this is really an idea on the opposite. Let me try to
> explain what I understand, please correct me if I am wrong. The idea you
> proposed indeed is to make bcache act as something like FTL for the
> backend zoned SMR drive, that is, for all random writes, bcache may
> convert them into sequential write onto the backend zoned SMR drive. In
> the meantime, if there are hot data, bcache continues to act as a
> caching device to accelerate read request.
> 
> Yes, if I understand your proposal correctly, writeback mode might be
> mandatory and backend-GC will be needed. The idea is interesting, it
> looks like adding a log-structure storage layer between current bcache
> B+tree indexing and zoned SMR hard drive.
> 
Well, not sure if that's required.

Or, to be correct, we actually have _two_ use-cases:
1) Have a SMR drive as a backing device. This was my primary goal for
handling these devices, as SMR device are typically not _that_ fast.
(Damien once proudly reported getting the incredible speed of 1 IOPS :-)
So having bcache running on top of those will be a clear win.
But in this scenario the cache device will be a normal device (typically
an SSD), and we shouldn't need much modification here.
In fact, a good testcase would be the btrfs patches which got posted
earlier this week. With them you should be able to create a btrfs
filesystem on the SMR drive, and use an SSD as a cache device.
Getting this scenario to run would indeed be my primary goal, and I
guess your patches should be more or less sufficient for that.
2) Using a SMR drive as a _cache_ device. This seems to be contrary to
the above statement of SMR drive not being fast, but then the NVMe WG is
working on a similar mechanism for flash devices called 'ZNS' (zoned
namespaces). And for those it really would make sense to have bcache
being able to handle zoned devices as a cache device.
But this is to my understanding really in the early stages, with no real
hardware being available. Damien might disagree, though :-)
And the implementation is still on the works on the linux side, so it's
more of a long-term goal.

But the first use-case is definitely something we should be looking at;
SMR drives are available _and_ with large capacity, so any speedup there
would be greatly appreciated.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux