On 12/6/19 5:37 AM, Coly Li wrote: > On 2019/12/6 8:30 上午, Damien Le Moal wrote: >> On 2019/12/06 9:22, Eric Wheeler wrote: >>> On Thu, 5 Dec 2019, Coly Li wrote: >>>> This is a very basic zoned device support. With this patch, bcache >>>> device is able to, >>>> - Export zoned device attribution via sysfs >>>> - Response report zones request, e.g. by command 'blkzone report' >>>> But the bcache device is still NOT able to, >>>> - Response any zoned device management request or IOCTL command >>>> >>>> Here are the testings I have done, >>>> - read /sys/block/bcache0/queue/zoned, content is 'host-managed' >>>> - read /sys/block/bcache0/queue/nr_zones, content is number of zones >>>> including all zone types. >>>> - read /sys/block/bcache0/queue/chunk_sectors, content is zone size >>>> in sectors. >>>> - run 'blkzone report /dev/bcache0', all zones information displayed. >>>> - run 'blkzone reset /dev/bcache0', operation is rejected with error >>>> information: "blkzone: /dev/bcache0: BLKRESETZONE ioctl failed: >>>> Operation not supported" >>>> - Sequential writes by dd, I can see some zones' write pointer 'wptr' >>>> values updated. >>>> >>>> All of these are very basic testings, if you have better testing >>>> tools or cases, please offer me hint. >>> >>> Interesting. >>> >>> 1. should_writeback() could benefit by hinting true when an IO would fall >>> in a zoned region. >>> >>> 2. The writeback thread could writeback such that they prefer >>> fully(mostly)-populated zones when choosing what to write out. >> >> That definitely would be a good idea since that would certainly benefit >> backend-GC (that will be needed). >> >> However, I do not see the point in exposing the /dev/bcacheX block >> device itself as a zoned disk. In fact, I think we want exactly the >> opposite: expose it as a regular disk so that any FS or application can >> run. If the bcache backend disk is zoned, then the writeback handles >> sequential writes. This would be in the end a solution similar to >> dm-zoned, that is, a zoned disk becomes useable as a regular block >> device (random writes anywhere are possible), but likely far more >> efficient and faster. That may result in imposing some limitations on >> bcache operations though, e.g. it can only be setup with writeback, no >> writethrough allowed (not sure though...). >> Thoughts ? >> > > I come to realize this is really an idea on the opposite. Let me try to > explain what I understand, please correct me if I am wrong. The idea you > proposed indeed is to make bcache act as something like FTL for the > backend zoned SMR drive, that is, for all random writes, bcache may > convert them into sequential write onto the backend zoned SMR drive. In > the meantime, if there are hot data, bcache continues to act as a > caching device to accelerate read request. > > Yes, if I understand your proposal correctly, writeback mode might be > mandatory and backend-GC will be needed. The idea is interesting, it > looks like adding a log-structure storage layer between current bcache > B+tree indexing and zoned SMR hard drive. > Well, not sure if that's required. Or, to be correct, we actually have _two_ use-cases: 1) Have a SMR drive as a backing device. This was my primary goal for handling these devices, as SMR device are typically not _that_ fast. (Damien once proudly reported getting the incredible speed of 1 IOPS :-) So having bcache running on top of those will be a clear win. But in this scenario the cache device will be a normal device (typically an SSD), and we shouldn't need much modification here. In fact, a good testcase would be the btrfs patches which got posted earlier this week. With them you should be able to create a btrfs filesystem on the SMR drive, and use an SSD as a cache device. Getting this scenario to run would indeed be my primary goal, and I guess your patches should be more or less sufficient for that. 2) Using a SMR drive as a _cache_ device. This seems to be contrary to the above statement of SMR drive not being fast, but then the NVMe WG is working on a similar mechanism for flash devices called 'ZNS' (zoned namespaces). And for those it really would make sense to have bcache being able to handle zoned devices as a cache device. But this is to my understanding really in the early stages, with no real hardware being available. Damien might disagree, though :-) And the implementation is still on the works on the linux side, so it's more of a long-term goal. But the first use-case is definitely something we should be looking at; SMR drives are available _and_ with large capacity, so any speedup there would be greatly appreciated. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer