On 12/6/19 8:42 AM, Damien Le Moal wrote: > On 2019/12/06 16:09, Hannes Reinecke wrote: [ .. ] >> So having bcache running on top of those will be a clear win. >> But in this scenario the cache device will be a normal device (typically >> an SSD), and we shouldn't need much modification here. > > I agree. That should work mostly as is since the user will be zone aware > and already be issuing sequential writes. bcache write-through only > needs to follow the same pattern, not reordering any write, and > write-back only has to replay the same. > Bcache really should just act as a block-based cache; the only trick here is to ensure to align the internal bcache buckets with zone sizes, so that in the optimal case only full zones will be written. >> In fact, a good testcase would be the btrfs patches which got posted >> earlier this week. With them you should be able to create a btrfs >> filesystem on the SMR drive, and use an SSD as a cache device. >> Getting this scenario to run would indeed be my primary goal, and I >> guess your patches should be more or less sufficient for that. > > + Will need the zone revalidation and zone type & write lock bitmaps to > prevent reordering from the block IO stack, unless bcache is a BIO > driver ? My knowledge of bcache is limited. Would need to look into the > details a little more to be able to comment. > bcache is a bio-based driver, so it won't do a request reordering itself. So from that perspective we should be fine. >> 2) Using a SMR drive as a _cache_ device. This seems to be contrary to >> the above statement of SMR drive not being fast, but then the NVMe WG is >> working on a similar mechanism for flash devices called 'ZNS' (zoned >> namespaces). And for those it really would make sense to have bcache >> being able to handle zoned devices as a cache device. >> But this is to my understanding really in the early stages, with no real >> hardware being available. Damien might disagree, though :-) > > Yes, that would be another potential use case and ZNS indeed could fit > this model, assuming that zone sizes align (multiples) between front and > back devices. > Indeed, but I would defer to my friendly drive manufacturer to figure that out :-) >> And the implementation is still on the works on the linux side, so it's >> more of a long-term goal.> >> But the first use-case is definitely something we should be looking at; >> SMR drives are available _and_ with large capacity, so any speedup there >> would be greatly appreciated. > > Yes. And what I was talking about in my earlier email is actually a > third use case: > 3) SMR drive as backend + regular SSD as frontend and the resulting > bcache device advertising itself as a regular disk, hiding all the zone > & sequential write constraint to the user. Since bcache already has some > form of indirection table for cached blocks, I thought we could hijack > this to implement a sort of FTL that would allow serializing random > writes to the backend with the help of the frontend as a write staging > buffer. Doing so, we get full random write capability with the benefit > of "hot" blocks staying in the cache. But again, not knowing enough > details about bcache, I may be talking too lightly here. Not sure if > that is reasonably easily feasible with the current bcache code. > That, however, will be tricky, as the underlying drive will _still_ have to contain a normal filesystem. While this mode of operation should be trivial for btrfs with the hmzoned patches, others like ext4 or xfs will be ... interesting. I wouldn't discount it out of hand, but there's a fair chance that it'll lead to intense cache-trashing as we'd need to cover up for random writes within zones, _and_ would have to read-in entire zones. But sure, worth a shot anyway. Once we get the btrfs case working, that it. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer