On 01/04/2017 08:24 AM, Damien Le Moal wrote: > > Slava, > > On 1/4/17 11:59, Slava Dubeyko wrote: >> What's the goal of SMR compatibility? Any unification or interface >> abstraction has the goal to hide the peculiarities of underlying >> hardware. But we have block device abstraction that hides all >> hardware's peculiarities perfectly. Also FTL (or any other >> Translation Layer) is able to represent the device as sequence of >> physical sectors without real knowledge on software side about >> sophisticated management activity on the device side. And, finally, >> guys will be completely happy to use the regular file systems (ext4, >> xfs) without necessity to modify software stack. But I believe that >> the goal of Open-channel SSD approach is completely opposite. Namely, >> provide the opportunity for software side (file system, for example) >> to manage the Open-channel SSD device with smarter policy. > > The Zoned Block Device API is part of the block layer. So as such, it > does abstract many aspects of the device characteristics, as so many > other API of the block layer do (look at blkdev_issue_discard or zeroout > implementations to see how far this can be pushed). > > Regarding the use of open channel SSDs, I think you are absolutely > correct: (1) some users may be very happy to use a regular, unmodified > ext4 or xfs on top of an open channel SSD, as long as the FTL > implementation does a complete abstraction of the device special > features and presents a regular block device to upper layers. And > conversely, (2) some file system implementations may prefer to directly > use those special features and characteristics of open channel SSDs. No > arguing with this. > > But you are missing the parallel with SMR. For SMR, or more correctly > zoned block devices since the ZBC or ZAC standards can equally apply to > HDDs and SSDs, 3 models exists: drive-managed, host-aware and host-managed. > > Case (1) above corresponds *exactly* to the drive managed model, with > the difference that the abstraction of the device characteristics (SMR > here) is in the drive FW and not in a host-level FTL implementation as > it would be for open channel SSDs. Case (2) above corresponds to the > host-managed model, that is, the device user has to deal with the device > characteristics itself and use it correctly. The host-aware model lies > in between these 2 extremes: it offers the possibility of complete > abstraction by default, but also allows a user to optimize its operation > for the device by allowing access to the device characteristics. So this > would correspond to a possible third way of implementing an FTL for open > channel SSDs. > >> So, my key worry that the trying to hide under the same interface the >> two different technologies (SMR and NAND flash) will be resulted in >> the loss of opportunity to manage the device in more smarter way. >> Because any unification has the goal to create a simple interface. >> But SMR and NAND flash are significantly different technologies. And >> if somebody creates technology-oriented file system, for example, >> then it needs to have access to really special features of the >> technology. Otherwise, interface will be overloaded by features of >> both technologies and it will looks like as a mess. > > I do not think so, as long as the device "model" is exposed to the user > as the zoned block device interface does. This allows a user to adjust > its operation depending on the device. This is true of course as long as > each "model" has a clearly defined set of features associated. Again, > that is the case for zoned block devices and an example of how this can > be used is now in f2fs (which allows different operation modes for > host-aware devices, but only one for host-managed devices). Again, I can > see a clear parallel with open channel SSDs here. > >> SMR zone and NAND flash erase block look comparable but, finally, it >> significantly different stuff. Usually, SMR zone has 265 MB in size >> but NAND flash erase block can vary from 512 KB to 8 MB (it will be >> slightly larger in the future but not more than 32 MB, I suppose). It >> is possible to group several erase blocks into aggregated entity but >> it could be not very good policy from file system point of view. > > Why not? For f2fs, the 2MB segments are grouped together into sections > with a size matching the device zone size. That works well and can > actually even reduce the garbage collection overhead in some cases. > Nothing in the kernel zoned block device support limits the zone size to > a particular minimum or maximum. The only direct implication of the zone > size on the block I/O stack is that BIOs and requests cannot cross zone > boundaries. In an extreme setup, a zone size of 4KB would work too and > result in read/write commands of 4KB at most to the device. > >> Another point that QLC device could have more tricky features of >> erase blocks management. Also we should apply erase operation on NAND >> flash erase block but it is not mandatory for the case of SMR zone. > > Incorrect: host-managed devices require a zone "reset" (equivalent to > discard/trim) to be reused after being written once. So again, the > "tricky features" you mention will depend on the device "model", > whatever this ends up to be for an open channel SSD. > >> Because SMR zone could be simply re-written in sequential order if >> all zone's data is invalid, for example. Also conventional zone could >> be really tricky point. Because it is one zone only for the whole >> device that could be updated in-place. Raw NAND flash, usually, >> hasn't likewise conventional zone. > > Conventional zones are optional in zoned block devices. There may be > none at all and an implementation may well decide to not support a > device without any conventional zones if some are required. > In the case of open channel SSDs, the FTL implementation may well decide > to expose a particular range of LBAs as "conventional zones" and have a > lower level exposure for the remaining capacity whcih can then be > optimally used by the file system based on the features available for > that remaining LBA range. Again, a parallel is possible with SMR. > >> Finally, if I really like to develop SMR- or NAND flash oriented file >> system then I would like to play with peculiarities of concrete >> technologies. And any unified interface will destroy the opportunity >> to create the really efficient solution. Finally, if my software >> solution is unable to provide some fancy and efficient features then >> guys will prefer to use the regular stack (ext4, xfs + block layer). > > Not necessarily. Again think in terms of device "model" and associated > feature set. An FS implementation may decide to support all possible > models, with likely a resulting incredible complexity. More likely, > similarly with what is happening with SMR, only models that make sense > will be supported by FS implementation that can be easily modified. > Example again here of f2fs: changes to support SMR were rather simple, > whereas the initial effort to support SMR with ext4 was pretty much > abandoned as it was too complex to integrate in the existing code while > keeping the existing on-disk format. > > Your argument above is actually making the same point: you want your > implementation to use the device features directly. That is, your > implementation wants a "host-managed" like device model. Using ext4 will > require a "host-aware" or "drive-managed" model, which could be provided > through a different FTL or device-mapper implementation in the case of > open channel SSDs. > > I am not trying to argue that open channel SSDs and zoned block devices > should be supported under the exact same API. But I can definitely see > clear parallels worth a discussion. As a first step, I would suggest > trying to try defining open channel SSDs "models" and their feature set > and see how these fit with the existing ZBC/ZAC defined models and at > least estimate the implications on the block I/O stack. If adding the > new models only results in the addition of a few top level functions or > ioctls, it may be entirely feasible to integrate the two together. > Thanks Damien. I couldn't have said it better my self. The OCSSD 1.3 specification has been made with an eye towards the SMR interface: - "Identification" - Follows the same "global" size definitions, and also supports that each zone has its own local size. - "Get Report" command follows a very similar structure as SMR, such that it can sit behind the "Report Zones" interface. - "Erase/Prepare Block" command follows the Reset block interface. Those should fit right in. If the layout is planar, such that the OCSSD only exposes a set of zones, it should be able to fit right into the framework with minor modifications. A couple of details are added when going towards managing multiple parallel units, which is some of the things that require a bit of discussion. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html