On 02/15/2014 10:04 AM, Dan Williams wrote:
In response to Dave's call [1] and highlighting Jeff's attend request
[2] I'd like to stoke a discussion on an emulation layer for atomic
block commands. Specifically, SNIA has laid out their position on the
command set an atomic block device may support (NVM Programming Model
[3]) and it is a good conversation piece for this effort. The goal
would be to review the proposed operations, identify the capabilities
that would be readily useful to filesystems / existing use cases, and
tear down a straw man implementation proposal.
The SNIA defined capabilities that seem the highest priority to implement are:
* ATOMIC_MULTIWRITE - dis-contiguous LBA ranges, power fail atomic, no
ordering constraint relative to other i/o
* ATOMIC_WRITE - contiguous LBA range, power fail atomic, no ordering
constraint relative to other i/o
* EXISTS - not an atomic command, but defined in the NPM. It is akin
to SEEK_{DATA|HOLE} to test whether an LBA is mapped or unmapped. If
the LBA is mapped additionally specifies whether data is present or
the LBA is only allocated.
* SCAR - again not an atomic command, but once we have metadata can
implement a bad block list, analogous to the bad-block-list support in
md.
Initial thought is that this functionality is better implemented as a
library a block device driver (bio-based or request-based) can call to
emulate these features. In the case where the feature is directly
supported by the underlying hardware device the emulation layer will
stub out and pass it through. The argument for not doing this as a
device-mapper target or stacked block device driver is to ease
provisioning and make the emulation transparent. On the other hand,
the argument for doing this as a virtual block device is that the
"failed to parse device metadata" is a known failure scenario for
dm/md, but not sd for example.
Hi Dan,
I'd suggest a dm device instead of a special library, mostly because the
emulated device is likely to need some kind of cleanup action after a
crash, and the dm model is best suited to cleanly provide that. It's
also a good fit for people that want to duct tape a small amount of very
fast nvm onto relatively slower devices.
The absolute minimum to provide something useful is a 16K discontig
atomic. That won't help the filesystems much, but it will allow mysql
to turn off double buffering. Oracle would benefit from ~64K, mostly
from a safety point of view since they don't double buffer.
Helping the filesystems is harder, we need atomics bigger than any
individual device is likely to provide. But as Dave says elsewhere in
the thread, we can limit that for specific workloads.
I'm not sold on SCAR, since I'd expect the FTL or drive firmware provide
that for us, what use case do you have in mind there?
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html