> On 7 Mar 2022, at 14.55, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, 2022-03-05 at 08:33 +0100, Javier González wrote: > [...] >> However, there is no users of ZoneFS for ZNS devices that I am aware >> of (maybe for SMR this is a different story). The main open-source >> implementations out there for RocksDB that are being used in >> production (ZenFS and xZTL) rely on either raw zone block access or >> the generic char device in NVMe (/dev/ngXnY). This is because having >> the capability to do zone management from applications that already >> work with objects fits much better. >> >> My point is that there is space for both ZoneFS and raw zoned block >> device. And regarding !PO2 zone sizes, my point is that this can be >> leveraged both by btrfs and this raw zone block device. > > This is basically history repeating itself, though. It's precisely the > reason why Linux acquired the raw character device: Oracle decided they > didn't want the OS abstractions in the way of fast performing direct > database access and raw devices was the way it had been done on UNIX, > so they decided it should be done on Linux as well. There was some > legacy to this as well: because Oracle already had a raw handler they > figured it would be easy to port to Linux. > > The problem Oracle had with /dev/raw is that they then have to manage > device discovery and partitioning as well. It sort of worked on UNIX > when you didn't have too many disks and the discover order was > deterministic. It began to fail as disks became storage networks. In > the end, when O_DIRECT was proposed, Oracle eventually saw that using > it on files allowed for much better managed access and the raw driver > fell into disuse and was (finally) removed last year. > > What you're proposing above is to repeat the /dev/raw experiment for > equivalent input reasons but expecting different outcomes ... Einstein > has already ruled on that one. Thanks for the history on the raw device. It’s good to the perspective on history repeating itself. I believe that the raw block device is different than the raw character device and we see tons of applications that don’t want FS semantics relying on them. But I get your point. If we agree to get ZoneFS up to speed and use it as the general API for zone devices, then I think we can refocus there. As I mentioned in the last reply to to Dave, the main concern for me at the moment is supporting arbitrary zone sizes in the kernel. If we can agree on a path towards that, we can definitely commit to focus on ZoneFS and implement support for it on the different places we maintain in user-space.