Re: [LSF/MM/BPF BoF] BoF for Zoned Storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 7 Mar 2022, at 14.55, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Sat, 2022-03-05 at 08:33 +0100, Javier González wrote:
> [...]
>> However, there is no users of ZoneFS for ZNS devices that I am aware
>> of (maybe for SMR this is a different story).  The main open-source
>> implementations out there for RocksDB that are being used in
>> production (ZenFS and xZTL) rely on either raw zone block access or
>> the generic char device in NVMe (/dev/ngXnY). This is because having
>> the capability to do zone management from applications that already
>> work with objects fits much better.
>> 
>> My point is that there is space for both ZoneFS and raw zoned block
>> device. And regarding !PO2 zone sizes, my point is that this can be
>> leveraged both by btrfs and this raw zone block device.
> 
> This is basically history repeating itself, though.  It's precisely the
> reason why Linux acquired the raw character device: Oracle decided they
> didn't want the OS abstractions in the way of fast performing direct
> database access and raw devices was the way it had been done on UNIX,
> so they decided it should be done on Linux as well.  There was some
> legacy to this as well: because Oracle already had a raw handler they
> figured it would be easy to port to Linux.
> 
> The problem Oracle had with /dev/raw is that they then have to manage
> device discovery and partitioning as well.  It sort of worked on UNIX
> when you didn't have too many disks and the discover order was
> deterministic.  It began to fail as disks became storage networks.  In
> the end, when O_DIRECT was proposed, Oracle eventually saw that using
> it on files allowed for much better managed access and the raw driver
> fell into disuse and was (finally) removed last year.
> 
> What you're proposing above is to repeat the /dev/raw experiment for
> equivalent input reasons but expecting different outcomes ... Einstein
> has already ruled on that one.

Thanks for the history on the raw device. It’s good to the perspective on history repeating itself. 

I believe that the raw block device is different than the raw character device and we see tons of applications that don’t want FS semantics relying on them. But I get your point.

If we agree to get ZoneFS up to speed and use it as the general API for zone devices, then I think we can refocus there. 

As I mentioned in the last reply to to Dave, the main concern for me at the moment is supporting arbitrary zone sizes in the kernel. If we can agree on a path towards that, we can definitely commit to focus on ZoneFS and implement support for it on the different places we maintain in user-space. 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux