On Tue, 22 Sep 2020, Matthew Wilcox wrote: > > > The NVFS indirect block tree has a fan-out of 16, > > > > No. The top level in the inode contains 16 blocks (11 direct and 5 > > indirect). And each indirect block can have 512 pointers (4096/8). You can > > format the device with larger block size and this increases the fanout > > (the NVFS block size must be greater or equal than the system page size). > > > > 2 levels can map 1GiB (4096*512^2), 3 levels can map 512 GiB, 4 levels can > > map 256 TiB and 5 levels can map 128 PiB. > > But compare to an unfragmented file ... you can map the entire thing with > a single entry. Even if you have to use a leaf node, you can get four > extents in a single cacheline (and that's a fairly naive leaf node layout; > I don't know exactly what XFS uses) But the benchmarks show that it is comparable to extent-based filesystems. > > > Rename is another operation that has specific "operation has atomic > > > behaviour" expectations. I haven't looked at how you've > > > implementated that yet, but I suspect it also is extremely difficult > > > to implement in an atomic manner using direct pmem updates to the > > > directory structures. > > > > There is a small window when renamed inode is neither in source nor in > > target directory. Fsck will reclaim such inode and add it to lost+found - > > just like on EXT2. > > ... ouch. If you have to choose, it'd be better to link it to the second > directory then unlink it from the first one. Then your fsck can detect > it has the wrong count and fix up the count (ie link it into both > directories rather than neither). I admit that this is lame and I'll fix it. Rename is not so performance-critical, so I can add a small journal for this. > > If you think that the lack of journaling is show-stopper, I can implement > > it. But then, I'll have something that has complexity of EXT4 and > > performance of EXT4. So that there will no longer be any reason why to use > > NVFS over EXT4. Without journaling, it will be faster than EXT4 and it may > > attract some users who want good performance and who don't care about GID > > and UID being updated atomically, etc. > > Well, what's your intent with nvfs? Do you already have customers in mind > who want to use this in production, or is this somewhere to play with and > develop concepts that might make it into one of the longer-established > filesystems? I develop it just because I thought it may be interesting. So far, it doesn't have any serious users (the physical format is still changing). I hope that it could be useable as a general purpose root filesystem when Optane DIMMs become common. Mikulas