[patch] fsblock preview

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK, vger doesn't seem to like my patch, so I'll have to give a url to it,
sorry.

http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/fsblock/2.6.27-rc5/fsb-preview.patch

I've been doing some work on fsblock again lately, so in case anybody might
find it interesting, here is a "preview" patch. Basically it compiles and
runs OK for me here, under a few stress tests. I wouldn't say it is close to
bug free, and it needs a lot of bits and pieces to polish up like error
handling.

I've also just stripped out the large block size support in the patch I'm
mailing out... I have been developing with ext2 without large lock support
sizes so those paths have rotted a bit and besides they still really need
a bit more changes to some VM paths.

Since I last posted fsblock, there have been some big changes:

- Using a per block spinlock to protect most access now. This eliminates
  some races I had against dirtying vs cleaning, and with fsblock
  refcounting and reclaim.

- fsblock_no_cache aka "nobh" mode now works well due to the above. When
  /proc/sys/vm/fsblock_no_cache is 1, you never get fsblocks hanging around
  longer than they have to. You also would never be subject to the circular
  referencing "orphan" pages that buffer heads are subject to.

- RCU is gone. This is actually a good thing because in "nobh" mode, some
  workloads will rapidly allocate and free the structures, and that can
  be costly with RCU.

- struct fsblock has shrunk to 32 bytes on 64-bit. Less than 1/3 the size
  of struct buffer_head. Although absolute size doesn't matter so much now
  (because of no_cache mode). I even have an optional feature "bdflush"
  that increases the size... although I do want to keep it within 64 bytes
  (one cacheline).

- added an "intermediate" mode which provides a ->data pointer in struct
  fsblock_meta, and means it is trivial to transition filesystems to
  fsblock (although they would not be able to support superpage blocks).

- Added ext2 intermediate support.

- Had to modify the VM a little bit in order to close races with freeing a
  page's fsblock before it can be cleaned (or still has a chance to be
  dirtied via mmap). fsblock of course ensures that zero memory allocations
  are required in the writeout path.

- Lockless pagecache has been merged in mainline, which means the largest
  granularity of synchronisation anywhere in the fsblock core code is on a
  per-page basis (buffer uses per-inode private_lock). This is one of the
  reasons I am skeptical that keeping pagecache state in extents is better: it
  would be rather impressive if it could match the straight line speed or
  scalability of fsblock.

- However, I *have* always agreed that it makes sense to keep (some) block
  state in extents, because that is going to change much less frequently, and
  should be represented with fewer extents provided the filesystem layout is
  reasonable. So I've written a (very) basic extent cache for block mappings,
  which can be used by filesystems that don't have good in-memory block
  mapping structures themselves (like ext2, for example). No reclaim for this
  at present, I should just add a simple shrinker.

- bdflush... it's commented out so it won't build by default, but basically
  because fslbock properly keeps block dirty state in synch with page dirty
  state, I can keep sorted structure of dirty fsblocks per device, and do
  writeout based on that rather than this fragile walking over inodes that
  pdflush does. Of course it won't work with delayed allocation, so something
  would have to be figured out with that (perhaps allocate all outstanding
  blocks before each writeout pass).

  The thing I like about bdflush is that it can easily do nice submit
  ordering of inter-file as well as file/metadata blocks for writeout. I
  don't know if it will come to anything, but at least it is not tightly
  coupled with the core fsblock stuff. It's a bit hacky at the moment ;)

- Still not using a private bdev for fsblock filesystems... I never got around
  to figuring out how to do this. This means that sometimes funny things will
  happen with block_dev device if pages and buffers try to use it. It mostly
  works OK but is a hack that I need to fix.

- Finally, for those not listening last time. I'm doing block sizes larger
  than page size (up to 16MB IIRC, but easily expandable to much higher) with
  fsblock using exactly the same data structures. Although I haven't included
  that in the patch here.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux