Re: [patch] fsblock preview

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday September 23, npiggin@xxxxxxx wrote:
> Thanks for the review so far, and sorry for not getting back to you earlier.
> Just starting to get back into the swing of things.

:-)  The conference season does that do you.

> 
> Definitely that is the plan, which Christoph also suggested a year
> ago. It will make all that much nicer. Coherency should not be a problem
> because for block devices we have the big-hammer coherency of flushing on
> close, and I suspect online fs tools would use ioctls or syscalls into
> the filesystem rather than raw bdev access.

You might suspect that... but you might be surprised.

  tune2fs -c 30 /dev/sda1

changes the max-mount-count (number of mounts before we force a fsck)
to 30, and does it simply by writing to the block device.  No ioctls.
This is expected to work when the filesystem is mounted.  I'm fairly
sure that works exactly because the ext3 holds on the to the
bufferhead for the superblock in the block device, so a write by
tune2fs updates the memory that ext3fs store the superblock in.  So
when ext3 journals and then write that superblock, it still has the
updates max-mount-count.

> >    As I think more on this, I'm wondering if I've missed something
> >    (entirely possible).  If each block does indeed hold a refcount on
> >    the page, then freeing idle pages which happen to have blocks
> >    attached is not ever going to happen.  This would be ok if you
> >    dropped the fsblocks as soon as they weren't in used, but your
> >    commentary seems to suggest that is an option, not a certainty. 
> >    Confused.
> 
> Yeah, it's just that we already expect PagePrivate to elevate the refcount
> everywhere... actually if it didn't, I think the VM might not reclaim those
> pages ;) Buffer heads do the same thing.
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I cannot see that.  I don't think that any code expects PagePrivate to
elevate the refcount.  I cannot see that buffer heads get a reference
on the page for each buffer_head?  There is only one get_page (a
find_get_page) in buffer.c, and it is followed shortly by a
page_cache_release.

????

>  
> > 7/ Those kmallocs in fsblock_init are a worry.  Surely is isn't hard
> >    to come up with a scheme for static allocation.
> >    And I think you should remove "if (i < 30)" and just leave it as
> >    a simple 'else'.
> 
> Oh? I don't see such a problem except I should panic if they fail I
> suppose (rather than oops strangely).

I've been programmed to raise a warning whenever I see a malloc where
the result isn't checked for NULL. :-(

> > 
> > 12/ In 'invalidate_block' you have commented out lock_block, and 
> >    put in a comment "why lock?".
> > 
> >    The reason to lock is to wait for pending writeout to complete.
> >    If the writeout was triggered by the VFS/VM, then the page will
> >    have PageWriteback set and we will already have waited for that to
> >    clear.  However if the writeout was triggered internally by the
> >    filesystem (such as for data ordering guarantees) then
> >    PageWriteback will not have been set, but the block (not the page)
> >    will be locked.
> > 
> >    At least that is how I understand buffer_heads to work.  However
> >    for fsblock you set PageWriteback whenever you write out any block
> >    on the page.  As noted in point 9 above,  I'm not convinced that is
> >    the correct thing to do.
> 
> Yeah, fsblock keeps dirty and writeout coherent, and uses only the page
> lock (rather than page and buffer lock that buffer heads use) for fsblock
> internal locking. I have not found any cases where the locking is needed,
> but I was doing it in case minix/xfs/ext2/etc required it for some reason.

I'm not sure that is a good design decision.
It means that the only way to concurrently write out two blocks that
are in the same page, is to write the whole page.

I notice that you don't even provide an interface for writing a block
asynchronously.  If the filesystem has a random collection of blocks
that it wants to write out it has to write each on synchronously, or
maybe write out the whole page for each block, which might not be what
is wanted.

> 
> But if I can have a clean break, I'll just require filesystems do their
> own locking rather than use the lock here ;) In fsblock I'm never going to
> add code with comments that look like /* I think reiserfs does this */,
> /* Does this still happen */ etc ;)

Yeah, those comments are pretty ugly.

What I think you want is a well defined public interface that provides
the sort of functionality that filesystems want, and which takes care
of all the tricky details.
Locking is certainly a tricky detail so I'm not sure that leaving it
entirely to the filesystem is a good idea.
And async block writes are something that I expect some filesystems
would want (I have one that does).


NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux