Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday May 12, akpm@xxxxxxxx wrote:
> NeilBrown <neilb@xxxxxxx> wrote:
> >
> >  ./drivers/md/bitmap.c         |  115 ++----------------------------------------
> 
> hmm.  I hope we're not doing any of that filesystem I/O within the context
> of submit_bio() or kblockd or anything like that.  Looks OK from a quick
> scan.

No.  We do all the I/O from the context of the per-array thread.
However some IO requests cannot complete until the filesystem I/O
completes, so we need to be sure that the filesystem I/O won't block
waiting for memory, or fail with -ENOMEM.

> 
> a_ops->commit_write() already ran set_page_dirty(), so you don't need that
> in there.

Is that documented somewhere..... but yes, that seems to be right. Thanks.

> 
> I assume this always works in units of a complete page?  It's strange to do
> prepare_write() followed immediately by commit_write().  Normally
> prepare_write() will do some prereading, but it's smart enough to not do
> that if the caller is preparing to write the whole page.
> 

Yes, it is strange.  That was one of the things that made me want to
review this code and figure out how to do it "properly".

As far as I can see, much of 'address_space' is really an internal
interface to support routines used by the filesystem.  A filesystem
may choose to use address spaces, and has a fair degree of freedom
when it comes to which bits to make use of and exactly what they
mean.
About the only thing that *has* to be supported is ->writepages --
which has a fair degree of latitude in exactly what it does -- and
->writepage -- which can only be called after locking a page and
rechecking the ->mapping.

bitmap.c is currently trying to do something every different.
It uses ->readpage to get pages in the page cache (even though some
address spaces don't define ->readpage) and then holds onto those
pages without holding the page lock, and then calls ->writepage to
flush them out from time to time.
Before calling writepage it gets the pagelock, but doesn't re-check
that ->mapping is correct (there is nothing much it can do if it isn't
correct..).

I noticed this is particularly a problem with tmpfs.  When you call
writepage on a tmpfs page, the page is swizzled into the swap cache,
and ->mapping becomes NULL - not the behaviour that bitmap is
expecting.

Now I agree that tmpfs is an unusual case, and that storing a bitmap
in tmpfs doesn't make a lot of sense (though it can make some...) but
the point is that if a filesystem is allowed to move pages around like
that, then bitmap cannot hold on to pages in the page cache like it
wants to.  It simply isn't a well defined thing to do.


> We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE.  Same diff.
> 

Yeah.... why is that?  Why have two names for exactly the same value?
How does a poor develop know when to use one and when the other?  More
particularly, how does one remember.
I would argue that the "same diff" should be no difference - not even
in spelling....

> If you have a page and you want to write the whole thing out then there's
> really no need to run prepare_write or commit_write at all.  Just
> initialise the whole page, run set_page_dirty() then write_one_page().
> 

I see that now.  But only after locking the page, and rechecking that
->mapping is correct, and if it isn't .... well, more work is involved
that bitmap is in a position to do.

> Perhaps it should check that the backing filesystem actually implements
> commit_write(), prepare_write(), readpage(), etc.  Some might not, and the
> user will get taught not to do that via an oops.
> 

Might help, but as I think you've gathered, I really want a whole
different approach to writing to the file.  One that I can justify as
being a "correct" use of interfaces, and also that I can be certain
will not block or fail on a kmalloc or similar.  Hence the bmap thing
later.


Thanks for your feedback.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux