Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil Brown <neilb@xxxxxxx> wrote:
>
> On Friday May 12, akpm@xxxxxxxx wrote:
> > NeilBrown <neilb@xxxxxxx> wrote:
> > >
> > >  ./drivers/md/bitmap.c         |  115 ++----------------------------------------
> > 
> > hmm.  I hope we're not doing any of that filesystem I/O within the context
> > of submit_bio() or kblockd or anything like that.  Looks OK from a quick
> > scan.
> 
> No.  We do all the I/O from the context of the per-array thread.

OK.

> However some IO requests cannot complete until the filesystem I/O
> completes, so we need to be sure that the filesystem I/O won't block
> waiting for memory, or fail with -ENOMEM.

That sounds like a complex deadlock.  Suppose the bitmap writeout requres
some writeback to happen before it can get enough memory to proceed.

> bitmap.c is currently trying to do something every different.
> It uses ->readpage to get pages in the page cache (even though some
> address spaces don't define ->readpage) and then holds onto those
> pages without holding the page lock, and then calls ->writepage to
> flush them out from time to time.

That's OK.

> Before calling writepage it gets the pagelock, but doesn't re-check
> that ->mapping is correct (there is nothing much it can do if it isn't
> correct..).

We do that to find out if the page was truncated while we ewre waiting for
the page lock.

> I noticed this is particularly a problem with tmpfs.  When you call
> writepage on a tmpfs page, the page is swizzled into the swap cache,
> and ->mapping becomes NULL - not the behaviour that bitmap is
> expecting.

You shouldn't call writepage on tmpfs.  If !bdi_cap_writeback_dirty(), your
write_page() should immediately give up and "succeed".  Because this
filesystem has no permanent backing store anyway.

tmpfs doesn't implement bmap(), so your new patch fixes that problem ;)

> > We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE.  Same diff.
> > 
> 
> Yeah.... why is that?  Why have two names for exactly the same value?

I guess at one time it was thought that we could use a different unit size
for pagecache.  Nowadays we use it by convention - it has little more than
documentary value.

> How does a poor develop know when to use one and when the other?  More
> particularly, how does one remember.

If the page is a pagecache page, use PAGE_CACHE_SIZE.  There are a few
spots where it blurs, but not many.

> > If you have a page and you want to write the whole thing out then there's
> > really no need to run prepare_write or commit_write at all.  Just
> > initialise the whole page, run set_page_dirty() then write_one_page().
> > 
> 
> I see that now.  But only after locking the page, and rechecking that
> ->mapping is correct, and if it isn't .... well, more work is involved
> that bitmap is in a position to do.

If you've bumped i_writecount to block truncate then there's no need to
check for truncation after locking the page.

i_mutex could be used similarly.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux