On Wed 06-08-08 09:25:13, Chris Mason wrote: > On Tue, 2008-08-05 at 14:17 -0700, Mingming Cao wrote: > > 在 2008-08-05二的 12:17 -0400,Chris Mason写道: > > > On Tue, 2008-08-05 at 13:51 +0900, Hisashi Hifumi wrote: > > > > >> > > > > > >> > diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.c > > > > >linux-2.6.27-rc1/fs/jbd/transaction.c > > > > >> > --- linux-2.6.27-rc1.org/fs/jbd/transaction.c 2008-07-29 > > > > >19:28:47.000000000 +0900 > > > > >> > +++ linux-2.6.27-rc1/fs/jbd/transaction.c 2008-07-29 20:40:12.000000000 +0900 > > > > >> > @@ -1764,6 +1764,12 @@ int journal_try_to_free_buffers(journal_ > > > > >> > */ > > > > >> > if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) { > > > > >> > journal_wait_for_transaction_sync_data(journal); > > > > >> > + > > > > >> > + bh = head; > > > > >> > + do { > > > > >> > + while (atomic_read(&bh->b_count)) > > > > >> > + schedule(); > > > > >> > + } while ((bh = bh->b_this_page) != head); > > > > >> > ret = try_to_free_buffers(page); > > > > >> > } > > > > >> > > > > >> The loop is problematic. If the scheduler decides to keep running this > > > > >> task then we have a busy loop. If this task has realtime policy then > > > > >> it might even lock up the kernel. > > > > >> > > > > > > > > > >ocfs2 calls journal_try_to_free_buffers too, looping on b_count might > > > > >not be the best idea there either. > > > > > > > > > >This code gets called from releasepage, which is used other places than > > > > >the O_DIRECT invalidation paths, I'd be worried about performance > > > > >problems here. > > > > > > > > > > > > > try_to_release_page has gfp_mask parameter. So when try_to_releasepage > > > > is called from performance sensitive part, gfp_mask should not be set. > > > > b_count check loop is inside of (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS) check. > > > > > > Looks like try_to_free_pages will go into releasepage with wait & fs > > > both set. This kind of change would make me very nervous. > > > > > > > Hi Chris, > > > > The gfp_mask try_to_free_pages() takes from it's caller will past it > > down to try_to_release_page(). Based on the meaning of __GFP_WAIT and > > GFP_FS, if the upper level caller set these two flags, I assume the > > upper level caller expect delay and wait for fs to finish? > > > > > > But I agree that using a loop in journal_try_to_free_buffers() to wait > > for the busy bh release the counter is expensive... > > I rediscovered your old thread about trying to do this in a launder_page > call ;) Yes, we thought about using launder_page() before :). > Does it make more sense to fix do_launder_page to call into the FS on > every page, and let the FS check for PageDirty on its own? That way > invalidate_inode_pages2_range basically gets its own private call into > the FS that says wait around until this page is really free. That would certainly work as well. But IMHO waiting for ->writepage() call to finish isn't really a big deal even in try_to_release_page() if __GFP_FS (and __GFP_WAIT) is set. The only problem is that there is no effective way to do so and so Hisashi used that "wait for b_count to drop" which looks really scary and I don't like it as well. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html