On Tue, Feb 22, 2011 at 11:00 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote: > On 2011-02-22, at 1:02 PM, Amir Goldstein wrote: >> On Tue, Feb 22, 2011 at 7:18 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>> On 2011-02-21, at 1:02 PM, Amir Goldstein wrote: >>>> After looking at the code a bit, I find that the only critical resource >>>> that several groups may share on a single page is the Uptodate flag, >>>> which is used to indicate that the buddy cache for *all* these groups >>>> is loaded and lock_page() and get_page() are used to protect it. >>>> >>>> There are 2 ways to eliminate this dependency: >>>> >>>> 1. (AKA easy lane) use a single page (or more) per block group. >>>> this will increase the memory usage for 1K blocks fs and for 2K block fs >>>> on 8K page system, but are these use cases really that common? >>> >>> I think some distros may use 1kB block filesystems for root, where there are lots of small files. I wonder if smolt would have this kind of info? >>> >>>> 2. (AKA hard lane) attach buffer heads to buddy page and use >>>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page() >>>> to initialize buddy cache of groups that share the same page. >>>> >>>> What do you say? >>>> Shall I take easy lane? >>> >>> For flex_bg filesystems, it would probably make even more sense to just load all of the bitmaps for that page, since it won't waste any more memory or cause extra disk seeks. I wonder what the memory vs. seek performance tradeoff is for 1k filesystems to load all the bitmaps even for the non-flex_bg case (i.e. would the second bitmap have been loaded anyway in most cases)? >> >> I'm sorry. I don't follow. I see how disk seeks can be avoided if we >> load all bitmaps of a flex_bg, >> but there can be no more than 2 groups on a page (4 on 8k system). >> So what do I gain? My goal is to remove the locking protection on >> allocations from different block groups. > > My point was that the locking is necessary because we may be loading multiple bitmaps into a single page at different times, making it difficult to set PageUptodate at one time. However, with flex_bg it is possible to read both bitmap blocks into the same page at one time without noticeably hurting performance, and possibly even improving performance due to reduced disk operations. Without flex_bg the benefit of reading all the bitmaps into a page at one time is less clear, because it would seek to read each bitmap. > I'm afraid the the details of buddy cache initialization are a bit more complex than the level of our discussion and you may be referring to other locks then the ones I am referring to (there are several levels of lockings). I am trying to ditch the holding of down_read(grp->alloc_sem) throughout the time that buddy is loaded, because I find it to be an overkill and it interferes with snapshots COW. In both options (i.e. easy lane or hard lane above), the page_lock() will be held during ext4_mb_init_cache() and down_write(grp->alloc_sem) of the group ITSELF will be held during ext4_mb_init_group(). In the buddy page buffers option, it is possible to read all blocks of the same page at the same time and mark all buddy page buffers uptodate. Other groups on the same page will be able to skip the read from disk, go straight to initializing phase and mark the buddy page buffers bitmap_uptodate on completion. ext4_mb_load_buddy() would then check for !buffer_bitmap_uptodate() instead of !PageUptodate(), before calling ext4_mb_init_cache(). It's a bit complicated to explain without a patch and I will get to it... I only wanted to know if I can take easy lane, since it is a lot less work and I was not sure the little memory saved for 1K block fs is worth that work. Easy lane will also be simplifying mballoc.c and reduce lines, whereas hard lane is likely to add more lines that it removes. Ted, can you please comment on the easy vs. hard choice? Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html