Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 28, 2012 at 6:04 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Interesting. The code *has* the block size (it's in "bh->b_size"), but
> it actually then uses the inode blocksize instead, and verifies the
> two against each other. It could just have used the block size
> directly (and then used the inode i_blkbits only when no buffers
> existed), avoiding that dependency entirely..

Looking more at this code, that really would be the nicest solution.

There's two cases for the whole get_block() thing:

 - filesystems. The block size will not change randomly, and
"get_block()" seriously depends on the block size.

 - the raw device. The block size *will* change, but to simplify the
problem, "get_block()" is a 1:1 mapping, so it doesn't even care about
the block size because it will always return "bh->b_blocknr = nr".

So we *could* just say that all the fs/buffer.c code should use
"inode->i_blkbits" for creating buffers (because that's the size new
buffers should always use), but use "bh->b_size" for any *existing*
buffer use.

And looking at it, it's even simple. Except for one *very* annoying
thing: several users really don't want the size of the buffer, they
really do want the *shift* of the buffer size.

In fact, that single issue seems to be the reason why
"inode->i_blkbits" is really used in fs/buffer.c.

Otherwise it would be fairly trivial to just make the pattern be just a simple

        if (!page_has_buffers(page))
                create_empty_buffers(page, 1 << inode->i_blkbits, 0);
        head = page_buffers(page);
        blocksize = head->b_size;

and just use the blocksize that way, without any other games. All
done, no silly WARN_ON() to verify against some global block-size, and
the fs/buffer.c code would be perfectly simple, and would have no
problem at all with multiple different blocksizes in different pages
(the page lock serializes the buffers and thus the blocksize at the
per-page level).

But the fact that the code wants to do things like

        block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits);

seriously seems to be the main thing that keeps us using
'inode->i_blkbits'. Calculating bbits from bh->b_size is just costly
enough to hurt (not everywhere, but on some machines).

Very annoying.

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux