Re: a major regression in recent kernels? - was: Re: Null pointer OOPS in sync_inodes_sb+0xa9/0x104

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 02, 2011 at 10:31:15AM -0800, Linus Torvalds wrote:
> The whole "backing_dev_info" has been a total disaster. The thing is
> crap. It violates all the normal kernel memory management rules ("Thou
> shalt use reference counts and free only when it goes to zero") and
> the whole thing has been a constant source of "oh, that driver didn't
> set it, but we changed all the code to require it to be correct".
> 
> And the reason we set it to NULL when the device goes away is exactly
> that it's not ref-counted correctly, so we really _have_ to set it to
> NULL, because it's not going to be around.
> 
> (And the reverse of that is why all kernel data structures should use
> refcounts, and not some external lifetime notion)

Yes.  But the bdi is even worse than that, as it conflates things with
different lifetime into a single object.  We have the "old school" bdi
which mostly contained various bits of tuning for the VM and read-ahead
algorithms.  This one is required to stay around even with no fs mounted
on block devices because people expect it to stay around with no fs
mounted.  And then we have the writeback context entangled into it,
which only makes sense with an active filesystem (or block device node)
on it to make it special fun.  Even more fun is that we have a pointer
from the superblock, and one from the inode, and the latter might point
to lala land if this is say a /dev/mem node which has a different bdi
for the "old-school" MM usage.

I had various stages of prototypes for separating the two into:

 1) the old bdi.  Life time rules are: allocated and reference counted
    with the containing device.  That is gendisk for block devices,
    server context for remote devices, static at module init time for
    /dev/zero and similar.
 2) writeback context.  Only exists if a user is there, and thus
    refcounted by itself. For non-blockdevice filesystem instances it's
    trivially always allocated with the superblock, and goes away with it.
    For block-device instances we need to keep a pointer to it from
    struct block_device and properly look it up on mount, or opening of
    the block device nodes.

I guess I need to get back to it, but kept it off for now as the code
had reached relative stability and really fear touching it again.

It's for sure not .38 material, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux