Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Matthew Wilcox writes:
On Fri, Dec 20, 2019 at 07:35:38PM +0200, Amir Goldstein wrote:
On Fri, Dec 20, 2019 at 6:46 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Fri, Dec 20, 2019 at 03:41:11PM +0200, Amir Goldstein wrote:
> > Suggestion:
> > 1. Extend the kmem_cache API to let the ctor() know if it is
> > initializing an object
> >     for the first time (new page) or recycling an object.
>
> Uh, what?  The ctor is _only_ called when new pages are allocated.
> Part of the contract with the slab user is that objects are returned to
> the slab in an initialised state.

Right. I mixed up the ctor() with alloc_inode().
So is there anything stopping us from reusing an existing non-zero
value of  i_ino in shmem_get_inode()? for recycling shmem ino
numbers?

I think that would be an excellent solution to the problem!  At least,
I can't think of any problems with it.

Thanks for the suggestions and feedback, Amir and Matthew :-)

The slab i_ino recycling approach works somewhat, but is unfortunately neutered quite a lot by the fact that slab recycling is per-memcg. That is, replacing with recycle_or_get_next_ino(old_ino)[0] for shmfs and a few other trivial callsites only leads to about 10% slab reuse, which doesn't really stem the bleeding of 32-bit inums on an affected workload:

    # tail -5000 /sys/kernel/debug/tracing/trace | grep -o 'recycle_or_get_next_ino:.*' | sort | uniq -c
        4454 recycle_or_get_next_ino: not recycled
         546 recycle_or_get_next_ino: recycled

Roman (who I've just added to cc) tells me that currently we only have per-memcg slab reuse instead of global when using CONFIG_MEMCG. This contributes fairly significantly here since there are multiple tasks across multiple cgroups which are contributing to the get_next_ino() thrash.

I think this is a good start, but we need something of a different magnitude in order to actually solve this problem with the current slab infrastructure. How about something like the following?

1. Add get_next_ino_full, which uses whatever the full width of ino_t is
2. Use get_next_ino_full in tmpfs (et al.)
3. Add a mount option to tmpfs (et al.), say `32bit-inums`, which people can pass if they want the 32-bit inode numbers back. This would still allow people who want to make this tradeoff to use xino.
4. (If you like) Also add a CONFIG option to disable this at compile time.

I'd appreciate your thoughts on that approach or others you have ideas about. Thanks! :-)

0:

unsigned int recycle_or_get_next_ino(ino_t old_ino)
{
	/*
	 * get_next_ino returns unsigned int. If this fires then i_ino must be
	 * >32 bits and have been changed later, so the caller shouldn't be
	 * recycling inode numbers
	 */
	WARN_ONCE(old_ino >> (sizeof(unsigned int) * 8),
		  "Recyclable i_ino uses more bits than unsigned int: %llu",
		  (u64)old_ino);

	if (old_ino) {
		if (prandom_u32() % 100 == 0)
			trace_printk("recycled\n");
		return old_ino;
	} else {
		if (prandom_u32() % 100 == 0)
			trace_printk("not recycled\n");
		return get_next_ino();
	}
}



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux