Re: [RFC] inode table locking contention reduction experiment

Changwei Ge <chge@xxxxxxxxxxxxxxxxx> · Mon, 4 Nov 2019 11:51:34 +0800

Hi Amar,

On 2019/10/31 6:30 下午, Amar Tumballi wrote:

On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez <jahernan@xxxxxxxxxx 
<mailto:jahernan@xxxxxxxxxx>> wrote:

    Hi Changwei,

    On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <chge@xxxxxxxxxxxxxxxxx
    <mailto:chge@xxxxxxxxxxxxxxxxx>> wrote:

        Hi,

        I am recently working on reducing inode_[un]ref() locking
        contention by
        getting rid of inode table lock. Just use inode lock to protect
        inode
        REF. I have already discussed a couple rounds with several
        Glusterfs
        developers via emails and Gerrit and basically get understood on
        major
        logic around.

        Currently, inode REF can be ZERO and be reused by increasing it
        to ONE.
        This is IMO why we have to burden so much work for inode table when
        REF/UNREF. It makes inode [un]ref() and inode table and
        dentries(alias)
        searching hard to run concurrently.

        So my question is in what cases, how can we find a inode whose
        REF is ZERO?

        As Glusterfs store its inode memory address into kernel/fuse,
        can we
        conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?

Xavi's answer below provides some insights. and same time, assuming that 
only fuse_ino_to_inode() can bring back inode from ref=0 state (for 
now), is a good start.

    Yes, when an inode gets refs = 0, it means that gluster code is not
    using it anywhere, so it cannot be referenced again unless kernel
    sends new requests on the same inode. Once refs=0 and nlookup=0, the
    inode can be destroyed.

    Inode code is quite complex right now and I haven't had time to
    investigate this further, but I think we could simplify inode
    management significantly (specially unref) if we add a reference
    when nlookup becomes > 0, and remove a reference when
    nlookup becomes 0 again. Maybe with this approach we could avoid
    inode table lock in many cases. However we need to make sure we
    correctly handle invalidation logic to keep inode table size under
    control.

My suggestion is, don't wait for a complete solution for posting the 
patch. Let us get a chance to have a look at WorkInProgress patches, so 
we can have discussions on code itself. It would help to reach better 
solutions sooner.

Agree.

I have almost implemented my draft design for this experiment.
The immature code has been pushed to my personal Glusterfs repo[1].

Now it's a single large patch, I will split it to patches when I decide 
to push it to Gerrit for review convenience. If you prefer to push it to 
Gerrit for a early review and discussion, I can do that :-). But I am 
still doing some debug stuff.

My work includes:

1. Move inode refing and unrefing logic unrelated logic out from 
`__inode_[un]ref()` hence to reduce their arguments.
2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
3. As `inode_table::active_size` is only used for debug purpose, convert 
it to atomic variable.
4. Factor out pruning inode.
5. In order to run inode search and grep run concurrently, firstly  use 
RDLOCK  and then convert it WRLOCK if necessary.
6. Inode table lock is not necessary for inode ref/unref unless we have 
to move it between table lists.

etc...

Any comments, ideas, suggestions are kindly welcomed.

Thanks,
Changwei

[1]:
https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330

    Regards,

    Xavi

        Thanks,
        Changwei
        _______________________________________________

        Community Meeting Calendar:

        APAC Schedule -
        Every 2nd and 4th Tuesday at 11:30 AM IST
        Bridge: https://bluejeans.com/118564314

        NA/EMEA Schedule -
        Every 1st and 3rd Tuesday at 01:00 PM EDT
        Bridge: https://bluejeans.com/118564314

        Gluster-devel mailing list
        Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
        https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel