Re: GFID to Path Conversion

Aravinda <avishwan@xxxxxxxxxx> · Wed, 9 Dec 2015 11:17:58 +0530



    Hi,

    
    Sharing draft design for GFID to Path Conversion.(Directory
      GFID to Path is

      very easy in DHT v.1, this design may not work in case of DHT 2.0)
      

      Performance and Storage space impact yet to be analyzed.

      
    Storing the required informaton

    -------------------------------

    Metadata information related to Parent GFID and Basename
      will reside

    with the file. PGFID and hash of Basename will become part
      of Xattr

    Key name and Basename will be saved as Value.

    
        Xattr Key = meta.<PGFID>.<HASH(BASENAME)>

        Xattr Value = <BASENAME>

    
    Non-crypto hash is suitable for this purpose.

    Number of Xattrs on a file = Number of Links

    
    Converting GFID to Path

    -----------------------

    Example GFID: 78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038

    
    1. List all xattrs of GFID file in the brick backend.

      
      ($BRICK_ROOT/.glusterfs/78/e8/78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038)

    2. If Xattr Key starts with “meta”, Split to get parent
      GFID and collect xattr value

    3. Convert Parent GFID to path using recursive readlink
      till path.

    4. Join Converted parent dir path and xattr value(basename)

    
    Recording

    ---------

    MKNOD/CREATE/LINK/SYMLINK: Add new Xattr(PGFID, BN)

    RENAME: Remove old xattr(PGFID1, BN1), Add new
      xattr(PGFID2, BN2)

    UNLINK: If Link count > 1 then Remove xattr(PGFID, BN)

    
    Heal on Lookup

    --------------

    Healing on lookup can be enabled if required, by default we
      can

    disable this option since this may have performance
      implications

    during read.

    
    Enabling the logging

    ---------------------

    This can be enabled using Volume set option. Option name
      TBD.

    
    Rebuild Index

    -------------

    Offline activity, crawls the backend filesystem and builds
      all the required xattrs.

    
    Comments and Suggestions Welcome.

    regards
Aravinda
    On 11/25/2015 10:08 AM, Aravinda wrote:

    
      regards
      

      Aravinda
      

      On 11/24/2015 11:25 PM, Shyam wrote:
      

      There seem to be other interested
        consumers in gluster for the same information, and I guess we
        need a god base design to address this on disk change, so that
        it can be leveraged in the various use cases appropriately.
        

        Request a few folks to list out how they would use this feature
        and also what performance characteristics they expect around the
        same.
        

        - gluster find class of utilties
        

        - change log processors
        

        - swift on file
        

        - inotify support on gluster
        

        - Others?
        

      Debugging utilities for users/admins(Show path for GFIDs displayed
      in log files)
      

      Retrigger Sync in Geo-replication(Geo-rep reports failed GFIDs in
      logs, we can retrigger sync if path is known instead of GFID)
      

        [3] is an attempt in XFS to do the same, possibly there is a
        more later thread around the same that discusses later
        approaches.
        

        [4] slide 13 onwards talks about how cephfs does this. (see
        cephfs inode backtraces)
        

        Aravinda, could you put up a design for the same, and how and
        where this is information is added etc. Would help review it
        from other xlators perspective (like existing DHT).
        

        Shyam
        

        [3] http://oss.sgi.com/archives/xfs/2014-01/msg00224.html
        

        [4]
http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf

        
        On 10/27/2015 10:02 AM, Shyam wrote:
        

        Aravinda, List,
          

          The topic is interesting and also relevant in the case of DHT2
          where we
          

          lose the hierarchy on a single brick (unlike the older DHT)
          and so some
          

          of the thoughts here are along the same lines as what we are
          debating
          

          w.r.t DHT2 as well.
          

          Here is another option that extends the current thought, that
          I would
          

          like to put forward, that is pretty much inspired from the
          Linux kernel
          

          NFS implementation (based on my current understanding of the
          same) [1] [2].
          

          If gluster server/brick processes handed out handles, (which
          are
          

          currently just GFID (or inode #) of the file), that encode
          pGFID/GFID,
          

          then on any handle based operation, we get the pGFID/GFID for
          the object
          

          being operated on. This solves the first part of the problem
          where we
          

          are encoding the pGFID in the xattr, and here we not only do
          that but
          

          further hand out the handle with that relationship.
          

          It also helps when an object is renamed and we still allow the
          older
          

          handle to be used for operations. Not a bad thing in some
          cases, and
          

          possibly not the best thing to do in some other cases (say
          access).
          

          To further this knowledge back to a name, what you propose can
          be stored
          

          on the object itself. Thus giving us a short dentry tree
          creation
          

          ability of pGFID->name(GFID).
          

          This of course changes the gluster RPC wire protocol, as we
          need to
          

          encode/send pGFID as well in some cases (or could be done
          adding this to
          

          the xdata payload.
          

          Shyam
          

          [1] http://nfs.sourceforge.net/#faq_c7
          

          [2]
          https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting
          

          On 10/27/2015 03:07 AM, Aravinda wrote:
          

          Hi,
            

            We have a volume option called "build-pgfid:on" to enable
            recording
            

            parent gfid in file xattr. This simplifies the GFID to Path
            conversion.
            

            Is it possible to save base name also in xattr along with
            PGFID? It
            

            helps in converting GFID to Path easily without doing crawl.
            

            Example structure,
            

            dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
            

                 - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
            

                 - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
            

            dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
            

                 - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
            

            Where file f1 and h1 are hardlinks. Note the same GFID.
            

            Backend,
            

            .glusterfs
            

                  - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
            

                  - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
            

                  - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
            

                  - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
            

            Since f1 and h1 are hardlinks accross directories, file
            xattr will have
            

            two parent GFIDs. Xattr dump will be,
            

            trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
            

            trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
            

            Number shows number of hardlinks per parent GFID.
            

            If we know GFID of a file, to get path,
            

            1. Identify which brick has that file using pathinfo xattr.
            

            2. Get all parent GFIDs(using listxattr on backend gfid path
            

            .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
            

            3. Crawl those directories to find files with same inode as
            

            .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
            

            Updating PGFID to be done when,
            

            1. CREATE/MKNOD - Add xattr
            

            2. RENAME - If moved to different directory, Update PGFID
            

            3. UNLINK - If number of links is more than 1. Reduce number
            of link,
            

            Remove respective parent PGFID
            

            4. LINK - Add PGFID if link to different directory,
            Increment count
            

            Advantageous:
            

            1. Crawling is limited to a few directories instead of full
            file system
            

            crawl.
            

            2. Break early during crawl when search reaches the
            hardlinks number as
            

            of Xattr value.
            

            Disadvantageous:
            

            1. Crawling is expensive if a directory has lot of files.
            

            2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
            

            3. This method of conversion will not work if file is
            deleted.
            

            We can improve performance of GFID to Path conversion if we
            record
            

            Basename also in file xattr.
            

            trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
            

            trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
            

            Note: Multiple base names delimited by zerobyte.
            

            What additional overhead compare to storing only PGFID,
            

            1. Space
            

            2. Number of xattrs will grow as number of hardlinks
            

            3. Max size issue for xattr value?
            

            4. Even when renamed within the same directory.
            

            5. Updating value of xattr involves parsing in case of
            multiple
            

            hardlinks.
            

            Are there any performance issues except during initial
            indexing.(Assume
            

            pgfid and basenames are populated by a separate script)
            

            Comments and Suggestions Welcome.
            

          _______________________________________________
          

          Gluster-devel mailing list
          

          Gluster-devel@xxxxxxxxxxx
          

          http://www.gluster.org/mailman/listinfo/gluster-devel
          

      _______________________________________________
      

      Gluster-devel mailing list
      

      Gluster-devel@xxxxxxxxxxx
      

      http://www.gluster.org/mailman/listinfo/gluster-devel
      

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel