Re: GFID to Path Conversion

Aravinda <avishwan@xxxxxxxxxx> · Thu, 10 Dec 2015 17:28:32 +0530



    Some more analysis wrt storage space,

    
    "Since support was added to the Linux kernel, there is a
      hard limit of

    64KiB for the size of each extended attribute value,
      however different

    file systems impose additional constraints. For ext2/3/4
      and btrfs,

    each extended attribute is limited to a file system block
      (e.g. 4 KiB),

    and all (including names and values) must fit together in a
      single

    block. In XFS the names can be up to 256 bytes in length,
      terminated

    by the first 0-byte, and the values can be up to 64KB of
      arbitrary

    binary data. ReiserFS allows attributes of arbitrary size."

    https://en.wikipedia.org/wiki/Extended_file_attributes

    
    Created a shell script to set 100 xattrs for a file with
      basename

    value as long as ~255.

    
      # -------------------

    file=$1

    for i in {1..100}

    do

        f="very very very very loooooooooooooooooong file
nameeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee$i";

        h=$(echo $f | md5sum | awk '{print $1}');

        setfattr -n
      trusted.pgfid.3c3b44ab-f21f-4801-a0bc-5a337bd5047c.$h -v "$f"
      $file;

    done

    # -------------------

    
      Let me know if anybody thinks space could be issue for storing
      these

    information in xattrs.

    
    Other experiments:

      ------------------

    For POC, I created two python scripts one to create index
      and other

    one to retrive value(gfid to path). I used MD5 for POC
      purpose.

    
    https://gist.github.com/aravindavk/5307489f68cbcfb37d3d

    https://gist.github.com/aravindavk/d1d0ca9c874b7d3d8d86

    
    python pgfid_index.py <BRICK_PATH> # Updates required
      xattrs for all files

    
    and

    
    python gfid_to_path.py <BRICK_PATH> <GFID>
      # Returns Path for given GFID

    
    Note: This script uses `user.pgfid` prefix for xattr
      instead of

    `trusted.pgfid` for POC.

    
    Once the design is finalized, I will update storage/posix
      code.

    
    Backward compatibility:

    -----------------------

    Same interface will be used to retrive information. That is

    
    gluster volume set test build-pgfid on

    getfattr -n glusterfs.ancestry.path -e text
      /mnt/testvol/.gfid/<GFID>

    
    Ref:

    https://gluster.readthedocs.org/en/latest/Troubleshooting/gfid-to-path/

    
    If any other component directly accessing xattrs instead of
      using

    getfattr interface, then that component need to be
      changed.(For

    example, glusterfind)

    
    One more step will be introduced after `volume set` to
      build the

    index. Current implementation is healing pgfid xattrs on
      named lookup,

    if we disable this feature then we have to provide seperate
      interface

    to heal(For example, getfattr -n pgfid.heal <PATH>)

    regards
Aravinda
    On 12/09/2015 11:17 AM, Aravinda wrote:

    
    Hi,
      

      Sharing draft design for GFID to Path Conversion.(Directory GFID
      to Path is
      

      very easy in DHT v.1, this design may not work in case of DHT 2.0)
      

      Performance and Storage space impact yet to be analyzed.
      

      Storing the required informaton
      

      -------------------------------
      

      Metadata information related to Parent GFID and Basename will
      reside
      

      with the file. PGFID and hash of Basename will become part of
      Xattr
      

      Key name and Basename will be saved as Value.
      

          Xattr Key = meta.<PGFID>.<HASH(BASENAME)>
      

          Xattr Value = <BASENAME>
      

      Non-crypto hash is suitable for this purpose.
      

      Number of Xattrs on a file = Number of Links
      

      Converting GFID to Path
      

      -----------------------
      

      Example GFID: 78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038
      

      1. List all xattrs of GFID file in the brick backend.
      

($BRICK_ROOT/.glusterfs/78/e8/78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038)
      

      2. If Xattr Key starts with “meta”, Split to get parent GFID and
      collect xattr value
      

      3. Convert Parent GFID to path using recursive readlink till path.
      

      4. Join Converted parent dir path and xattr value(basename)
      

      Recording
      

      ---------
      

      MKNOD/CREATE/LINK/SYMLINK: Add new Xattr(PGFID, BN)
      

      RENAME: Remove old xattr(PGFID1, BN1), Add new xattr(PGFID2, BN2)
      

      UNLINK: If Link count > 1 then Remove xattr(PGFID, BN)
      

      Heal on Lookup
      

      --------------
      

      Healing on lookup can be enabled if required, by default we can
      

      disable this option since this may have performance implications
      

      during read.
      

      Enabling the logging
      

      ---------------------
      

      This can be enabled using Volume set option. Option name TBD.
      

      Rebuild Index
      

      -------------
      

      Offline activity, crawls the backend filesystem and builds all the
      required xattrs.
      

      Comments and Suggestions Welcome.
      

      regards
      

      Aravinda
      

      On 11/25/2015 10:08 AM, Aravinda wrote:
      

        regards
        

        Aravinda
        

        On 11/24/2015 11:25 PM, Shyam wrote:
        

        There seem to be other interested
          consumers in gluster for the same information, and I guess we
          need a god base design to address this on disk change, so that
          it can be leveraged in the various use cases appropriately.
          

          Request a few folks to list out how they would use this
          feature and also what performance characteristics they expect
          around the same.
          

          - gluster find class of utilties
          

          - change log processors
          

          - swift on file
          

          - inotify support on gluster
          

          - Others?
          

        Debugging utilities for users/admins(Show path for GFIDs
        displayed in log files)
        

        Retrigger Sync in Geo-replication(Geo-rep reports failed GFIDs
        in logs, we can retrigger sync if path is known instead of GFID)
        

          [3] is an attempt in XFS to do the same, possibly there is a
          more later thread around the same that discusses later
          approaches.
          

          [4] slide 13 onwards talks about how cephfs does this. (see
          cephfs inode backtraces)
          

          Aravinda, could you put up a design for the same, and how and
          where this is information is added etc. Would help review it
          from other xlators perspective (like existing DHT).
          

          Shyam
          

          [3] http://oss.sgi.com/archives/xfs/2014-01/msg00224.html
          

          [4]
http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf

          
          On 10/27/2015 10:02 AM, Shyam wrote:
          

          Aravinda, List,
            

            The topic is interesting and also relevant in the case of
            DHT2 where we
            

            lose the hierarchy on a single brick (unlike the older DHT)
            and so some
            

            of the thoughts here are along the same lines as what we are
            debating
            

            w.r.t DHT2 as well.
            

            Here is another option that extends the current thought,
            that I would
            

            like to put forward, that is pretty much inspired from the
            Linux kernel
            

            NFS implementation (based on my current understanding of the
            same) [1] [2].
            

            If gluster server/brick processes handed out handles, (which
            are
            

            currently just GFID (or inode #) of the file), that encode
            pGFID/GFID,
            

            then on any handle based operation, we get the pGFID/GFID
            for the object
            

            being operated on. This solves the first part of the problem
            where we
            

            are encoding the pGFID in the xattr, and here we not only do
            that but
            

            further hand out the handle with that relationship.
            

            It also helps when an object is renamed and we still allow
            the older
            

            handle to be used for operations. Not a bad thing in some
            cases, and
            

            possibly not the best thing to do in some other cases (say
            access).
            

            To further this knowledge back to a name, what you propose
            can be stored
            

            on the object itself. Thus giving us a short dentry tree
            creation
            

            ability of pGFID->name(GFID).
            

            This of course changes the gluster RPC wire protocol, as we
            need to
            

            encode/send pGFID as well in some cases (or could be done
            adding this to
            

            the xdata payload.
            

            Shyam
            

            [1] http://nfs.sourceforge.net/#faq_c7
            

            [2]
            https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting
            

            On 10/27/2015 03:07 AM, Aravinda wrote:
            

            Hi,
              

              We have a volume option called "build-pgfid:on" to enable
              recording
              

              parent gfid in file xattr. This simplifies the GFID to
              Path conversion.
              

              Is it possible to save base name also in xattr along with
              PGFID? It
              

              helps in converting GFID to Path easily without doing
              crawl.
              

              Example structure,
              

              dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
              

                   - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
              

                   - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
              

              dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
              

                   - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
              

              Where file f1 and h1 are hardlinks. Note the same GFID.
              

              Backend,
              

              .glusterfs
              

                    - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
              

                    - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
              

                    - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
              

                    - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
              

              Since f1 and h1 are hardlinks accross directories, file
              xattr will have
              

              two parent GFIDs. Xattr dump will be,
              

              trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
              

              trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
              

              Number shows number of hardlinks per parent GFID.
              

              If we know GFID of a file, to get path,
              

              1. Identify which brick has that file using pathinfo
              xattr.
              

              2. Get all parent GFIDs(using listxattr on backend gfid
              path
              

              .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
              

              3. Crawl those directories to find files with same inode
              as
              

              .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
              

              Updating PGFID to be done when,
              

              1. CREATE/MKNOD - Add xattr
              

              2. RENAME - If moved to different directory, Update PGFID
              

              3. UNLINK - If number of links is more than 1. Reduce
              number of link,
              

              Remove respective parent PGFID
              

              4. LINK - Add PGFID if link to different directory,
              Increment count
              

              Advantageous:
              

              1. Crawling is limited to a few directories instead of
              full file system
              

              crawl.
              

              2. Break early during crawl when search reaches the
              hardlinks number as
              

              of Xattr value.
              

              Disadvantageous:
              

              1. Crawling is expensive if a directory has lot of files.
              

              2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
              

              3. This method of conversion will not work if file is
              deleted.
              

              We can improve performance of GFID to Path conversion if
              we record
              

              Basename also in file xattr.
              

              trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
              

              trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
              

              Note: Multiple base names delimited by zerobyte.
              

              What additional overhead compare to storing only PGFID,
              

              1. Space
              

              2. Number of xattrs will grow as number of hardlinks
              

              3. Max size issue for xattr value?
              

              4. Even when renamed within the same directory.
              

              5. Updating value of xattr involves parsing in case of
              multiple
              

              hardlinks.
              

              Are there any performance issues except during initial
              indexing.(Assume
              

              pgfid and basenames are populated by a separate script)
              

              Comments and Suggestions Welcome.
              

            _______________________________________________
            

            Gluster-devel mailing list
            

            Gluster-devel@xxxxxxxxxxx
            

            http://www.gluster.org/mailman/listinfo/gluster-devel
            

        _______________________________________________
        

        Gluster-devel mailing list
        

        Gluster-devel@xxxxxxxxxxx
        

        http://www.gluster.org/mailman/listinfo/gluster-devel
        

      _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
    
    
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel