Re: Recovering from remove-brick where shards did not rebalance

Anthony Hoppe <anthony@xxxxxxxx> · Wed, 8 Sep 2021 23:27:03 -0700

Ok!  I'm actually poking at this now, so great timing.

The only mistake I made, I believe, was I expanded the last shard to 
64MB.  I forgot that bit.  I'm going to try again leaving that one as 
is.  Otherwise here is what my process has been so far.  It may be a bit 
roundabout but here it is:

1) copy main file + shards from each node to directories on recovery storage
2) separate empty and non-empty files
3) compare non-empty files (diff -q the directories) for discrepancies

If everything seems to check out:

4) combine empty files into one directory overwriting dupes
5) combine non-empty files into one directory overwriting dupes
6) expand all files not already 64 MB to 64 MB, except last shard.
7) create a numerically sorted list of files
8) spot check sort list and append shard 0 to top of list if necessary.
9) cat everything together reading from sorted list.

Does this sound more or less like I'm going down the right path?

Thanks!

On 9/8/21 11:18 PM, Xavi Hernandez wrote:
Hi Anthony,

On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony@xxxxxxxx 
<mailto:anthony@xxxxxxxx>> wrote:

    Hi Xavi,

    I am working with a distributred-replicated volume.  What I've been
    doing is copying the shards from each node to their own "recovery"
    directory, discarding shards that are 0 bytes, then comparing the
    remainder and combining unique shards into a common directory.  Then
    I'd build a sorted list so the shards are sorted numerically adding
    the "main file" to the top of the list and then have cat run through
    the list.  I had one pair of shards that diff told me were not
    equal, but their byte size was equivalent.  In that case, I'm not
    sure which is the "correct" shard, but I'd note that and just pick
    one with the intention of circling back if cat'ing things together
    didn't work out...which so far I haven't had any luck.

If there's a shard with different contents probably it has a pending 
heal. If it's a replica 3, most probably 2 of the files should match. In 
that case this should be the "good" version. Otherwise you will need to 
check the stat and extended attributes of the files from each brick to 
see which one is the best.

    How can I identify if a shard is not full size?  I haven't checked
    every single shard, but they seem to be 64 MB in size.  Would that
    mean I need to make sure all but the last shard is 64 MB?  I suspect
    this might be my issue.

If you are using the default shard size, they should be 64 MiB (i.e. 
67108864 bytes). Any file smaller than that (including the main file, 
but not the last shard) must be expanded to this size (truncate -s 
67108864 <file>). All shards must exist (from 1 to last number). If one 
is missing you need to create it (touch <file> && truncate -s 67108864 
<file>).

    Also, is shard 0 what would appear as the actual file (so
    largefile.raw or whatever)?  It seems in my scenario these files are
    ~48 MB.  I assume that means I need to extend it to 64 MB?

Yes, shard 0 is the main file, and it also needs to be extended to 64 MiB.

Regards,

Xavi

    This is all great information.  Thanks!

    ~ Anthony

    ------------------------------------------------------------------------

        *From: *"Xavi Hernandez" <jahernan@xxxxxxxxxx
        <mailto:jahernan@xxxxxxxxxx>>
        *To: *"anthony" <anthony@xxxxxxxx <mailto:anthony@xxxxxxxx>>
        *Cc: *"gluster-users" <gluster-users@xxxxxxxxxxx
        <mailto:gluster-users@xxxxxxxxxxx>>
        *Sent: *Wednesday, September 8, 2021 1:57:51 AM
        *Subject: *Re:  Recovering from remove-brick
        where shards did not rebalance

        Hi Anthony,

        On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony@xxxxxxxx
        <mailto:anthony@xxxxxxxx>> wrote:

            I am currently playing with concatenating main file + shards
            together.  Is it safe to assume that a shard with the same
            ID and sequence number
            (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is
            identical across bricks?  That is, I can copy all the shards
            into a single location overwriting and/or discarding
            duplicates, then concatenate them together in order?  Or is
            it a more complex?

        Assuming it's a replicated volume, a given shard should appear
        on all bricks of the same replicated subvolume. If there were no
        pending heals, they should all have the same contents (however
        you can easily check that by running an md5sum (or similar) on
        each file).

        On distributed-replicated volumes it's possible to have the same
        shard on two different subvolumes. In this case one of the
        subvolumes contains the real file, and the other a special
        0-bytes file with mode '---------T'. You need to take the real
        file and ignore the second one.

        Shards may be smaller than the shard size. In this case you
        should extend the shard to the shard size before concatenating
        it with the rest of the shards (for example using "truncate
        -s"). The last shard may be smaller. It doesn't need to be extended.

        Once you have all the shards, you can concatenate them. Note
        that the first shard of a file (or shard 0) is not inside the
        .shard directory. You must take it from the location where the
        file is normally seen.

        Regards,

        Xavi

            ------------------------------------------------------------------------

                *From: *"anthony" <anthony@xxxxxxxx
                <mailto:anthony@xxxxxxxx>>
                *To: *"gluster-users" <gluster-users@xxxxxxxxxxx
                <mailto:gluster-users@xxxxxxxxxxx>>
                *Sent: *Tuesday, September 7, 2021 10:18:07 AM
                *Subject: *Re:  Recovering from
                remove-brick where shards did not        rebalance

                I've been playing with re-adding the bricks and here is
                some interesting behavior.

                When I try to force add the bricks to the volume while
                it's running, I get complaints about one of the bricks
                already being a member of a volume.  If I stop the
                volume, I can then force-add the bricks.  However, the
                volume won't start without force.  Once the volume is
                force started, all of the bricks remain offline.

                I feel like I'm close...but not quite there...

                ------------------------------------------------------------------------

                    *From: *"anthony" <anthony@xxxxxxxx
                    <mailto:anthony@xxxxxxxx>>
                    *To: *"Strahil Nikolov" <hunter86_bg@xxxxxxxxx
                    <mailto:hunter86_bg@xxxxxxxxx>>
                    *Cc: *"gluster-users" <gluster-users@xxxxxxxxxxx
                    <mailto:gluster-users@xxxxxxxxxxx>>
                    *Sent: *Tuesday, September 7, 2021 7:45:44 AM
                    *Subject: *Re:  Recovering from
                    remove-brick where shards did not        rebalance

                    I was contemplating these options, actually, but not
                    finding anything in my research showing someone had
                    tried either before gave me pause.

                    One thing I wasn't sure about when doing a force
                    add-brick was if gluster would wipe the existing
                    data from the added bricks.  Sounds like that may
                    not be the case?

                    With regards to concatenating the main file +
                    shards, how would I go about identifying the shards
                    that pair with the main file?  I see the shards have
                    sequence numbers, but I'm not sure how to match the
                    identifier to the main file.

                    Thanks!!

                    ------------------------------------------------------------------------

                        *From: *"Strahil Nikolov" <hunter86_bg@xxxxxxxxx
                        <mailto:hunter86_bg@xxxxxxxxx>>
                        *To: *"anthony" <anthony@xxxxxxxx
                        <mailto:anthony@xxxxxxxx>>, "gluster-users"
                        <gluster-users@xxxxxxxxxxx
                        <mailto:gluster-users@xxxxxxxxxxx>>
                        *Sent: *Tuesday, September 7, 2021 6:02:36 AM
                        *Subject: *Re:  Recovering from
                        remove-brick where shards did not        rebalance

                        The data should be recoverable by concatenating
                        the main file with all shards. Then you can copy
                        the data back via the FUSE mount point.

                        I think that some users reported that add-brick
                        with the force option allows to 'undo' the
                        situation and 're-add' the data, but I have
                        never tried that and I cannot guarantee that it
                        will even work.

                        The simplest way is to recover from a recent
                        backup , but sometimes this leads to a data loss.

                        Best Regards,
                        Strahil Nikolov

                            On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
                            <anthony@xxxxxxxx <mailto:anthony@xxxxxxxx>>
                            wrote:
                            Hello,

                            I did a bad thing and did a remove-brick on
                            a set of bricks in a distributed-replicate
                            volume where rebalancing did not
                            successfully rebalance all files.  In
                            sleuthing around the various bricks on the 3
                            node pool, it appears that a number of the
                            files within the volume may have been stored
                            as shards.  With that, I'm unsure how to
                            proceed with recovery.

                            Is it possible to re-add the removed bricks
                            somehow and then do a heal?  Or is there a
                            way to recover data from shards somehow?

                            Thanks!
                            ________

                            Community Meeting Calendar:

                            Schedule -
                            Every 2nd and 4th Tuesday at 14:30 IST /
                            09:00 UTC
                            Bridge: https://meet.google.com/cpu-eiue-hvk
                            <https://meet.google.com/cpu-eiue-hvk>
                            Gluster-users mailing list
                            Gluster-users@xxxxxxxxxxx
                            <mailto:Gluster-users@xxxxxxxxxxx>
                            https://lists.gluster.org/mailman/listinfo/gluster-users
                            <https://lists.gluster.org/mailman/listinfo/gluster-users>

            ________

            Community Meeting Calendar:

            Schedule -
            Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
            Bridge: https://meet.google.com/cpu-eiue-hvk
            <https://meet.google.com/cpu-eiue-hvk>
            Gluster-users mailing list
            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
            https://lists.gluster.org/mailman/listinfo/gluster-users
            <https://lists.gluster.org/mailman/listinfo/gluster-users>

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users