Re: Removing subvolume from dist/rep volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

Yes, files in split brain are not migrated as we cannot figure out which is the good copy. Adding Ravi to look at this and see what can be done.
Also adding Krutika as this is a sharded volume.

The files with the "---------T" permissions are internal files and can be ignored. Ravi and Krutika, please take a look at the other files.

Regards,
Nithya


On Fri, 28 Jun 2019 at 19:56, Dave Sherohman <dave@xxxxxxxxxxxxx> wrote:
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> There are some edge cases that may prevent a file from being migrated
> during a remove-brick. Please do the following after this:
>
>    1. Check the remove-brick status for any failures.  If there are any,
>    check the rebalance log file for errors.
>    2. Even if there are no failures, check the removed bricks to see if any
>    files have not been migrated. If there are any, please check that they are
>    valid files on the brick and copy them to the volume from the brick to the
>    mount point.

Well, looks like I hit one of those edge cases.  Probably because of
some issues around a reboot last September which left a handful of files
in a state where self-heal identified them as needing to be healed, but
incapable of actually healing them.  (Check the list archives for
"Kicking a stuck heal", posted on Sept 4, if you want more details.)

So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick
(gandalf), and 3 on the other (saruman).  Looking in
/var/log/gluster/palantir-rebalance.log, I see those numbers of

migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: failed to lock file on palantir-replicate-1 [Stale file handle]

errors.

Also, merlin has four errors, and gandalf has one, of the form:

Gfid mismatch detected for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/0f500288-ff62-4f0b-9574-53f510b4159f.2898>, 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1.

There are no gfid mismatches recorded on saruman.  All of the gfid
mismatches are for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806> and (on
saruman) appear to correspond to 0-byte files (e.g.,
.shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the
gfid mismatch quoted above).

For both types of errors, all affected files are in .shard/ and have
UUID-style names, so I have no idea which actual files they belong to.
File sizes are generally either 0 bytes or 4M (exactly), although one of
them has a size slightly larger than 3M.  So I'm assuming they're chunks
of larger files (which would be almost all the files on the volume -
it's primarily holding disk image files for kvm servers).

Web searches generally seem to consider gfid mismatches to be a form of
split-brain, but `gluster volume heal palantir info split-brain` shows
"Number of entries in split-brain: 0" for all bricks, including those
bricks which are reporting gfid mismatches.


Given all that, how do I proceed with cleaning up the stale handle
issues?  I would guess that this will involve somehow converting the
shard filename to a "real" filename, then shutting down the
corresponding VM and maybe doing some additional cleanup.

And then there's the gfid mismatches.  Since they're for 0-byte files,
is it safe to just ignore them on the assumption that they only hold
metadata?  Or do I need to do some kind of split-brain resolution on
them (even though gluster says no files are in split-brain)?


Finally, a listing of /var/local/brick0/data/.shard on saruman, in case
any of the information it contains (like file sizes/permissions) might
provide clues to resolving the errors:

--- cut here ---
root@saruman:/var/local/brick0/data/.shard# ls -l
total 63996
-rw-rw---- 2 root libvirt-qemu       0 Sep 17  2018 0f500288-ff62-4f0b-9574-53f510b4159f.2864
-rw-rw---- 2 root libvirt-qemu       0 Sep 17  2018 0f500288-ff62-4f0b-9574-53f510b4159f.2868
-rw-rw---- 2 root libvirt-qemu       0 Sep 17  2018 0f500288-ff62-4f0b-9574-53f510b4159f.2879
-rw-rw---- 2 root libvirt-qemu       0 Sep 17  2018 0f500288-ff62-4f0b-9574-53f510b4159f.2898
-rw------- 2 root libvirt-qemu 4194304 May 17 14:42 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229
-rw------- 2 root libvirt-qemu 4194304 Jun 24 09:10 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820
-rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 323186b1-6296-4cbe-8275-b940cc9d65cf.27466
-rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 323186b1-6296-4cbe-8275-b940cc9d65cf.32575
-rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 323186b1-6296-4cbe-8275-b940cc9d65cf.3448
---------T 2 root libvirt-qemu       0 Jun 28 14:26 4cd094f4-0344-4660-98b0-83249d5bd659.22998
-rw------- 2 root libvirt-qemu 4194304 Mar 13  2018 6cdd2e5c-f49e-492b-8039-239e71577836.1302
---------T 2 root libvirt-qemu       0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131
---------T 2 root libvirt-qemu       0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137
-rw-rw-r-- 2 root libvirt-qemu 4194304 Nov  4  2018 9544617c-901c-4613-a94b-ccfad4e38af1.165
-rw-rw-r-- 2 root libvirt-qemu 4194304 Nov  4  2018 9544617c-901c-4613-a94b-ccfad4e38af1.168
-rw-rw-r-- 2 root libvirt-qemu 4194304 Nov  5  2018 9544617c-901c-4613-a94b-ccfad4e38af1.193
-rw-rw-r-- 2 root libvirt-qemu 4194304 Nov  6  2018 9544617c-901c-4613-a94b-ccfad4e38af1.3800
---------T 2 root libvirt-qemu       0 Jun 28 15:02 b48a5934-5e5b-4918-8193-6ff36f685f70.46559
-rw-rw---- 2 root libvirt-qemu       0 Oct 12  2018 c5bde2f2-3361-4d1a-9c88-28751ef74ce6.3568
-rw-r--r-- 2 root libvirt-qemu 4194304 Apr 13  2018 c953c676-152d-4826-80ff-bd307fa7f6e5.10724
-rw-r--r-- 2 root libvirt-qemu 4194304 Apr 11  2018 c953c676-152d-4826-80ff-bd307fa7f6e5.3101
--- cut here ---

--
Dave Sherohman
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux