Re: Replica bricks fungible?

Zenon Panoussis <oracle@xxxxxxxxxxxxxxx> · Sun, 13 Jun 2021 12:22:26 +0000

> Have you documented the procedure you followed?

There was a serious error in my previous reply to you:

   rsync -vvaz --progress node01:/gfsroot/gv0 /gfsroot/

That should have been 'rsync -vvazH' and the "H" is very
important. Gluster uses hard links to map file UUIDs to file
names, but rsync without -H ignores hard links and copies the
hardlinked data again into a new unrelated file, which breaks
gluster's coupling of data to metadata.

*

I have now also tried copying raw data on a three-brick replica
cluster (one brick per server) in a different way (do note the
hostname of the prompts below):

[root@node01 ~]# gluster volume status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node01:/vol/gfs/gv0               49152     0          Y       35409
Brick node02:/vol/gfs/gv0               49152     0          Y       6814
Brick node03:/vol/gfs/gv0               49155     0          Y       21457

[root@node01 ~]# gluster volume heal gv0 statistics heal-count
(all 0)

[root@node02 ~]# umount 127.0.0.1:gv0
[root@node03 ~]# umount 127.0.0.1:gv0

[root@node01 ~]# gluster volume remove-brick gv0 replica 2 node03:/vol/gfs/gv0 force
[root@node01 ~]# gluster volume remove-brick gv0 replica 1 node02:/vol/gfs/gv0 force

You see here that, from node01 and with glusterd running on all
three nodes, I remove the other two nodes' bricks. This leaves
volume gv0 with one single brick and imposes a quorum of 1 (thank
you Strahil for this idea, albeit differently implemented here).

Now, left with a volume of only one single blick, I copy the data
to it on node01:

[root@node01 ~]# rsync -vva /datasource/blah 127.0.0.1:gv0/

This is fast. It is almost as fast as copying from one partition
to another on the same disk, because there is no network overhead
within gluster of nodes having to communicate multiple system
calls with each-other before they can write a file. And there
is no latency. System call latency ~200ms back and fro multiple
times is what is killing me (because of ADSL and 4.000 km between
my node01 and the other two), so this eliminates that problem.

In the next step I copied the raw gluster volume data to the other
two nodes. This is where 'rsync -H ' is important:

[root@node02 ~]# rsync -vvazH node01:/vol/gfs/gv0 /vol/gfs/
[root@node03 ~]# rsync -vvazH node02:/vol/gfs/gv0 /vol/gfs/

This is also fast; it copies raw data from A to B without any
communications needing to travel back and fro from every node
to every other node. Hence, no exponential latency multiplication
stonewall.

Finally, when all the raw data is in place on all three nodes,

[root@node01 www]# gluster volume add-brick gv0 replica 2 node02:/vol/gfs/gv0 force
[root@node01 www]# gluster volume add-brick gv0 replica 3 node03:/vol/gfs/gv0 force

For comparison: Copying a mail store of about 1,1 million small
and very small files, total ~80 GB, to this same gluster volume
the normal way, took me from the first days of January to early
May. Four months! Copying about 200.000 mostly small files
yesterday, total ~38 GB, with the above somewhat unorthodox way
took 12 hours from start to finish including the transfer over
ADSL.

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users