Re: Rebalance improvement design

Benjamin Turner <bennyturns@xxxxxxxxx> · Fri, 1 May 2015 02:48:20 -0400

There was a segfault on gqas001, have a look when you get a sec:
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol --xlator-option'.
Program terminated with signal 11, Segmentation fault.
#0  gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032
2032	                GF_FREE (tmp_container->parent_loc);
(gdb) bt
#0  gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032
#1  gf_defrag_process_dir (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2207
#2  0x00007f26fdae1eb8 in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2299
#3  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc200, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#4  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc430, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#5  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc660, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#6  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc890, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#7  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbcac0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#8  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbccf0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#9  0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbcf60, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
    at dht-rebalance.c:2416
#10 0x00007f26fdae2524 in gf_defrag_start_crawl (data="" at dht-rebalance.c:2599
#11 0x00007f2709024f62 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375
#12 0x0000003648c438f0 in ?? () from /lib64/libc-2.12.so
#13 0x0000000000000000 in ?? ()

On Fri, May 1, 2015 at 12:53 AM, Benjamin Turner <bennyturns@xxxxxxxxx> wrote:
Ok I have all my data created and I just started the rebalance.  One thing to not in the client log I see the following spamming:
[root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l
394042

[2015-05-01 00:47:55.591150] I [MSGID: 109036] [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /file_dstdir/gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006 with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2141429670 , Stop: 4294967295 ], 
[2015-05-01 00:47:55.596147] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: chunk size = 0xffffffff / 19920276 = 0xd7
[2015-05-01 00:47:55.596177] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-1
[2015-05-01 00:47:55.596189] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-0
[2015-05-01 00:47:55.597081] I [MSGID: 109036] [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /file_dstdir/gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_005 with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2141429670 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2141429669 ], 
[2015-05-01 00:47:55.601853] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: chunk size = 0xffffffff / 19920276 = 0xd7
[2015-05-01 00:47:55.601882] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-1
[2015-05-01 00:47:55.601895] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-0

Just to confirm the patch is in, glusterfs-3.8dev-0.71.gita7f8482.el6.x86_64.  Correct?

Here is the info on the data set:

hosts in test : ['gqac006.sbu.lab.eng.bos.redhat.com', 'gqas003.sbu.lab.eng.bos.redhat.com']
top test directory(s) : ['/gluster-mount']
peration : create
files/thread : 500000
threads : 8
record size (KB, 0 = maximum) : 0
file size (KB) : 64
file size distribution : fixed
files per dir : 100
dirs per dir : 10
total threads = 16
total files = 7222600
total data = "" 440.833 GB
 90.28% of requested files processed, minimum is  70.00
8107.852862 sec elapsed time
890.815377 files/sec
890.815377 IOPS
55.675961 MB/sec

Here is the rebalance run after about 5 or so minutes:

[root@gqas001 ~]# gluster v rebalance testvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------                               localhost            32203         2.0GB        120858             0          5184          in progress            1294.00
      gqas011.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0               failed               0.00
      gqas016.sbu.lab.eng.bos.redhat.com             9364       585.2MB         53121             0             0          in progress            1294.00
      gqas013.sbu.lab.eng.bos.redhat.com                0        0Bytes         14750             0             0          in progress            1294.00
      gqas014.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0               failed               0.00
      gqas015.sbu.lab.eng.bos.redhat.com                0        0Bytes        196382             0             0          in progress            1294.00
volume rebalance: testvol: success: 

The hostnames are there if you want to poke around.  I had a problem with one of the added systems being on a different version of glusterfs so I had to update everything to glusterfs-3.8dev-0.99.git7d7b80e.el6.x86_64, remove the bricks I just added, and add them back.  Something may have went wrong in that process but I thought I did everything correctly.  I'll start fresh tomorrow.  I figured I'd let this run over night.

-b

On Wed, Apr 29, 2015 at 9:48 PM, Benjamin Turner <bennyturns@xxxxxxxxx> wrote:
Sweet!  Here is the baseline:
[root@gqas001 ~]# gluster v rebalance testvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------                               localhost          1328575        81.1GB       9402953             0             0            completed           98500.00
      gqas012.sbu.lab.eng.bos.redhat.com                0        0Bytes       8000011             0             0            completed           51982.00
      gqas003.sbu.lab.eng.bos.redhat.com                0        0Bytes       8000011             0             0            completed           51982.00
      gqas004.sbu.lab.eng.bos.redhat.com          1326290        81.0GB       9708625             0             0            completed           98500.00
      gqas013.sbu.lab.eng.bos.redhat.com                0        0Bytes       8000011             0             0            completed           51982.00
      gqas014.sbu.lab.eng.bos.redhat.com                0        0Bytes       8000011             0             0            completed           51982.00
volume rebalance: testvol: success: 

I'll have a run on the patch started tomorrow.

-b

On Wed, Apr 29, 2015 at 12:51 PM, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:

Doh my mistake, I thought it was merged.  I was just running with the

upstream 3.7 daily.  Can I use this run as my baseline and then I can run

next time on the patch to show the % improvement?  I'll wipe everything and

try on the patch, any idea when it will be merged?

Yes, it would be very useful to have this run as the baseline. The patch has just been merged in master. It should be backported to 3.7 in a day or so.

Regards,

Nithya

> > > >

> > > > >

> > > > > On Wed, Apr 22, 2015 at 1:10 AM, Nithya Balachandran

> > > > > <nbalacha@xxxxxxxxxx>

> > > > > wrote:

> > > > >

> > > > > > That sounds great. Thanks.

> > > > > >

> > > > > > Regards,

> > > > > > Nithya

> > > > > >

> > > > > > ----- Original Message -----

> > > > > > From: "Benjamin Turner" <bennyturns@xxxxxxxxx>

> > > > > > To: "Nithya Balachandran" <nbalacha@xxxxxxxxxx>

> > > > > > Cc: "Susant Palai" <spalai@xxxxxxxxxx>, "Gluster Devel" <

> > > > > > gluster-devel@xxxxxxxxxxx>

> > > > > > Sent: Wednesday, 22 April, 2015 12:14:14 AM

> > > > > > Subject: Re:  Rebalance improvement design

> > > > > >

> > > > > > I am setting up a test env now, I'll have some feedback for you

> this

> > > > > > week.

> > > > > >

> > > > > > -b

> > > > > >

> > > > > > On Tue, Apr 21, 2015 at 11:36 AM, Nithya Balachandran

> > > > > > <nbalacha@xxxxxxxxxx

> > > > > > >

> > > > > > wrote:

> > > > > >

> > > > > > > Hi Ben,

> > > > > > >

> > > > > > > Did you get a chance to try this out?

> > > > > > >

> > > > > > > Regards,

> > > > > > > Nithya

> > > > > > >

> > > > > > > ----- Original Message -----

> > > > > > > From: "Susant Palai" <spalai@xxxxxxxxxx>

> > > > > > > To: "Benjamin Turner" <bennyturns@xxxxxxxxx>

> > > > > > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>

> > > > > > > Sent: Monday, April 13, 2015 9:55:07 AM

> > > > > > > Subject: Re:  Rebalance improvement design

> > > > > > >

> > > > > > > Hi Ben,

> > > > > > >   Uploaded a new patch here:

> http://review.gluster.org/#/c/9657/.

> > > > > > >   We

> > > > > > >   can

> > > > > > > start perf test on it. :)

> > > > > > >

> > > > > > > Susant

> > > > > > >

> > > > > > > ----- Original Message -----

> > > > > > > From: "Susant Palai" <spalai@xxxxxxxxxx>

> > > > > > > To: "Benjamin Turner" <bennyturns@xxxxxxxxx>

> > > > > > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>

> > > > > > > Sent: Thursday, 9 April, 2015 3:40:09 PM

> > > > > > > Subject: Re:  Rebalance improvement design

> > > > > > >

> > > > > > > Thanks Ben. RPM is not available and I am planning to refresh

> the

> > > > > > > patch

> > > > > > in

> > > > > > > two days with some more regression fixes. I think we can run

> the

> > > > > > > tests

> > > > > > post

> > > > > > > that. Any larger data-set will be good(say 3 to 5 TB).

> > > > > > >

> > > > > > > Thanks,

> > > > > > > Susant

> > > > > > >

> > > > > > > ----- Original Message -----

> > > > > > > From: "Benjamin Turner" <bennyturns@xxxxxxxxx>

> > > > > > > To: "Vijay Bellur" <vbellur@xxxxxxxxxx>

> > > > > > > Cc: "Susant Palai" <spalai@xxxxxxxxxx>, "Gluster Devel" <

> > > > > > > gluster-devel@xxxxxxxxxxx>

> > > > > > > Sent: Thursday, 9 April, 2015 2:10:30 AM

> > > > > > > Subject: Re:  Rebalance improvement design

> > > > > > >

> > > > > > >

> > > > > > > I have some rebalance perf regression stuff I have been

> working on,

> > > > > > > is

> > > > > > > there an RPM with these patches anywhere so that I can try it

> on my

> > > > > > > systems? If not I'll just build from:

> > > > > > >

> > > > > > >

> > > > > > > git fetch git:// review.gluster.org/glusterfs

> > > > > > > refs/changes/57/9657/8

> > > > > > > &&

> > > > > > > git cherry-pick FETCH_HEAD

> > > > > > >

> > > > > > >

> > > > > > >

> > > > > > > I will have _at_least_ 10TB of storage, how many TBs of data

> should

> > > > > > > I

> > > > > > > run

> > > > > > > with?

> > > > > > >

> > > > > > >

> > > > > > > -b

> > > > > > >

> > > > > > >

> > > > > > > On Tue, Apr 7, 2015 at 9:07 AM, Vijay Bellur <

> vbellur@xxxxxxxxxx >

> > > > > > wrote:

> > > > > > >

> > > > > > >

> > > > > > >

> > > > > > >

> > > > > > > On 04/07/2015 03:08 PM, Susant Palai wrote:

> > > > > > >

> > > > > > >

> > > > > > > Here is one test performed on a 300GB data set and around

> 100%(1/2

> > > > > > > the

> > > > > > > time) improvement was seen.

> > > > > > >

> > > > > > > [root@gprfs031 ~]# gluster v i

> > > > > > >

> > > > > > > Volume Name: rbperf

> > > > > > > Type: Distribute

> > > > > > > Volume ID: 35562662-337e-4923-b862- d0bbb0748003

> > > > > > > Status: Started

> > > > > > > Number of Bricks: 4

> > > > > > > Transport-type: tcp

> > > > > > > Bricks:

> > > > > > > Brick1: gprfs029-10ge:/bricks/ gprfs029/brick1

> > > > > > > Brick2: gprfs030-10ge:/bricks/ gprfs030/brick1

> > > > > > > Brick3: gprfs031-10ge:/bricks/ gprfs031/brick1

> > > > > > > Brick4: gprfs032-10ge:/bricks/ gprfs032/brick1

> > > > > > >

> > > > > > >

> > > > > > > Added server 32 and started rebalance force.

> > > > > > >

> > > > > > > Rebalance stat for new changes:

> > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status

> > > > > > > Node Rebalanced-files size scanned failures skipped status run

> time

> > > > > > > in

> > > > > > secs

> > > > > > > --------- ----------- ----------- ----------- -----------

> > > > > > > -----------

> > > > > > > ------------ --------------

> > > > > > > localhost 74639 36.1GB 297319 0 0 completed 1743.00

> > > > > > > 172.17.40.30 67512 33.5GB 269187 0 0 completed 1395.00

> > > > > > > gprfs029-10ge 79095 38.8GB 284105 0 0 completed 1559.00

> > > > > > > gprfs032-10ge 0 0Bytes 0 0 0 completed 402.00

> > > > > > > volume rebalance: rbperf: success:

> > > > > > >

> > > > > > > Rebalance stat for old model:

> > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status

> > > > > > > Node Rebalanced-files size scanned failures skipped status run

> time

> > > > > > > in

> > > > > > secs

> > > > > > > --------- ----------- ----------- ----------- -----------

> > > > > > > -----------

> > > > > > > ------------ --------------

> > > > > > > localhost 86493 42.0GB 634302 0 0 completed 3329.00

> > > > > > > gprfs029-10ge 94115 46.2GB 687852 0 0 completed 3328.00

> > > > > > > gprfs030-10ge 74314 35.9GB 651943 0 0 completed 3072.00

> > > > > > > gprfs032-10ge 0 0Bytes 594166 0 0 completed 1943.00

> > > > > > > volume rebalance: rbperf: success:

> > > > > > >

> > > > > > >

> > > > > > > This is interesting. Thanks for sharing & well done! Maybe we

> > > > > > > should

> > > > > > > attempt a much larger data set and see how we fare there :).

> > > > > > >

> > > > > > > Regards,

> > > > > > >

> > > > > > >

> > > > > > > Vijay

> > > > > > >

> > > > > > >

> > > > > > > ______________________________ _________________

> > > > > > > Gluster-devel mailing list

> > > > > > > Gluster-devel@xxxxxxxxxxx

> > > > > > > http://www.gluster.org/ mailman/listinfo/gluster-devel

> > > > > > >

> > > > > > > _______________________________________________

> > > > > > > Gluster-devel mailing list

> > > > > > > Gluster-devel@xxxxxxxxxxx

> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel

> > > > > > > _______________________________________________

> > > > > > > Gluster-devel mailing list

> > > > > > > Gluster-devel@xxxxxxxxxxx

> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel

> > > > > > >

> > > > > >

> > > > >

> > > > _______________________________________________

> > > > Gluster-devel mailing list

> > > > Gluster-devel@xxxxxxxxxxx

> > > > http://www.gluster.org/mailman/listinfo/gluster-devel

> > > >

> > > _______________________________________________

> > > Gluster-devel mailing list

> > > Gluster-devel@xxxxxxxxxxx

> > > http://www.gluster.org/mailman/listinfo/gluster-devel

> > >

> >

>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel