There was a segfault on gqas001, have a look when you get a sec:
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol --xlator-option'.
Program terminated with signal 11, Segmentation fault.
#0 gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032
2032 GF_FREE (tmp_container->parent_loc);
(gdb) bt
#0 gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032
#1 gf_defrag_process_dir (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2207
#2 0x00007f26fdae1eb8 in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2299
#3 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc200, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#4 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc430, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#5 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc660, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#6 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbc890, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#7 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbcac0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#8 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbccf0, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#9 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, defrag=0x7f26f8031ef0, loc=0x7f26f4dbcf60, fix_layout=0x7f2707874b5c, migrate_data=0x7f2707874be8)
at dht-rebalance.c:2416
#10 0x00007f26fdae2524 in gf_defrag_start_crawl (data="" at dht-rebalance.c:2599
#11 0x00007f2709024f62 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375
#12 0x0000003648c438f0 in ?? () from /lib64/libc-2.12.so
#13 0x0000000000000000 in ?? ()
On Fri, May 1, 2015 at 12:53 AM, Benjamin Turner <bennyturns@xxxxxxxxx> wrote:
Ok I have all my data created and I just started the rebalance. One thing to not in the client log I see the following spamming:[root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l394042[2015-05-01 00:47:55.591150] I [MSGID: 109036] [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /file_dstdir/gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006 with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2141429670 , Stop: 4294967295 ],[2015-05-01 00:47:55.596147] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: chunk size = 0xffffffff / 19920276 = 0xd7[2015-05-01 00:47:55.596177] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-1[2015-05-01 00:47:55.596189] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-0[2015-05-01 00:47:55.597081] I [MSGID: 109036] [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /file_dstdir/gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_005 with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2141429670 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2141429669 ],[2015-05-01 00:47:55.601853] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: chunk size = 0xffffffff / 19920276 = 0xd7[2015-05-01 00:47:55.601882] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-1[2015-05-01 00:47:55.601895] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: assigning range size 0x7fa39fa6 to testvol-replicate-0Just to confirm the patch is in, glusterfs-3.8dev-0.71.gita7f8482.el6.x86_64. Correct?Here is the info on the data set:hosts in test : ['gqac006.sbu.lab.eng.bos.redhat.com', 'gqas003.sbu.lab.eng.bos.redhat.com']top test directory(s) : ['/gluster-mount']peration : createfiles/thread : 500000threads : 8record size (KB, 0 = maximum) : 0file size (KB) : 64file size distribution : fixedfiles per dir : 100dirs per dir : 10total threads = 16total files = 7222600total data = "" 440.833 GB90.28% of requested files processed, minimum is 70.008107.852862 sec elapsed time890.815377 files/sec890.815377 IOPS55.675961 MB/secHere is the rebalance run after about 5 or so minutes:[root@gqas001 ~]# gluster v rebalance testvol statusNode Rebalanced-files size scanned failures skipped status run time in secs--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------localhost 32203 2.0GB 120858 0 5184 in progress 1294.00gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00gqas016.sbu.lab.eng.bos.redhat.com 9364 585.2MB 53121 0 0 in progress 1294.00gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 14750 0 0 in progress 1294.00gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 196382 0 0 in progress 1294.00volume rebalance: testvol: success:The hostnames are there if you want to poke around. I had a problem with one of the added systems being on a different version of glusterfs so I had to update everything to glusterfs-3.8dev-0.99.git7d7b80e.el6.x86_64, remove the bricks I just added, and add them back. Something may have went wrong in that process but I thought I did everything correctly. I'll start fresh tomorrow. I figured I'd let this run over night.-bOn Wed, Apr 29, 2015 at 9:48 PM, Benjamin Turner <bennyturns@xxxxxxxxx> wrote:Sweet! Here is the baseline:[root@gqas001 ~]# gluster v rebalance testvol statusNode Rebalanced-files size scanned failures skipped status run time in secs--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------localhost 1328575 81.1GB 9402953 0 0 completed 98500.00gqas012.sbu.lab.eng.bos.redhat.com 0 0Bytes 8000011 0 0 completed 51982.00gqas003.sbu.lab.eng.bos.redhat.com 0 0Bytes 8000011 0 0 completed 51982.00gqas004.sbu.lab.eng.bos.redhat.com 1326290 81.0GB 9708625 0 0 completed 98500.00gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 8000011 0 0 completed 51982.00gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 8000011 0 0 completed 51982.00volume rebalance: testvol: success:I'll have a run on the patch started tomorrow.-bOn Wed, Apr 29, 2015 at 12:51 PM, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
Doh my mistake, I thought it was merged. I was just running with the
upstream 3.7 daily. Can I use this run as my baseline and then I can run
next time on the patch to show the % improvement? I'll wipe everything and
try on the patch, any idea when it will be merged?
Yes, it would be very useful to have this run as the baseline. The patch has just been merged in master. It should be backported to 3.7 in a day or so.
Regards,
Nithya
> > > >
> > > > >
> > > > > On Wed, Apr 22, 2015 at 1:10 AM, Nithya Balachandran
> > > > > <nbalacha@xxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > That sounds great. Thanks.
> > > > > >
> > > > > > Regards,
> > > > > > Nithya
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > From: "Benjamin Turner" <bennyturns@xxxxxxxxx>
> > > > > > To: "Nithya Balachandran" <nbalacha@xxxxxxxxxx>
> > > > > > Cc: "Susant Palai" <spalai@xxxxxxxxxx>, "Gluster Devel" <
> > > > > > gluster-devel@xxxxxxxxxxx>
> > > > > > Sent: Wednesday, 22 April, 2015 12:14:14 AM
> > > > > > Subject: Re: Rebalance improvement design
> > > > > >
> > > > > > I am setting up a test env now, I'll have some feedback for you
> this
> > > > > > week.
> > > > > >
> > > > > > -b
> > > > > >
> > > > > > On Tue, Apr 21, 2015 at 11:36 AM, Nithya Balachandran
> > > > > > <nbalacha@xxxxxxxxxx
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ben,
> > > > > > >
> > > > > > > Did you get a chance to try this out?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Nithya
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Susant Palai" <spalai@xxxxxxxxxx>
> > > > > > > To: "Benjamin Turner" <bennyturns@xxxxxxxxx>
> > > > > > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > > > > > Sent: Monday, April 13, 2015 9:55:07 AM
> > > > > > > Subject: Re: Rebalance improvement design
> > > > > > >
> > > > > > > Hi Ben,
> > > > > > > Uploaded a new patch here:
> http://review.gluster.org/#/c/9657/.
> > > > > > > We
> > > > > > > can
> > > > > > > start perf test on it. :)
> > > > > > >
> > > > > > > Susant
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Susant Palai" <spalai@xxxxxxxxxx>
> > > > > > > To: "Benjamin Turner" <bennyturns@xxxxxxxxx>
> > > > > > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > > > > > Sent: Thursday, 9 April, 2015 3:40:09 PM
> > > > > > > Subject: Re: Rebalance improvement design
> > > > > > >
> > > > > > > Thanks Ben. RPM is not available and I am planning to refresh
> the
> > > > > > > patch
> > > > > > in
> > > > > > > two days with some more regression fixes. I think we can run
> the
> > > > > > > tests
> > > > > > post
> > > > > > > that. Any larger data-set will be good(say 3 to 5 TB).
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Susant
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Benjamin Turner" <bennyturns@xxxxxxxxx>
> > > > > > > To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> > > > > > > Cc: "Susant Palai" <spalai@xxxxxxxxxx>, "Gluster Devel" <
> > > > > > > gluster-devel@xxxxxxxxxxx>
> > > > > > > Sent: Thursday, 9 April, 2015 2:10:30 AM
> > > > > > > Subject: Re: Rebalance improvement design
> > > > > > >
> > > > > > >
> > > > > > > I have some rebalance perf regression stuff I have been
> working on,
> > > > > > > is
> > > > > > > there an RPM with these patches anywhere so that I can try it
> on my
> > > > > > > systems? If not I'll just build from:
> > > > > > >
> > > > > > >
> > > > > > > git fetch git:// review.gluster.org/glusterfs
> > > > > > > refs/changes/57/9657/8
> > > > > > > &&
> > > > > > > git cherry-pick FETCH_HEAD
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I will have _at_least_ 10TB of storage, how many TBs of data
> should
> > > > > > > I
> > > > > > > run
> > > > > > > with?
> > > > > > >
> > > > > > >
> > > > > > > -b
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 7, 2015 at 9:07 AM, Vijay Bellur <
> vbellur@xxxxxxxxxx >
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 04/07/2015 03:08 PM, Susant Palai wrote:
> > > > > > >
> > > > > > >
> > > > > > > Here is one test performed on a 300GB data set and around
> 100%(1/2
> > > > > > > the
> > > > > > > time) improvement was seen.
> > > > > > >
> > > > > > > [root@gprfs031 ~]# gluster v i
> > > > > > >
> > > > > > > Volume Name: rbperf
> > > > > > > Type: Distribute
> > > > > > > Volume ID: 35562662-337e-4923-b862- d0bbb0748003
> > > > > > > Status: Started
> > > > > > > Number of Bricks: 4
> > > > > > > Transport-type: tcp
> > > > > > > Bricks:
> > > > > > > Brick1: gprfs029-10ge:/bricks/ gprfs029/brick1
> > > > > > > Brick2: gprfs030-10ge:/bricks/ gprfs030/brick1
> > > > > > > Brick3: gprfs031-10ge:/bricks/ gprfs031/brick1
> > > > > > > Brick4: gprfs032-10ge:/bricks/ gprfs032/brick1
> > > > > > >
> > > > > > >
> > > > > > > Added server 32 and started rebalance force.
> > > > > > >
> > > > > > > Rebalance stat for new changes:
> > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status
> > > > > > > Node Rebalanced-files size scanned failures skipped status run
> time
> > > > > > > in
> > > > > > secs
> > > > > > > --------- ----------- ----------- ----------- -----------
> > > > > > > -----------
> > > > > > > ------------ --------------
> > > > > > > localhost 74639 36.1GB 297319 0 0 completed 1743.00
> > > > > > > 172.17.40.30 67512 33.5GB 269187 0 0 completed 1395.00
> > > > > > > gprfs029-10ge 79095 38.8GB 284105 0 0 completed 1559.00
> > > > > > > gprfs032-10ge 0 0Bytes 0 0 0 completed 402.00
> > > > > > > volume rebalance: rbperf: success:
> > > > > > >
> > > > > > > Rebalance stat for old model:
> > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status
> > > > > > > Node Rebalanced-files size scanned failures skipped status run
> time
> > > > > > > in
> > > > > > secs
> > > > > > > --------- ----------- ----------- ----------- -----------
> > > > > > > -----------
> > > > > > > ------------ --------------
> > > > > > > localhost 86493 42.0GB 634302 0 0 completed 3329.00
> > > > > > > gprfs029-10ge 94115 46.2GB 687852 0 0 completed 3328.00
> > > > > > > gprfs030-10ge 74314 35.9GB 651943 0 0 completed 3072.00
> > > > > > > gprfs032-10ge 0 0Bytes 594166 0 0 completed 1943.00
> > > > > > > volume rebalance: rbperf: success:
> > > > > > >
> > > > > > >
> > > > > > > This is interesting. Thanks for sharing & well done! Maybe we
> > > > > > > should
> > > > > > > attempt a much larger data set and see how we fare there :).
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > >
> > > > > > > Vijay
> > > > > > >
> > > > > > >
> > > > > > > ______________________________ _________________
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxxx
> > > > > > > http://www.gluster.org/ mailman/listinfo/gluster-devel
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxxx
> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > > > > _______________________________________________
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxxx
> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > > > >
> > > > > >
> > > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
>
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel