Re: Rebalance improvement design

Benjamin Turner <bennyturns@xxxxxxxxx> · Mon, 4 May 2015 11:28:13 -0400

I see:
#define GF_DECIDE_DEFRAG_THROTTLE_COUNT(throttle_count, conf) {         \
                                                                        \
                throttle_count = MAX ((get_nprocs() - 4), 4);                 \
                                                                        \
                if (!strcmp (conf->dthrottle, "lazy"))                  \
                        conf->defrag->rthcount = 1;                     \
                                                                        \
                if (!strcmp (conf->dthrottle, "normal"))                \
                        conf->defrag->rthcount = (throttle_count / 2);  \
                                                                        \
                if (!strcmp (conf->dthrottle, "aggressive"))            \
                        conf->defrag->rthcount = throttle_count;  \

So aggressive will give us the default of (20 + 16), normal is that divided by 2, and lazy is 1, is that correct?  If so that is what I was looking to see.  The only other thing I can think of here is making the tunible a number like event threads, but I like this.  IDK if I saw it documented but if its not we should note this in help.

Also to note, the old time was 98500.00 the new one is 55088.00, that is a 44% improvement!

-b

On Mon, May 4, 2015 at 9:06 AM, Susant Palai <spalai@xxxxxxxxxx> wrote:
Ben,

    On no. of threads:

     Sent throttle patch here:http://review.gluster.org/#/c/10526/ to limit thread numbers[Not merged]. The rebalance process in current model spawns 20 threads and in addition to that there will be a max 16 syncop threads.

    Crash:

     The crash should be fixed by this: http://review.gluster.org/#/c/10459/.

     Rebalance time taken is a factor of number of files and their size. If the frequency of files getting added to the global queue[on which the migrator threads act] is higher, faster will be the rebalance. I guess here we are seeing the effect of local crawl mostly as only 81GB is migrated out of 500GB.

Thanks,

Susant

----- Original Message -----

> From: "Benjamin Turner" <bennyturns@xxxxxxxxx>

> To: "Vijay Bellur" <vbellur@xxxxxxxxxx>

> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>

> Sent: Monday, May 4, 2015 5:18:13 PM

> Subject: Re:  Rebalance improvement design

>

> Thanks Vijay! I forgot to upgrade the kernel(thinp 6.6 perf bug gah) before I

> created this data set, so its a bit smaller:

>

> total threads = 16

> total files = 7,060,700 (64 kb files, 100 files per dir)

> total data = "" GB

> 88.26% of requested files processed, minimum is 70.00

> 10101.355737 sec elapsed time

> 698.985382 files/sec

> 698.985382 IOPS

> 43.686586 MB/sec

>

> I updated everything and ran the rebalanace on

> glusterfs-3.8dev-0.107.git275f724.el6.x86_64.:

>

> [root@gqas001 ~]# gluster v rebalance testvol status

> Node Rebalanced-files size scanned failures skipped status run time in secs

> --------- ----------- ----------- ----------- ----------- -----------

> ------------ --------------

> localhost 1327346 81.0GB 3999140 0 0 completed 55088.00

> gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 1 0 0 completed 26070.00

> gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00

> gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00

> gqas016.sbu.lab.eng.bos.redhat.com 1325857 80.9GB 4000865 0 0 completed

> 55088.00

> gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00

> volume rebalance: testvol: success:

>

>

> A couple observations:

>

> I am seeing lots of threads / processes running:

>

> [root@gqas001 ~]# ps -eLf | grep glu | wc -l

> 96 <- 96 gluster threads

> [root@gqas001 ~]# ps -eLf | grep rebal | wc -l

> 36 <- 36 rebal threads.

>

> Is this tunible? Is there a use case where we would need to limit this? Just

> curious, how did we arrive at 36 rebal threads?

>

> # cat /var/log/glusterfs/testvol-rebalance.log | wc -l

> 4,577,583

> [root@gqas001 ~]# ll /var/log/glusterfs/testvol-rebalance.log -h

> -rw------- 1 root root 1.6G May 3 12:29

> /var/log/glusterfs/testvol-rebalance.log

>

> :) How big is this going to get when I do the 10-20 TB? I'll keep tabs on

> this, my default test setup only has:

>

> [root@gqas001 ~]# df -h

> Filesystem Size Used Avail Use% Mounted on

> /dev/mapper/vg_gqas001-lv_root 50G 4.8G 42G 11% /

> tmpfs 24G 0 24G 0% /dev/shm

> /dev/sda1 477M 65M 387M 15% /boot

> /dev/mapper/vg_gqas001-lv_home 385G 71M 366G 1% /home

> /dev/mapper/gluster_vg-lv_bricks 9.5T 219G 9.3T 3% /bricks

>

> Next run I want to fill up a 10TB cluster and double the # of bricks to

> simulate running out of space doubling capacity. Any other fixes or changes

> that need to go in before I try a larger data set? Before that I may run my

> performance regression suite against a system while a rebal is in progress

> and check how it affects performance. I'll turn both these cases into perf

> regression tests that I run with iozone smallfile and such, any other use

> cases I should add? Should I add hard / soft links / whatever else tot he

> data set?

>

> -b

>

>

> On Sun, May 3, 2015 at 11:48 AM, Vijay Bellur < vbellur@xxxxxxxxxx > wrote:

>

>

> On 05/01/2015 10:23 AM, Benjamin Turner wrote:

>

>

> Ok I have all my data created and I just started the rebalance. One

> thing to not in the client log I see the following spamming:

>

> [root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l

> 394042

>

> [2015-05-01 00:47:55.591150] I [MSGID: 109036]

> [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:

> Setting layout of

> /file_dstdir/

> gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006

> < http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006 >

> with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop:

> 2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start:

> 2141429670 , Stop: 4294967295 ],

> [2015-05-01 00:47:55.596147] I

> [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht:

> chunk size = 0xffffffff / 19920276 = 0xd7

> [2015-05-01 00:47:55.596177] I

> [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht:

> assigning range size 0x7fa39fa6 to testvol-replicate-1

>

>

> I also noticed the same set of excessive logs in my tests. Have sent across a

> patch [1] to address this problem.

>

> -Vijay

>

> [1] http://review.gluster.org/10281

>

>

>

>

>

> _______________________________________________

> Gluster-devel mailing list

> Gluster-devel@xxxxxxxxxxx

> http://www.gluster.org/mailman/listinfo/gluster-devel

>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel