Re: Question about random and rr scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Craig Tierney wrote:
Amar S. Tumballi wrote:
On Mon, Mar 3, 2008 at 4:27 PM, Craig Tierney <Craig.Tierney@xxxxxxxx>
wrote:

I setting up Gluster (1.3.7) with two servers.  I first
tried configuring the clients as round-robin (rr). When I try and write
to the filesystem for the first time, all of the files go
to the first brick.  Subsequent writes will alternate between
the two bricks.  When I try random, the first file is always
created on the first brick.  Subsequent writes always go to
the first brick (never the second).


As the name suggests 'random' scheduler just calls random() and it just does % with the number of clients. Hence, not much control over it by user side
right now.




What I want is round-robin, or a working random.  However,
for round-robin to work for me, I need the chosen server to
be random, not always the first one.

In the long-term, it wouldn't really matter because everything
would average out.  However, I am creating filesystems that
will be temporary, so I need the right behavior in the short term.

Should random do what I need?  Should I look in the code
and see how to get the Round-Robin schedule to start with
a random index?


Just initialize index variable in rr scheduler to start with a random
number. should not be much of a work..




I modified the rr scheduler to use a random index at initialization.
I like the behavior much better.


<patch deleted>


The patch I created actually wasn't working.  I didn't notice the behavior
until I tested it further.  The problem is that every client seems to
be calling time(NULL) at the same time (for 32 clients) and the random
number generator is being seeded with the same value.  This is the same
behavior I saw when trying to use the random scheduler.

Below is a new patch.  What is does is that it adds a function called
seed_random, which seeds the random number generator with data from /dev/urandom.
This makes it much more likely that all clients will be seeded with a different
value.  In the event that some distro doesn't have /dev/urandom defined, the
function will fall back to using time(NULL).  There may be a better fallback
position than this though.

I tested the patch with both the random and rr schedulers.  The first files
written by clients are distributed more evenly now.




diff -urN glusterfs-1.3.7/scheduler/rr/src/rr.c ../glusterfs-1.3.7/scheduler/rr/src/rr.c
--- glusterfs-1.3.7/scheduler/rr/src/rr.c       2007-10-05 05:57:12.000000000 +0000
+++ ../glusterfs-1.3.7/scheduler/rr/src/rr.c    2008-03-04 17:05:56.250635645 +0000
@@ -49,7 +49,11 @@
     trav_xl = trav_xl->next;
   }
   rr_buf->child_count = index;
-  rr_buf->sched_index = 0;
+
+
+  seed_random(); /* Replacement random number generator seed to use /dev/random */
+  rr_buf->sched_index = random()%index; /* Randomize the initial index */
+
   rr_buf->array = calloc (index + 1, sizeof (struct rr_sched_struct));
   trav_xl = xl->children;
   index = 0;

diff -urN glusterfs-1.3.7/scheduler/random/src/random.c ../glusterfs-1.3.7/scheduler/random/src/random.c
--- glusterfs-1.3.7/scheduler/random/src/random.c       2007-10-05 05:57:12.000000000 +0000
+++ ../glusterfs-1.3.7/scheduler/random/src/random.c    2008-03-04 17:06:19.815473947 +0000
@@ -29,7 +29,7 @@
   int32_t index = 0;

   /* Set the seed for the 'random' function */
-  srandom ((uint32_t) time (NULL));
+  seed_random();

   data_t *limit = dict_get (xl->options, "random.limits.min-free-disk");
   if (limit) {

--- glusterfs-1.3.7/libglusterfs/src/common-utils.c     2007-08-27 11:28:30.000000000 +0000
+++ ../glusterfs-1.3.7/libglusterfs/src/common-utils.c  2008-03-04 16:57:44.349142971 +0000
@@ -32,6 +32,7 @@
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <signal.h>
+#include <time.h>

 #include "logging.h"
 #include "common-utils.h"
@@ -272,3 +273,31 @@
 {

 }
+
+
+
+/* Use the random number generator, /dev/urandom, if present */
+
+void seed_random() {
+
+        FILE *fp;
+        int val;
+
+        fp=fopen("/dev/urandom","r");
+        if (!fp) {
+                gf_log ("rr", GF_LOG_CRITICAL,
+                        "seed_random is unable to open /dev/random, defaulting to time");
+                srandom(time(NULL));
+                return;
+        }
+        /* This should read in a 4 byte integer) */
+        fread(&val,sizeof(val),1,fp);
+        gf_log ("rr", GF_LOG_CRITICAL,
+                "Seeding seed_random with %d",val);
+        fclose(fp);
+        srandom(val);
+
+        return;
+}--- glusterfs-1.3.7/libglusterfs/src/common-utils.h     2007-08-02 20:05:10.000000000 +0000
+++ ../glusterfs-1.3.7/libglusterfs/src/common-utils.h  2008-03-04 16:45:44.867605453 +0000
@@ -59,6 +59,7 @@

 #define VECTORSIZE(count) (count * (sizeof (struct iovec)))

+
 #define LOCK_INIT(x)    pthread_spin_init (x, 0)
 #define LOCK(x)         pthread_spin_lock (x)
 #define UNLOCK(x)       pthread_spin_unlock (x)
@@ -170,5 +171,8 @@
   return newptr;
 }

+void seed_random();
+
 #endif /* _COMMON_UTILS_H */

+

+
+













--
Craig Tierney (craig.tierney@xxxxxxxx)




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux