Re: NSR design document

Manoj Pillai <mpillai@xxxxxxxxxx> · Wed, 14 Oct 2015 15:11:16 -0400 (EDT)

----- Original Message -----
> > "The reads will also be sent to, and processed by the current
> > leader."
> > 
> > So, at any given time, only one brick in the replica group is
> > handling read requests? For a read-only workload-phase,
> > all except one will be idle in any given term?
> 
> By default and in theory, yes.  The question is: does it matter in practice?
> If you only have one replica set, and if you haven't turned on the option
> to allow reads from non-leaders (which is not the default because it does
> imply a small reduction in consistency across failure scenarios), and if the
> client(s) bandwidth isn't already saturated, then yeah, it might be slower
> than AFR.  Then again, even that might be outweighed by gains in cache
> efficiency and avoidance of any need for locking.  In the absolute worst
> case, we can split bricks and create multiple replica sets across the same
> bricks, each with their own leader.  That would parallelize reads as much as
> AFR, while still gaining all of the other NSR advantages.
> 
> In other words, yes, in theory it could be a problem.  In practice?  No.

Or maybe: in theory, it shouldn't be a problem in practice :).

We _could_ split bricks to distribute the load more or less evenly. So 
what would naturally be a replica-3/JBOD configuration (i.e. each 
disk is a brick, multiple bricks per server), could be changed 
to carve out 3 bricks out of each disk to distribute load 
(otherwise 2/3 of the disks would be idle in said read-only 
workload phase, IIUC). Such carving could have its downsides though. 
E.g. 3x number of bricks could be a problem if workload has 
operations that don't scale well with brick count. Plus the brick 
configuration guidelines would not exactly be elegant.

FWIW, if I look at the performance and perf regressions tests 
that are run at my place of work (as these tests stand today), I'd 
expect AFR to significantly outperform this design on reads.

-- Manoj

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel