Re: [PATCH 2/2] Fix cpgbench (large message sizes)

Angus Salkeld <asalkeld@xxxxxxxxxx> · Thu, 15 Dec 2011 08:15:46 +1100

On Wed, Dec 14, 2011 at 10:22:18AM -0700, Steven Dake wrote:
> On 12/14/2011 02:05 AM, Angus Salkeld wrote:
> > On Tue, Dec 13, 2011 at 09:49:52AM -0700, Steven Dake wrote:
> >> On 12/13/2011 07:11 AM, Angus Salkeld wrote:
> >>> To allow async cpg messages of 1M we need to:
> >>> 1) increase the totem queue size by 4
> >>
> >> multiple of 4
> >>
> >>> 2) align the critical level to one large message free
> >>>
> >>
> >> Add following to commit message unless you disagree:
> >>
> >> Kind of hacky.  Don't particularly like that the max message queue is
> >> 4MB instead of 1MB.
> >>
> >> Does this mean that 1MB will never be used in the buffer because of
> >> TOTEM_Q_LEVEL_CRITICAL?
> >>
> > 
> > There are a number of reasons for doing this:
> > 1)
> > We can't let cpg_mcast_joined() fail because the user will not see it
> > and will assume is has succeded.
> > 
> 
> Wouldn't this return ERR_TRY_AGAIN?  The only failure cpg_mcast_joined
> should have is TRY_AGAIN which is how we identify when flow control is
> locked.

Yes, but if the call makes it down to corosync to discover that another
client has gotten ahead of it then there is a problem.

> 
> > 2)
> > The reason I am getting good performance is by providing a negitive
> > feedback loop from the totem q and the IPC/poll system. This relies
> > on 4 q states low/med/high/crit. With messages of size 1M you
> > now have a q of size one and now go from level low to crit instantly
> > then back to low as messages are put on and taken off. I don't think
> > this is the best behaviour. By having a q size of 4 allows the system
> > to utilize the q better and give us time to respond to changes in
> > the q level.
> > 
> 
> got it
> 
> > 3)
> > To effective achieve flow control with a q of size 1 would require
> > all the clients to request the space on the q like is done in
> > totempg_groups_joined_reserve() but probably in shared memory
> > This would take quite a bit of re-work.
> > 
> 
> OK well go ahead and merge this then.  I am a bit concerned about the
> extra memory use (1mb->4mb per ipc connection).  f t becomes a problem
> we can always change it later.

The totem q is global, when I last checked.

> >> Reviewed-by: Steven Dake <sdake@xxxxxxxxxx>
> >>
> >> Regards
> >> -steve
> >>
> >>> Signed-off-by: Angus Salkeld <asalkeld@xxxxxxxxxx>
> >>> ---
> >>>  exec/totempg.c                    |   13 +++++++++----
> >>>  include/corosync/engine/coroapi.h |    6 ++++--
> >>>  include/corosync/totem/totem.h    |    2 +-
> >>>  3 files changed, 14 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/exec/totempg.c b/exec/totempg.c
> >>> index 3ece489..924979c 100644
> >>> --- a/exec/totempg.c
> >>> +++ b/exec/totempg.c
> >>> @@ -1185,6 +1185,11 @@ int totempg_groups_mcast_joined (
> >>>  	return (res);
> >>>  }
> >>>  
> >>> +#ifndef HAVE_SMALL_MEMORY_FOOTPRINT
> >>> +#undef MESSAGE_QUEUE_MAX
> >>> +#define MESSAGE_QUEUE_MAX	((4 * MESSAGE_SIZE_MAX) / totempg_totem_config->net_mtu)
> >>> +#endif /* HAVE_SMALL_MEMORY_FOOTPRINT */
> >>> +
> >>>  static void check_q_level(
> >>>  	void *totempg_groups_instance)
> >>>  {
> >>> @@ -1193,15 +1198,15 @@ static void check_q_level(
> >>>  	struct totempg_group_instance *instance = (struct totempg_group_instance *)totempg_groups_instance;
> >>>  
> >>>  	old_level = instance->q_level;
> >>> -	percent_used = 100 - (totemmrp_avail () * 100 / 800); /*(1024*1024/1500)*/
> >>> +	percent_used = 100 - ((totemmrp_avail () * 100) / MESSAGE_QUEUE_MAX);
> >>>  
> >>> -	if (percent_used > 90 && instance->q_level != TOTEM_Q_LEVEL_CRITICAL) {
> >>> +	if (percent_used >= 75 && instance->q_level != TOTEM_Q_LEVEL_CRITICAL) {
> >>>  		instance->q_level = TOTEM_Q_LEVEL_CRITICAL;
> >>>  	} else if (percent_used < 30 && instance->q_level != TOTEM_Q_LEVEL_LOW) {
> >>>  		instance->q_level = TOTEM_Q_LEVEL_LOW;
> >>> -	} else if (percent_used > 40 && percent_used < 60 && instance->q_level != TOTEM_Q_LEVEL_GOOD) {
> >>> +	} else if (percent_used > 40 && percent_used < 50 && instance->q_level != TOTEM_Q_LEVEL_GOOD) {
> >>>  		instance->q_level = TOTEM_Q_LEVEL_GOOD;
> >>> -	} else if (percent_used > 70 && percent_used < 80 && instance->q_level != TOTEM_Q_LEVEL_HIGH) {
> >>> +	} else if (percent_used > 60 && percent_used < 70 && instance->q_level != TOTEM_Q_LEVEL_HIGH) {
> >>>  		instance->q_level = TOTEM_Q_LEVEL_HIGH;
> >>>  	}
> >>>  	if (totem_queue_level_changed && old_level != instance->q_level) {
> >>> diff --git a/include/corosync/engine/coroapi.h b/include/corosync/engine/coroapi.h
> >>> index 567d14f..cabcbb3 100644
> >>> --- a/include/corosync/engine/coroapi.h
> >>> +++ b/include/corosync/engine/coroapi.h
> >>> @@ -72,15 +72,17 @@ struct corosync_tpg_group {
> >>>  
> >>>  #define INTERFACE_MAX 2
> >>>  
> >>> +#ifndef MESSAGE_QUEUE_MAX
> >>>  #ifdef HAVE_SMALL_MEMORY_FOOTPRINT
> >>>  #define PROCESSOR_COUNT_MAX	16
> >>>  #define MESSAGE_SIZE_MAX	1024*64
> >>>  #define MESSAGE_QUEUE_MAX	512
> >>>  #else
> >>>  #define PROCESSOR_COUNT_MAX	384
> >>> -#define MESSAGE_SIZE_MAX	1024*1024 /* (1MB) */
> >>> -#define MESSAGE_QUEUE_MAX	MESSAGE_SIZE_MAX / totem_config->net_mtu
> >>> +#define MESSAGE_SIZE_MAX	1024*1024
> >>> +#define MESSAGE_QUEUE_MAX	((4 * MESSAGE_SIZE_MAX) / totem_config->net_mtu)
> >>>  #endif /* HAVE_SMALL_MEMORY_FOOTPRINT */
> >>> +#endif /* MESSAGE_QUEUE_MAX */
> >>>  
> >>>  #define TOTEM_AGREED	0
> >>>  #define TOTEM_SAFE	1
> >>> diff --git a/include/corosync/totem/totem.h b/include/corosync/totem/totem.h
> >>> index 2166143..3d00318 100644
> >>> --- a/include/corosync/totem/totem.h
> >>> +++ b/include/corosync/totem/totem.h
> >>> @@ -44,7 +44,7 @@
> >>>  #else
> >>>  #define PROCESSOR_COUNT_MAX	384
> >>>  #define MESSAGE_SIZE_MAX	1024*1024 /* (1MB) */
> >>> -#define MESSAGE_QUEUE_MAX	MESSAGE_SIZE_MAX / totem_config->net_mtu
> >>> +#define MESSAGE_QUEUE_MAX	((4 * MESSAGE_SIZE_MAX) / totem_config->net_mtu)
> >>>  #endif /* HAVE_SMALL_MEMORY_FOOTPRINT */
> >>>  
> >>>  #define FRAME_SIZE_MAX		10000
> > _______________________________________________
> > discuss mailing list
> > discuss@xxxxxxxxxxxx
> > http://lists.corosync.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss