Re: [PATCH 03/18] ipvs: zero percpu stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le dimanche 06 mars 2011 Ã 14:18 +0200, Julian Anastasov a Ãcrit :
> 	Hello,
> 
> On Sun, 6 Mar 2011, Eric Dumazet wrote:
> 
> >>  	Zero the new percpu stats because we copy from there.
> >>
> >> Signed-off-by: Julian Anastasov <ja@xxxxxx>
> >> Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx>
> >> ---
> >>  net/netfilter/ipvs/ip_vs_ctl.c |   17 +++++++++++++++++
> >>  1 files changed, 17 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> >> index a2a67ad..fd74527 100644
> >> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> >> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> >> @@ -715,8 +715,25 @@ static void ip_vs_trash_cleanup(struct net *net)
> >>  static void
> >>  ip_vs_zero_stats(struct ip_vs_stats *stats)
> >>  {
> >> +	struct ip_vs_cpu_stats *cpustats = stats->cpustats;
> >> +	int i;
> >> +
> >>  	spin_lock_bh(&stats->lock);
> >>
> >> +	for_each_possible_cpu(i) {
> >> +		struct ip_vs_cpu_stats *u = per_cpu_ptr(cpustats, i);
> >> +		unsigned int start;
> >> +
> >> +		/* Do not pretend to be writer, it is enough to
> >> +		 * sync with writers that modify the u64 counters
> >> +		 * because under stats->lock we are the only reader.
> >> +		 */
> >> +		do {
> >> +			start = u64_stats_fetch_begin(&u->syncp);
> >> +			memset(&u->ustats, 0, sizeof(u->ustats));
> >> +		} while (u64_stats_fetch_retry(&u->syncp, start));
> >
> >
> > Sorry this makes no sense to me.
> 
>  	Hm, yes, the comment is a little bit misleading.
> I fixed it below...
> 
> > This code _is_ a writer, and hardly a hot path.
> 
>  	Yes, the picture is as follows:
> 
> - in 2.6.38-rc we remove the global spin lock (stats->lock)
> from packet processing which is a hot path, adding percpu
> counters instead
> 
> - we need protection for percpu counters and for the sum
> 
> - the chain is: interrupts increment percpu counters, the
> estimation timer reads them and creates sum every 2 seconds,
> then user context can read the sum or even to show the percpu
> counters, not to forget the zeroing of sum and counters
> 
> The players in detail:
> 
> - packet processing:
>  	- softirq context, hot path
>  	- increments counters by using u64_stats_update_begin and
>  	u64_stats_update_end, does not wait readers or zeroing
>  	- sum not touched, stats->lock usage removed in 2.6.38-rc
> 
> - 2-second estimation timer:
>  	- funcs: estimation_timer()
>  	- timer context, softirq
>  	- reads percpu counters with u64_stats_fetch_begin and
>  	u64_stats_fetch_retry to sync with counter incrementing
>  	- uses spin_lock (stats->lock) to protect the written sum
>  	which is later read by user context: provides
>  	at least u64 atomicity but additionally the relation
>  	between packets and bytes
> 
> - sum readers:
>  	- funcs: ip_vs_stats_show(), ip_vs_stats_percpu_show(),
>  	ip_vs_copy_stats(), ip_vs_genl_fill_stats()
>  	- user context, not a hot path
>  	- uses spin_lock_bh (stats->lock) for atomic reading of
>  	the sum created by estimation_timer()
> 
> - show percpu counters:
>  	- funcs: ip_vs_stats_percpu_show()
>  	- user context, not a hot path
>  	- uses u64_stats_fetch_begin_bh and u64_stats_fetch_retry_bh
>  	to synchronize with counter incrementing
>  	- still missing: should use spin_lock_bh (stats->lock)
>  	to synchronize with ip_vs_zero_stats() that modifies
>  	percpu counters.
> 
> - zero stats and percpu counters
>  	- funcs: ip_vs_zero_stats()
>  	- user context, not a hot path
>  	- uses spin_lock_bh (stats->lock) while modifying
>  	sum but also while zeroing percpu counters because
>  	we are a hidden writer which does not allow other
>  	percpu counter readers at the same time but we are
>  	still synchronized with percpu counter incrementing
>  	without delaying it
> 
> To summarize, I see 2 solutions, in order of preference:
> 
> 1. all players except packet processing should use stats->lock
> when reading/writing sum or when reading/zeroing percpu
> counters. Use u64_stats to avoid delays in incrementing.
> 
> 2. Use seqlock instead of u64_stats if we want to treat the
> percpu counters zeroing as writer. This returns us before
> 2.6.38-rc where we used global stats->lock even for counter
> incrementing. Except that now we can use percpu seqlock
> just to register the zeroing as writer.
> 
> > Why try to pretend its a reader and confuse people ?
> >
> > Either :
> >
> > - Another writer can modify the counters in same time, and we must
> > synchronize with them (we are a writer after all)
> 
>  	Global mutex allows only one zeroing at a time.
> But zeroing runs in parallel with incrementing, so we
> have 2 writers for a per-CPU state. This sounds like
> above solution 2 with percpu seqlock? But it adds extra
> spin_lock in hot path, even if it is percpu. It only
> saves the spin_lock_bh while reading percpu counters in
> ip_vs_stats_percpu_show(). That is why a prefer solution 1.
> 
> > - Another reader can read the counters in same time, and we must let
> > them catch we mihjt have cleared half of their values.
> 
>  	Yes, zeroing can run in parallel with /proc reading,
> that is why I now try to serialize all readers with the
> stats spin lock to guarantee u64 atomicity.
> 
> > - No reader or writer can access data, no synch is needed, a pure
> > memset() is OK.
> 
>  	Packet processing can damage the counters while we
> do memset, so we need at least u64_stats_fetch_* to sync
> with incrementing.
> 

OK I now understand what you wanted to do.

Problem is you do synchronize your memset() with a concurrent writer but
one way only. (You detect a writer did some changes on the counters
while you memset() them), but a writer has no way to detect your writes
(could be partially committed to main memory) : It could read a
corrupted value.

I feel memory barriers are wrong and not really fixable without slowing
down the hot path.

As implied in include/linux/u64_stats_sync.h file, a "writer" should be
alone :)

One other way to handle that (and let hot path packet processing without
extra locking) would be to never memset() this data, but use a separate
"summed" value as a relative point, and substract this sum to the
current one (all this in slow path, so not a problem)



--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux