Re: [PATCH] sctp: avoid refreshing heartbeat timer too often

"'Marcelo Ricardo Leitner'" <marcelo.leitner@xxxxxxxxx> · Thu, 31 Mar 2016 18:25:12 -0300

On Thu, Mar 31, 2016 at 11:16:52AM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 30 March 2016 13:13
> > Em 30-03-2016 06:37, David Laight escreveu:
> > > From: Marcelo Ricardo Leitner
> > >> Sent: 29 March 2016 14:42
> > >>
> > >> Currently on high rate SCTP streams the heartbeat timer refresh can
> > >> consume quite a lot of resources as timer updates are costly and it
> > >> contains a random factor, which a) is also costly and b) invalidates
> > >> mod_timer() optimization for not editing a timer to the same value.
> > >> It may even cause the timer to be slightly advanced, for no good reason.
> > >
> > > Interesting thoughts:
> > > 1) Is it necessary to use a different 'random factor' until the timer actually
> > >     expires?
> > 
> > I don't understand you fully here, but we have to have a random factor
> > on timer expire. As noted by Daniel Borkmann on his commit 8f61059a96c2
> > ("net: sctp: improve timer slack calculation for transport HBs"):
> 
> When a HEARTBEAT chunk is sent determine the new interval, use that
> interval until the timer actually expires when a new interval is
> calculated. So the random number is only generated once per heartbeat.
> 
> >      RFC4960, section 8.3 says:
> > 
> >        On an idle destination address that is allowed to heartbeat,
> >        it is recommended that a HEARTBEAT chunk is sent once per RTO
> >        of that destination address plus the protocol parameter
> >        'HB.interval', with jittering of +/- 50% of the RTO value,
> >        and exponential backoff of the RTO if the previous HEARTBEAT
> >        is unanswered.
> > 
> > Previous to his commit, it was using a random factor based on jiffies.
> > 
> > This patch then assumes that random_A+2 is just as random as random_B as
> > long as it is within the allowed range, avoiding the unnecessary updates.
> > 
> > > 2) It might be better to allow the heartbeat timer to expire, on expiry work
> > >     out the new interval based on when the last 'refresh' was done.
> > 
> > Cool, I thought about this too. It would introduce some extra complexity
> > that is not really worth I think, specially because now we may be doing
> > more timer updates even with this patch but it's not triggering any wake
> > ups and we would need at least 2 wake ups then: one for the first
> > timeout event, and then re-schedule the timer for the next updated one,
> > and maybe again, and again.. less timer updates but more wake ups, one
> > at every heartbeat interval even on a busy transport. Seems it's cheaper
> > to just update the timer then.
> 
> One wakeup per heartbeat interval on a busy connection is probably noise.
> Probably much less than the 1000s of timer updates that would otherwise happen.

I was thinking more on near-idle systems, as the overhead for this
refresh looked rather small now even for busy transports if compared to
other points, a worth trade-off for reducing wake ups, imho.

But then I checked tcp, and it does what you're suggesting.
I'll rework the patch. Thanks

> A further optimisation would be to restart the timer if more than (say) 80%
> of the way through the timeout period.
> 
> Similarly the HEARTBEAT could be sent if the 2nd wakeup would be almost immediate.

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html