Re: multipath algorithm

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2006-03-15 at 16:33 -0500, Jody Shumaker wrote:
>
> First attempts were with gentoo 2.6.14 kernel sources, using the
> routes-2.6.14-12.diff patch.  I had the instability problems, which I
> eventually confirmed to have to do with routing. Specifically, if I
> wrote a script to run a series of "ip route get" commands for some
> range of ip's, it was guarenteed to cause a kernel panic. Same kernel
> panic would occur if I just left it running. I also could never get
> multipath routing to use anything but the the last nexthop route
> specified.
> 
> Figuring maybe the gentoo patchset was maybe conflicting indirectly(as
> there was no errors applying the patch itself) with julian's patch, i
> instead tried the vanilla source package in gentoo.  That gave me
> identical results.

Interesting, good to know since I will be there soon. Kernel panics are
a rather sever issue. 

> I haven't tried anything without his patches as in the end I just
> dropped my older connection as the verizon FiOS connection proved
> reliable.

Sweet, can't wait for FiOS to be offered in my market. I am sure that
won't happen for years to come if ever by Verizon.

>  Everything besides the multiple nexthops was working just
> fine,  my website could be accessed via either connection and
> responses went out over the right link etc.  Just the default gateway
> selection was broken, and the eventual kernel panic.

Hmm

> I didn't try to specifically, but he did eventually respond and only
> asked a simple question which had no relevance.

Wow, maybe he is onto other things these days or short of time. Julians
was very very helpful when I was trying to get things working back in
the day. Some of it got quite crazy, with route cache timing, and all
kinds of things I was messing with before I got things working. Most all
are in this lists archives. Some was off list.

>   Still i responded
> with an answer, and pointed out the symptoms that I had previously
> stated which ruled that out as a possible problem.
> http://mailman.ds9a.nl/pipermail/lartc/2006q1/017946.html

Hey the question he asked in that is relevant. Because arp stuff is very
much related to multipath, or multiple gateways. Since in my case I am
having some arp issues due to what I believe are replies going out the
wrong device the request came in on. To resolve  have made some static
arp entries for now. Maybe for ever not sure there.

Granted the question is not to relevant to kernel panics :)

> My response was not to the lartc list as it only repeated information
> already in the thread, namely that if i reversed the order of nexthops
> i could have the eth1 favored over ppp0 or the reverse. Shortly after
> this the thread died.

I assume you were flushing not only the route cache, but the arp cache
as well? In between switching them. However with your weights, it should
have been using the one much more often then than the other. Regardless
of position or order.

If you swapped them and did not fully flush everything out, it could
explain some of the behavior you were seeing. Granted it does not fully
explain why it would always use the second gateway and not the first. I
would assume it had to do with some cache or etc.

FYI, really in hind site with my past experience, and current trial and
error wows at times. I am starting to think when you are messing with
this stuff. It's best to shut down all interfaces, flush out everything.
Bringing everything back up from a clean, empty state. Then doing
comparisons.

Stuff get's put into cache so fast. That even when you flush, by the
time the next command has run. More than likely something has made it
into cache.

> I checked back and the exact comment was about fedora core 4:
> http://mailman.ds9a.nl/pipermail/lartc/2006q1/018121.html
> I also thought it could be conflicts with other patches, and hence why
> I tried with a vanilla source kernel.  Though I do admit i never tried
> directly downloading a source tarball myself and using that, I was
> always starting from a gentoo emerge.

Yeah, gentoo vanilla sources should be untainted. So you definitely
covered all bases there.

> The patches stated 2.6.14-15.  Unfortunately I don't remember any
> subrelease version #'s I was using under that, but i had even reviewed
> all patches used for those and all of them were minor security updates
> that I believe did not touch the networking code.

Yes, just mentioning that because it is stating a range on Julian's main
patch page. http://www.ssi.bg/~ja/#routes-2.6
But pretty sure you were using the correct patches and correct version.


> Yeah, I can see the reasons why its not included.  It's not quite
> something that could really be done as a toggleable module as it seems
> to require modifications all over the place from what I recall looking
> at the patches.

Yeah and most have no clue about multipath routing etc. Are totally
happy with 1 ISP. Most broadband providers seem to have good uptime
these days.

> In the end, I'm still not sure why the patches would not work for me. 
> At this point I'm guessing it is entirely possible some of my kernel
> config options conflicted with the changes.

I was starting to be curious about that myself. Maybe try to make the
kernel with no experimental stuff. Which might be impossible depending
on what support you need in the kernel ;)

>   It's also possible my
> config for routes was invalid, but the Kernel panics lead me to think
> otherwise, especially when noone had anything to say about my config
> on the list.

Regardless of configs or etc. You have two issues. One that the
multipath gateways did not use both gateways/links. Two that you have
kernel panics. Which I would be way more concerned with kernel panics
then the routing issues ;)

> Maybe someday I'll have 2 connections again and i'll actually feel up
> to trying to follow the kernel code and debugging the problem myself.

Well I might be there sooner than later. If all goes well, sometime this
year I will get another T1 from a different provider, and have redundant
lines here. I would have done that already as I did in CA using SDSL.
But this area only has 1 SDSL provider, Covad, and everyone else
re-sells their stuff. Or provides really low bandwidth SDSL lines.

In the mean time I might have to apply the patches to resolve some of my
arp issues. If I get kernel panics I will be upset, because I have not
seen those in years. Since the last time I was messing with all this.
But those were boot time panics.

For the record it is possible, I have done it. And once done, it's so
great. I literally had no issues, worries, downtime, etc for over a year
with it. It was so great, kept the machine around as an internal router.
Only recently decommissioned it. Good old LRP install and etc. Way
outdated and totally insecure. Now that my linux router is back
connected to a wan. I had to update. Not using a ramdisk atm, and
booting from and using a HD. Totally un cool. For now ;)

-- 
Sincerely,
William L. Thomson Jr.
Obsidian-Studios, Inc.
http://www.obsidian-studios.com

_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux