Re: Buggy UDEV?

Alan Jenkins <sourcejedi.lkml@xxxxxxxxxxxxxx> · Sun, 18 Oct 2009 23:28:17 +0100

On 10/18/09, dragondaddy@xxxxxxxxxxxx <dragondaddy@xxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On Tue, 13 Oct 2009 15:26:52 -0600 Alan Jenkins
> <sourcejedi.lkml@xxxxxxxxxxxxxx> wrote:
>>On 10/13/09, dragondaddy@xxxxxxxxxxxx <dragondaddy@xxxxxxxxxxxx>
>>wrote:
>>> starts.  The rules for udev worked fine form a couple of months
>>> until I put in a new hard drive to use as an NFS drive.  And
>>then
>>> it went to crap. Needless to say the random renaming of the
>>> interfaces screws up my bridging script and is unacceptable
>>> behavior.
>>>
>>> Udev began renaming the interfaces!! What's the point of having
>>> rules file if they are not rules?  Here is the current rules
>>file:
>>>
>>>
>>>
>>> # PCI device 0x10ec:0x8139 (8139too)
>>> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
>>> ATTR{address}=="00:13:8f:ad:bf:06", ATTR{type}=="1",
>>> KERNEL=="eth*", NAME="eth0"
>>>
>>> # PCI device 0x1106:0x3106 (via-rhine)
>>> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
>>> ATTR{address}=="00:40:f4:9c:ec:dd", ATTR{type}=="1",
>>> KERNEL=="eth*", NAME="eth1"
>>>
>>> # PCI device 0x1106:0x3106 (via-rhine)
>>> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
>>> ATTR{address}=="00:e0:4c:6a:4e:f2", ATTR{type}=="1",
>>> KERNEL=="eth*", NAME="eth2"
>>>
>>>
>>>
>>> Here is dmesg | grep eth from the last boot:
>>>
>>> eth0: VIA Rhine III at 0xff0ff400, 00:40:f4:9c:ec:dd, IRQ 21.
>>> eth0: MII PHY found at address 1, status 0x786d advertising 05e1
>>> Link 4de1.
>>> eth1: RealTek RTL8139 at 0xc800, 00:13:8f:ad:bf:06, IRQ 22
>>> eth2: RTL8169sb/8110sb at 0xf8910800, 00:e0:4c:6a:4e:f2, XID
>>> 10000000 IRQ 23
>>> udev: renamed network interface eth0 to eth1
>>> udev: renamed network interface eth1_rename to eth0
>>> eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
>>> eth1: link up, 100Mbps, full-duplex, lpa 0x4DE1
>>> r8169: eth2: link up
>>> eth0: no IPv6 routers present
>>> eth1: no IPv6 routers present
>>> eth2: no IPv6 routers present
>>> device eth1 entered promiscuous mode
>>> device eth2 entered promiscuous mode
>>> brdg1: port 2(eth2) entering learning state
>>> brdg1: port 1(eth1) entering learning state
>>>
>>> It's clear that the rule says that eth0 should be associated
>>with
>>> MAC address "00:13:8f:ad:bf:06" udev ignored this rule and
>>> associated eth0 with "00:40:f4:9c:ec:dd".  And then it decide to
>>> just ignore the rest of the rules and swap around the interfaces
>
>>at
>>> random.
>>
>>Um, no?  I think your problem is happening later on.  What you've
>>shown so far makes perfect sense
>>
>>> eth0: VIA Rhine III at 0xff0ff400, 00:40:f4:9c:ec:dd, IRQ 21.
>>> eth0: MII PHY found at address 1, status 0x786d advertising 05e1
>>> Link 4de1.
>>> eth1: RealTek RTL8139 at 0xc800, 00:13:8f:ad:bf:06, IRQ 22
>>> eth2: RTL8169sb/8110sb at 0xf8910800, 00:e0:4c:6a:4e:f2, XID
>>> 10000000 IRQ 23
>>
>>These are the devices as created by the kernel.  Obviously udev
>>has no
>>influence over these names; it is not part of the kernel.  All
>>udev
>>can do is rename the devices once they are created...
>>
>>> udev: renamed network interface eth0 to eth1
>>> udev: renamed network interface eth1_rename to eth0
>>
>>like so.  The 8139 device started off as eth1, so udev renamed it
>>to
>>eth0 in accordance with your rules.  And the same with eth1 ->
>>eth0.
>>
>>Regards
>>Alan
>
> I don't believe this to be accurate.  If it was accurate then it
> would have had eth0 on the cable modem and then had a dhcp
> response.  This was not the case.  I fixed the problem by removing
> the SATA HD.  And all was right with the world again.  If adding
> and removing a hard drive changes the way network interface
> associate it's got to be a bug somewhere.
>
> Also if what you said is true then why did it not change the name
> of the interfaces for months?  It is the same rule set.  It only
> started this behavior when I added an sata hard drive.
>
> I have worked around the problem by removing the smaller IDE drive
> and just installed the terabyte sata drive ( and slackware ).  I
> made a small partition on the sata drive for the system and then
> shared the rest of it with nfs.  Remade the rule set, actually just
> copied it from a usb drive, and it now doesn't change the
> interfaces.
>
> I know I'm just a retired IVR programmer and not a kernel hacker
> but when all I do is add a hard drive and the networking system
> blows up there is a bug somewhere.  And if you cruise the net
> looking for this you will find I am far from alone.
>
> This is in reality an attempt to make it, UDEV, better.  I am now
> having another problem with UDEV with adding hard drives ( on a
> different machine).  When there is only one hard drive in the
> system it is sda with the first partition sda1. But add a few hard
> drives and it no longer is sda.  So when you reboot after adding
> the drives the boot loader can't load the system because it doesn't
> know that sda is no longer sda.  If this is the intended function
> it's rather dysfunctional. There should be consistency between
> reboots even if the hardware has had changes.  In other words the
> first hard drive should always be the first hard drive unless a
> rule, written by the user, changed it.  I suspect this is all the
> same code somewhere that is doing this. It is changing devices
> around for reasons only it knows.  In the small little world I come
> from that's a bug.
>
>
> I just took another look at the dmesg I sent along.  The part of
> the rule set that I posted that has the comments after "#" are just
> the remnants of what I cut and pasted when I made my rules set.  I
> then looked at the changes that the interfaces went through.  This
> has to be at the kernel level ( you're right it's not UDEV but the
> kernel).  The kernel changed the interfaces and UDEV tried to clean
> up after.  The same thing is going on with the hard drive problem.
> Maybe you would be so kind as to pull on the chain for whom ever is
> responsible for how devices get named in the kernel.  If this is
> the intended action they need to flush out their head gear.  It's a
> total PITA for those of us that try to actually use the system.
>
> I suspect that there were changes made to the kernel when UDEV was
> added.  It is those changes that are the root of the problem. That
> makes looking for the problem a little easier.
>
> Thank you
>
> Joseph

I don't feel obliged to respond with civility or considered
helpfulness if you rant at me off-list.  I am not paid for this.  The
people who are paid here have already pattern-matched you as being
almost certainly wrong, and an incredible waste of effort to talk to.

I have no idea what IVR stands for.  You know the saying; on the
internet, no-one knows you're a dog.  If you're an experienced
computer programmer, you should really have a better idea of what
makes a good bug report.  I assume you're actually interested in
tracking down the bug you found - <end rant><civilize
vibes="positive">

it would be useful to identify exactly what renaming had occurred.
I.e. look at "ifconfig -a" for both the "good ordering" and the "bad
ordering".  And the same for the kernel logs - showing the "bad" log
messages is much less useful than comparing the "good" and "bad"
messages.

Here's a short and generic explanation of naming -

The kernel provides no guarantees about device ordering.  Hardware can
change the order, sometimes in completely screwy and unpredictable
ways and even without user intervention.  There is nothing that the
kernel can do about this.  The kernel has no state (e.g. it doesn't
have a config file), so it can't tell what the previous order was.

Given the right rule, udev will rename an interface with a given MAC
address, to e.g. "eth0".  This is the only way to ensure a fixed
ordering.  Udev is often shipped with rules which generate _other_
rules, so that it automtically remembers the original ordering of
network devices and will preserve it.

That method is *only* appropriate for network devices - they are
special, because they do not appear in the filesystem.  For nodes
which appear in the filesystem, it is _not_ possible to rename them
and tell the kernel the new name.  If you did rename them, the names
logged in kernel messages and shown in /sys would be out of sync with
/dev, and there would be potential for great confusion.

Instead, we have symlinks,e.g. as shown by "ls -lR /dev/disk".
"mount" appears to provide shortcut syntax for LABEL and UUID
symlinks; I suppose it's a convenient abstraction.

So, given what you've shown me so far, there doesn't seem to be
anything wrong with the names of your devices.  The disk naming is
expected behavior.  What would be broken is if you have a setup which
hardcodes "sda1" instead of a more meaningful name, if you want it to
automagically continue working when you rearrange the hardware.

It's possible but unlikely that the network devices are being renamed
incorrectly after the messages you've shown.  This is what I was
trying to encourage you to rule out (or in).  I'm really sorry if you
didn't know how to list your network devices, but if you start by
explaining that your network devices have the wrong names, it does
tend to imply that you checked them :)

Good luck
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html