Request for Feedback on Design Issue with Systemd and "Consistent Network Device Naming"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

    I wonder if you can help. I'm trying to find a contact in systemd dev who has been involved in the "Consistent Network Device Naming" initiative.

As a HPC compute architect I was surprised to come across some changes in RHEL8 when testing that seem to originate from systemd work.

While I applaud the initiative, I think that there has been some fundamental oversight on real world use cases for network device management.

Rather than create a more *consistent* OS environment for applications the implementation will, in the real world,  make the environment fundamentally more confusing divergent for users.  More importantly for commercial businesses there will be a $ impact on managing the the changes in the data center and require people to invalidate commercial support by disabling the feature via a kernel bootstrap argument net.ifnames=0 to disable the feature.


### PROBLEM ###

The issue is around the depreciation of the support for the HWADDR= argument in the ifcfg files (RHEL, other distros are available).

This feature is used in the real world to migrate device names around physical NIC cards and ports in *order to create a more consistent environment* for application users in multi homed servers. In HPC one of the challenges we face is the fact that our server farms are depreciated over 3-5 years and during that time capacity expansions mean we don't have 100% consistent hardware, especially when it comes to NIC implementation.  Dedicated on board, discrete PCIe NIC Cards and Server flex-lom/riser and their firmware's are constantly changing with their version iterations. This means that the systemd project can never control server HW manufactures;

1. PCIe implementations and lane allocations to specific slots on the motherboard. 2. Decision on the number of "on board" Chipset NIC's (typically RJ45 1GbE though i'm sure we will soon see 10GbE SFP+ becoming a norm) 3. Default "FlexLom/Riser" cards (can vary from 2 to 4 1GbE Ports or 1 to 2 x 10GbE SFP+ Ports) 4. Ports numbers (RJ45 and SFP+) they put by default on NIC Manufacturers cards as their model iterations increase. 5. Firmware changes on NIC cards that can affect the order of the initialization of ports on the PCIe bus for each CPU. 6. The OEM relationships with NIC makers that servers manufactures have where on board and flexlom NIC Chipsets change regularly with each base revision (broadcom, realtek,  Intel etc )

Now in HPC one of the biggest challenge we face is to maximize performance on the increasing amount of compute cores we get per socket per and to maximize efficiency and lower latency. In order to do this a common approach (see attached diagrams for use cases) is to separate data flows into an ingress and egress paradigm. We can then use multi homed servers with discrete PCIe high performance NIC's exploiting full bandwidth 16 lane's going directly into a processors. Dual socket servers allow then us to split the compute data flows into reader and writer threads and dedicate a Processor, DDR RAM Banks, and a NIC Card for each thread type. Typically a sweet spot is a Dual Socket white box server where HPC Designers in the OS Space target interfaces for functional roles

Processor O ->  PCIe Slot 1 (Full 16 lane) => Ingress Threads.
Processor 1 ->  PCIe Slot 4 (Full 16 lane) => Egress Threads.

Now because of all the issues listed (1-6) we can *never* guarantee which interface device name Linux will allocate to these key NIC ports. And yet we want to create a consistent environment for the application team to know which processor and interface they need to pin their processes to. They need to know this in order to minimize memory NUMA latency and irrelevant NIC interrupts.

How HPC architects try to help sysadmins and application teams in the process is to have post build modifications. Here we can use the HWADDR= variable in the ifcfg-[device name] files to move a *specific* device name to these targeted NIC cards and ports. This way application teams can always associated a *specific* device name for a specific functional purpose (Feed,Backbone,Access) and know for them where to tie their reader and writer threads. Also we can always standardize that a given interface is always the "default route" interface for a specific server blueprint.

It would appear in RHEL8 that due to systemd the HWADDR= is no longer supported and we have lost this fundamentally important feature.

### REQUIREMENT ###

Sysadmins and HPC Designers need a supported way to either swap / move kernel allocated device names around the physical NIC Cards and ports to create consistent compute environments. The HWADDR= solution was rather brutal, but effective way of achieving this but it would appear now that this is no longer supported in systemd. A better solution would be the support for the user to define unique device names for NIC card interfaces to they can be more explicit in their naming conventions


e.g.

Ethernet:     enofeed1.0, enofeed1.1, enoback1.0, enoback1.1, enoaccess1.0, enoaccess1.1, Infiniband:     ibofeed1.0, ibofeed1.1, iboback1.0, iboback1.1, iboaccess1.0, iboaccess1.1,

### THE FUTURE ###
The industry is moving towards moving compute *closer to the network* and NIC Cards are having FPGA, DDR Memory Banks, GPU, Many-Core all integrated on the PCB attached to the PCIe slot. The Linux kernel needs to enable sysadmins and HPC architects to create consistent compute environments across heterogeneous server environments.

Who can I discuss these design issues with in the systemd space ?

Yours Sincerely
Axel

Attachment: ConsistantNaming3.png
Description: PNG image

Attachment: ConsistantNaming2.png
Description: PNG image

Attachment: ConsistantNaming1.png
Description: PNG image

_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux