On 7 July 2022 20:15:07 EEST, Vladimir Oltean <olteanv@xxxxxxxxx> wrote: >Hi Nikolay, > >On Thu, Jul 07, 2022 at 05:08:15PM +0300, Nikolay Aleksandrov wrote: >> On 07/07/2022 00:01, Nikolay Aleksandrov wrote: >> > On 06/07/2022 23:21, Vladimir Oltean wrote: >> >> On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote: >> [snip] >> > I already said it's ok to add hard configurable limits if they're done properly performance-wise. >> > Any distribution can choose to set some default limits after the option exists. >> > >> >> Just fyi, and to avoid duplicate efforts, I already have patches for global and per-port software >> fdb limits that I'll polish and submit soon (depending on time availability, of course). If I find >> more time I might add per-vlan limits as well to the set. They use embedded netlink attributes >> to config and dump, so we can easily extend them later (e.g. different action on limit hit, limit >> statistics etc). > >So again, to repeat myself, it's nice to have limits on FDB size, but >those won't fix the software bridges that are now out in the open and >can't have their configuration scripts changed. > >I haven't had the time to expand on this in a proper change yet, but I >was thinking more along the lines of adding an OOM handler with >register_oom_notifier() in br_fdb_init(), and on OOM, do something, like >flush the FDB from all bridges. There are going to be complications, it >will schedule switchdev, switchdev is going to allocate memory which >we're low on, the workqueues aren't created with WQ_MEM_RECLAIM, so this >isn't necessarily going to be a silver bullet either. But this is what >concerns me the most, the unconfigured bridge killing the kernel so >easily. As you can see, with an OOM handler I'm not so much trying to >impose a fixed limit on FDB size, but do something sensible such that >the bridge doesn't contribute to the kernel dying. Hi Vladimir, Sounds good to me, the fdb limits have come up multiple times in the past so I decided to finally add them and build from there, with them configured oom shouldn't be hit. These limits have never been present and people are fine (everyone deals with or leaves it), but I'll be happy to review and ack such changes. I hope you can correlate the oom and the bridge fdbs, not just blindly flushing as that can be problematic if you plan to have it enabled by default. Cheers, Nik