On 15.06.2024 16:58, Mikhail Morfikov wrote:
On 15/06/2024 2.27 pm, Andrei Borzenkov wrote:
On 15.06.2024 14:02, Mikhail Morfikov wrote:
But there's no curl pids in /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/cgroup.procs .
To be more specific, there's no pids at all in this cgroup.procs file. The curl pids are under
# cat /sys/fs/cgroup/morfikownia/user/curl/pids.current
1
# cat /sys/fs/cgroup/morfikownia/user/curl/cgroup.procs
44907
And this cgroup path (morfikownia/user/curl/) is permitted in nftables, and
yet packets sometimes are visible like they had user.slice/user-1000.slice/user@1000.service/
path set. Why?
Because curl starts in this hierarchy and attempts network connection before your daemon moves curl into different cgroup. It is just as good stab in the dark as any other.
No, it's not like this. When curl attempts to access the internet, it sends
SYN packet, which is dropped in nftables because of the wrong cgroup path.
If what you say was true, then the next (or any other) SYN packet would be
accepted, since the pid is in the right cgroup path now, which is permitted in
nftabels.
But when I watch the nftables logs, I see something like this:
Jun 15 15:30:57 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52657 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A96453BC0000000000103030E) UID=1000 GID=1000
Jun 15 15:30:59 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52658 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A96453FCB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:00 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52659 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A964543CB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:01 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52660 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A964547CB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:02 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52661 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A96454BCB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:03 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52662 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A96454FCB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:05 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52663 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A964557CB000000000103030E) UID=1000 GID=1000
Jun 15 15:31:09 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52664 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A9645678B000000000103030E) UID=1000 GID=1000
Jun 15 15:31:17 morfikownia kernel: * cgroup * IN= OUT=bond0 SRC=192.168.1.150 DST=212.77.98.9 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52665 DF PROTO=TCP SPT=41760 DPT=80 SEQ=3391855235 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A964588CB000000000103030E) UID=1000 GID=1000
Pay attention to the timestamp. All the packets comes from the same curl
connection. So we have beginning at 15:30:57 and end at 15:31:17 (20s window),
and then was ctrl+c, because it's not going to work.
So the pid is in the right cgroup path for sure before sending the SYN packets.
If the very first SYN packet was dropped, that would make sense, I mean the
theory with the app accessing net before cgrulesengd moves the pid. But we have
20s, the pid is in the right cgroup and sometimes it works, and sometimes it
doesn't, I mean curl is able to access the net or not. And that's weird.
Not really. nftables checks the *socket* cgroup, not the *process*
cgroup. The socket may have been created while process was in the old
cgroup.
I do not know whether kernel attempts to also move all process sockets
to the new cgroup. I suspect not, but that is most certainly the
question to the kernel folks.
See my other response about atomically placing a process to some
pre-existing cgroup from the very beginning.
It looks like the cgroup path isn't updated for some reason -- that's my blind
guess, because the pid is in the right place, the nftables rule works, and yet
the cgroup path "internally somewhere" is user.slice/user-1000.slice/user@1000.service/
instead of the right one, where the pid was moved. I bet there's a bug somewhere.