Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/15, Stanislav Fomichev wrote:
On 06/15, sdf@xxxxxxxxxx wrote:
> On 06/15, Maciej Żenczykowski wrote:
> > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@xxxxxxxxxx>
> > wrote:
> > > > >
> > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > > > cherrypicking that specific stable 5.18.x patch [
> > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> > > ...
> > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added
> > EFAULT in 5.16
> > >
> > > There are no such sha-s in the upstream kernel.
> > > Sorry we cannot help with debugging of android kernels.
>
> > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
> > internal branch of
> > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return
> > value'
> > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93
>
> > Anyway, I think it's unrelated - or at least not the immediate root cause.
>
> > Also there's *no* Android kernels involved here.
> > This is the android net tests failing on vanilla 5.18 and passing on
> > 5.18.3.
>
> Yeah, sorry, didn't mean to send those outside :-)
>
> Attached un-android-ified testcase. Passes on bpf-next, trying to see
> what happens on vanilla 5.18. Will update once I get more data..

I've bisected the original issue to:

b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
syscall return value")

And I believe it's these two lines from the original patch:

  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
  	({						\
@@ -1398,10 +1398,12 @@ out:
  		u32 _ret;				\
  		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
  		_cn = _flags & BPF_RET_SET_CN;		\
+		if (_ret && !IS_ERR_VALUE((long)_ret))	\
+			_ret = -EFAULT;	

_ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) returns false in this case because it doesn't sign-expand the argument and internally
does ffff_ffff >= ffff_ffff_ffff_f001 comparison.

I'll try to see what I've changed in my unrelated patch to fix it. But I think
we should audit all these IS_ERR_VALUE((long)_ret) regardless; they don't
seem to work the way we want them to...

Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'.

So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret).

Sigh..




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux