Re: [PATCH] bpf: Unregister fentry when bpf_trampoline_unlink_prog fails to update image

Chen Zhongjin <chenzhongjin@xxxxxxxxxx> · Thu, 27 Apr 2023 09:40:50 +0800

On 2023/4/27 2:17, Yonghong Song wrote:


On 4/26/23 2:55 AM, Chen Zhongjin wrote:
In bpf_link_free, bpf trampoline will update the image and remove the
unlinked prog.

bpf_trampoline_unlink_prog is committed as 'never fail', however it 
depends
on the result of image update. It is possible to fail if memory 
allocation
fail in bpf_trampoline_update.

Could you give more details which memory allocation fail here?
bpf_tramp_image_alloc()? Are you using some error injection or
this happens in your production workload?

I guess it's an error injection because syzkaller reported this.


The error result of bpf_trampoline_update can't be passed to 
bpf_link_free
because link release callback returns void. Then it will free the prog
whether image updating is successful or not.
If the old image tries to call a freed prog, it makes kernel panic.

     BUG: unable to handle page fault for address: ffffffffc04a8d20
     #PF: supervisor instruction fetch in kernel mode
     #PF: error_code(0x0010) - not-present page
     RIP: 0010:0xffffffffc04a8d20
     Code: Unable to access opcode bytes at RIP 0xffffffffc04a8cf6.
     ...
     Call Trace:
     ? bpf_trampoline_78223_0
     bpf_traced_function
     ...

Could you explain how 'the old image tries to call a freed prog'?
IIUC, the previous bpf_link_free() should not be available to
call the bpf prog, right?

What I mean here is, if failed to update the image, the image keeps 
unchanged but the unlinked prog will be freed later.
Next time when it enter the trampoline the image will call freed prog.


Fix this when bpf_trampoline_update failed in bpf_trampoline_unlink_prog,
unregister fentry to disable the trampoline. Then other progs on the
trampoline can be unlinked safely and finally the trampoline will be
released.

Do we still leak tr->cur_image here?

No, bpf_tramp_image_put() will free everything when all progs_cnt 
decline to zero in bpf_trampoline_update(). It is a release function, 
but called 'put'.


Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code")

If the above is a refactoring patch, you should not use that
as 'Fixes' patch, you should pick one truely introduced the issue.

Signed-off-by: Chen Zhongjin <chenzhongjin@xxxxxxxxxx>
---
  kernel/bpf/trampoline.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index d0ed7d6f5eec..6daa93b30e81 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -604,7 +604,10 @@ static int __bpf_trampoline_unlink_prog(struct 
bpf_tramp_link *link, struct bpf_
      }
      hlist_del_init(&link->tramp_hlist);
      tr->progs_cnt[kind]--;
-    return bpf_trampoline_update(tr, true /* lock_direct_mutex */);
+    err =  bpf_trampoline_update(tr, true /* lock_direct_mutex */);
+    if (err && tr->cur_image)
+        unregister_fentry(tr, tr->cur_image->image);

If error happens for the all subsequent programs,
unregister_fentry() will be called multiple times. Any side effect?
It will fail with no side effect. Actually if there is no error, 
modify_fentry() will fail in update() as well. The fentry is available 
until all progs are unlinked and the broken image is freed by 
bpf_tramp_image_put().

However with an extra state to record this happens, it's possible to 
re-register the fentry with new image when the next link/unlink calls 
update(). It will generate a new image and replace/free the error one.

Overall, I think this is an extreme corner case which happens
when kernel memory is extreme tight. If this is the case, not
sure whether it is worthwhile to fix it or not.

Yes, it's a really rare case. I'm just not sure whether it needs some 
best-effort to avoid kernel panic at this point.

If you think it's not necessary. Just let it go.

Thanks for your time!

Best,
Chen
+    return err;
  }
  /* bpf_trampoline_unlink_prog() should never fail. */