Re: [PATCH v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD

Thursday, 29 December 2022

On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul(a)paul-moore.com&gt; wrote:
...

 On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev(a)yandex.ru&gt;
 wrote:
 >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf(a)google.com&gt;
wrote:
 >> get_func_ip() */
 >>>> -                               tstamp_type_access:1; /* Accessed
 >>>> __sk_buff->tstamp_type */
 >>>> +                               tstamp_type_access:1, /* Accessed
 >>>> __sk_buff->tstamp_type */
 >>>> +                               valid_id:1; /* Is bpf_prog::aux::__id
valid? */
 >>>>    enum bpf_prog_type      type;           /* Type of BPF program */
 >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types
*/
 >>>>    u32                     len;            /* Number of filter blocks
*/
 >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
 >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog
*prog);
 >>>> void bpf_prog_put(struct bpf_prog *prog);
 >>>>
 >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
 >>>> +{
 >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid
eBPF program"))
 >>>> +               return 0;
 >>>> +       return prog->aux->__id;
 >>>> +}
 >>>
 >>> I'm still missing why we need to have this WARN and have a check at
all.
 >>> IIUC, we're actually too eager in resetting the id to 0, and need to
 >>> keep that stale id around at least for perf/audit.
 >>> Why not have a flag only to protect against double-idr_remove
 >>> bpf_prog_free_id and keep the rest as is?
 >>> Which places are we concerned about that used to report id=0 but now
 >>> would report stale id?
 >>
 >> What double-idr_remove are you concerned about?
 >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
 >> while __bpf_prog_put just dropped it to zero.
 >
 > (traveling, sending from an untested setup, hope it reaches everyone)
 >
 > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
 > tries to make offloaded program disappear from the idr when the netdev
 > goes offline. So I'm assuming that '!prog->aux->id' check in
bpf_prog_free_id
 > is to handle that case where we do bpf_prog_free_id much earlier than the
 > rest of the __bpf_prog_put stuff.
 >
 >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
 >> after perf_event_bpf_event and bpf_audit_prog ?
 >> Probably can remove the obsolete do_idr_lock bool flag as
 >> separate patch?
 >
 > +1 on removing do_idr_lock separately.
 >
 >> Much simpler fix and no code churn.
 >> Both valid_id and saved_id approaches have flaws.
 >
 > Given the __bpf_prog_offload_destroy path above, we still probably need
 > some flag to indicate that the id has been already removed from the idr?

 So what do you guys want in a patch?  Is there a consensus on what you
 would merge to fix this bug/regression? 
Can we try the following?

1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
kernel/bpf/offload.c; that should make it easier to reason about those
'!id' checks
2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
audit/perf in kernel/bpf/syscall.c (there are comments that say "must
be called first", but I don't see why; seems like GET_FD_BY_ID would
correctly return -ENOENT; maybe Martin can chime in, CC'ed him
explicitly)
3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true')

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [PATCH v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD