On Mon, Dec 26, 2022 at 7:35 PM Stanislav Fomichev <stfomichev(a)yandex.ru> wrote:
> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf(a)google.com> wrote:
> get_func_ip() */
> > > - tstamp_type_access:1; /* Accessed
__sk_buff->tstamp_type */
> > > + tstamp_type_access:1, /* Accessed
__sk_buff->tstamp_type */
> > > + valid_id:1; /* Is bpf_prog::aux::__id
valid? */
> > > enum bpf_prog_type type; /* Type of BPF program */
> > > enum bpf_attach_type expected_attach_type; /* For some prog
types */
> > > u32 len; /* Number of filter blocks
*/
> > > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog
*prog);
> > > void bpf_prog_put(struct bpf_prog *prog);
> > >
> > > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > +{
> > > + if (WARN(!prog->valid_id, "Attempting to use an invalid
eBPF program"))
> > > + return 0;
> > > + return prog->aux->__id;
> > > +}
> >
> > I'm still missing why we need to have this WARN and have a check at all.
> > IIUC, we're actually too eager in resetting the id to 0, and need to
> > keep that stale id around at least for perf/audit.
> > Why not have a flag only to protect against double-idr_remove
> > bpf_prog_free_id and keep the rest as is?
> > Which places are we concerned about that used to report id=0 but now
> > would report stale id?
>
> What double-idr_remove are you concerned about?
> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> while __bpf_prog_put just dropped it to zero.
(traveling, sending from an untested setup, hope it reaches everyone)
There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
tries to make offloaded program disappear from the idr when the netdev
goes offline. So I'm assuming that '!prog->aux->id' check in
bpf_prog_free_id
is to handle that case where we do bpf_prog_free_id much earlier than the
rest of the __bpf_prog_put stuff.
That remove was done in
commit ad8ad79f4f60 ("bpf: offload: free program id when device disappears")
Back in 2017 there was no bpf audit and no
PERF_BPF_EVENT_PROG_LOAD/UNLOAD events.
The removal of id made sense back then to avoid showing this
'useless' orphaned offloaded prog in 'bpftool prog show',
but with addition of perf load/unload and audit it was no longer
correct to zero out ID in a prog which refcnt is still not zero.
So we should simply remove bpf_prog_free_id from __bpf_prog_offload_destroy.
There won't be any adverse effect other than bpftool prog show
will show orphaned progs.
> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> after perf_event_bpf_event and bpf_audit_prog ?
> Probably can remove the obsolete do_idr_lock bool flag as
> separate patch?
+1 on removing do_idr_lock separately.
> Much simpler fix and no code churn.
> Both valid_id and saved_id approaches have flaws.
Given the __bpf_prog_offload_destroy path above, we still probably need
some flag to indicate that the id has been already removed from the idr?
No. ID should be valid until prog went through perf and audit unload
events. Don't know about audit, but for perf it's essential to have
valid ID otherwise perf record will not be able to properly associate events.