On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <sdf(a)google.com> wrote:
On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul(a)paul-moore.com> wrote:
>
> On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev(a)yandex.ru>
> wrote:
> >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf(a)google.com>
wrote:
> >> get_func_ip() */
> >>>> - tstamp_type_access:1; /* Accessed
> >>>> __sk_buff->tstamp_type */
> >>>> + tstamp_type_access:1, /* Accessed
> >>>> __sk_buff->tstamp_type */
> >>>> + valid_id:1; /* Is
bpf_prog::aux::__id valid? */
> >>>> enum bpf_prog_type type; /* Type of BPF program
*/
> >>>> enum bpf_attach_type expected_attach_type; /* For some prog
types */
> >>>> u32 len; /* Number of filter
blocks */
> >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct
bpf_prog *prog);
> >>>> void bpf_prog_put(struct bpf_prog *prog);
> >>>>
> >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> >>>> +{
> >>>> + if (WARN(!prog->valid_id, "Attempting to use an
invalid eBPF program"))
> >>>> + return 0;
> >>>> + return prog->aux->__id;
> >>>> +}
> >>>
> >>> I'm still missing why we need to have this WARN and have a check at
all.
> >>> IIUC, we're actually too eager in resetting the id to 0, and need
to
> >>> keep that stale id around at least for perf/audit.
> >>> Why not have a flag only to protect against double-idr_remove
> >>> bpf_prog_free_id and keep the rest as is?
> >>> Which places are we concerned about that used to report id=0 but now
> >>> would report stale id?
> >>
> >> What double-idr_remove are you concerned about?
> >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> >> while __bpf_prog_put just dropped it to zero.
> >
> > (traveling, sending from an untested setup, hope it reaches everyone)
> >
> > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > tries to make offloaded program disappear from the idr when the netdev
> > goes offline. So I'm assuming that '!prog->aux->id' check in
bpf_prog_free_id
> > is to handle that case where we do bpf_prog_free_id much earlier than the
> > rest of the __bpf_prog_put stuff.
> >
> >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> >> after perf_event_bpf_event and bpf_audit_prog ?
> >> Probably can remove the obsolete do_idr_lock bool flag as
> >> separate patch?
> >
> > +1 on removing do_idr_lock separately.
> >
> >> Much simpler fix and no code churn.
> >> Both valid_id and saved_id approaches have flaws.
> >
> > Given the __bpf_prog_offload_destroy path above, we still probably need
> > some flag to indicate that the id has been already removed from the idr?
>
> So what do you guys want in a patch? Is there a consensus on what you
> would merge to fix this bug/regression?
Can we try the following?
1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
kernel/bpf/offload.c; that should make it easier to reason about those
'!id' checks
calls? you mean a single call, right?
2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
audit/perf in kernel/bpf/syscall.c (there are comments that say "must
be called first", but I don't see why; seems like GET_FD_BY_ID would
correctly return -ENOENT; maybe Martin can chime in, CC'ed him
explicitly)
The comment says that it should be removed from idr
before __bpf_prog_put_noref will proceed to clean up.
3. (optionally) Remove do_idr_lock arguments (all callers are passing
'true')
yes. please.