On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev(a)yandex.ru>
wrote:
> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev
<sdf(a)google.com> wrote:
> get_func_ip() */
>>> - tstamp_type_access:1; /* Accessed
>>> __sk_buff->tstamp_type */
>>> + tstamp_type_access:1, /* Accessed
>>> __sk_buff->tstamp_type */
>>> + valid_id:1; /* Is bpf_prog::aux::__id valid?
*/
>>> enum bpf_prog_type type; /* Type of BPF program */
>>> enum bpf_attach_type expected_attach_type; /* For some prog types */
>>> u32 len; /* Number of filter blocks */
>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>>> void bpf_prog_put(struct bpf_prog *prog);
>>>
>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
>>> +{
>>> + if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF
program"))
>>> + return 0;
>>> + return prog->aux->__id;
>>> +}
>>
>> I'm still missing why we need to have this WARN and have a check at all.
>> IIUC, we're actually too eager in resetting the id to 0, and need to
>> keep that stale id around at least for perf/audit.
>> Why not have a flag only to protect against double-idr_remove
>> bpf_prog_free_id and keep the rest as is?
>> Which places are we concerned about that used to report id=0 but now
>> would report stale id?
>
> What double-idr_remove are you concerned about?
> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> while __bpf_prog_put just dropped it to zero.
(traveling, sending from an untested setup, hope it reaches everyone)
There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
tries to make offloaded program disappear from the idr when the netdev
goes offline. So I'm assuming that '!prog->aux->id' check in
bpf_prog_free_id
is to handle that case where we do bpf_prog_free_id much earlier than the
rest of the __bpf_prog_put stuff.
> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> after perf_event_bpf_event and bpf_audit_prog ?
> Probably can remove the obsolete do_idr_lock bool flag as
> separate patch?
+1 on removing do_idr_lock separately.
> Much simpler fix and no code churn.
> Both valid_id and saved_id approaches have flaws.
Given the __bpf_prog_offload_destroy path above, we still probably need
some flag to indicate that the id has been already removed from the idr?
So what do you guys want in a patch? Is there a consensus on what you
would merge to fix this bug/regression?
--
paul-moore.com