On 14/02/20, Oleg Nesterov wrote:
On 01/23, Richard Guy Briggs wrote:
>
> task->tgid is an error prone construct and results in duplicate maintenance.
> Start it's demise by modifying task_tgid_nr to not use it.
Well, I disagree.
Yes I agree that ->tgid should probably die. But this change itself doesn't
help, it only makes task_tgid_nr() slower. We need to convert other users
first, then consider this change along with ->tgid removal.
I thought I recently saw a statistic that there were only 7 instances of
->tgid, but a quick search comes up with >50 outside of audit!
Besides, this is not that simple and the patch doesn't look
right:
> static inline pid_t task_tgid_nr(struct task_struct *tsk)
> {
> - return tsk->tgid;
> + return pid_nr(task_tgid(tsk));
> }
And what protect task_tgid? This is racy.
The race is very unlikely, pid_nr() will likely hit pid == NULL if tsk
exits. But still it can use the freed/unmapped/reused memory.
So at the very least, I'd need
static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
return pid_nr(is_alive(tsk) ? task_tgid(tsk) : NULL);
}
And it sounds like I might even need this since the status of is_alive
could change between the time I call it and the time I call task_tgid:
static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
pid_t pid;
task_lock(&tsk);
pid = pid_nr(is_alive(tsk) ? task_tgid(tsk) : NULL);
task_unlock(&tsk);
return pid;
}
Is task_lock() sufficient, or do I need the heavier
read_lock(&tasklist_lock)?
And even if we add rcu_read_lock() the patch will add the semantics
change,
task_tgid_nr() can return 0 if tsk has already exited. At least this should
be documented, but you also need to audit the users.
Basically check for an inline error return of 0 signalling a failure
rather than the idle task.
Oleg.
- RGB
--
Richard Guy Briggs <rbriggs(a)redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545