On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > > > Yafang Shao <laoar.shao@xxxxxxxxx> writes: > > > > > Quoted from Linus [0]: > > > > > > Since user space can randomly change their names anyway, using locking > > > was always wrong for readers (for writers it probably does make sense > > > to have some lock - although practically speaking nobody cares there > > > either, but at least for a writer some kind of race could have > > > long-term mixed results > > > > Ugh. > > Ick. > > > > This code is buggy. > > > > I won't argue that Linus is wrong, about removing the > > task_lock. > > > > Unfortunately strscpy_pad does not work properly with the > > task_lock removed, and buf_size larger that TASK_COMM_LEN. > > There is a race that will allow reading past the end > > of tsk->comm, if we read while tsk->common is being > > updated. > > It appears so. Thanks for pointing it out. Additionally, other code, > such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() > directly without the task_lock. It seems we should change that as > well. Hmm. What race do you see? If lock is removed from __get_task_comm() it probably can be removed from __set_task_comm() as well. And both are calling strscpy_pad to write and read comm. So I don't see how it would read past sizeof(comm), because 'buf' passed into __set_task_comm is NUL-terminated. So the concurrent read will find it. > > > > So __get_task_comm needs to look something like: > > > > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) > > { > > size_t len = buf_size; > > if (len > TASK_COMM_LEN) > > len = TASK_COMM_LEN; > > memcpy(buf, tsk->comm, len); > > buf[len -1] = '\0'; > > return buf; > > } > > Thanks for your suggestion. > > > > > What shows up in buf past the '\0' is not guaranteed in the above > > version but I would be surprised if anyone cares. > > I believe we pad it to prevent the leakage of kernel data. In this > case, since no kernel data will be leaked, the following change may be > unnecessary. It's not about leaking of kernel data, but more about not writing garbage past NUL. Because comm[] is a part of some record that is used as a key in a hash map.