Re: [PATCH] Fix "ps/vm" commands to display the memory usage for exiting tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/08/24 17:29, lijiang wrote:
> On Thu, Aug 24, 2023 at 10:01 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx>
> wrote:
> 
>> On 2023/08/23 14:44, Lianbo Jiang wrote:
>>> When a task is exiting, usually kernel marks its flags as 'PF_EXITING',
>>> but even so, sometimes the mm_struct has not been freed, it might still
>>> be valid. For such tasks, the "ps/vm" commands won't display the memory
>>> usage. For example:
>>>
>>>     crash> ps 47070
>>>           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS
>> COMM
>>>         47070       1   0  ffff9ba7c4910000  UN   0.0        0        0
>> ra_ris.parse
>>>     crash> vm 47070
>>>     PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND:
>> "ra_ris.parse"
>>>            MM               PGD          RSS    TOTAL_VM
>>>            0                 0            0k       0k
>>>
>>> To be honest, this is a corner case, but it has already occurred in
>>> actual production environments. Given that, let's allow the "ps/vm"
>>> commands to try to display the memory usage for this case, but it does
>>> not guarantee that it can work well at any time, which still depends on
>>> how far the mm_struct deconstruction has proceeded.
>>
>> Agree to display it, and looks like the deconstruction is done after
>> task->mm is set to NULL, so it looks fine to me.
>>
>>
> Thank you for the comments, Kazu.
> 
> 
>> void __noreturn do_exit(long code)
>> {
>> ...
>>           exit_signals(tsk);  /* sets PF_EXITING */
>> ...
>>           exit_mm();
>>
>> static void exit_mm(void)
>> {
>>           struct mm_struct *mm = current->mm;
>> ...
>>           current->mm = NULL;  ## task->mm is set to NULL here
>> ...
>>           mmput(mm);           ## release the resources actually
>>
>>
>> On the other hand, the mm->mm_count is decremented in mmput(), is there
>> need to check it?
>>
>>
> Good question. I had the same thoughts, but finally I chose to double check
> with the mm_count. Not sure how to ensure the memory synchronization can be
> done, when the kernel is panicking.

Ok, it's fine to double check just in case, then ...

> 
> If the address of mm pointer is valid and the mm_struct members are always
> legitimate, we won't need to double check.
> 
> But anyway, this is just my thoughts, maybe it's not correct completely. If
> you do not want to have it, I can post v2 and simply remove
> the IS_EXITING(task) from get_task_mem_usage().
> 
> Thanks.
> Lianbo
> 
> 
>> void mmput(struct mm_struct *mm)
>> {
>>           might_sleep();
>>
>>           if (atomic_dec_and_test(&mm->mm_users))
>>                   __mmput(mm);
>> }
>>
>> static inline void __mmput(struct mm_struct *mm)
>> {
>> ...
>>           exit_mmap(mm);
>> ...
>>           mmdrop(mm);
>> }
>>
>> static inline void mmdrop(struct mm_struct *mm)
>> {
>> ...
>>           if (unlikely(atomic_dec_and_test(&mm->mm_count)))
>>                   __mmdrop(mm);
>> }
>>
>> Thanks,
>> Kazu
>>
>>>
>>> With the patch:
>>>     crash> ps 47070
>>>           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS
>> COMM
>>>         47070       1   0  ffff9ba7c4910000  UN  90.8 38461228 31426444
>> ra_ris.parse
>>>     crash> vm 47070
>>>     PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND:
>> "ra_ris.parse"
>>>            MM               PGD          RSS    TOTAL_VM
>>>     ffff9bad6e873840  ffff9baee0544000  31426444k  38461228k
>>>           VMA           START       END     FLAGS FILE
>>>     ffff9bafdbe1d6c8     400000     8c5000 8000875
>> /data1/rishome/ra_cu_cn_412/sbin/ra_ris.parse
>>>     ...
>>>
>>> Reported-by: Buland Kumar Singh <bsingh@xxxxxxxxxx>
>>> Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx>
>>> ---
>>>    memory.c | 13 +++++++++++--
>>>    1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/memory.c b/memory.c
>>> index 5d76c5d7fe6f..7d59c0555a0e 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -4792,10 +4792,12 @@ get_task_mem_usage(ulong task, struct
>> task_mem_usage *tm)
>>>    {
>>>        struct task_context *tc;
>>>        long rss = 0, rss_cache = 0;
>>> +     int mm_count = 0;
>>> +     ulong addr;
>>>
>>>        BZERO(tm, sizeof(struct task_mem_usage));
>>>
>>> -     if (IS_ZOMBIE(task) || IS_EXITING(task))
>>> +     if (IS_ZOMBIE(task))
>>>                return;
>>>
>>>        tc = task_to_context(task);
>>> @@ -4805,7 +4807,14 @@ get_task_mem_usage(ulong task, struct
>> task_mem_usage *tm)
>>>
>>>        tm->mm_struct_addr = tc->mm_struct;
>>>
>>> -     if (!task_mm(task, TRUE))
>>> +     if (!(addr = task_mm(task, TRUE)))
>>> +             return;
>>> +
>>> +     if (!readmem(addr + OFFSET(mm_struct_mm_count), KVADDR, &mm_count,
>>> +             sizeof(int), "mm_struct mm_count", RETURN_ON_ERROR))
>>> +             return;

tt->mm_struct is filled in task_mm(), I think we can use this:

   mm_count = INT(tt->mm_struct + OFFSET(mm_struct_mm_count));

The following code in get_task_mem_usage() also use this.

Thanks,
Kazu
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux