Re: [PATCH 5/5] [RFC] Replace lseek/read into pread for kcore and vmcore reading

HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx> · Fri, 14 Apr 2023 07:12:49 +0000

On 2023/04/14 15:34, lijiang wrote:
> Thank you for sharing your thoughts, Kazu.
> 
> On Thu, Apr 13, 2023 at 9:06 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx>
> wrote:
> 
>> On 2023/04/12 18:04, lijiang wrote:
>>>>>> On 2023/03/25 13:12, Tao Liu wrote:
>>>>>>> Previously crash use lseek/read for kcore and vmcore reading, this
>>>> involves 2
>>>>>>> syscalls. And we can replace them with pread, only 1 syscall is
>> needed
>>>> for
>>>>>>> kcore/vmcore reading, and we can have a better performance. Please
>>>> note there
>>>>>>> are plenty of places in crash using lseek/read, this patch doesn't
>>>> modify all
>>>>>>> of them, just the most commonly used kcore and diskdump vmcore
>> reading.
>>>>>>
>>>>>> This is nice as an optimization, but can we distinguish a seek error
>> and
>>>>>> read error with errno as currently we can do?
>>>>>>
>>>>>
>>>>> Sorry I didn't understand your question...
>>>>>
>>>>> The functions which I modified in this patch didn't use a global errno
>>>>> to distinguish error types, they used a returned value instead.
>>>>> Currently if read error, READ_ERROR will be returned, same as seek
>>>>> error returns SEEK_ERROR. Do you want to distinguish pread error by
>>>>> introducing a new error number as PREAD_ERROR?
>>>>
>>>> No, I would like to know which causes an error in pread(), seek or read.
>>>> In other words, is there any way to know which is wrong, the specified
>>>> offset or file/media?
>>>>
>>>>
>>> Good questions.
>>>
>>> In fact, the SEEK_ERROR is simply returned to the top-level call, but it
>>> seems that there is no corresponding processing for the type of errors at
>>> the top-level code. Only saw two cases(please correct me if I missed
>>> anything):
>>> [1] readmem()
>>> ...
>>>                   case SEEK_ERROR:
>>>                           if (PRINT_ERROR_MESSAGE)
>>>                                   error(INFO, SEEK_ERRMSG,
>>> memtype_string(memtype, 0), addr, type);
>>>                           goto readmem_error;
>>>
>>>                   case READ_ERROR:
>>>                           if (PRINT_ERROR_MESSAGE)
>>>                                   error(INFO, READ_ERRMSG,
>>> memtype_string(memtype, 0), addr, type);
>>>                           if ((pc->flags & DEVMEM) && (kt->flags &
>>> PRE_KERNEL_INIT) &&
>>>                               !(error_handle & NO_DEVMEM_SWITCH) &&
>>> devmem_is_restricted() &&
>>>                               switch_to_proc_kcore()) {
>>>                                   error_handle &= ~QUIET;
>>>                                   return(readmem(addr, memtype, bufptr,
>> size,
>>>                                           type, error_handle));
>>>                           }
>>>                           goto readmem_error;..
>>>
>>> [2] read_diskdump()
>>> ...
>>>
>>>                   if ((ret = cache_page(curpaddr)) < 0) {
>>>                           if (CRASHDEBUG(8))
>>>                                   fprintf(fp, "read_diskdump: "
>>>                                       "%s: cannot cache page: %llx\n",
>>>                                           ret == SEEK_ERROR ?
>>>                                           "SEEK_ERROR" : "READ_ERROR",
>>>                                           (ulonglong)curpaddr);
>>>                           return ret;
>>>                   }
>>> ...
>>>
>>> Given that, only recognizing the type of errors, it can not do anything.
>>> Would it really make sense to distinguish the type of errors(seek/read
>>> errors)?
>>
>> I can see some seek errors on GitHub issues [1] and I don't think that
>> it's useless at all for the first step of debugging.
>>
>> [1] https://github.com/crash-utility/crash/issues?q=is%3Aissue+seek+error
>>
>> But according to the man page and some experiments, lseek() allows an
>> offset beyond the end of the file for normal dumpfiles (a vmcore,
> 
> 
> For this case, crash tool may leave a hole(gap) in a file after writing any
> data, but usually crash tool won't change the vmcore data.

hmm, sorry, I might lack words..

I meant that the check below can return a seek error mostly like lseek() 
does, because lseek() doesn't return an error for positive values.

> 
> 
>> /proc/kcore and etc.) and just does not allow a negative offset.
>>
> So, my concern will be mostly solved by adding a negative offset check,
>> for example,
>>
>>     if (offset < 0) {
>>        if (CRASHDEBUG(8))
>>            fprintf(fp, " ... offset: %llx\n", offset);
>>        return SEEK_ERROR;
>>     }
>>
>> How about adding something like this, instead of just removing lseek?
>>
>>
> I would tend to replace the lseek()/read() with the pread() here, and print
> the errno when the pread() failed, that is close to the real error, because
> pread() may fail and set errno to any error specified for read() or lseek().

With the reason above, I meant to replace the lseek() and read() with 
the light offset check above and pread().  The check will not affect the 
performance.

And yes, we can also add the errno message.

Thanks,
Kazu

> 
> Otherwise, users have to still suffer from significant performance overhead
> for the following events:
> 
>    24.89%  crash    [kernel.vmlinux]      [k] syscall_exit_to_user_mode
>    12.62%  crash    [kernel.vmlinux]      [k] syscall_return_via_sysret
>    10.76%  crash    [kernel.vmlinux]      [k] entry_SYSCALL_64
>      3.43%  crash    [kernel.vmlinux]      [k] read_kcore
>    ...
> 
> But anyway, this needs more trade-offs, and depends on how we make good
> choices based on the debugging demands. In other words, which one is more
> important to users. Actually, the performance issues have been bothering us
> for a long time.
> 
> Thanks.
> Lianbo
> 
>>
>>> BTW: Could it be helpful to print an actual error when the pread() fails?
>>> For example:
>>>
>>> +       ssize_t prdcnt = pread(fd, bufptr, readcnt, offset);
>>> +       if (prdcnt != readcnt) {
>>> +               if (prdcnt == -1)
>>> +                       error(WARNING, "/proc/kcore: %s\n",
>>> strerror(errno));
>>>                   return READ_ERROR;
>>> +       }
>>
>> Yes, it's also sounds good to print errno for better debugging.
>> I would prefer it with CRASHDEBUG(8) same as the other error messages.
>>
>> Thanks,
>> Kazu
>>
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki