Re: [PATCH 5/5] [RFC] Replace lseek/read into pread for kcore and vmcore reading

lijiang <lijiang@xxxxxxxxxx> · Fri, 14 Apr 2023 14:34:59 +0800

Thank you for sharing your thoughts, Kazu.

On Thu, Apr 13, 2023 at 9:06 AM HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab@xxxxxxx> wrote:
On 2023/04/12 18:04, lijiang wrote:

>>>> On 2023/03/25 13:12, Tao Liu wrote:

>>>>> Previously crash use lseek/read for kcore and vmcore reading, this

>> involves 2

>>>>> syscalls. And we can replace them with pread, only 1 syscall is needed

>> for

>>>>> kcore/vmcore reading, and we can have a better performance. Please

>> note there

>>>>> are plenty of places in crash using lseek/read, this patch doesn't

>> modify all

>>>>> of them, just the most commonly used kcore and diskdump vmcore reading.

>>>>

>>>> This is nice as an optimization, but can we distinguish a seek error and

>>>> read error with errno as currently we can do?

>>>>

>>>

>>> Sorry I didn't understand your question...

>>>

>>> The functions which I modified in this patch didn't use a global errno

>>> to distinguish error types, they used a returned value instead.

>>> Currently if read error, READ_ERROR will be returned, same as seek

>>> error returns SEEK_ERROR. Do you want to distinguish pread error by

>>> introducing a new error number as PREAD_ERROR?

>>

>> No, I would like to know which causes an error in pread(), seek or read.

>> In other words, is there any way to know which is wrong, the specified

>> offset or file/media?

>>

>>

> Good questions.

> 

> In fact, the SEEK_ERROR is simply returned to the top-level call, but it

> seems that there is no corresponding processing for the type of errors at

> the top-level code. Only saw two cases(please correct me if I missed

> anything):

> [1] readmem()

> ...

>                  case SEEK_ERROR:

>                          if (PRINT_ERROR_MESSAGE)

>                                  error(INFO, SEEK_ERRMSG,

> memtype_string(memtype, 0), addr, type);

>                          goto readmem_error;

> 

>                  case READ_ERROR:

>                          if (PRINT_ERROR_MESSAGE)

>                                  error(INFO, READ_ERRMSG,

> memtype_string(memtype, 0), addr, type);

>                          if ((pc->flags & DEVMEM) && (kt->flags &

> PRE_KERNEL_INIT) &&

>                              !(error_handle & NO_DEVMEM_SWITCH) &&

> devmem_is_restricted() &&

>                              switch_to_proc_kcore()) {

>                                  error_handle &= ~QUIET;

>                                  return(readmem(addr, memtype, bufptr, size,

>                                          type, error_handle));

>                          }

>                          goto readmem_error;..

> 

> [2] read_diskdump()

> ...

> 

>                  if ((ret = cache_page(curpaddr)) < 0) {

>                          if (CRASHDEBUG(8))

>                                  fprintf(fp, "read_diskdump: "

>                                      "%s: cannot cache page: %llx\n",

>                                          ret == SEEK_ERROR ?

>                                          "SEEK_ERROR" : "READ_ERROR",

>                                          (ulonglong)curpaddr);

>                          return ret;

>                  }

> ...

> 

> Given that, only recognizing the type of errors, it can not do anything.

> Would it really make sense to distinguish the type of errors(seek/read

> errors)?

I can see some seek errors on GitHub issues [1] and I don't think that 

it's useless at all for the first step of debugging.

[1] https://github.com/crash-utility/crash/issues?q=is%3Aissue+seek+error

But according to the man page and some experiments, lseek() allows an 

offset beyond the end of the file for normal dumpfiles (a vmcore,

For this case, crash tool may leave a hole(gap) in a file after writing any data, but usually crash tool won't change the vmcore data.

/proc/kcore and etc.) and just does not allow a negative offset.

So, my concern will be mostly solved by adding a negative offset check, 

for example,

   if (offset < 0) {

      if (CRASHDEBUG(8))

          fprintf(fp, " ... offset: %llx\n", offset);

      return SEEK_ERROR;

   }

How about adding something like this, instead of just removing lseek?

I would tend to replace the lseek()/read() with the pread() here, and print the errno when the pread() failed, that is close to the real error, because pread() may fail and set errno to any error specified for read() or lseek().

Otherwise, users have to still suffer from significant performance overhead for the following events:

  24.89%  crash    [kernel.vmlinux]      [k] syscall_exit_to_user_mode
  12.62%  crash    [kernel.vmlinux]      [k] syscall_return_via_sysret
  10.76%  crash    [kernel.vmlinux]      [k] entry_SYSCALL_64
    3.43%  crash    [kernel.vmlinux]      [k] read_kcore
  ...

But anyway, this needs more trade-offs, and depends on how we make good choices based on the debugging demands. In other words, which one is more important to users. Actually, the performance issues have been bothering us for a long time.

Thanks.
Lianbo

> 

> BTW: Could it be helpful to print an actual error when the pread() fails?

> For example:

> 

> +       ssize_t prdcnt = pread(fd, bufptr, readcnt, offset);

> +       if (prdcnt != readcnt) {

> +               if (prdcnt == -1)

> +                       error(WARNING, "/proc/kcore: %s\n",

> strerror(errno));

>                  return READ_ERROR;

> +       }

Yes, it's also sounds good to print errno for better debugging.

I would prefer it with CRASHDEBUG(8) same as the other error messages.

Thanks,

Kazu
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki