Hi Lijiang and Kazu, On Fri, Apr 14, 2023 at 4:16 PM lijiang <lijiang@xxxxxxxxxx> wrote: > > On Fri, Apr 14, 2023 at 3:12 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx> wrote: >> >> On 2023/04/14 15:34, lijiang wrote: >> > Thank you for sharing your thoughts, Kazu. >> > >> > On Thu, Apr 13, 2023 at 9:06 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx> >> > wrote: >> > >> >> On 2023/04/12 18:04, lijiang wrote: >> >>>>>> On 2023/03/25 13:12, Tao Liu wrote: >> >>>>>>> Previously crash use lseek/read for kcore and vmcore reading, this >> >>>> involves 2 >> >>>>>>> syscalls. And we can replace them with pread, only 1 syscall is >> >> needed >> >>>> for >> >>>>>>> kcore/vmcore reading, and we can have a better performance. Please >> >>>> note there >> >>>>>>> are plenty of places in crash using lseek/read, this patch doesn't >> >>>> modify all >> >>>>>>> of them, just the most commonly used kcore and diskdump vmcore >> >> reading. >> >>>>>> >> >>>>>> This is nice as an optimization, but can we distinguish a seek error >> >> and >> >>>>>> read error with errno as currently we can do? >> >>>>>> >> >>>>> >> >>>>> Sorry I didn't understand your question... >> >>>>> >> >>>>> The functions which I modified in this patch didn't use a global errno >> >>>>> to distinguish error types, they used a returned value instead. >> >>>>> Currently if read error, READ_ERROR will be returned, same as seek >> >>>>> error returns SEEK_ERROR. Do you want to distinguish pread error by >> >>>>> introducing a new error number as PREAD_ERROR? >> >>>> >> >>>> No, I would like to know which causes an error in pread(), seek or read. >> >>>> In other words, is there any way to know which is wrong, the specified >> >>>> offset or file/media? >> >>>> >> >>>> >> >>> Good questions. >> >>> >> >>> In fact, the SEEK_ERROR is simply returned to the top-level call, but it >> >>> seems that there is no corresponding processing for the type of errors at >> >>> the top-level code. Only saw two cases(please correct me if I missed >> >>> anything): >> >>> [1] readmem() >> >>> ... >> >>> case SEEK_ERROR: >> >>> if (PRINT_ERROR_MESSAGE) >> >>> error(INFO, SEEK_ERRMSG, >> >>> memtype_string(memtype, 0), addr, type); >> >>> goto readmem_error; >> >>> >> >>> case READ_ERROR: >> >>> if (PRINT_ERROR_MESSAGE) >> >>> error(INFO, READ_ERRMSG, >> >>> memtype_string(memtype, 0), addr, type); >> >>> if ((pc->flags & DEVMEM) && (kt->flags & >> >>> PRE_KERNEL_INIT) && >> >>> !(error_handle & NO_DEVMEM_SWITCH) && >> >>> devmem_is_restricted() && >> >>> switch_to_proc_kcore()) { >> >>> error_handle &= ~QUIET; >> >>> return(readmem(addr, memtype, bufptr, >> >> size, >> >>> type, error_handle)); >> >>> } >> >>> goto readmem_error;.. >> >>> >> >>> [2] read_diskdump() >> >>> ... >> >>> >> >>> if ((ret = cache_page(curpaddr)) < 0) { >> >>> if (CRASHDEBUG(8)) >> >>> fprintf(fp, "read_diskdump: " >> >>> "%s: cannot cache page: %llx\n", >> >>> ret == SEEK_ERROR ? >> >>> "SEEK_ERROR" : "READ_ERROR", >> >>> (ulonglong)curpaddr); >> >>> return ret; >> >>> } >> >>> ... >> >>> >> >>> Given that, only recognizing the type of errors, it can not do anything. >> >>> Would it really make sense to distinguish the type of errors(seek/read >> >>> errors)? >> >> >> >> I can see some seek errors on GitHub issues [1] and I don't think that >> >> it's useless at all for the first step of debugging. >> >> >> >> [1] https://github.com/crash-utility/crash/issues?q=is%3Aissue+seek+error >> >> >> >> But according to the man page and some experiments, lseek() allows an >> >> offset beyond the end of the file for normal dumpfiles (a vmcore, >> > >> > >> > For this case, crash tool may leave a hole(gap) in a file after writing any >> > data, but usually crash tool won't change the vmcore data. >> >> hmm, sorry, I might lack words.. >> >> I meant that the check below can return a seek error mostly like lseek() >> does, because lseek() doesn't return an error for positive values. >> >> > >> > >> >> /proc/kcore and etc.) and just does not allow a negative offset. >> >> >> > So, my concern will be mostly solved by adding a negative offset check, >> >> for example, >> >> >> >> if (offset < 0) { >> >> if (CRASHDEBUG(8)) >> >> fprintf(fp, " ... offset: %llx\n", offset); >> >> return SEEK_ERROR; >> >> } >> >> >> >> How about adding something like this, instead of just removing lseek? >> >> >> >> >> > I would tend to replace the lseek()/read() with the pread() here, and print >> > the errno when the pread() failed, that is close to the real error, because >> > pread() may fail and set errno to any error specified for read() or lseek(). >> >> With the reason above, I meant to replace the lseek() and read() with >> the light offset check above and pread(). The check will not affect the >> performance. >> >> And yes, we can also add the errno message. > > > Thank you for the explanation, Kazu. This sounds good to me. > > Could you please improve and post it again as a separate patch based on the above suggestions?Tao. > (Or Kazu can help to merge this patch according to our discussion.) > Thanks for the comments and suggestions, I was busy working on other staff, sorry for the late reply. I will update and send the v2 patch next week. > BTW: for the multi thread patches, I would focus on this issue: Is it possible to make the multi thread code as a common library to serve for startup and other crash commands?(But anyway, still need to be discussed.) Yes I agree it will be better to make multi thread work as a library. However it is not as easy, I will keep investigating the possibility and update the information if any progress is made. Thanks, Tao Liu > > > Thanks. > Lianbo > >> >> > >> > Otherwise, users have to still suffer from significant performance overhead >> > for the following events: >> > >> > 24.89% crash [kernel.vmlinux] [k] syscall_exit_to_user_mode >> > 12.62% crash [kernel.vmlinux] [k] syscall_return_via_sysret >> > 10.76% crash [kernel.vmlinux] [k] entry_SYSCALL_64 >> > 3.43% crash [kernel.vmlinux] [k] read_kcore >> > ... >> > >> > But anyway, this needs more trade-offs, and depends on how we make good >> > choices based on the debugging demands. In other words, which one is more >> > important to users. Actually, the performance issues have been bothering us >> > for a long time. >> > >> > Thanks. >> > Lianbo >> > >> >> >> >>> BTW: Could it be helpful to print an actual error when the pread() fails? >> >>> For example: >> >>> >> >>> + ssize_t prdcnt = pread(fd, bufptr, readcnt, offset); >> >>> + if (prdcnt != readcnt) { >> >>> + if (prdcnt == -1) >> >>> + error(WARNING, "/proc/kcore: %s\n", >> >>> strerror(errno)); >> >>> return READ_ERROR; >> >>> + } >> >> >> >> Yes, it's also sounds good to print errno for better debugging. >> >> I would prefer it with CRASHDEBUG(8) same as the other error messages. >> >> >> >> Thanks, >> >> Kazu >> >> -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki