在 2020年10月30日 14:29, HAGIO KAZUHITO(萩尾 一仁) 写道: > -----Original Message----- >> 在 2020年10月28日 16:32, HAGIO KAZUHITO(萩尾 一仁) 写道: >>> Hi Julien, >>> >>> sorry for my delayed reply. >>> >>> -----Original Message----- >>>>>>>>> A user might want to know how much space a vmcore file will take on >>>>>>>>> the system and how much space on their disk should be available to >>>>>>>>> save it during a crash. >>>>>>>>> >>>>>>>>> The option --vmcore-size does not create the vmcore file but provides >>>>>>>>> an estimation of the size of the final vmcore file created with the >>>>>>>>> same make dumpfile options. >>>>> >>>>> Interesting. Do you have any actual use case? e.g. used by kdumpctl? >>>>> or use it in kdump initramfs? >>>>> >>>> >>>> Yes, the idea would be to use this in mkdumprd to have a more accurate >>>> estimate of the dump size (currently it cannot take compression into >>>> account and warns about potential lack of space, considering the system >>>> memory size as a whole). >>> >>> Hmm, I'm not sure how you are going to implement in mkdumprd, but I do not >>> recommend that you use it to determine how much disk space should be >>> allocated for crash dump. Because, I think that >>> >>> - It cannot estimate the dump size when a real crash occurs, e.g. if slab >>> explodes with non-zero data, almost all memory will be captured by makedumpfile >> >> I agree with you, but this could be rare? If yes, I'm not sure if it is worth >> thinking more about the rare situations. > > Cases that a dumpfile is inflated with -d 31 might be rare, but if users > need user data, e.g. for gcore, underestimation will occur easily. > Yes, that's true. >> >>> even with -d 31, and compression ratio varies with data in memory. >> >> Indeed. >> >>> Also, in most cases, mkdumprd runs at boot time or construction phase >>> with less memory usage, not at usual application running time. So it >>> can underestimate the needed size easily. >>> >> If administrator can monitor the estimated size periodically, maybe it >> won't be a problem? > > I think most of them cannot or do not do that, and even if they could do, > when a panic occurs by an unknown problem, can you depend on that estimation? > This requires user to evaluate the risk. The tools only provide a reference value at a certain time point, and remind users of such risks. >> >>> - The system might need a full vmcore and need to change makedumpfile's >>> dump level for an issue in the future. But many systems cannot change >>> their disk space allocation easily. So we should prevent users from >>> having minimum disk space for crash dump. >>> >>> So, the following is from mkdumprd on Fedora 32, personally I think this >>> is good for now. >>> >>> if [ $avail -lt $memtotal ]; then >>> echo "Warning: There might not be enough space to save a vmcore." >>> echo " The size of $2 should be greater than $memtotal kilo bytes." >>> fi >>> >> Currently, some users are complaining that mkdumprd overestimates the needed size, >> and most vmcores are significantly smaller than the size of system memory. >> >> Furthermore, in most cases, the system memory will not be completely exhausted, but >> that still depends on how the memory is used in the system, for example: >> [1] make the stressful test for memory >> [2] always occupies amount of memory and not release it. >> >> For the above two cases, there may be rare. > > I've seen and worked on thousands of support cases, memory is exhausted > easily and unexpectedly.. Especially nowadays I often see panics by > vm.panic_on_oom. > >> Therefore, can we find out a compromise >> between the size of vmcore and system memory so that makedumpfile can estimate the >> size of vmcore more accurately? >> >> And finally, mkdumprd can use the estimated size of vmcore instead of system memory(memtotal) >> to determine if the target disk has enough space to store vmcore. > > The current mkdumprd just warns the possibility of lack of space, > it doesn't fail. I think this is a good balance. > > Users can choose the estimated size over the whole memory size with > their discretion. Providing the useful estimation tool for them > might be good. > > But, if we do so, we should let users know the tradeoff between the > disk space and the risk of failure. So I believe that we should > continue to warn the possibility of failure of capturing vmcore > with less space than the whole memory. > Our understanding is consistent about this issue. Maybe we could have a document to explain the details. Thanks. Lianbo > Thanks, > Kazu > > >> >> >> Thanks. >> Lianbo >> >>> The patch's functionality itself might be useful and I don't reject, though. >>> >>>>>>>>> @@ -4643,6 +4706,8 @@ write_buffer(int fd, off_t offset, void *buf, size_t buf_size, char *file_name) >>>>>>>>> } >>>>>>>>> if (!write_and_check_space(fd, &fdh, sizeof(fdh), file_name)) >>>>>>>>> return FALSE; >>>>>>>>> + } else if (info->flag_vmcore_size && fd == info->fd_dumpfile) { >>>>>>>>> + return write_buffer_update_size_info(offset, buf, buf_size); >>>>> >>>>> Why do we need this function? makedumpfile actually writes zero-filled >>>>> pages to the dumpfile with -d 0, and doesn't write them with -d 1. >>>>> So isn't "write_bytes += buf_size" enough? For example, with -d 30, >>>>> >>>> >>>> The reason I went with this method was to make an estimate of the number >>>> of blocks actually allocated on the disk (since depending on how the >>>> data written is scattered in the file, there might be a significant >>>> difference between bytes written vs actual size allocated on disk). But >>>> I realize that there is some misunderstanding from my end since written >>>> 0 do make block allocation as opposed to not writing at some offset >>>> (skipping the with lseek() ), I would need to fix that. >>>> >>>> To highlight the behaviour I'm talking about: >>>> $ dd if=/dev/zero of=./testfile bs=4096 count=1 seek=1 >>>> 1+0 records in >>>> 1+0 records out >>>> 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000302719 s, 13.5 MB/s >>>> $ du -h testfile >>>> 4.0K testfile >>>> >>>> $ dd if=/dev/zero of=./testfile bs=4096 count=2 >>>> 2+0 records in >>>> 2+0 records out >>>> 8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000373002 s, 22.0 MB/s >>>> $ du -h testfile >>>> 8.0K testfile >>>> >>>> >>>> So, do you think it's not worth bothering estimating the number of >>>> blocks allocated an that I should only consider the number of bytes written? >>> >>> Yes, makedumpfile almost doesn't make empty (sparse) blocks, >>> so the error would be small enough. >>> >>>>>>>> >>>>>>>> I like the idea, but sometimes we use makedumpfile to generate a >>>>>>>> dumpfile in the primary kernel as well. For example: >>>>>>>> >>>>>>>> $ makedumpfile -d 31 -x vmlinux /proc/kcore dumpfile >>>>>>>> >>>>>>>> In such use-cases it is useful to use --vmcore-size and still generate >>>>>>>> the dumpfile (right now the default behaviour is not to generate a >>>>>>>> dumpfile when --vmcore-size is specified). Maybe we need to think more >>>>>>>> on supporting this use-case as well. >>>>>>>> >>>>>>> >>>>>>> The thing is, if you are generating the dumpfile, you can just check the >>>>>>> size of the file created with "du -b" or some other command. >>>>>> >>>>>> I agree, but I just was looking to replace the two 'makedumpfile + >>>>>> du' steps with a single 'makedumpfile --vmcore-size' step. >>>>>> >>>>>>> Overall I don't mind supporting your case as well. Maybe that can depend >>>>>>> on whether a vmcore/dumpfile filename is provided: >>>>>>> >>>>>>> $ makedumpfile -d 31 -x vmlinux /proc/kcore # only estimates the size >>>>>>> >>>>>>> $ makedumpfile -d 31 -x vmlinux /proc/kcore dumpfile # writes the >>>>>>> dumpfile and gives the final size >>>>>>> >>>>>>> Any thought, opinions, suggestions? >>>>>> >>>>>> Let's wait for Kazu's opinion on the same, but I am ok with using a >>>>>> two-step 'makedumpfile + du' approach for now (and later expand >>>>>> --vmcore-size as we encounter more use-cases). >>>>>> >>>>>> @Kazuhito Hagio : What's your opinion on the above? >>>>> >>>>> I would prefer only estimating with the option. >>>>> >>>>> And if the write_bytes method above is usable, it can be shown also >>>>> in report messages when wrote the dumpfile. >>>>> >>>> >>>> Let me know your preferred approach considering my comment above and >>>> I'll send out a v2. >>> >>> I'm rethinking about what command options makedumpfile should have. >>> If once we add an option to makedumpfile, we cannot change it easily, >>> so I'd like to think carefully. >>> >>> The calculated size might be useful if it's printed so that it can be >>> easily post-processed by scripts, e.g. for automated tests. If so, >>> makedumpfile already prints its statistics with "--message-level 16", >>> and it might be useful to also print them by an option like "--show-stats". >>> >>> # makedumpfile --show-stats -l -d 31 vmcore dump.ld31 >>> total_pages xxx >>> excluded_pages yyy >>> ... >>> write_bytes zzz >>> >>> Also, if we also have "--dry-run" option to not write actually, it's >>> explicit and meets Bhupesh's use case. What do you think? >>> >>> Thanks, >>> Kazu >>> >>> _______________________________________________ >>> kexec mailing list >>> kexec@xxxxxxxxxxxxxxxxxxx >>> http://lists.infradead.org/mailman/listinfo/kexec >>> > > _______________________________________________ > kexec mailing list > kexec@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/kexec > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec