On Sat, Dec 05, 2020 at 08:34:34PM +0100, Paul Menzel wrote: > [Cc: +Colin] Also CCing Dimitri, whose GRUB patch this may be related to. Dimitri, see https://marc.info/?l=linux-ext4&m=160719695914303&w=2 for the full message I'm replying to. > Am 04.12.20 um 19:05 schrieb Theodore Y. Ts'o: > > On Fri, Dec 04, 2020 at 04:39:12PM +0100, Paul Menzel wrote: > > Colin, the modules in `/boot/grub/i386-pc` look funny, and can’t be loaded > by GRUB anymore. > > ``` > $ ls -lt /boot/grub/i386-pc/ > insgesamt 2085 > -rw-r--r-- 1 root root 512 13. Aug 23:00 'boot.img-'$'\205\300''u'$' > \023\211''鍓]'$'\206\371\377\211\360\350''f'$'\376\377\377\205 > \300''ur'$'\203\354\004''V'$'\377''t$'$'\030''j'$'\002''胒' > -rw-r--r-- 1 root root 30893 13. Aug 23:00 'core.img-'$'\205\300''u'$' > \023\211''鍓]'$'\206\371\377\211\360\350''f'$'\376\377\377\205 > \300''ur'$'\203\354\004''V'$'\377''t$'$'\030''j'$'\002''胒' > […] > ``` [...] > > When was the last time the directory was OK? Do you know when it may > > have gotten corrupted? > > The last reboot before. But I am really confused now though. > > $ ls -ld /boot/grub/i386-pc > drwxr-xr-x 2 root root 28672 29. Nov 12:13 /boot/grub/i386-pc > > But the module files in there are all from August 2020. > > -rw-r--r-- 1 root root 2400 Aug 13 23:00 'part_gpt.mod-'$'\205\300''u'$'\023\211\351\215\223'']'$'\206\371\377\211\360\350''f'$'\376\377\377\205\300''ur'$'\203\354\004''V'$'\377''t$'$'\030''j'$'\002\350\203\222' > > The characters in the file name look like some character encoding. Do you > know hat that is? UTF-8? The dumped output viewed in an editor shows a > “Asian” looking characters 胒. It seems rather more likely to be junk from uninitialised memory. > 2020-11-29 11:38:06 upgrade grub2-common:i386 2.04-9 2.04-10 > […] > 2020-11-29 12:04:00 status installed linux-image-5.9.0-4-686-pae:i386 > 5.9.11-1 > […] > 2020-11-29 12:13:24 configure grub-pc:i386 2.04-10 <none> > 2020-11-29 12:13:24 status unpacked grub-pc:i386 2.04-10 > 2020-11-29 12:13:24 status half-configured grub-pc:i386 2.04-10 > [Dialog waited for my confirmation: Some GRUB warning regarding block > devices, which I always “ignore”, that means tell GRUB to be upgraded] You need to actually look into this and fix it properly rather than ignoring it. It's probably related to this problem, since a successful installation doesn't go down the RESTORE_BACKUP path which I think is the suspicious one here. > 2020-11-29 12:43:21 status installed grub-pc:i386 2.04-10 > […] > > So, afterward I was able to reboot without any issues. [...] > Do you want me to re-install grub to see if it’s a problem introduced in > Debian’s GRUB 2.04-10? Now that I look at it more closely, some of the changes to clean_grub_dir_real look suspicious: + char *srcf = grub_util_path_concat (2, di, de->d_name); + + if (mode == CREATE_BACKUP) + { + char *dstf = grub_util_path_concat_ext (2, di, de->d_name, "-"); + if (grub_util_rename (srcf, dstf) < 0) + grub_util_error (_("cannot backup `%s': %s"), srcf, + grub_util_fd_strerror ()); + free (dstf); + } + else if (mode == RESTORE_BACKUP) + { + char *dstf = grub_util_path_concat_ext (2, di, de->d_name); + dstf[strlen (dstf) - 1] = 0; + if (grub_util_rename (srcf, dstf) < 0) + grub_util_error (_("cannot restore `%s': %s"), dstf, + grub_util_fd_strerror ()); + free (dstf); + } + else + { + if (grub_util_unlink (srcf) < 0) + grub_util_error (_("cannot delete `%s': %s"), srcf, + grub_util_fd_strerror ()); + } + free (srcf); grub_util_path_concat is a helper that joins its arguments with "/"; grub_util_path_concat_ext does likewise except the last argument is appended as an extension without first appending "/". The first argument to both of these functions is "n": grub_util_path_concat expects n further argument, while grub_util_path_concat_ext expects n + 1 further arguments. So, in the RESTORE_BACKUP case, shouldn't that be: char *dstf = grub_util_path_concat (2, di, de->d_name); ... rather than grub_util_path_concat_ext? Otherwise it seems to me that it's going to try to append an additional argument which doesn't exist, and may well add random uninitialised stuff from memory. Running grub-install under valgrind would probably show this up (I can't get it to do it for me so far, but most likely I just haven't set up quite the right initial conditions). This looks more likely to be a userspace problem rather than filesystem corruption. I think this should likely be refiled as a bug against Debian's grub2 package. -- Colin Watson (he/him) [cjwatson@xxxxxxxxxx]