On May 22, 2018, at 8:25 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote: > > On Tue, May 22, 2018 at 03:57:41PM +0530, RAJESH DASARI wrote: >> Hi , >> >> Could someone please respond to my query. Issue here is i have >> upgraded e2fsprogs to 1.44.0 version from 1.43.9 and after that i am >> noticing the file system corruption mentioned in this mail chain. I >> have upgraded to 1.44.1 also but i still see the issue. >> >> i have downgraded to 1.43.9 version , issue is disappeared. >> >> Reason why i was upgrading because there seems to be some buffer >> overrun issues in the blkid library and in the fsck program of >> e2fsprogs. An attacker can use this to cause a denial of service and >> this issue is fixed from 1.44.0 onwards. For this i was trying to >> upgrade the e2fsprogs, if upgrade is not possible,i would like to back >> port the buffer over run fix by Ted to 1.43.9 version. >> >> I checked the git commit log and noticed that the below commit by ted >> will fix the buffer over run issue. >> https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=d8e5da0a3b94f7445ab8cdd629bfc561986e7501 >> >> @Ted, >> >> Could you please let me know the above commit is enough or do i have >> to take any other changes along with this commit to fix the buffer >> over run issues on 1.43.9 version ? > > I can't seem to find your original report so I have no idea at all what > the problem is. > > However are you saying that when you run e2fsck v1.43.9 the file > system is fine and when you run e2fsck v1.44.1 there is a problem ? If > so, please show us the problem. > > Also I recall that Andreas asked you to git-bisect the relevant code to > try to pin-point the problem, have you tried that ? It is possible that the problem relates to a specific new feature that is enabled in 1.44 that was not present in 1.43. It would be useful to see what the filesystem feature list is from the filesystem formatted with 1.43.9 vs. 1.44.1, like: # dumpe2fs -h | grep feature Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery dirdata sparse_super large_file huge_file uninit_bg dir_nlink Journal features: journal_incompat_revoke There was also just a 1.44.2 release made, and while it is unlikely that this will solve your problem (I didn't see anything in the changelog that seemed similar to what you report), it wouldn't hurt to try. If this is easily reproducible, it would be most straight forward to run a git bisect to isolate the problem to a specific patch. Check out the e2fsprogs code from Git: # git clone git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git Start Git bisect: # git bisect start Build binaries: # ./configure; make Use the built mke2fs and e2fsck to reproduce your problem. If it is hit (which I would expect with 1.44.2), then mark this version with: # git bisect bad Next, check out 1.43.9 like "git checkout v1.43.9", repeat "Build binaries" and retest. I'd assume 1.43.9 is working properly, so then run: # git bisect good After that, git bisect should automatically check out intermediate releases for you to build and test until it isolates the problem to a single patch. If your problem is only hitting intermittently (say 1/4 reboots), then if it is not hit you should make sure that you retry enough times to ensure the problem is not present before using "git bisect good". If it is hit even once you can use "git bisect bad". Cheers, Andreas >> On Fri, May 18, 2018 at 3:19 PM, RAJESH DASARI <raajeshdasari@xxxxxxxxx> wrote: >>> >>> On Thu, May 3, 2018 at 1:40 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>>> On May 2, 2018, at 10:26 AM, RAJESH DASARI <raajeshdasari@xxxxxxxxx> wrote: >>>>> On Tue, May 1, 2018 at 6:15 PM, Eric Sandeen <esandeen@xxxxxxxxxx> wrote: >>>>>> On 4/30/18 1:27 PM, RAJESH DASARI wrote: >>>>>>> Hi , >>>>>>> >>>>>>> We are noticing an issue with logical volume file system is getting >>>>>>> corrupted after restarting the machine for multiple times. >>>>>> >>>>>> When you say restarting, are you talking about clean reboots, or >>>>>> power fails etc that may replay the log? >>>>> >>>>> It is clean reboot. no power failures. >>>>> >>>>>> (Also note that for a while at least on Fedora, systemd was preventing >>>>>> the root filesystem from unmounting cleanly on reboot.) >>>>>> >>>>>> So, were these log-replay-inducing machine restarts or "clean" reboots? >>>>>> >>>>>>> This issue we have started noticing after upgrading the kernel to 4.4.121. >>>>>> >>>>>> What was the previous kernel that did not seem to exhibit the problem? >>>>> >>>>> we have upgraded from 4.4.106 to 4.4.121 and e2fsprogs from 1.43.9 to >>>>> 1.44.0. After the upgrade this issue is noticed. >>>>> >>>>> Now I have downgraded the kernel to 4.4.106 and downgraded e2fsprogs >>>>> to 1.43.9 and issue is disappeared. >>>> >>>> If that is the case, please try the newer kernel and e2fsprogs independently to isolate which one introduced the problem. Next, do a git-bisect on the relevant code to isolate it to a specific patch. >>>> >>> I tried it independently and noticed that it is the issue with the >>> e2fsprogs version 1.44.0 . I downgraded to 1.43.9 and issue is >>> disappeared. Is it any known issue in e2fsprogs? does the latest >>> version of e2fsprogs contains any fixes for similar issues. Please >>> provide your inputs. >>> >>>> >>>>>> If this happens again, capturing the primary super in some way (i.e. >>>>>> e2image, or even simply using dd to copy it) might be interesting, to see >>>>>> exactly what the corruption is. >>>>>> >>>>> I tried capturing the primary super block using dd command to some >>>>> file, but still i get the same error when i do dumpe2fs on the file. >>>>>> >>>>>>> while running tune2fs -c 1 /dev/VG_NEW/state to set the >>>>>>> mmax_mounts_count we are noticing the error. >>>>>>> >>>>>>> tune2fs -c 1 /dev/VG_NEW/state >>>>>>> tune2fs 1.44.0 (7-Mar-2018) >>>>>>> tune2fs: The ext2 superblock is corrupt while trying to open /dev/VG_NEW/state >>>>>>> Couldn't find valid filesystem superblock. >>>>>>> >>>>>>> lvs command output is below (there are other logical volumes and >>>>>>> volume groups also along with state volume, I have not pasted them to >>>>>>> minimize this post). >>>>>>> Important thing to note here is always state volume only is getting >>>>>>> corrupted and no file system corruption seen on other logical volumes. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP