On Fri, Jun 27, 2014 at 3:53 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: > On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >> On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: >>> On 17 June 2014 01:29, John Stultz <john.stultz@xxxxxxxxxx> wrote: >>>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: >>>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: >>>>>> This patch based upon my latest mmc tree and the next branch. I tried >>>>>> to apply it for 3.15, and I think you will be able resolve the >>>>>> conflict - I should be quite trivial. >>>>> >>>>> No worries. I just didn't want to waste time resolving it if it was >>>>> logically dependent on some other change. >>>>> >>>>> I'll give it a shot and get back to you. >>>> >>>> So unfortunately I'm still seeing trouble.. >>>> >>>> [ 94.202843] EXT4-fs error (device mmcblk0p5): >>>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in >>>> gd; block bitmap corrupt. >>>> [ 94.203873] Aborting journal on device mmcblk0p5-8. >>>> [ 94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): >>>> panic forced after error >>>> [ 94.206553] >>>> [ 94.207420] CPU: 0 PID: 1 Comm: init Not tainted >>>> 3.15.0-00002-g044f37a-dirty #589 >>>> [ 94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>] >>>> (show_stack+0x11/0x14) >>>> [ 94.208835] [<c000f3f1>] (show_stack) from [<c042d599>] >>>> (dump_stack+0x59/0x7c) >>>> [ 94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178) >>>> [ 94.209724] [<c042a57f>] (panic) from [<c0135055>] >>>> (ext4_handle_error+0x69/0x74) >>>> [ 94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>] >>>> (__ext4_grp_locked_error+0x6b/0x160) >>>> [ 94.210747] [<c01358db>] (__ext4_grp_locked_error) from >>>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c) >>>> [ 94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>] >>>> (ext4_mb_init_cache+0x219/0x4e0) >>>> [ 94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>] >>>> (ext4_mb_init_group+0xbb/0x13c) >>>> [ 94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>] >>>> (ext4_mb_good_group+0xf3/0xfc) >>>> [ 94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>] >>>> (ext4_mb_regular_allocator+0x153/0x2c4) >>>> [ 94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from >>>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4) >>>> [ 94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>] >>>> (ext4_ext_map_blocks+0x965/0x10f0) >>>> [ 94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>] >>>> (ext4_map_blocks+0xff/0x374) >>>> [ 94.219200] [<c0126839>] (mpage_map_and_submit_extent) from >>>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8) >>>> [ 94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>] >>>> (do_writepages+0x19/0x28) >>>> [ 94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>] >>>> (__filemap_fdatawrite_range+0x3d/0x44) >>>> [ 94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from >>>> [<c008cc3f>] (filemap_flush+0x23/0x28) >>>> [ 94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>] >>>> (ext4_rename+0x2f9/0x3e4) >>>> [ 94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>] >>>> (vfs_rename+0x183/0x45c) >>>> [ 94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>] >>>> (SyS_renameat2+0x22b/0x26c) >>>> [ 94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>] >>>> (SyS_rename+0x1f/0x24) >>>> [ 94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>] >>>> (ret_fast_syscall+0x1/0x5c) >>>> >>>> >>>> That said, this mirrors the behavior when I was reverting your change >>>> by hand on-top of 3.15. While git bisect pointed to your patch and >>>> reverting it from the commit seems to resolve the issue at that point, >>>> there seems to be some other commit in the 3.14->3.15-rc1 interval >>>> that is causing problems as well. >>>> >>>> Are there any sort of debugging options for mmc that I can use to try >>>> to better narrow down whats going wrong? >>> >>> It seems like you want to debug the mmci host driver and unfortunate >>> the debug utilities available are only dev_dbg prints. I wouldn't be >>> surprised if the problem goes away when you enable them. :-) >>> >>> I have some other locally stored debug patches for mmci, but those are >>> not re-based and I am not sure you want to deal with them as is. >>> >>> I guess I need to set up the QEMU environment and run the tests >>> myself, unless we go for the revert path. >>> How do you perform the tests, is just a simple mounting/un-mounting >>> that triggers the problem? >>> Any specific things that I need to think of when running QEMU? >> >> FWIW, I'm hitting this problem as well. For me, it is every time I try >> to boot. Only reverting to 3.14 makes it go away, and this series >> doesn't fix it for me either. :( >> >> My only difference is that I don't run with an initrd: >> >> qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb >> rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage >> -drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append >> "root=/dev/mmcblk0p1 console=ttyAMA0" > > I've been continuing to try to bisect this down with > 8d94b54d99ea968a9d188ca0e68793ebed601220 and > e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems > like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the > bisection results are really inconsistent. I suspect it actually > shows up earlier, its just its harder to trip the problem with the > patches reverted, so I'm marking good commits that are actually bad. > > If you are seeing this on every bootup, it might be worth trying to do > the bisection with the two commits above reverted to see if you can > narrow it down any better? And now I can't reproduce it! I think I was being tricked by filesystem corruption that spanned some of my test boots. I'm going to start this over and try again. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html