https://bugzilla.kernel.org/show_bug.cgi?id=29402 Summary: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 Product: File System Version: 2.5 Kernel Version: 2.6.38-rc5 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: ext4 AssignedTo: fs_ext4@xxxxxxxxxxxxxxxxxxxx ReportedBy: eric.whitney@xxxxxx Regression: Yes Created an attachment (id=48352) --> (https://bugzilla.kernel.org/attachment.cgi?id=48352) captured console output - spinlock bad magic: ext4lazyinit The 2.6.38-rc5 kernel can panic while running any one of the ffsb profiles in http://free.linux.hp.com/~enw/ext4/profiles on an ext4 filesystem on a 48 core x86 system. These panics occur most frequently using the 48 or 192 thread versions of those profiles. The problem has been reproduced on a 16 core x86 system using identical storage, but occurs there at lower frequency. On average, it takes only two runs of "large_file_creates_threads-192.ffsb" to produce a panic on the 48 core system. The panics occur more or less equally frequently on a vanilla ext4 filesystem, ext4 filesystem without a journal, and an ext4 filesystem with a journal but mounted with mblk_io_submit. With various debugging options enabled including spinlock debugging, panics or oopses or BUGS occur in four varieties: protection violation, invalid opcode, NULL pointer, and spinlock bad magic. Typically, the first fault triggers a cascade of subsequent oopses, etc. These panics can be suppressed by using -E lazy_itable_init at mkfs time. The test system survived two series of 10 ffsb tests beginning with a single mkfs each. Subsequently, the system survived a run of about 16 hours in which a complete scalability measurement pass was made Repeated ffsb runs on ext3 and xfs filesystems on 2.6.38-rc* have not produced panics. Numerous previous ffsb scalability runs on ext4 and 2.6.37 did not produce panics. The panics can be produced using either HP SmartArray (backplane RAID) or FibreChannel storage with no material difference in the panic backtraces. Attempted bisection of the bug in 38-rc1 was inconclusive. Repeatability was lost the earlier in -rc1 I got. The last clear indication was in the midst of perf changes very early in the release (SHA id beginning with 006b20fe4c, "Merge branch 'perf/urgent' into perf/core"). Preceding that are RCU and GFS patches, plus a small number of x86 patches. Relatively little useful spinlock debugging information was reported in repeated tests in early 38 rc's - with later rc's, more information gradually became visible (or maybe I was just getting progressively more lucky). The first attachment contains the partial backtrace that most clearly suggests lazy_itable_init involvement. The softirq portion of this backtrace tends to look the same across the panics I've seen. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html