29.10.2015, 03:35, "Neil Brown" <neilb@xxxxxxx>: > On Wed, Oct 28 2015, Roman Gushchin wrote: > >> After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") >> __find_stripe() is called under conf->hash_locks + hash. >> But handle_stripe_clean_event() calls remove_hash() under >> conf->device_lock. >> >> Under some cirscumstances the hash chain can be circuited, >> and we get an infinite loop with disabled interrupts and locked hash >> lock in __find_stripe(). This leads to hard lockup on multiple CPUs >> and following system crash. >> >> I was able to reproduce this behavior on raid6 over 6 ssd disks. >> The devices_handle_discard_safely option should be set to enable trim >> support. The following script was used: >> >> for i in `seq 1 32`; do >> dd if=/dev/zero of=large$i bs=10M count=100 & >> done >> >> Signed-off-by: Roman Gushchin <klamm@xxxxxxxxxxxxxx> >> Cc: Neil Brown <neilb@xxxxxxx> >> Cc: Shaohua Li <shli@xxxxxxxxxx> >> Cc: linux-raid@xxxxxxxxxxxxxxx >> Cc: <stable@xxxxxxxxxxxxxxx> # 3.10 - 3.19 > > Hi Roman, > thanks for reporting this and providing a fix. > > I'm a bit confused by that stable range: 3.10 - 3.19 > > The commit you identify as introducing the bug was added in 3.13, so > presumably 3.10, 3.11, 3.12 are not affected. Sure, it's my mistake. Correct range is 3.13 - 3.19. Sorry. > Also the bug is still present in mainline, so 4.0, 4.1, 4.2 are also > affected, though the patch needs to be revised a bit for 4.1 and later. Yes, exactly, but things are a bit more complicated in mainline. I'll try to prepare a patch for mainline in a couple of days. Thanks, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html