On 10/17/07, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On 10/17/07, BERTRAND Joël <joel.bertrand@xxxxxxxxxxx> wrote: > > BERTRAND Joël wrote: > > > Hello, > > > > > > I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each > > > server has a partitionable raid5 array (/dev/md/d0) and I have to > > > synchronize both raid5 volumes by raid1. Thus, I have tried to build a > > > raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from > > > the second server) and I obtain a BUG : > > > > > > Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 > > > /dev/sdi1 > > > ... > > > > Hello, > > > > I have fixed iscsi-target, and I have tested it. It works now without > > any trouble. Patches were posted on iscsi-target mailing list. When I > > use iSCSI to access to foreign raid5 volume, it works fine. I can format > > foreign volume, copy large files on it... But when I tried to create a > > new raid1 volume with a local raid5 volume and a foreign raid5 volume, I > > receive my well known Oops. You can find my dmesg after Oops : > > > > Can you send your .config and your bootup dmesg? > I found a problem which may lead to the operations count dropping below zero. If ops_complete_biofill() gets preempted in between the following calls: raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack); raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending); ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL causing the assertion. In fact, the 'pending' bit should always be cleared first, but the other cases are protected by spin_lock(&sh->lock). Patch attached. -- Dan
Attachment:
fix-biofill-clear.patch
Description: Binary data