Dan Williams wrote:
On 10/17/07, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
On 10/17/07, BERTRAND Joël <joel.bertrand@xxxxxxxxxxx> wrote:
BERTRAND Joël wrote:
Hello,
I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
server has a partitionable raid5 array (/dev/md/d0) and I have to
synchronize both raid5 volumes by raid1. Thus, I have tried to build a
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
the second server) and I obtain a BUG :
Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
/dev/sdi1
...
Hello,
I have fixed iscsi-target, and I have tested it. It works now without
any trouble. Patches were posted on iscsi-target mailing list. When I
use iSCSI to access to foreign raid5 volume, it works fine. I can format
foreign volume, copy large files on it... But when I tried to create a
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
receive my well known Oops. You can find my dmesg after Oops :
Can you send your .config and your bootup dmesg?
I found a problem which may lead to the operations count dropping
below zero. If ops_complete_biofill() gets preempted in between the
following calls:
raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion. In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(&sh->lock). Patch attached.
Dan,
I have modified get_stripe_work like this :
static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;
pending = sh->ops.pending;
test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a=ack;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b=ack;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c=ack;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d=ack;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e=ack;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f=ack;
if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
ack++;
g=ack;
sh->ops.count -= ack;
if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
a,b,c,d,e,f,g);
BUG_ON(sh->ops.count < 0);
return pending;
}
and I obtain on console :
1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]
If that can help you...
JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html