RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

"Ross S. W. Walker" <rwalker@xxxxxxxxxxxxx> · Fri, 19 Oct 2007 17:08:25 -0400

BERTRAND Joël wrote:
> 
> BERTRAND Joël wrote:
> > BERTRAND Joël wrote:
> >> Bill Davidsen wrote:
> >>> Dan Williams wrote:
> >>>> On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
> >>>>  
> >>>>>         I run for 12 hours some dd's (read and write in nullio)
> >>>>> between
> >>>>> initiator and target without any disconnection. Thus 
> iSCSI code seems
> >>>>> to
> >>>>> be robust. Both initiator and target are alone on a 
> single gigabit
> >>>>> ethernet link (without any switch). I'm investigating...
> >>>>>     
> >>>>
> >>>> Can you reproduce on 2.6.22?
> >>>>
> >>>> Also, I do not think this is the cause of your failure, 
> but you have
> >>>> CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' 
> will compile
> >>>> out the unneeded checks for offload engines in async_memcpy and
> >>>> async_xor.
> >>>
> >>> Given that offload engines are far less tested code, I 
> think this is 
> >>> a very good thing to try!
> >>
> >>     I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only 
> uses 40% of one 
> >> CPU when I rebuild my raid1 array. 1% of this array was now 
> >> resynchronized without any hang.
> >>
> >> Root gershwin:[/usr/scripts] > cat /proc/mdstat
> >> Personalities : [raid1] [raid6] [raid5] [raid4]
> >> md7 : active raid1 sdi1[2] md_d0p1[0]
> >>       1464725632 blocks [2/1] [U_]
> >>       [>....................]  recovery =  1.0% 
> (15705536/1464725632) 
> >> finish=1103.9min speed=21875K/sec
> > 
> >     Same result...
> > 
> > connection2:0: iscsi: detected conn error (1011)
> > 
> >          session2: iscsi: session recovery timed out after 120 secs
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> 
> 	Sorry for this last mail. I have found another mistake, 
> but I don't 
> know if this bug comes from iscsi-target or raid5 itself. 
> iSCSI target 
> is disconnected because istd1 and md_d0_raid5 kernel threads 
> use 100% of 
> CPU each !
> 
> Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
> Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi, 
>  0.0%si, 
> 0.0%st
> Mem:   4139032k total,   218424k used,  3920608k free,    
> 10136k buffers
> Swap:  7815536k total,        0k used,  7815536k free,    
> 64808k cached
> 
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
> 
>   5824 root      15  -5     0    0    0 R  100  0.0  10:34.25 istd1 
> 
>   5599 root      15  -5     0    0    0 R  100  0.0   7:25.43 
> md_d0_raid5
> 
> 	Regards,
> 
> 	JKB

If you have 2 iSCSI sessions mirrored then any failure along either
path will hose the setup. Plus having iSCSI and MD RAID fight over
same resources in kernel is a recipe for a race condition.

How about exploring MPIO and DRBD?

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html