RE: Please review: Slackware RAID How-To

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I looked at the code.  The stride option only seems to align tables on a
boundary (stride size).  The stride size is only used to define the starting
point of a table, not the table size.  The stride size does not affect disk
writes as far as I can tell.  At least not directly.

Also, the change log talks about RAID0 stride size.  No comments on RAID5.
The comments are dated 1997 and 1998.  Maybe RAID5 came after 1997/1998.

Anyway, based on what I found in the code, I bet if you base "stride size"
on "chunk size" you will be fine.

But maybe "stride size" should be based on "stripe size" sometime in the
future!  See my comments below.

Guy

This is the code:
===========================================================================
This is the only place stride is used in the code.
>From misc/mke2fs.c
/*
         * Allocate the block and inode bitmaps, if necessary
         */
        if (fs->stride) {
                start_blk = group_blk + fs->inode_blocks_per_group;
                start_blk += ((fs->stride * group) %
                              (last_blk - start_blk));
                if (start_blk > last_blk)
                        start_blk = group_blk;
        } else
                start_blk = group_blk;

===========================================================================
There was also some other hard coded STRIDE_LENGTH stuff.
STRIDE_LENGTH is NOT related to stride.
I think they should be the same.
The disk i/o is at a size of STRIDE_LENGTH.
If this was 1 stripe the RAID5, RAID0 or RAID6 software may go faster.
This hard coded 8 seems wrong to me.
I bet it matched the writers chunk size at the time, and maybe he planned to
make it a parameter after testing.

Also, next_update_incr is not used.  What if num/100 is < 1?
This line:
	next_update += num / 100;
Should be:
	next_update += next_update_incr;

>From misc/mke2fs.c

#define STRIDE_LENGTH 8

/* Allocate the zeroizing buffer if necessary */
        if (!buf) {
                buf = malloc(fs->blocksize * STRIDE_LENGTH);
                if (!buf) {
                        com_err("malloc", ENOMEM,
                                _("while allocating zeroizing buffer"));
                        exit(1);
                }
                memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
        }
        /* OK, do the write loop */
        next_update = 0;
        next_update_incr = num / 100;
        if (next_update_incr < 1)
                next_update_incr = 1;
        for (j=0; j < num; j += STRIDE_LENGTH, blk += STRIDE_LENGTH) {
                count = num - j;
                if (count > STRIDE_LENGTH)
                        count = STRIDE_LENGTH;
                retval = io_channel_write_blk(fs->io, blk, count, buf);
                if (retval) {
                        if (ret_count)
                                *ret_count = count;
                        if (ret_blk)
                                *ret_blk = blk;
                        return retval;
                }
                if (progress && j > next_update) {
                        next_update += num / 100;
                        progress_update(progress, blk);
                }
        }
        return 0;

==========================================================================




-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Guy
Sent: Wednesday, May 19, 2004 11:11 PM
To: 'John Lange'
Cc: 'LinuxRaid'
Subject: RE: Please review: Slackware RAID How-To

The man page says "strip size", not "chunk size", which is correct?

RAID5:
	"strip size" = ("Number of disks in array" - 1) * "chunk size"
RAID6:
	"strip size" = ("Number of disks in array" - 2) * "chunk size"

"Number of disks in array" does not include spares!

It would be GREAT for performance if writes were full strips at a time,
since no reads would be required.

I don't think it would help performance if writes were full chunks at a
time, since the target chunk would still need to be read to compute the
parity chunk.

Any filesystem gods out there?  Any opinions?

Guy


-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of John Lange
Sent: Wednesday, May 19, 2004 9:42 PM
To: Guy
Cc: 'LinuxRaid'
Subject: RE: Please review: Slackware RAID How-To

On Sun, 2004-05-16 at 23:51, Guy wrote:
> md2 has 4 disks with a chunk size of 128K.  Since only 3 disks are used
for
> data, and the filesystem block size is 4K, the stride size should be
> 128*3/4, or 96.
> Change:
> 	mke2fs -b 4096 -R stride=32 /dev/md2
> To:
> 	mke2fs -b 4096 -R stride=96 /dev/md2
> 
> My logic:
> 	"Stripe size" is "chunk size" times "number of data disks".
> 	From example:
> 		"chunk size" = 128
> 		"number of data disks" = "nr-raid-disks" - 1 (-2 if RAID6)

I think this is incorrect. At this point I defer to the
Software-RAID-HOWTO.

http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.10

>From that document stride is chucksize/blocksize . Number of disks does
not enter into it.

So with a chunksize of 128, and a block size of 4 it would be:

128K/4K = 32 for stride.

If this is indeed correct I will be sure to expand that area of the
How-To so it is more clear.

Thanks very much for your feedback.

Regards,

John Lange

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux