RE: Software based SATA RAID-5 expandable arrays?

"Daniel Korstad" <dan@xxxxxxxxxxx> · Fri, 13 Jul 2007 13:23:45 -0500

I can't speak for SuSe issues but I believe there is some confusion on the packages and command syntax.  

So hang on, we are going for a ride, step by step...

Check and repair are not packages per say.

You should have a package called echo.

If you run this;

echo 1

Should get a 1 echoed back at you.

For example;

[root@gateway]# echo 1
1

Or anything else you want;

[root@gateway]# echo check
check

Now all we are doing with this is redirecting with the ">>" to another location, /sys/block/md0/md/sync_action

The difference between a double >> and a single > is the >> will append it to the end and the single > will replace the contents of the file with the value.

For example;
I will create a file called foo;

[root@gateway tmp]# vi foo

In this file I add two lines of text, foo, than I will write and quit :wq

Now I will take a look at the file I just made with my vi editor...

[root@gateway tmp]# cat foo
foo
foo

Great, now I run my echo command to send another value to it.

First I use the double >> to just append;

[root@gateway tmp]# echo foo2 >> foo

Now I take another look at the file;

[root@gateway tmp]# cat foo
foo
foo
foo2

So, I have my first two text lines the third line "foo2" appended.

Now I do this again but use just the single > to replace the file with a value.

[root@gateway tmp]# echo foo3 > foo

Than I look at it again;

[root@gateway tmp]# cat foo
foo3

Ahh, all the other lines are gone and now I just have foo3.

So, > replaces and >> appends.

How does this affect your /sys/block/md0/md/sync_action  file?  As it turns out, it does not matter.

Think of the proc and sys (/proc and /sys) as psuedo file system is a real time, memory resident file system that tracks the processes running on your machine and the state of your system.

So first lets go to /sys/block/

Than I will list its contents;
[root@gateway ~]# cd /sys/block/
[root@gateway block]# ls
dm-0  dm-3  hda  md1  ram0   ram11  ram14  ram3  ram6  ram9  sdc  sdf  sdi
dm-1  dm-4  hdc  md2  ram1   ram12  ram15  ram4  ram7  sda   sdd  sdg
dm-2  dm-5  md0  md3  ram10  ram13  ram2   ram5  ram8  sdb   sde  sdh

This will be different for you since your system will have different hardware and settings, again a pseudo file system.  The dm stuff are my logical volumes and you might have more or less sata drives, the sda, sdb, ...  these were created when I boot the system.  If I add another sata drive, another sdj will be created automatically for me.

So depending on how many raid devices you have (I have four, /boot, swa, /, and my RAID6 data, (md0, md1, md2, md3)) they are listed here too.

So lets go into one, my swap RAID, md1, is small so let go to that one and test this out;

[root@gateway md1]# ls
dev  holders  md  range  removable  size  slaves  stat  uevent

Lets go deeper,
[root@gateway md1]# cd /sys/block/md1/md/

[root@gateway md]# ls
chunk_size      dev-hdc1          mismatch_cnt  rd0         suspend_lo      sync_speed
component_size  level             new_dev       rd1         sync_action     sync_speed_max
dev-hda1        metadata_version  raid_disks    suspend_hi  sync_completed  sync_speed_min

Now lets look at sync_action;

[root@gateway md]# cat sync_action
idle

That is the pseudo file the represents the current state of my RAID md1.

So lets run that echo command and than lets check the state of the RAID;

[root@gateway md]# echo check > sync_action
[root@gateway md]# cat /proc/mdstat
Personalities : [raid1] [raid6]
md1 : active raid1 hdc1[1] hda1[0]
      104320 blocks [2/2] [UU]
      [============>........]  resync = 62.7% (65664/104320) finish=0.0min speed=65664K/sec

So it is in resync state and if there are bad blocks they will be correct from parity.

Now once it is done, lets check that sync_action file again.

[root@gateway md]# cat sync_action
idle

Now remember we used the single redirect, so we replace the value with the text of "check" with our echo command.  Once it was done with the resync, my system changed the value back to "idle".

What about the double ">>" well they append to the file but it will have the over all same effect...

[root@gateway md]# echo check >> sync_action
[root@gateway md]# cat /proc/mdstat
Personalities : [raid1] [raid6]
md1 : active raid1 hdc1[1] hda1[0]
      104320 blocks [2/2] [UU]
      [=========>...........]  resync = 49.0% (52096/104320) finish=0.0min speed=52096K/sec

When it is done the value goes back to idle;

[root@gateway md]# cat sync_action
idle

So, > or >> does not matter here.  And the command you need is echo.

Manipulating the pseudo files in /proc are similar.

Say for example, for security, I don't want my box to respond to pings (1 is for true and 0 is for false),
echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_all

In this case, you want the single > because you want to replace the current value to 1 and not the >> for append.

Also another pseudo file for turning you linux box into a router;
echo 1 > /proc/sys/net/ipv4/ip_forward

As for SuSe updating your kernel, removing your original one and breaking your box by dropping you to a limited shell on boot up..  I can't help you much there.  I don't have SuSe but as I understand, they are a good distro.  In my current distro, Fedora, you can tell the update manager to not update the kernel.  Also in Fedora, it will keep your old kernel by default so if there was an issue, you can select to go back to it in the grub boot up menu.  I believe Ubuntu is similar.  I bet you could configure SuSe to do the same.

I hope that clears up some confusion and good luck.

Dan.

-----Original Message-----
From: Michael [mailto:big_green_jelly_bean@xxxxxxxxx] 
Sent: Friday, July 13, 2007 11:48 AM
To: Daniel Korstad
Cc: davidsen; linux-raid
Subject: Re: Software based SATA RAID-5 expandable arrays?

RESPONSE

I had everything working, but it is evident that when I installed SuSe
the first time check and repair where not included in the package:(  I
did not use the ">>" I used ">", as was incorrectly stated in
many documentations I set up.

The thing that made me suspect check and repair wasn't part of sues was
the failure of "check" or "repair" typed at the command prompt to
respond in any kind other then a response that stated their was no
command.  In addition man check and man repair was also missing.

BROKEN!

I did an auto update of the SuSe machine, which ended up replacing the
kernel.  They added the new entries to the boot choices but the mount
information was not transfered.  SuSe also deleted the original kernel
boot setup.  When suse looked at the drives individually they found
that none of them was recognizable.  Therefor when I woke up this
morning and rebooted the machine after the update, I received the
errors and then dumps me to a basic prompt with limited ability to do
anything.  I know I need to manually remount the drives, but its going
to be a challenge since I did not do this in the past.  The answer to
this question is that I either have to change distro's (which I am
tempted to do) or fix the current distro.  Please do not bother
providing any solutions for I simply have to RTFM (which I haven't had
time to do).

I think I am going to reset up my machines.  The first two drives with
identical boot partitions, yet not mirror them.  I can then manually
run a "tree" copy that would update my second drive as I grow the
system, and after successfull and needed updates.  This would then
allow me a fall back after any updates, and with simply swapping SATA
drive cables from the first boot drive too the second.  I am assuming
this will work.  I then can RAID-6 (or 5) in the setup, recopy my files
(yes I haven't deleted them because I am not confident in my ability
with Linux yet.).  Hopefully I will just simply remount these 4 drives
because there a simple raid 5 array.

SUSE's COMPLETE FAILURES

This frustration with SuSe, the lack of a simple reliable update
utility and the failures I experience has discouraged me from using
SuSe at all.  Its got some amazing tools that help me from constantly
looking up documentation, posting to forums, or going to IRC, but the
unreliable upgrade process is a deal breaker for me.  Its simply to
much work to manually update everything.  This project had a simple
goal, which was to provide an easy and cheap solution to an unlimited
NAS service.

SUPPORT

In addition, SuSe's IRC help channel is among the worst I have
encountered.  The level of support is often very good, but the level of
harassment, flames and simple childish behavior overcomes almost any
attempt at providing any level of support.  I have no problem giving
back to the community when I learn enough to do so, but I will not be
mocked for my inability to understand a new and very in depth system. 
In fact, I tend to goto the wonderful gentoo irc for my answers.  The
IRC is amazing, the people patient and encouraging, the level of
knowledge is the best I have experienced.  This resource, outside the
original incident, has been an amazing resource.  I feel highly
confident asking questions about RAID here, because I know you guys are
actually RUNNING systems that I am attempting to do.

----- Original Message ----
From: Daniel Korstad <dan@xxxxxxxxxxx>
To: big.green.jelly.bean <big_green_jelly_bean@xxxxxxxxx>
Cc: davidsen <davidsen@xxxxxxx>; linux-raid <linux-raid@xxxxxxxxxxxxxxx>
Sent: Friday, July 13, 2007 11:22:45 AM
Subject: RE: Software based SATA RAID-5 expandable arrays?

To run it manually;

echo check >> /sys/block/md0/md/sync_action

than you can check the status with;

cat /proc/mdstat

Or to continually watch it, if you want (kind of boring though :) )

watch cat /proc/mdstat

This will refresh ever 2sec.

In my original email I suggested to use a crontab so you don't need to remember to do this every once in a while.

Run (I did this in root);

crontab -e 

This will allow you to edit you crontab. Now past this command in there;

30 2 * * Mon echo >> check /sys/block/md0/md/sync_action

If you want you can add comments, I like to comment my stuff since I have lots of stuff in mine, just make sure you have '#' in the front of the lines so your system knows it is just a comment and not a command it should run;

#check for bad blocks once a week (every Mon at 2:30am)
#if bad blocks are found, they are corrected from parity information

After you have put this in your crontab, write and quit with this command;

:wq

It should come back with this;
[root@gateway ~]# crontab -e
crontab: installing new crontab

Now you can look at your cron table (without editing) with this;

crontab -l

It should return something like this, depending if you added comments or how you scheduled your command;

#check for bad blocks once a week (every Mon at 2:30am)
#if bad blocks are found, they are corrected from parity information
30 2 * * Mon echo >> check /sys/block/md0/md/sync_action

For more info on crontab and syntax for times (I just did a google and grabbed the first couple links...);
http://www.tech-geeks.org/contrib/mdrone/cron&crontab-howto.htm
http://ubuntuforums.org/showthread.php?t=102626&highlight=cron

Cheers,
Dan.

-----Original Message-----
From: Michael [mailto:big_green_jelly_bean@xxxxxxxxx] 
Sent: Thursday, July 12, 2007 5:43 PM
To: Bill Davidsen; Daniel Korstad
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Software based SATA RAID-5 expandable arrays?

SuSe uses its own version of cron which is different then everything else I have seen, and the documentation is horrible.  However they provide a wonderfull xwindows utility that helps set them up... the problem Im having is figuring out what to run.  When I try to run "/sys/block/md0/md/sync_action" under a prompt it shoots out a permission denied even though I am SU or logged in under Root.  Very annoying.  You mention Check vrs Repair... which brings me too my last issue on setting up this machine.  How do you send an email when Check, SMART, and when a RAID drive fails?  How do you auto repair if the Check fails?

These are the last things I need to do for my Linux Server to work right... after I get all of this done, I will change the boot to goto the command prompt and not XWindows, and I will leave it in the corner of my room hopefully not to be used for as long as possible.

----- Original Message ----
From: Bill Davidsen <davidsen@xxxxxxx>
To: Daniel Korstad <dan@xxxxxxxxxxx>
Cc: Michael <big_green_jelly_bean@xxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx
Sent: Wednesday, July 11, 2007 10:21:42 AM
Subject: Re: Software based SATA RAID-5 expandable arrays?

Daniel Korstad wrote:
> You have lots of options.  This will be a lengthy response and will give just some ideas for just some of the options...
>  
>   
Just a few thoughts below interspersed with your comments.
> For my server, I had started out with a single drive.  I later migrated to migrate to a RAID 1 mirror (after having to deal with reinstalls after drive failures I wised up).  Since I already had an OS that I wanted to keep, my RAID-1 setup was a bit more involved.  I following this migration to get me there;
> http://wiki.clug.org.za/wiki/RAID-1_in_a_hurry_with_grub_and_mdadm
>  
> Since you are starting from scratch, it should be easier for you.  Most distros will have an installer that will guide you though the process.  When you get to hard drive partitioning, look for an advance option or review and modify partition layout option or something similar otherwise it might just make a guess of what you want and that would not be RAID.  In this advance partition setup, you will be able to create your RAID.  First you make equal size partitions on both physical drives.  For example, first carve out 100M partition on each of the two physical OS drives, than make a RAID 1 md0 with each of this partitions and than make this your /boot.  Do this again for other partitions you want to have RAIDed.  You can do this for /boot, /var, /home, /tmp, /usr.  This is can be nice to have a separations incase a user fills /home/foo with crap and this will not effect other parts of the OS, or if mail spool fills up, it will not hang the OS.  Only problem it
 determining how big to make them during the install.  At a minimum, I would do three partitions; /boot, swap, and /  This means all the others (/var, /home, /tmp, /usr) are in the / partition but this way you don't have to worry about sizing them all correctly. 
>  
> For the simplest setup, I would do RAID 1 for /boot (md0), swap (md1), and / (md2)  (Alternatively, your could make a swap file in / and not have a swap partition, tons of options...)  Do you need to RAID your swap?  Well, I would RAID it or make a swap file within a RAID partition.  If you don't and your system is using swap and you lose a drive that has swap information/partition on it, you might have issues depending on how important that information in the failed drive was.  You systems might hang.
>  
>   
Note that RAID-10 generally performs better than mirroring, particularly 
when more than a few drives are involved. This can have performance 
implications for swap, when large i/o pushes program pages out of 
memory. The other side of that coin is that "recovery CDs" don't seem to 
know how to use RAID-10 swap, which might be an issue on some systems.
> After you go through the install and have a bootable OS that is running on mdadm RAID, I would test it to make sure grub was installed correctly to both the physical drives.  If grub is not installed to both drives, and you lose one drive down the road and if that one was the one with grub, you will have a system that will not boot even though it has a second drive with a copy of all the files.  If this were to happen, you can recover by booting with a bootable linux CD or recover disk and manually installing grub too. For example say you only had grub installed to hda and it failed, boot with a live linux cd and type (assuming /dev/hdd is the surviving second drive);
> grub
>  device (hd0) /dev/hdd
>  root (hd0,0)
>  setup (hd0)
>  quit
> You say you are using two 500G drives for the OS.  You don't necessary have to use all the space for the OS.  You can make your partitions and take the left over space and throw it into a logical volume.  This logical volume would not be fault tolerant, but would be the sum of the left over capacity from both drives.  For example, you use 100M for /boot and 200G for / and 2G for swap.  Take the rest and make a standard ext3 partition for the remaining space on both drives and put them in a logical volume giving over 500G to play with for non critical crap.
>  
> Why do I use RAID6?  For the extra redundancy and I have 10 drives in my arrary.  
> I have been an advocate for RAID 6, especially with the every increasing drive capacity and the number of drives in the array is above say six;
> http://www.intel.com/technology/magazine/computing/RAID-6-0505.htm 
>  
>   
Other configurations will perform better for writes, know your i/o 
performance requirements.
> http://storageadvisors.adaptec.com/2005/10/13/raid-5-pining-for-the-fjords/ 
> "...for using RAID-6, the single biggest reason is based on the chance of drive errors during an array rebuild after just a single drive failure. Rebuilding the data on a failed drive requires that all the other data on the other drives be pristine and error free. If there is a single error in a single sector, then the data for the corresponding sector on the replacement drive cannot be reconstructed. Data is lost. In the drive industry, the measurement of how often this occurs is called the Bit Error Rate (BER). Simple calculations will show that the chance of data loss due to BER is much greater than all the other reasons combined. Also, PATA and SATA drives have historically had much greater BERs, i.e., more bit errors per drive, than SCSI and SAS drives, causing some vendors to recommend RAID-6 for SATA drives if they¢re used for mission critical data."
>  
> Since you are using only four drives for your data array, the overhead for RAID6 (two drives for parity) might not be worth it.  
>  
> With four drives you would be just fine with a RAID5.
> However, I would make a cron for the command to run every once in awhile.  Add this to your crontab...
>
> #check for bad blocks once a week (every Mon at 2:30am)if bad blocks are found, they are corrected from parity information 
> 30 2 * * Mon echo check /sys/block/md0/md/sync_action
>  
> With this, you will keep hidden bad blocks to a minimum and when a drive fails, you won't be likely bitten by a hidden bad block(s) during a rebuild.
>  
>   
I think a comment on "check" vs. "repair" is appropriate here. At the 
least "see the man page" is appropriate.
> For your data array, I would make one partition of Linux raid (FD) and have one partition for the whole drive in each physical drive.  Than create your raid.  
>  
> mdadm --create /dev/md3 -l 5 -n 4 /dev/<your data drive1-partition> /dev/<your data drive2-partition> /dev/<your data drive3-partition> /dev/<your data drive4-partition>  <---the /dev/md3 can be what you want and will depend on how many other previous raid arrays you have, so long as you use a number not currently used.  
>  
> My filesystem of choice is XFS, but you get to pick your own poison:
> mkfs.xfs /-f /dev/md3
>  
> Mount the device :
> mount /dev/md3 /foo
>  
> I would edit your /etc/fstab to have it automounted for each startup.
>  
> Dan.
>  
Other misc comments: mirroring your boot partition on drives which the 
BIOS won't use is a waste of bytes. If you have more than, say four, 
drives fail to function you probably have a system problem other than 
disk. And some BIOS versions will boot a secondary drive if the primary 
fails hard but not if it has a parity or other error, which can enter a 
retry loop (I *must* keep trying to boot). This behavior can be seen on 
at least one major server hardware from a big name vendor, it's not just 
cheap desktops. The solution, ugly as it is, is to use the firmware 
"RAID" on the motherboard controller for boot, and I have several 
systems with low cost small PATA drives in mirror just for boot (after 
which they are spun down with hdparm settings) for this reason.

Really good notes, people should hang onto them!

-- 
bill davidsen <davidsen@xxxxxxx>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

____________________________________________________________________________________
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
http://farechase.yahoo.com/

____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html