Re: Is partition alignment needed for RAID partitions ?

Pieter De Wit <pieter@xxxxxxxxxxxxx> · Tue, 31 Dec 2013 01:10:15 +1300

Hi Stan,
Size is incorrect in what way?  If your RAID0 chunk is 512KiB, then
3407028224 sectors is 3327176 chunks, evenly divisible, so this
partition is fully aligned.  Whether the capacity is correct is
something only you can determine.  Partition 2 is 1.587 TiB.
Would you mind showing me the calc you did to get there, 
3407028224/3327176=1024, I don't understand how the 512kiB came into play ?
I'm not intending to be jerk, but this is a technical mailing list.
Understood - here is the complete layout:

/dev/sda - 250 gig disk
/dev/sdb - 2TB disk
/dev/sdc - 2TB disk
/dev/sdd - 256gig iSCSI target on QNAP NAS (block allocated, not thin 
prov'ed)
/dev/sde - 2TB iSCSI target on QNAP NAS (block allocated, not thin prov'ed)
Show your partition table for sdc.  Even if the partitions on it are not
aligned, reads shouldn't be adversely affected by it.  Show

$ mdadm --detail
# parted /dev/sdb unit s print
Model: ATA WDC WD20EARX-008 (scsi)
Disk /dev/sdb: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start       End          Size         File system  Name Flags
 1      2048s       500000767s   499998720s raid
 2      500000768s  3907028991s  3407028224s raid

# parted /dev/sdc unit s print
Model: ATA WDC WD20EARX-008 (scsi)
Disk /dev/sdc: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start       End          Size         File system  Name Flags
 1      2048s       500000767s   499998720s raid
 2      500000768s  3907028991s  3407028224s raid

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Dec 30 12:33:43 2013
     Raid Level : raid1
     Array Size : 249868096 (238.29 GiB 255.86 GB)
  Used Dev Size : 249868096 (238.29 GiB 255.86 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Tue Dec 31 01:01:42 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : srv01:0  (local to host srv01)
           UUID : 45d71ef8:9a1115cb:8ed0c4d9:95d56df4
         Events : 25

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1

# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Mon Dec 30 12:33:56 2013
     Raid Level : raid0
     Array Size : 3407027200 (3249.19 GiB 3488.80 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Dec 30 12:33:56 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : srv01:1  (local to host srv01)
           UUID : abfdcb5e:804fa119:9c4a8d88:fa2f08a7
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       18        0      active sync   /dev/sdb2
       1       8       34        1      active sync   /dev/sdc2

for the RAID0 array.  md itself, especially in RAID0 personality, is
simply not going to be the -cause- of low performance.  The problem lay
somewhere else.  Given the track record of Western Digital's Green
series of drives I'm leaning toward that cause.  Post output from

$ smartctl -A /dev/sdb
$ smartctl -A /dev/sdc
# smartctl -A /dev/sdb
smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
  3 Spin_Up_Time            0x0027   217   186   021    Pre-fail 
Always       -       4141
  4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       102
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age 
Always       -       8263
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age 
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age 
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       102
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       88
193 Load_Cycle_Count        0x0032   155   155   000    Old_age 
Always       -       135985
194 Temperature_Celsius     0x0022   121   108   000    Old_age 
Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

# smartctl -A /dev/sdc
smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
  3 Spin_Up_Time            0x0027   217   186   021    Pre-fail 
Always       -       4141
  4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       100
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age 
Always       -       8263
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       100
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       86
193 Load_Cycle_Count        0x0032   156   156   000    Old_age 
Always       -       134976
194 Temperature_Celsius     0x0022   122   109   000    Old_age 
Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

I would have expected the RAID0 device to easily get
up to the 60meg/sec mark ?
As the source disk of a bulk file copy over NFS/CIFS?  As a point of
reference, I have a workstation that maxes 50MB/s FTP and only 24MB/s
CIFS to/from a server.  Both hosts have far in excess of 100MB/s disk
throughput.  The 50MB/s limitation is due to the cheap Realtek mobo NIC,
and the 24MB/s is a Samba limit.  I've spent dozens of hours attempting
to tweak Samba to greater throughput but it simply isn't capable on that
machine.

Your throughput issues are with your network, not your RAID.  Learn and
use FIO to see what your RAID/disks can do.  For now a really simple
test is to time cat of a large file and pipe to /dev/null.  Divide the
file size by the elapsed time.  Or simply do a large read with dd.  This
will be much more informative than "moving data to a NAS", where your
throughput is network limited, not disk.

The system is using a server grade NIC, I will run a dd/network test
shortly after the copy is done. (I am shifting all the data back to the
NAS, incase I mucked up the partitions :) ), I do recall that this
system was able to fill a gig pipe...
Now that you've made it clear the first scenario was over iSCSI same as
the 2nd scenario, and not NFS/CIFS, I doubt the TCP stack is the
problem.  Assume the network is fine for now and concentrate on the disk
drives in the host.  That's seems the most likely cause of the problem
at this point.

BTW, you didn't state the throughput of the RAID1 device on sdb/sdc.
The RAID0 device is on the same disks, yes?  RAID0 was 15 MB/s.  What
was the RAID1?

ATM, the data is still moving back to the NAS (from the RAID1 device). 
According to iostat, this is reading at +30000 kB/s (all of my numbers 
are from iostat -x)

Also, there is no other disk usage in the system. All the data is 
currently on the NAS (except system "stuff" for a quite firewall)

I just spotted another thing, the two drives are on the same SATA 
controller, from rescan-scsi-bus:

Scanning for device 3 0 0 0 ...
OLD: Host: scsi3 Channel: 00 Id: 00 Lun: 00
      Vendor: ATA      Model: WDC WD20EARX-008 Rev: 51.0
      Type:   Direct-Access                    ANSI SCSI revision: 05
Scanning for device 3 0 1 0 ...
OLD: Host: scsi3 Channel: 00 Id: 01 Lun: 00
      Vendor: ATA      Model: WDC WD20EARX-008 Rev: 51.0
      Type:   Direct-Access                    ANSI SCSI revision: 05

Would it be better to move these apart ? I remember IDE used to have 
this issue, but I also recall SATA "fixed" that.

Thanks again,

Pieter
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html