Re: LVM on raid10,f2 performance issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmm, 

Why is the command

 blockdev --setra 65536 /dev/md0

really needed? I think the kernel should set a reasonable default here.

What is the logic? I am in the following trying to discuss what would be
reasonable for a kernel patch to achieve.

The command sets readahead to 32 MiB . Is that really wanted?
I understand that it really is important for our benchmarks to give good
results. But is it useful in real operation? Or can a smaller value
solve the problem?

reading 32 MB takes about 300 - 500 ms - and this needs to be done for
every read, even for small reads. That is a lot. For database operations
this would limit operations to 2 to 3 transactions per second. A normal
7200 rpm drive is capable of say about 100 tps, so this would slow such
transactions down with a factor of 30 to 50...

maybe a parameter to blockdev of 16384 - or 8 MiB - would be sufficient?
This would limit the time spent on each transaction to about 100 ms.

And this could be dependent on the relevant parameters, say drive
numbers and chunk size. Maybe the trick is to read a full stripe set,
that is number of drives times chunk size. For a 4 drive array with
chunk size 256 KiB this would be 1 MiB or a --setra paramaeter of 2048.

Maybe the trick is to read more stripe sets at the same time.
For raid5 and raid6 reads the parity chunks need not be read so it
would be a waiste to read the full stripe set.
I am not fully sure what is going on. Maybe somebody can enlighten me.

Or maybe the readahead is not the real parameter that needs to be set
correctly - but maybe something else needs to be fixed, maybe some
logic.

best regards
keld

On Sun, Jan 18, 2009 at 08:24:42PM -0500, thomas62186218@xxxxxxx wrote:
> Hi everyone,
> 
> I too was seeing miserable read-performance with LVM2 volumes on top of 
> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 
> 300GB 15K SAS drives on a 4-port LSI PCIe SAS controller.
> 
> I use:
> blockdev --setra 65536 /dev/md0
> 
> And this dramatically increased my RAID 10 read performance.
> 
> You MUST do the same for your LVM2 volumes for them to see a comparable 
> performance boost.
> 
> blockdev --setra 65536 /dev/mapper/raid10-testvol
> 
> Otherwise, your LVM will default to 256 read-ahead value, which stinks. 
> I increased my read performance by 3.5x with this one change! See below:
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 50.8923 s, 206 MB/s
> 
> root@b410:~# blockdev --setra 65536 /dev/mapper/raid10twelve256k-testvol
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 14.4057 s, 728 MB/s
> 
> 
> Enjoy!
> -Thomas
> 
> -----Original Message-----
> From: Michal Soltys <soltys@xxxxxxxx>
> To: Holger Mauermann <mauermann@xxxxxxxxx>
> Cc: Keld Jørn Simonsen <keld@xxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx
> Sent: Wed, 3 Dec 2008 1:43 am
> Subject: Re: LVM on raid10,f2 performance issues
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Holger Mauermann wrote: 
> 
> >Keld Jørn Simonsen schrieb: 
> 
> >>How is it if you use t
> he raid10,f2 without lvm? 
> 
> >>What are the numbers? 
> 
> >
> >After a fresh installation LVM performance is now somewhat better. I 
> 
> >don't know what was wrong before. However, it is still not as fast as 
> 
> >the raid10... 
> 
> >
> >dd on raw devices 
> 
> >----------------- 
> 
> >
> >raid10,f2: 
> 
> >  read : 409 MB/s 
> 
> >  write: 212 MB/s 
> 
> >
> >raid10,f2 + lvm: 
> 
> >  read : 249 MB/s 
> 
> >  write: 158 MB/s 
> 
> >
> >
> >sda:  sdb:  sdc:  sdd: 
> 
> >---------------------- 
> 
> >YYYY  ....  ....  XXXX 
> 
> >....  ....  ....  .... 
> 
> >XXXX  YYYY  ....  .... 
> 
> >....  ....  ....  .... 
>  
> 
>  
> 
> Regarding the layout from your first mail - this is how it's supposed 
> to
> be. LVM's header took 3*64KB (you can control that with --metadatasize,
> and verify with e.g. pvs -o+pe_start), and then the first 4MB extent
> (controlled with --physicalextentsize) of the first logical volume
> started - on sdd and continued on sda. Mirrored data was set "far" from
> that, and shifted one disk to the right - as expected from raid10,f2. 
>  
> 
> As for performance, hmmm. Overally - there're few things to consider
> when doing lvm on top of the raid: 
>  
> 
> - stripe vs. extent alignment 
> 
> - stride vs. stripe vs. extent size 
> 
> - filesystem's awareness 
> that there's also raid a layer below 
> 
> - lvm's readahead (iirc, only uppermost layer matters - functioning as 
> a
> hint for the filesystem) 
>  
> 
> But this is particulary important for raid with parities. Here
> everything is aligned already, and parity doesn't exist. 
>  
> 
> But the last point can be relevant - and you did test with filesystem
> after all. Try setting readahead with blockdev or lvchange (the latter
> will be permananet across lv activations). E.g. 
>  
> 
> #lvchange -r 2048 /dev/mapper... 
>  
> 
> and compare to raw raid10: 
>  
> 
> #blockedv --setra 2048 /dev/md... 
>  
> 
> If you did your tests with ext2/3, also try to create it with -E 
> stride=
> stripe-width= option in both cases. Similary to sunit/swidth if you 
> used
> xfs. 
>  
> 
> You might also create volume group with larger extent - such as 512MB
> (as 4MB granularity is often an overkill). Performance wise it 
> shouldn't
> matter in this case though. 
>  
> 
> -- 
> 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in 
> 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html ;
> 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux