Thanks, yes the workload that this system will do is a lot of small I/O requests. From the system that I have logs from (I am trying to gather logs from all systems to fine tune the sizes) the requests are in the ~64-128KiB range to the drive subsystem and very random in nature. So a 4MiB PE or even 8MiB PE shouldn't be much of a problem (assuming the workloads on the other boxes is similar). This is why I am planning on adding as many spindles as I can to grow that way more so than streaming I/O. I understand the limits of outstanding commands IOPS (read/write) and scsi command queue depth, which the later (command queue depth) is the biggest item that I see what would push me to create more/smaller (instead of raid6 (13+2+1) for each PV something like a raid-6 of (5+2+1) which would lower storage efficiency but increase the number of PV's and command queue depth aggregate (as well as increasing write iops but this is mainly a read-request array not many writes). I am more looking for examples of builds that have real-world broached the 100+TB range under linux and what kinds of gotcha's I'm in for. My current arrays that are going to be merged into this are on average ~20-30TiB in size each, since all serve similar (and somewhat overlapping functions) merging them is in order. Steve -----Original Message----- From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com] On Behalf Of Marek Podmaka Sent: Tuesday, December 23, 2008 04:28 To: LVM general discussion and development Subject: Re: LVM2 robustness w/ large (>100TB) name spaces? Hello, Tuesday, December 23, 2008, 1:15:28, Steve Costaras wrote: > - What are the limits on PE/LE's per logical volume (>200,000,000? A > problem?) (I will be attaching multiple external chassis like above to > several HBA's and will be using LVM striping to increase performance. So a > small PE size (4MB-8MB) would be best to aid in the distribution of > requests across the physical subsystems.) I think 4-8 MB for PE size is too small when you will be using such big (and probably advanced arrays). LVM stripping (strip size in hundreds of kB) would kill any array, because when you request for example 512 kB from one array and next 512 kB from another array, they can't handle it efficiently. You won't see the benefit of reading from all 16 spindles - everytime it will just load 512 kB from one physical disk. Also detection of sequential read might not work well in array in this case. In HP-UX LVM with enterprise arrays like HP EVA or HP XP we use 32-64 MB PE and enable distribution - that means "stripe" size = PE size. LE1 = PV1_1 LE2 = PV2_1 LE3 = PV1_2 LE4 = PV2_2 and so on. Using this you request for example 32 MB from one array. Given the cache sizes of arrays and readahead, so should get much better performance, because those 32 MB will be fetched partially from all 16 drives. Also we don't use ditribution among 2 arrays, just using different paths to one array (different HBA, different SAN switch and different array FC controller). We use 2 arrays only for mirroring data to other datacentre for clusters. The main reason for us for that PE distribution is that HP-UX does not have loadbalancing multipath built-in. But even when you will have it, using more PVs is better because of the architectural limits of arrays (no. of outstanding request for single virtual drive, scsi queue depth on server and on array, cache memory limits per virtual drive, etc.) -- bYE, Marki _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/