Sorry again for the delayed response... it takes me a while to read through all these and process them. :) I do appreciate all the feedback though! On Sun, Feb 27, 2011 at 8:55 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote: > Yes, this is pretty much exactly what I mentioned. ~5GB/s aggregate. > But we've still not received an accurate detailed description from Matt > regarding his actual performance needs. He's not posted iostat numbers > from his current filer, or any similar metrics. Accurate metrics are hard to determine. I did run iostat for 24 hours on a few servers, but I don't think the results give an accurate picture of what we really need. Here's the details on what we have now: We currently have 10 servers, each with an NFS share. Each server mounts every other NFS share; mountpoints are consistently named on every server (and a server's local storage is a symlink named like its mountpoint on other machines). One server has a huge directory of symbolic links that acts as the "database" or "index" to all the files spread across all 10 servers. We spent some time a while ago creating a semi-smart distribution of the files. In short, we basically round-robin'ed files in such a way as to parallelize bulk reads across many servers. The current system works, but is (as others have suggested), not particularly scalable. When we add new servers, I have to re-distribute those files across the new servers. That, and these storage servers are dual-purposed; they are also used as analysis servers---basically batch computation jobs that use this data. The folks who run the analysis programs look at the machine load to determine how many analysis jobs to run. So when all machines are running analysis jobs, the machine load is a combination of both the CPU load from these analysis programs AND the I/O load from serving files. In other words, if these machines were strictly compute servers, they would in general show a lower load, and thus would run even more programs. Having said all that, I picked a few of the 10 NFS/compute servers and ran iostat for 24 hours, reporting stats every 1 minute (FYI, this is actually what Dell asks you to do if you inquire about their storage solutions). The results from all machines were (as expected) virtually the same. They average constant, continuous reads at about 3--4 MB/s. You might take that info and say, 4 MB/s times 10 machines, that's only 40 MB/s... that's nothing, not even the full bandwidth of a single gigabit ethernet connection. But there are several problems (1) the number of analysis jobs is currently artificially limited; (2) the file distribution is smart enough that NFS load is balanced across all 10 machines; and (3) there are currently about 15 machines doing analysis jobs (10 are dual-purposed as I already mentioned), but this number is expected to grow to 40 or 50 within the year. Given all that, I have simplified the requirements as follows: I want "something" that is capable of keeping the gigabit connections of those 50 analysis machines saturated at all times. There have been several suggestions along the lines of smart job scheduling and the like. However, the thing is, these analysis jobs are custom---they are constantly being modified, new ones created, and old ones retired. Meaning, the access patterns are somewhat dynamic, and will certainly change over time. Our current "smart" file distribution is just based on the general case of maybe 50% of the analysis programs' access patterns. But next week someone could come up with a new analysis program that makes our current file distribution "stupid". The point is, current access patterns are somewhat meaningless, because they are all but guaranteed to change. So what do we do? For business reasons, any surplus manpower needs to be focused on these analysis jobs; we don't have the resources to constantly adjust job scheduling and file distribution. So I think we are truly trying to solve the most general case here, which is that all 50 gigabit-connected servers will be continuously requesting data in an arbitrary fashion. This is definitely a solvable problem; and there are multiple options; I'm in the learning stage right now, so hopefully I can make a good decision about which solution is best for our particular case. I solicited the list because I had the impression that there were at least a few people who have built and/or administer systems like this. And clearly there are people with exactly this experience, given the feedback I've received! So I've learned a lot, which is exactly what I wanted in the first place. > http://www.ibm.com/common/ssi/fcgi-bin/ssialias?infotype=SA&subtype=WH&appname=STGE_XB_XB_USEN&htmlfid=XBW03010USEN&attachment=XBW03010USEN.PDF > > Your numbers are wrong, by a factor of 2. He should research GPFS and > give it serious consideration. It may be exactly what he needs. I'll definitely look over that. > I don't believe his desire is to actually DIY the compute and/or storage > nodes. If it is, for a production system of this size/caliber, *I* > wouldn't DIY in this case, and I'm the king of DIY hardware. Actually, > I'm TheHardwareFreak. ;) I guess you've missed the RHS of my email > addy. :) I was given that nickname, flattering or not, about 15 years > ago. Obviously it stuck. It's been my vanity domain for quite a few years. I'm now leaning towards a purchased solution, mainly due to the fact that it seems like a DIY solution would cost a lot more in terms of my time. Expensive though they are, one of the nicer things about the vendor solutions is that they seem to provide somewhat of a "set it and forget it" experience. Of course, a system like this needs routine maintenance and such, but the the vendors claim their solutions simplify that. But maybe that's just marketspeak! :) Although I think there's some truth to it---I've been a Linux/DIY enthusiast/hobbyist for years now, and my experience is that the DIY/FOSS stuff always takes more individual effort. It's fun to do at home, but can be costly from a business perspective... > Why don't you ask Matt, as I have, for an actual, accurate description > of his workload. What we've been given isn't an accurate description. > If it was, his current production systems would be so overwhelmed he'd > already be writing checks for new gear. I've seen no iostat or other > metrics, which are standard fair when asking for this kind of advice. Hopefully my description above sheds a little more light on what we need. Ignoring smarter job scheduling and such, I want to solve the worst-case scenario, which is 50 servers all requesting enough data to saturate their gigabit network connections. > I'm still not convinced of that. Simply stating "I have 50 compute > nodes each w/one GbE port, so I need 6GB/s of bandwidth" isn't actual > application workload data. From what Matt did describe of how the > application behaves, simply time shifting the data access will likely > solve all of his problems, cheaply. He might even be able to get by > with his current filer. We simply need more information. I do anyway. > I'd hope you would as well. Hopefully I described well enough why our current application workload data metrics aren't sufficient. We haven't time-shifted data access, but have somewhat space-shifted it, given the round-robin "smart" file distribution I described above. But it's only "smart" for today's usage---tomorrow's usage will almost certainly be different. 50 gbps / 6 GB/s is the requirement. > I don't recall Matt saying he needed a solution based entirely on FOSS. > If he did I missed it. If he can accomplish his goals with all FOSS > that's always a plus in my book. However, I'm not averse to closed > source when it's a better fit for a requirement. Nope, doesn't have to be entirely FOSS. -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html