On Sun, Mar 31, 2013 at 8:56 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote: > On 3/27/2013 5:18 PM, Mark Knecht wrote: <SNIP> >> Is there a way for me to measure, say over a whole day or some fixed >> time, what the workload really looks like? > > That's not the way to go about this. > OK >> The machine is a basic Gentoo desktop machine running KDE. The only >> workload where I really care about performance is that I run a bunch >> of Virtualbox Win 7 & Win XP VMs where I need to the performance to be >> as good as I can reasonably get. The problem I have is these VMs are >> either 1 huge file (40-50GB in a single file) or many 2GB files. I >> haven't a clue how Windows & Virtualbox is accessing what it sees as a >> virtual drive and then underlying that how the vbox drivers are using >> the system to get to the RAID. > > So you have a bunch of Windows VM guests that write to large sparse > files residing on what, EXT4? NTFS block size is 4KB so that's your > smallest IO. > Currently EXT3 based on my starting point 2 years ago and never having changed. I'm open to EXT4 if this discussion show me it warrants the work. Would rather not deal with anything more exotic right now. >> It would be interesting to set some program running, probably on a >> weekend or sometime when performance isn't so critical, and see what >> sort of data gets collected, assuming there's a program that does that >> sort of thing. > > Again, that's not the way to approach this. What would be informative > to know is what applications you're running in these Windows VMs. The > application dictates the write pattern. You don't need a "collector" to > tell you that. You just need to know the application(s). If you're > just running productivity apps (web/mail/pdf/etc) inside these VMs then > there's nothing to optimize WRT RAID stripe parameters as you have no > sustained write IO. So what are the Windows apps? Currently 3 VMs, but only 2 matter for performance. The one that doesn't matter is a VMWare Player VM used for things like watching Netflix & Hulu. Nothing much more than that. 1 CPU core dedicated. CPU usage is generally low. I haven't paid much attention to disk usage for this VM but will check it out. Performance VMs: 1) This first VM primarily runs TradeStation, a rules-based trading platform for trading stocks & futures. I generally run with 2-4 CPU cores and almost never uses much computational power. The big deal in this VM is stock data caching with years or even decades of data for each stock or futures contract. Currently this cache appears to be sitting in a single file which is about 3GB in size. This data streams into the VM over the net when the markets are open (pretty much 24/7) and the cache grows. Depending on the type of market and chart the data might be as fine grained as each individual trade taking place that day, or it might only be updated once every bar. (1 minute bar, 5 minute bar, daily bar, etc.) TradeStation reads the cache as it needs data. I have no idea what the access looks like in real time but generally I expect that it's accessing the data in date order. Whether the data is sorted or not in this cache file I have no idea. 2) This second VM is more computational in nature. It primarily runs two apps for long periods of time, although I don't think either app is all that disk intensive. Noth apps read market data once from disk, cache it in memory and then computer for hours to days depending on what I'm asking them to do. I will say I don't see a lot of disk activity lights when either of these programs are running. - Adaptrade Builder - a genetic optimization program that attempts to generate TradeStation EasyLanguage trading strategies. I believe that once it has the market data in memory it's using memory and disk to store interesting strategies for me to look at later. The output of the program is generally a single file ranging in size from 1MB to maybe 50MB. - TradingSolutions - a neural network program that attempts to generate neural network models for trading markets. Each instance of this program (I typically run 2-3 instances) generally has access to one file sized 25MB-200MB plus a lot (50-100) small files under 20K in size. I have no idea how often any of these programs are read or written. The program runs for hours doing it's work. I suppose there are other things that happen in the VMs. I run Excel a lot, but it's not a lot of data. Hopefully that gives you enough info to suggest a direction. Thanks, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html