On Fri, 2010-07-16 at 23:04 -0400, Robert Myers wrote: > On Fri, Jul 16, 2010 at 10:35 PM, JD <jd1008@xxxxxxxxx> wrote: > > > > So, what would you say is/are the class/classes of problems > that would > benefit greatly from a high flops gpu, but without the sort of > bus > bandwidth you would like to see? > > > Almost any problem that is embarrassingly parallel or nearly so is > potentially a candidate for low-bandwidth computing. Ray tracing is > the primo example. Almost any linear problem is potentially > embarrassingly parallel, and, if you don't want to go through the work > of exposing the embarrassingly parallel nature of the problem, there > are the tricks that make the linpack benchmark so popular for selling > "supercomputers" that have absurdly small bisection bandwidths. > > > My question, though, is, if that's the kind of problem you have, why > not do it on a distributed platform and teach students how to use > distributed resources? If you're Pixar, I understand why you'd want a > well-organized farm of GPU's, but if you just want to replicate what > LLNL (Lawrence Livermore) was doing, say, ten years ago, are you doing > your students any favor by giving them a GPGPU instead of the > challenge of doing real distributed computing? Conceivably, watts per > flop (power consumption) makes GPGPU's the hands down winner over > distributed computing for problems that are embarrassingly parallel or > nearly so. Inevitably, though, people will want clusters of GPUGPU's, > so you'll wind up doing distributed computing, anyway. > > > If you rewrite your applications for the one-off architectures typical > of GPU's, so that you have to do it all over again when the next > generation comes out, have you really done yourself any favors? > > > I don't claim that there are simple or obvious answers, but it's just > too easy for people to be blown away by huge flops numbers. What I'm > afraid of has already started to happen as far as I'm concerned, which > is that all problems will be jammed into the low-bandwidth mold, > whether it's appropriate or not. > > > Robert. > > But unfortunately, Robert, networks are inherently low bandwidth. To achieve full throughput you still need parallelism in the networks themselves. I think from your description, you are discussing the fluid models which are generally deconstructed to finite series over limited areas for computation with established boundary conditions. This partitioning permits the distributed processing approach, but suffers at the boundaries where either the boundary crossing phenomena are discounted, or simulated, or I guess passed via some coding algorithm to permit recursive reduction for some number of iterations. What your field, and most others related to fluid dynamics, such as plasma studies, explosion studies and so on, needs is full wide band memory access across thousands of processors (millions perhaps?). I don't pretend to understand all the implications of various computational requirements of your field, or that of neuroprocessors, which is another area where massive parallelism requires deep memory access, as my own problem area is rather limited to data spaces of only a few gigabytes, and generally serial processing is capable of dealing with it, although not real-time. There are a number of neat processing ideas that have applications to specific parallel type problems, such as CAPP, SIMD, MIMD arrays, and neural networks. Whether these solutions will match your problem, likely depends a great deal on your view of the problem. As in most endeavors in life, our vision of the solution is as you speak of the others here, limited by our view of the problem. Very few people outside the realm of computation analysis ever deal with the choices of algorithms, architecture, real through put, processing limitations, bandwidth limitations, data access and distribution problems, and so on. Fewer still deal with massive quantities of data. Search engines deal with some of the issues, people like google deal with all kinds of distributed problems across data spaces that dwarf the efforts of even most sciences. Some algorithms, as you point out, have limitations of the Multiple Instruction, Multiple Data sort, which place great demands on memory bandwidth, and processor speeds, as well as interprocess communications. But saying that a particular architecture is unfit for an application means that you have to understand both the application, and the architecture. These are both difficult today, as the architectures are changing about every 11 months or maybe less right now. Computation via interferometry for example is one of the old (new) fields where a known capability is only now capable of being explored. Optical computing, 3d shading, broadcast 3d, and other immersion technologies add new requirements to the GPGU's being discussed. Motion coupled with 3d means that the shaders and other elements need greater processing power, and greater through put. Their architecture is undergoing severe redesign. Even microcontrollers are expanding their reach via multiple processors (via things like the propeller chip for example). I am a test applications consultant. My trade forces me to continuously update my skills and try to keep up with the multitude of new architectures. I have almost no free time, as researching new architectures, designing new algorithms, understanding the application of algorithms to new tests, and hardware design requirements eats up many hours every day. Fortunately I am and always have been a NERD and proud of it. I work at it. Since you have a deep knowledge of your requirements, perhaps you should put some time into thinking of a design methodology other than those I have mentioned or those that you know, in an attempt to improve the science. I am sure many others would be very appreciative, and quite likely supportive. Regards, Les H -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines