On Thu, 2014-04-03 at 09:40 -0600, Greg Woods wrote: > On Wed, 2014-04-02 at 21:34 +0000, Bill Oliver wrote: > > > > > 1) Get a pixel and a small area around it (say the surrounding 100 pixels). > > > > 2) Do a contrast enhancement method called "histogram equalization" on that group of pixels. This will change the value of the pixel in question. Let's say that this process involves 500 high-level instructions. > > > > 3) Move to the next pixel. Do the same thing. > > > > If you have a 12-megapixel image (say, 11,760,000 pixels), that's 5,880,000,000 instructions. That 500 instruction block is impossible to parallelize well. However, each pixel is independent, so you can parallelize the work on each pixel easily. I remember back in the 80s implementing this on a microVAX GPX II. It took about 3 hours to do a 512x512 greyscale image by brute force. Then Henry Fuchs et al. developed the PixelPlanes machine, and Austin et al. implemented it on that -- it took about 4 seconds. Even today on my laptop with an i7, a brute-force contrast-limited adaptive histogram equalization on a 10 megapixel image takes a "go get a cup of coffe" time period. There are, of course, short cuts such as the Pizer-Cromartie algorithm, but they introduce interpolation artifacts. > > > Sure, that's a good example of something that could benefit from > parallel processing. But you still have to be careful. For instance, > suppose you do have two different processors working on adjacent pixels > in parallel. At some point, one of the pixels will be modified first. > Depending on exactly when that happens, it can affect the final value of > the second pixel, depending on whether the original or modified value of > the first pixel is fetched when calculating the second. If done blindly, > without any locking, this creates a race condition that will cause the > final value of the two pixels to be indeterminate. That is to say, run > the same code multiple times, and you might not get the exact same image > out of it. > > There are certainly tools that would make it easy to distribute the same > calculation over all the pixels of the image to multiple processors. But > those tools will not magically provide the locking you would need to > prevent one of the neighboring pixels from being modified while the > value of one pixel is being calculated. And the locking code can be > tricky, which is why it can't be completely automated. For instance, it > would be easy to lock all the neighbor pixels while calculating the > value of one pixel. But that would cause the calculation the neighboring > pixels to be delayed, thus losing some of the benefit of > parallelization. > > In practice, that example would probably be handled by using a temporary > array to hold the output image so that the input image is never modified > until the calculation is completely finished, then copying it back into > place. But if your code doesn't already do that, it would have to be > modified to do it that way. Another trivial example to illustrate the > point. > > There are, in fact, tools that can analyze code and point out constructs > that might prevent parallelization, or suggest places that could benefit > from parallelization, but use of those tools is not 100% automatic and > the ones that are used here are (I believe) proprietary and not cheap. I > don't know if there are similar open source tools, but in any case, > these tools provide suggestions, but they don't change the code for you. > It is unlikely that any significant calculation could be parallelized > with zero work. > > For our users, the investment of time in parallelizing code usually pays > off, because this will help their big simulations run much faster, thus > saving them time in the long run. We actually have a User Services > section which employs people who specialize in helping users write and > modify code to maximize performance on our massively parallel > supercomputing system. > > --Greg > > > Rendering (AFAIK) doesn't really require the tight interconnection and coordination of processes that generally characterizes HPC. You can parallelize trivially by treating each frame as a single job--the work per frame is small compared to the whole job, and each frame is independent of the others, so you'll get near perfect speedup that way. We usually call this type of load (lots of relatively small, independent tasks) "high-throughput computing" (HTC). One HTC tool is HTCondor (Fedora 20 RPM currently condor-8.1.1-0.3.fc20.x86_64.rpm, more info at http://research.cs.wisc.edu/htcondor/). HTCondor manages the submission of large numbers of independent tasks to distributed computers and the collection of results. It also manages the load on machines that are used interactively, harvesting idle cycles but not interfering with interactive use. -- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org