On Sat, Nov 10, 2012 at 9:45 AM, Jann Horn <jannhorn@xxxxxxxxxxxxxx> wrote: > I don't think this statement on your website makes much sense: > > A user can download Wisecracker™ on a GPU cluster virtual machine > provided by Amazon EC2® and reverse an MD5 cryptographic hash for > a 6 character password in about 20 minutes if using 1 virtual > machine or in about 3 minutes if using 2 or more. > > What does "2 or more" mean here? If I use two machines in parallel, that's > more than six times as fast as only using one machine? Seems weird to me. Hi I have actually updated the white paper with more clarity on that statement. The time of 3 minutes is not the time taken to generate hashes for all the possibilities. It is the average time taken to hit upon the first successful solution. Once you get the solution the software sends a stop signal to stop computation. This is part of the MD5 example though, and the framework API is more generic and allows the user to design the problem however they want. The framework's advantage is the communications between systems and task distribution. Wisecracker uses a concept of "tasks" for distribution of work load across processors (CPUs, GPUs). The algorithm is a "divide and conquer" algorithm similar to "bucket sort" and "quick sort". So if you want to reverse an MD5 sum into a 6-character string of all printable ASCII characters (94 of them) that would be about 94^6 which is approximately 690 billion combinations. Wisecracker internally creates an index based handling of these tasks. Each task is just an index, and each index will end up being a string of 6-characters based on which combination in the 94^6 it represents. (You can refer to the md5.cl OpenCL file to see how it is done). However, when more than 1 system is used the tasks get distributed per system based on each system's OpenCL capabilities based on compute units and work group size. Amazon's VMs have 2 GPUs each. When a single system is being used the tasks are distributed in the range [1, 690billion] between 2 GPUs based on the product of the compute units and workgroup size. So if a GPU's compute units are 32 in number and have a work group size of 2048, it gets 32 x 2048 task blocks to work on. So one by one each GPU will keep computing on it successive task blocks that it is given until it finds the solution. When you use 2 systems, the tasks are distributed between each system as the ranges [1, 345billion] and [345billion, 690billion]. Once each system gets its task range, it distributes work between its GPUs in a similar fashion as task blocks based on compute units and work group size. Let's say you want to recover the string 'z@bD1g' and it might be in the index range [345billion - 690billion]. If you were to run this on 1 Amazon GPU VM the program will have to compute for [1-345billion] range first and then get to the [345billion-690billion] range. However, if you distribute this on 2 VMs you will hit upon the solution faster because the second system is starting from 345billion and you might not need to compute all the 345billion possible values on each VM. You are saved from the needless computation of the [1-345billion] range in full as done in the single system operation. Hence the 2 VMs give a bigger decline in time rather than 1 VM because of the way the work is distributed for the MD5 example. On an average with different sets of strings the runtime drastically goes down because of the fact that the search buckets are smaller and start at different points. The task distribution is a "divide and conquer" algorithm and does have a worst case scenario run value of 10 minutes for a 6-character string if the string is the 345billionth or 690billionth possibility. But more often than not you will see an average run value of 3 minutes to find the solution string. I shall correct the ambiguous message on the website. Thanks Vikas Selective Intellect LLC