Re: Need C optimization help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 16 May 2010 2:03:27 am Tim Prince wrote:
> On 5/15/2010 12:59 PM, Mark Rose wrote:
> > The tree vectorizer spits out this note about the loop I'm particularly
> > interested in:
> >
> > for (int k = 1; k<  n_topics; k++)
> >      cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset
> > + beta)/(tcount[k] + wbeta);
> > note: not vectorized: data ref analysis failed D.8597_92 = *D.8596_91;
> >
> > I've done hours of googling, playing with restrict keywords, splitting
> > the cdf[k-1] addition into another loop, and nothing will help. The error
> > message itself is very poor, as I can't find a decent explanation
> > anywhere online as to what it means nor how to fix it.
> 
> It's a bit much to ask for the message to give a complete discussion of
> the concept of vectorization.  If you would spend a few minutes studying
> this code, it should be clear that you have a lot of performance robbing
> stuff here, even without seeing the background.  The compiler can't
> vectorize, because each value of cdf[k] depends on the preceding one
> cdf[k-1].  If the compiler has aliasing concerns not addressed by
> restrict which prevent direct use of the preceding value in register,
> but requires moving it around in the memory hierarchy, you might
> consider using a local temporary, should you know that such concerns can
> be ignored.
> There's a pretty good chance that code which violates typed aliasing, if
> this does so, will disable optimizations, if the compiler doesn't reject
> it.  It's obscure enough that you should maybe fix it so you can
> concentrate on the issues you may be interested in.

Hi Tim,

I have a basic understanding of vectorization, and I realize the cdf[k] = 
cdf[k-1] + x bit is not vectorizable. I've tried splitting that part into a 
second loop, which should leave the rest of the equation vectorizable, but it 
doesn't happen. :/ Beyond the cdf[k] = cdf[k-1] part, there is no dependencies 
between steps. In the attached version of the function, I've split the loop.

I'm not entirely familiar with typed aliasing. The pointers always point to 
the same type, except when the arrays are allocated, where they are cast to 
void*. Could this be the source of the troubles?

Thank you greatly for you help.

-Mark
// Perfrom the LDA operation
void tsTopicLDA(TSTopics * topics, TSParams * params, TSCorpus * corpus)
{
	// Initialize the pseudo-random number generator (PRNG)
	dsfmt_t prng;
	dsfmt_init_gen_rand(&prng, params->seed);

	// Initialize the solver
	randomInit(topics, corpus->tokens, &prng);

	// Stores the estimated CDF
	double * restrict cdf;
	tsMemAlloc((void **)&cdf, sizeof(double)*params->topics);

	// Dereference all of the pointers (most are array pointers anyway so this can be done)
	const unsigned int n_tokens = corpus->n_tokens;
	const unsigned int n_topics = topics->n_topics;
	const double alpha = params->alpha;
	const double beta  = params->beta;
	const double wbeta = (double)corpus->n_words*params->beta;

	unsigned int * restrict wrd_tokens = &corpus->tokens[0];
	unsigned int * restrict doc_tokens = &corpus->tokens[n_tokens];
	unsigned int * restrict wcount = topics->wcount;
	unsigned int * restrict dcount = topics->dcount;
	unsigned int * restrict tcount = topics->tcount;
	unsigned int * restrict assign = topics->assign;

	// Run the Gibbs Sampler for the given burn-in duration
	for (int i = 0; i < params->iter; i++)
	{
		// Sample each word
		for (int j = 0; j < corpus->n_tokens; j++)
		{
			unsigned int w = wrd_tokens[j];
			unsigned int d = doc_tokens[j];
			unsigned int t = assign[j];

			unsigned int w_offset = w*n_topics;
			unsigned int d_offset = d*n_topics;

			// Decrement the counters for the current word to remove its influence
			--wcount[t + w_offset];
			--dcount[t + d_offset];
			--tcount[t];

			// Infer the CDF from the current state
			// NOTE: Splitting into two loops like this, with the addition in the second loop, should allow
			// the first loop to be vectorized (I assume). It doesn't happen though. Writing the value of
			// cdf[k] twice is about 10% slower. The code compiles fine with -fstrict-aliasing, and none
			// of the pointers in this function have overlapping memory (or shouldn't at least).
			for (int k = 0; k < n_topics; k++)
				cdf[k] = (dcount[k+d_offset] + alpha)*(wcount[k+w_offset] + beta)/(tcount[k] + wbeta);
			for (int k = 1; k < n_topics; k++)
				cdf[k] += cdf[k-1];

			// Choose the most likely topic assignment based on the CDF
			double u = cdf[n_topics-1]*dsfmt_genrand_close_open(&prng);
			int new_topic = -1;
			for (int k = 0; k < n_topics; k++)
				if (cdf[k] > u)
				{
					new_topic = k;
					break;
				}
			assign[j] = new_topic;

			// Increment the appropriate counts
			++dcount[new_topic + d_offset];
			++wcount[new_topic + w_offset];
			++tcount[new_topic];
		}
	}

	// Free unused resources
	tsMemFree(cdf);
}

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux