Re: Need C optimization help

Mark Rose <markrose@xxxxxxxxxxx> · Sat, 15 May 2010 16:01:39 -0400

I accidentally forgot to save the reduced file before attaching it. Here is the 
proper attchment.

-Mark

On Saturday 15 May 2010 3:59:30 pm Mark Rose wrote:
> Hi All,
> 
> I have a line that gets called approxmiately 2 trillion times according to
> valgrind, and I love any suggestions for speeding it up. I have taken over
>  the project from someone else and my C abilities are only intermediate.
>  I've attached a trimmed down version of the function.
> 
> Here are some notes about the code:
> * tsMemAlloc and tsMemFree are basically wrappers to malloc/free.
> * dsfmt_genrand_close_open() generates a number in the range [0,1)
> 
> And notes about the data:
> * n_topics is 35 (loaded from a configuration file; changeable)
> * n_tokens is about 7M (and increases as the data set grows)
> * params->iter is 500
> * topics->wcount size is about 190k items
> * topics->dcount size is about 275k items
> * topics->tcount size is 35 items
> 
> I'm compling on an Opteron, with options: -03 -std=gnu99 -march=native
>  -ffast- math -ftree-loop-distribution -ftree-loop-linear
>  -ftree-interchange -floop- block -ftree-vectorizer-verbose=8
> 
> I'm using GCC 4.4.3.
> 
> The tree vectorizer spits out this note about the loop I'm particularly
> interested in:
> 
> for (int k = 1; k < n_topics; k++)
> 	cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset +
> beta)/(tcount[k] + wbeta);
> note: not vectorized: data ref analysis failed D.8597_92 = *D.8596_91;
> 
> I've done hours of googling, playing with restrict keywords, splitting the
> cdf[k-1] addition into another loop, and nothing will help. The error
>  message itself is very poor, as I can't find a decent explanation anywhere
>  online as to what it means nor how to fix it.
> 
> Thanks for whatever insight you can give!
> 
> -Mark
> 
// Perfrom the LDA operation
void tsTopicLDA(TSTopics * topics, TSParams * params, TSCorpus * corpus)
{
	// Initialize the pseudo-random number generator (PRNG)
	dsfmt_t prng;
	dsfmt_init_gen_rand(&prng, params->seed);

	// Initialize the solver
	randomInit(topics, corpus->tokens, &prng);

	// Stores the estimated CDF
	double * cdf;
	tsMemAlloc((void **)&cdf, sizeof(double)*params->topics);

	// Dereference all of the pointers (most are array pointers anyway so this can be done)
	const unsigned int n_tokens = corpus->n_tokens;
	const unsigned int n_topics = topics->n_topics;
	const double alpha = params->alpha;
	const double beta  = params->beta;
	const double wbeta = (double)corpus->n_words*params->beta;

	unsigned int * wrd_tokens = &corpus->tokens[0];
	unsigned int * doc_tokens = &corpus->tokens[n_tokens];
	unsigned int * restrict wcount = topics->wcount;
	unsigned int * restrict dcount = topics->dcount;
	unsigned int * restrict tcount = topics->tcount;
	unsigned int * restrict assign = topics->assign;

	// Run the Gibbs Sampler for the given burn-in duration
	for (int i = 0; i < params->iter; i++)
	{
		// Sample each word
		for (int j = 0; j < corpus->n_tokens; j++)
		{
			const unsigned int w = wrd_tokens[j];
			const unsigned int d = doc_tokens[j];
			const unsigned int t = assign[j];

			const unsigned int w_offset = w*n_topics;
			const unsigned int d_offset = d*n_topics;

			double * restrict rcdf = cdf;

			// Decrement the counters for the current word to remove its influence
			--wcount[t + w_offset];
			--dcount[t + d_offset];
			--tcount[t];

			// Infer the CDF from the current state
			cdf[0] = (dcount[d_offset] + alpha)*(wcount[w_offset] + beta)/(tcount[0] + wbeta);
			for (int k = 1; k < n_topics; k++)
				cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset] + beta)/(tcount[k] + wbeta);

			// Choose the most likely topic assignment based on the CDF
			double u = cdf[n_topics-1]*dsfmt_genrand_close_open(&prng);
			int new_topic = -1;
			for (int k = 0; k < n_topics; k++)
				if (rcdf[k] > u)
				{
					new_topic = k;
					break;
				}
			assign[j] = new_topic;

			// Increment the appropriate counts
			++dcount[new_topic + d_offset];
			++wcount[new_topic + w_offset];
			++tcount[new_topic];
		}
	}

	// Free unused resources
	tsMemFree(cdf);
}