I accidentally forgot to save the reduced file before attaching it. Here is the proper attchment. -Mark On Saturday 15 May 2010 3:59:30 pm Mark Rose wrote: > Hi All, > > I have a line that gets called approxmiately 2 trillion times according to > valgrind, and I love any suggestions for speeding it up. I have taken over > the project from someone else and my C abilities are only intermediate. > I've attached a trimmed down version of the function. > > Here are some notes about the code: > * tsMemAlloc and tsMemFree are basically wrappers to malloc/free. > * dsfmt_genrand_close_open() generates a number in the range [0,1) > > And notes about the data: > * n_topics is 35 (loaded from a configuration file; changeable) > * n_tokens is about 7M (and increases as the data set grows) > * params->iter is 500 > * topics->wcount size is about 190k items > * topics->dcount size is about 275k items > * topics->tcount size is 35 items > > I'm compling on an Opteron, with options: -03 -std=gnu99 -march=native > -ffast- math -ftree-loop-distribution -ftree-loop-linear > -ftree-interchange -floop- block -ftree-vectorizer-verbose=8 > > I'm using GCC 4.4.3. > > The tree vectorizer spits out this note about the loop I'm particularly > interested in: > > for (int k = 1; k < n_topics; k++) > cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset + > beta)/(tcount[k] + wbeta); > note: not vectorized: data ref analysis failed D.8597_92 = *D.8596_91; > > I've done hours of googling, playing with restrict keywords, splitting the > cdf[k-1] addition into another loop, and nothing will help. The error > message itself is very poor, as I can't find a decent explanation anywhere > online as to what it means nor how to fix it. > > Thanks for whatever insight you can give! > > -Mark >
// Perfrom the LDA operation void tsTopicLDA(TSTopics * topics, TSParams * params, TSCorpus * corpus) { // Initialize the pseudo-random number generator (PRNG) dsfmt_t prng; dsfmt_init_gen_rand(&prng, params->seed); // Initialize the solver randomInit(topics, corpus->tokens, &prng); // Stores the estimated CDF double * cdf; tsMemAlloc((void **)&cdf, sizeof(double)*params->topics); // Dereference all of the pointers (most are array pointers anyway so this can be done) const unsigned int n_tokens = corpus->n_tokens; const unsigned int n_topics = topics->n_topics; const double alpha = params->alpha; const double beta = params->beta; const double wbeta = (double)corpus->n_words*params->beta; unsigned int * wrd_tokens = &corpus->tokens[0]; unsigned int * doc_tokens = &corpus->tokens[n_tokens]; unsigned int * restrict wcount = topics->wcount; unsigned int * restrict dcount = topics->dcount; unsigned int * restrict tcount = topics->tcount; unsigned int * restrict assign = topics->assign; // Run the Gibbs Sampler for the given burn-in duration for (int i = 0; i < params->iter; i++) { // Sample each word for (int j = 0; j < corpus->n_tokens; j++) { const unsigned int w = wrd_tokens[j]; const unsigned int d = doc_tokens[j]; const unsigned int t = assign[j]; const unsigned int w_offset = w*n_topics; const unsigned int d_offset = d*n_topics; double * restrict rcdf = cdf; // Decrement the counters for the current word to remove its influence --wcount[t + w_offset]; --dcount[t + d_offset]; --tcount[t]; // Infer the CDF from the current state cdf[0] = (dcount[d_offset] + alpha)*(wcount[w_offset] + beta)/(tcount[0] + wbeta); for (int k = 1; k < n_topics; k++) cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset] + beta)/(tcount[k] + wbeta); // Choose the most likely topic assignment based on the CDF double u = cdf[n_topics-1]*dsfmt_genrand_close_open(&prng); int new_topic = -1; for (int k = 0; k < n_topics; k++) if (rcdf[k] > u) { new_topic = k; break; } assign[j] = new_topic; // Increment the appropriate counts ++dcount[new_topic + d_offset]; ++wcount[new_topic + w_offset]; ++tcount[new_topic]; } } // Free unused resources tsMemFree(cdf); }