Hi Brian, Again sorry for the late response. I was doing more testing. I followed your suggestion. But, I think you need to change this to following since they are pointers. Fabi's suggestion was the same. However results don't change. > i = args->i; > j = args->j; > k = args->k; i = &(args->i); j = &(args->j); k = &(args->k); Just to check all these increased access are coming because of excess pointer access, I notify my simulator about the address of the pointer and ask it not to allocate it in caches. Thereafter number of L1 accesses are similar to one without the pointer access. I understand this is not the solution. Now I am looking others ways to achieve this(specially Ian's suggestion to use section attribute), without tweaking the simulator and will post the success. Since you ask about the 2D arrays, below is how I allocate them. I am not sure it is the best way. int **m1, **m2, **mr; ... m1 = (int**)malloc(mat_size*sizeof(int*)); for (i=0; i < mat_size; i++) { m1[i] = malloc(mat_size*sizeof(int)); } m2 and mr is allocated in a similar manner Thanks a lot for your help. regards, Isuru --- On Wed, 2/2/11, Brian Budge <brian.budge@xxxxxxxxx> wrote: > From: Brian Budge <brian.budge@xxxxxxxxx> > Subject: Re: Allocate a variable in a known physical location > To: "isuru herath" <isuru81@xxxxxxxxx> > Cc: gcc-help@xxxxxxxxxxx > Date: Wednesday, February 2, 2011, 1:58 PM > Hi Isuru - > > From looking at this last email, it makes sense why you'd > need double > dereference (with no optimization). Regardless of how > you are holding > "p" (as aliases integer for example), > you have to dereference it to get "i", which is itself a > pointer that > needs to be dereferenced when you increment or decrement > the value at > that address. > > The other thing you could do, which likely will work: > > int *i, *j, *k; > struct thread_args *args = get_filled_args_mem(); > i = args->i; > j = args->j; > k = args->k; > > now use *i, *j, and *k as your counters. I'm pretty > sure that even > without optimization, each access should only require a > single > dereference. > > If this doesn't work, you may need to hack around the lack > of optimization. > > How are you allocating your 2D array? This could be > the other cause > of too many loads. > > Brian > > On Wed, Feb 2, 2011 at 1:39 PM, isuru herath <isuru81@xxxxxxxxx> > wrote: > > Hi Brian, > > > > Thanks for the mail and sorry for the late response. I > was trying to store the pointer to the structure, in a > register variable as shown below. > > > > register unsigned int address = (unsigned int)p; // p > is the pointer to the structure > > > > Then later in the code when I need to access i, rather > than doing it like p->i, I do it like ((struct > thread_args *)address)->i. > > > > I got the same statistics. Now I will try your method. > If I understood correctly following is your suggestion. > > > > int *i = get_my_memory(sizeof(int)); > > > > Later in the code, than using i++, to use (*i)++. I > will try that and let you know how it goes. > > > > Thanks and regards, > > Isuru > > > > > > --- On Tue, 2/1/11, Brian Budge <brian.budge@xxxxxxxxx> > wrote: > > > >> From: Brian Budge <brian.budge@xxxxxxxxx> > >> Subject: Re: Allocate a variable in a known > physical location > >> To: "isuru herath" <isuru81@xxxxxxxxx> > >> Cc: gcc-help@xxxxxxxxxxx > >> Date: Tuesday, February 1, 2011, 11:16 AM > >> Ah, I see, you canNOT use any > >> optimization. I think I misunderstood earlier. > >> > >> Perhaps something like this: > >> > >> int *i, *j, *k; > >> fill_in_counters(&i, &j, &k); //calls > mmap, and > >> assigns the first > >> three ints-worth of the memory to i, j, k > >> > >> Then use *i, *j, and *k. > >> > >> Just to make sure. You want each use of these > to > >> produce exactly one > >> load - not zero or two? > >> > >> If I look at the below, I'd expect each time > through the > >> inner loop to > >> produce 8 accesses to counters, an access to > m_size, and an > >> access to > >> each of mr, m1, and m2. That's 12 loads * 256 * > 256 * > >> 128. If you > >> add the minor loop accesses, this is probably what > you're > >> talking > >> about with "100893832". How are mr, m1, and m2 > >> defined? Are they > >> type **, or type *[]? Because if you're > allocating > >> separate buffers, > >> this will increase your accesses by three in each > loop (12 > >> above goes > >> to 15). > >> > >> Unsure where the rest is coming from. You might > need > >> to dump the > >> assembly for this. > >> > >> Brian > >> > >> On Tue, Feb 1, 2011 at 10:09 AM, isuru herath > <isuru81@xxxxxxxxx> > >> wrote: > >> > Hi Brian, > >> > > >> > Thanks for the quick reply. Following is the > initial > >> code. t_id is the > >> > thread id and n_t is number of threads. > m_size is > >> 256. > >> > > >> > for (i= (t_id*(m_size/n_t)); i < > ((m_size/n_t) + > >> (m_size/n_t)*t_id); i++) > >> > { > >> > for (j=0; j < m_size; j++) > >> > { > >> > for (k=0; k < > m_size; k++) > >> > { > >> > > >> mr[i][j] += m1[i][k] * m2[k][j]; > >> > } > >> > } > >> > } > >> > For this code I got 201557258 L1 accesses > for > >> processors 0. I only used 2 > >> > thread. > >> > > >> > I wanted to allocate i, j, k, n_t, t_id and > m_size in > >> a separate area of > >> > memory. Therefore I created a structure as > follows. > >> > > >> > struct thread_args > >> > { > >> > int i; > >> > int j; > >> > int k; > >> > int t_id; > >> > int m_size; > >> > int n_t; > >> > }; > >> > > >> > Then I allocate space for this structure from > this > >> area of memory. To do > >> > this, I pre-allocated large area of memory > and later I > >> allocate space for > >> > this structure from it. > >> > > >> > struct thread_args* p = (struct > >> thread_args*)get_my_memory(sizeof(struct > thread_args)); > >> > > >> > So I changed my program, > >> > > >> > for (p->i= > (p->t_id*(p->m_size/p->num_t)); > >> p->i < ((p->m_size/p->num_t) + > >> (p->m_size/p->num_t)*p->t_id); > p->i++) > >> > { > >> > for (p->j=0; p->j < > p->m_size; > >> p->j++) > >> > { > >> > for (p->k=0; p->k < > >> p->m_size; p->k++) > >> > { > >> > mr[p->i][p->j] += > >> m1[p->i][p->k] * m2[p->k][p->j]; > >> > } > >> > } > >> > } > >> > > >> > Then I checked the statics, I got 100893832 > accesses > >> in the area I am > >> > interested in, but my total L1 cache accesses > has > >> increased to 302450960. > >> > I believe increasing from 201557258 in early > case to > >> 302450960 in current > >> > case has resulted from additional pointer > access > >> occurred for every i, j, > >> > k.. access. Also addition of 100893832 and > 201557258 > >> is roughly equal to > >> > 302450960. I also followed the suggestion by > Fabi, > >> still the numbers are > >> > same and I realized even though I used *pi in > my code, > >> it might access pi > >> > first and then access the address pointed by > pi next. > >> I cannot use any > >> > optimization (-O2 or -O3) > >> > > >> > All what I need to do is to allocate i, j, k > in the > >> area of memory I am > >> > interested in. So do you think this is > impossible or > >> is there a workaround > >> > for this. > >> > > >> > Any help/advice is greatly appreciated. > >> > > >> > regards, > >> > Isuru > >> > > >> > --- On Tue, 2/1/11, Brian Budge <brian.budge@xxxxxxxxx> > >> wrote: > >> > > >> >> From: Brian Budge <brian.budge@xxxxxxxxx> > >> >> Subject: Re: Allocate a variable in a > known > >> physical location > >> >> To: "isuru herath" <isuru81@xxxxxxxxx> > >> >> Cc: gcc-help@xxxxxxxxxxx > >> >> Date: Tuesday, February 1, 2011, 9:40 AM > >> >> Maybe the full code of the for loop, > >> >> as well as the number of > >> >> iterations would help us help you. > >> >> > >> >> Brian > >> >> > >> >> On Tue, Feb 1, 2011 at 9:06 AM, isuru > herath > >> <isuru81@xxxxxxxxx> > >> >> wrote: > >> >> > Hi Brian, > >> >> > > >> >> > Well, this is related with my > research. I am > >> studying > >> >> cache behavior. I am interested in > allocating > >> certain > >> >> variables in a known physical address > range. The > >> way I > >> >> follow to do this is to allocate them in > a > >> structure and > >> >> then allocate space for this structure in > the > >> address space > >> >> I am interested in. Later in the code I > access > >> these > >> >> variable via a pointer to that structure. > This > >> introduces > >> >> another cache access(which is the access > to > >> pointer). So I > >> >> am looking for another way to allocate > these > >> variables so > >> >> that it doesn't introduces another > access. > >> >> > > >> >> > regards, > >> >> > Isuru > >> >> > > >> >> > --- On Tue, 2/1/11, Brian Budge > <brian.budge@xxxxxxxxx> > >> >> wrote: > >> >> > > >> >> >> From: Brian Budge <brian.budge@xxxxxxxxx> > >> >> >> Subject: Re: Allocate a variable > in a > >> known > >> >> physical location > >> >> >> To: "isuru herath" <isuru81@xxxxxxxxx> > >> >> >> Cc: gcc-help@xxxxxxxxxxx, > >> >> Cenedese@xxxxxxxx > >> >> >> Date: Tuesday, February 1, 2011, > 8:43 AM > >> >> >> So you are counting the number > of > >> >> >> dereferences/loads? > >> >> >> > >> >> >> What optimization level are you > using? > >> Depending > >> >> on > >> >> >> your code, you > >> >> >> may also need to specify that > these > >> addresses > >> >> cannot alias > >> >> >> one > >> >> >> another, as the potentially > aliasing > >> variables may > >> >> require > >> >> >> more loads, > >> >> >> depending on how you use the > pointers. > >> >> >> > >> >> >> Is this for an experiment, or > for real > >> usable > >> >> code? > >> >> >> > >> >> >> Brian > >> >> >> > >> >> >> On Tue, Feb 1, 2011 at 7:53 AM, > isuru > >> herath > >> >> <isuru81@xxxxxxxxx> > >> >> >> wrote: > >> >> >> > Hi Fabi, > >> >> >> > > >> >> >> > Thanks for the reply. I > tried that, > >> but still > >> >> numbers > >> >> >> don't change. Let me > >> >> >> > describe the scenario. > >> >> >> > > >> >> >> > My code without any > modification I > >> got > >> >> 201557258 > >> >> >> accesses. I needed to > >> >> >> > allocate those i and j > variables in > >> a > >> >> separate area of > >> >> >> memory. To do that > >> >> >> > I follow the method > described > >> earlier(using > >> >> a > >> >> >> structure). Therefore I got > >> >> >> > accesses in that separate > area. I > >> got > >> >> 100893832 > >> >> >> accesses in that area, but > >> >> >> > my total accesses are > increased to > >> 302450960. > >> >> I > >> >> >> thought this is because > >> >> >> > every time I access > variable i or j, > >> I have > >> >> to access > >> >> >> poniter p first. No > >> >> >> > I tried Fabi's suggestion. > code > >> shown below > >> >> >> > > >> >> >> > int* p_i = &(p->i); > >> >> >> > int* p_j = &(p->j); > >> >> >> > int* p_k = &(p->k); > >> >> >> > > >> >> >> > for (*p_k=0; *p_k < > *p_mat_size; > >> >> (*p_k)++) > >> >> >> > ... > >> >> >> > ... > >> >> >> > > >> >> >> > Still I got total access > as > >> 302450960. Could > >> >> somebody > >> >> >> help me to > >> >> >> > understand this. > >> >> >> > > >> >> >> > Any help/advice is greatly > >> appreciated. > >> >> >> > > >> >> >> > regards, > >> >> >> > Isuru > >> >> >> > > >> >> >> >> Once you have p->i, > you can > >> also do > >> >> int* > >> >> >> pi=&(p->i); > >> >> >> >> So *pi=1 will only be > one > >> access. > >> >> >> > > >> >> >> >> bye Fabi > >> >> >> > > >> >> >> > > >> >> >> > --- On Tue, 2/1/11, isuru > herath > >> <isuru81@xxxxxxxxx> > >> >> >> wrote: > >> >> >> > > >> >> >> >> From: isuru herath > <isuru81@xxxxxxxxx> > >> >> >> >> Subject: Re: Allocate a > variable > >> in a > >> >> known > >> >> >> physical location > >> >> >> >> To: gcc-help@xxxxxxxxxxx > >> >> >> >> Cc: david@xxxxxxxxxxxxxxx > >> >> >> >> Date: Tuesday, February > 1, 2011, > >> 3:07 AM > >> >> >> >> Hi David, > >> >> >> >> > >> >> >> >> Thanks a lot for the > reply. The > >> address > >> >> 0x10001000 > >> >> >> is a > >> >> >> >> physical address > >> >> >> >> and not a virtual > address. I > >> thought we > >> >> can only > >> >> >> do this > >> >> >> >> type casting with > >> >> >> >> virtual addresses. > Anyway I > >> tried the > >> >> method you > >> >> >> suggested > >> >> >> >> and I got a > >> >> >> >> segmentation fault. > >> >> >> >> > >> >> >> >> I use mmap to map > those > >> physical > >> >> addresses to > >> >> >> virtual > >> >> >> >> addresses, because > >> >> >> >> OS(linux) in unaware of > this > >> other piece > >> >> of memory > >> >> >> which > >> >> >> >> uses physical > >> >> >> >> address range > 0x10001000 to > >> 0x10101000. > >> >> >> >> > >> >> >> >> In my example, when I > use my > >> method to > >> >> access i > >> >> >> via pointer > >> >> >> >> p (p->i), it > >> >> >> >> first accesses p and > then > >> accesses i. But > >> >> this > >> >> >> introduces > >> >> >> >> unnecessary > >> >> >> >> access p. Therefore I > was > >> wondering how > >> >> to > >> >> >> allocate i in > >> >> >> >> the above > >> >> >> >> physical region.(Please > note > >> that I cant > >> >> use any > >> >> >> >> optimization -O2, -O3) > >> >> >> >> > >> >> >> >> I was looking in > section > >> attribute, but > >> >> still > >> >> >> couldn't > >> >> >> >> figure out how to > >> >> >> >> use it, also I am not > sure it is > >> the > >> >> correct way > >> >> >> to do > >> >> >> >> this. > >> >> >> >> > >> >> >> >> any help/suggestion is > greatly > >> >> appreciated. > >> >> >> >> > >> >> >> >> regards, > >> >> >> >> Isuru > >> >> >> >> > >> >> >> >> > I don't know what > OS you > >> are using, > >> >> or what > >> >> >> you want > >> >> >> >> to do with mmap. > >> >> >> >> > But if you have > struct that > >> you want > >> >> to > >> >> >> access at a > >> >> >> >> particular address, > >> >> >> >> > the easiest way is > with a > >> bit of > >> >> >> typecasting: > >> >> >> >> > >> >> >> >> > struct my *p = > (struct > >> my*) > >> >> 0x10001000; > >> >> >> >> > >> >> >> >> > Then when you > access > >> p->j, for > >> >> example, > >> >> >> the > >> >> >> >> generated code will use > the > >> >> >> >> > absolute address > 0x10001004 > >> (for > >> >> 32-bit > >> >> >> ints). > >> >> >> >> > >> >> >> >> > mvh., > >> >> >> >> > >> >> >> >> > David > >> >> >> >> > >> >> >> >> --- On Mon, 1/31/11, > isuru > >> herath <isuru81@xxxxxxxxx> > >> >> >> >> wrote: > >> >> >> >> > >> >> >> >> > From: isuru herath > <isuru81@xxxxxxxxx> > >> >> >> >> > Subject: Re: > Allocate a > >> variable in > >> >> a known > >> >> >> physical > >> >> >> >> location > >> >> >> >> > To: "Ian Lance > Taylor" > >> <iant@xxxxxxxxxx> > >> >> >> >> > Cc: gcc-help@xxxxxxxxxxx > >> >> >> >> > Date: Monday, > January 31, > >> 2011, 1:01 > >> >> PM > >> >> >> >> > Hi Ian, > >> >> >> >> > > >> >> >> >> > Thanks a lot for > your quick > >> response > >> >> and I am > >> >> >> sorry > >> >> >> >> for not > >> >> >> >> > explaining the > >> >> >> >> > problem > correctly. > >> >> >> >> > > >> >> >> >> > I have a separate > piece of > >> memory > >> >> for which I > >> >> >> have > >> >> >> >> given > >> >> >> >> > physical address > >> >> >> >> > range 0x10001000 > to > >> 0x10101000. I > >> >> want to > >> >> >> allocate > >> >> >> >> > variables in this > >> >> >> >> > address range. To > achieve > >> this I > >> >> create a > >> >> >> structure > >> >> >> >> with > >> >> >> >> > variables I need > >> >> >> >> > to allocate there. > For > >> example if I > >> >> need to > >> >> >> allocate i > >> >> >> >> and > >> >> >> >> > j in the above > >> >> >> >> > address range, I > define a > >> structure > >> >> like > >> >> >> following. > >> >> >> >> > > >> >> >> >> > struct my > >> >> >> >> > { > >> >> >> >> > int i; > >> >> >> >> > int j; > >> >> >> >> > }; > >> >> >> >> > > >> >> >> >> > and then allocate > memory > >> for the > >> >> structure > >> >> >> using mmap > >> >> >> >> like > >> >> >> >> > below.(bear with > >> >> >> >> > me if syntax are > wrong). > >> >> >> >> > > >> >> >> >> > struct my *p = > >> mmap(........); > >> >> >> >> > > >> >> >> >> > when ever I need > to access > >> i, j in > >> >> my code I > >> >> >> access > >> >> >> >> them > >> >> >> >> > via pointer p > like > >> >> >> >> > following. > >> >> >> >> > > >> >> >> >> > p->i or > p->j > >> >> >> >> > > >> >> >> >> > All what I need is > to > >> allocate i and > >> >> j in the > >> >> >> above > >> >> >> >> address > >> >> >> >> > range. Due to > >> >> >> >> > lack of my > knowledge in > >> compiler > >> >> and gcc > >> >> >> this is > >> >> >> >> how > >> >> >> >> > I did it. The > >> >> >> >> > drawback of this > is that to > >> access > >> >> i, it has > >> >> >> to access > >> >> >> >> p > >> >> >> >> > first. This > >> >> >> >> > introduces an > unnecessary > >> access to > >> >> my > >> >> >> statistics. > >> >> >> >> > Therefore if I > could > >> >> >> >> > allocate i and j > without > >> using the > >> >> above > >> >> >> method I > >> >> >> >> thought > >> >> >> >> > my problem will > >> >> >> >> > be solved. > >> >> >> >> > > >> >> >> >> > As you mentioned > in your > >> reply can I > >> >> use > >> >> >> section > >> >> >> >> attribute > >> >> >> >> > to achieve this or > do you > >> have any > >> >> other > >> >> >> suggestion. > >> >> >> >> > > >> >> >> >> > Any help/advice is > greatly > >> >> appreciated. > >> >> >> >> > > >> >> >> >> > regards, > >> >> >> >> > Isuru > >> >> >> >> > > >> >> >> >> > --- On Mon, > 1/31/11, Ian > >> Lance > >> >> Taylor <iant@xxxxxxxxxx> > >> >> >> >> > wrote: > >> >> >> >> > > >> >> >> >> > > From: Ian > Lance Taylor > >> <iant@xxxxxxxxxx> > >> >> >> >> > > Subject: Re: > Allocate > >> a > >> >> variable in a > >> >> >> known > >> >> >> >> physical > >> >> >> >> > location > >> >> >> >> > > To: "isuru > herath" > >> <isuru81@xxxxxxxxx> > >> >> >> >> > > Cc: gcc-help@xxxxxxxxxxx > >> >> >> >> > > Date: Monday, > January > >> 31, 2011, > >> >> 11:21 > >> >> >> AM > >> >> >> >> > > isuru herath > <isuru81@xxxxxxxxx> > >> >> >> >> > > writes: > >> >> >> >> > > > >> >> >> >> > > > I need > to > >> allocate a > >> >> variable in a > >> >> >> known > >> >> >> >> > physical > >> >> >> >> > > location, > let's say I > >> need > >> >> >> >> > > > to > allocate void > >> *p in > >> >> location > >> >> >> >> 0x10001000. I > >> >> >> >> > > was using > mmap to to > >> do this, > >> >> >> >> > > > but in > that > >> manner I can > >> >> only > >> >> >> allocate > >> >> >> >> p[0], > >> >> >> >> > > p[1]...p[n] > in that > >> physical > >> >> >> >> > > > address > range. > >> Therefore > >> >> when I > >> >> >> access > >> >> >> >> p[i], > >> >> >> >> > accesses > >> >> >> >> > > to p results > in > >> >> >> >> > > > outside > >> {0x10001000, > >> >> >> 0x10001000+offset} > >> >> >> >> and > >> >> >> >> > p[i] > >> >> >> >> > > results as an > access > >> in > >> >> >> >> > > > the > range I am > >> interested > >> >> in. > >> >> >> >> > > > >> >> >> >> > > I don't > understand the > >> last > >> >> sentence > >> >> >> there. > >> >> >> >> > > > >> >> >> >> > > > I was > wondering > >> is there a > >> >> was for > >> >> >> me to > >> >> >> >> force > >> >> >> >> > > > to > allocate > >> variable p in > >> >> that > >> >> >> address range > >> >> >> >> or I > >> >> >> >> > am > >> >> >> >> > > looking for > something > >> >> >> >> > > > totally > >> unrealistic. > >> >> Because of the > >> >> >> nature > >> >> >> >> of my > >> >> >> >> > > research I > can use > >> any > >> >> >> >> > > > > optimization(-O2, > >> O3). > >> >> >> >> > > > >> >> >> >> > > If you don't > want to > >> use mmap, > >> >> the > >> >> >> simplest way > >> >> >> >> to put > >> >> >> >> > a > >> >> >> >> > > variable at > a > >> >> >> >> > > specific > location is > >> to put it > >> >> in a > >> >> >> specific > >> >> >> >> section > >> >> >> >> > using > >> >> >> >> > > > __attribute__ > >> >> >> >> > > ((section > ("..."))) > >> and then > >> >> put that > >> >> >> section at > >> >> >> >> a > >> >> >> >> > specific > >> >> >> >> > > address > >> >> >> >> > > using a > linker > >> script. > >> >> >> >> > > > >> >> >> >> > > Ian > >> >> >> >> > > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > >> > > >> > > >> > > >> > > >> > > > > > > > > >