Hi Brian, Thanks for the mail and sorry for the late response. I was trying to store the pointer to the structure, in a register variable as shown below. register unsigned int address = (unsigned int)p; // p is the pointer to the structure Then later in the code when I need to access i, rather than doing it like p->i, I do it like ((struct thread_args *)address)->i. I got the same statistics. Now I will try your method. If I understood correctly following is your suggestion. int *i = get_my_memory(sizeof(int)); Later in the code, than using i++, to use (*i)++. I will try that and let you know how it goes. Thanks and regards, Isuru --- On Tue, 2/1/11, Brian Budge <brian.budge@xxxxxxxxx> wrote: > From: Brian Budge <brian.budge@xxxxxxxxx> > Subject: Re: Allocate a variable in a known physical location > To: "isuru herath" <isuru81@xxxxxxxxx> > Cc: gcc-help@xxxxxxxxxxx > Date: Tuesday, February 1, 2011, 11:16 AM > Ah, I see, you canNOT use any > optimization. I think I misunderstood earlier. > > Perhaps something like this: > > int *i, *j, *k; > fill_in_counters(&i, &j, &k); //calls mmap, and > assigns the first > three ints-worth of the memory to i, j, k > > Then use *i, *j, and *k. > > Just to make sure. You want each use of these to > produce exactly one > load - not zero or two? > > If I look at the below, I'd expect each time through the > inner loop to > produce 8 accesses to counters, an access to m_size, and an > access to > each of mr, m1, and m2. That's 12 loads * 256 * 256 * > 128. If you > add the minor loop accesses, this is probably what you're > talking > about with "100893832". How are mr, m1, and m2 > defined? Are they > type **, or type *[]? Because if you're allocating > separate buffers, > this will increase your accesses by three in each loop (12 > above goes > to 15). > > Unsure where the rest is coming from. You might need > to dump the > assembly for this. > > Brian > > On Tue, Feb 1, 2011 at 10:09 AM, isuru herath <isuru81@xxxxxxxxx> > wrote: > > Hi Brian, > > > > Thanks for the quick reply. Following is the initial > code. t_id is the > > thread id and n_t is number of threads. m_size is > 256. > > > > for (i= (t_id*(m_size/n_t)); i < ((m_size/n_t) + > (m_size/n_t)*t_id); i++) > > { > > for (j=0; j < m_size; j++) > > { > > for (k=0; k < m_size; k++) > > { > > > mr[i][j] += m1[i][k] * m2[k][j]; > > } > > } > > } > > For this code I got 201557258 L1 accesses for > processors 0. I only used 2 > > thread. > > > > I wanted to allocate i, j, k, n_t, t_id and m_size in > a separate area of > > memory. Therefore I created a structure as follows. > > > > struct thread_args > > { > > int i; > > int j; > > int k; > > int t_id; > > int m_size; > > int n_t; > > }; > > > > Then I allocate space for this structure from this > area of memory. To do > > this, I pre-allocated large area of memory and later I > allocate space for > > this structure from it. > > > > struct thread_args* p = (struct > thread_args*)get_my_memory(sizeof(struct thread_args)); > > > > So I changed my program, > > > > for (p->i= (p->t_id*(p->m_size/p->num_t)); > p->i < ((p->m_size/p->num_t) + > (p->m_size/p->num_t)*p->t_id); p->i++) > > { > > for (p->j=0; p->j < p->m_size; > p->j++) > > { > > for (p->k=0; p->k < > p->m_size; p->k++) > > { > > mr[p->i][p->j] += > m1[p->i][p->k] * m2[p->k][p->j]; > > } > > } > > } > > > > Then I checked the statics, I got 100893832 accesses > in the area I am > > interested in, but my total L1 cache accesses has > increased to 302450960. > > I believe increasing from 201557258 in early case to > 302450960 in current > > case has resulted from additional pointer access > occurred for every i, j, > > k.. access. Also addition of 100893832 and 201557258 > is roughly equal to > > 302450960. I also followed the suggestion by Fabi, > still the numbers are > > same and I realized even though I used *pi in my code, > it might access pi > > first and then access the address pointed by pi next. > I cannot use any > > optimization (-O2 or -O3) > > > > All what I need to do is to allocate i, j, k in the > area of memory I am > > interested in. So do you think this is impossible or > is there a workaround > > for this. > > > > Any help/advice is greatly appreciated. > > > > regards, > > Isuru > > > > --- On Tue, 2/1/11, Brian Budge <brian.budge@xxxxxxxxx> > wrote: > > > >> From: Brian Budge <brian.budge@xxxxxxxxx> > >> Subject: Re: Allocate a variable in a known > physical location > >> To: "isuru herath" <isuru81@xxxxxxxxx> > >> Cc: gcc-help@xxxxxxxxxxx > >> Date: Tuesday, February 1, 2011, 9:40 AM > >> Maybe the full code of the for loop, > >> as well as the number of > >> iterations would help us help you. > >> > >> Brian > >> > >> On Tue, Feb 1, 2011 at 9:06 AM, isuru herath > <isuru81@xxxxxxxxx> > >> wrote: > >> > Hi Brian, > >> > > >> > Well, this is related with my research. I am > studying > >> cache behavior. I am interested in allocating > certain > >> variables in a known physical address range. The > way I > >> follow to do this is to allocate them in a > structure and > >> then allocate space for this structure in the > address space > >> I am interested in. Later in the code I access > these > >> variable via a pointer to that structure. This > introduces > >> another cache access(which is the access to > pointer). So I > >> am looking for another way to allocate these > variables so > >> that it doesn't introduces another access. > >> > > >> > regards, > >> > Isuru > >> > > >> > --- On Tue, 2/1/11, Brian Budge <brian.budge@xxxxxxxxx> > >> wrote: > >> > > >> >> From: Brian Budge <brian.budge@xxxxxxxxx> > >> >> Subject: Re: Allocate a variable in a > known > >> physical location > >> >> To: "isuru herath" <isuru81@xxxxxxxxx> > >> >> Cc: gcc-help@xxxxxxxxxxx, > >> Cenedese@xxxxxxxx > >> >> Date: Tuesday, February 1, 2011, 8:43 AM > >> >> So you are counting the number of > >> >> dereferences/loads? > >> >> > >> >> What optimization level are you using? > Depending > >> on > >> >> your code, you > >> >> may also need to specify that these > addresses > >> cannot alias > >> >> one > >> >> another, as the potentially aliasing > variables may > >> require > >> >> more loads, > >> >> depending on how you use the pointers. > >> >> > >> >> Is this for an experiment, or for real > usable > >> code? > >> >> > >> >> Brian > >> >> > >> >> On Tue, Feb 1, 2011 at 7:53 AM, isuru > herath > >> <isuru81@xxxxxxxxx> > >> >> wrote: > >> >> > Hi Fabi, > >> >> > > >> >> > Thanks for the reply. I tried that, > but still > >> numbers > >> >> don't change. Let me > >> >> > describe the scenario. > >> >> > > >> >> > My code without any modification I > got > >> 201557258 > >> >> accesses. I needed to > >> >> > allocate those i and j variables in > a > >> separate area of > >> >> memory. To do that > >> >> > I follow the method described > earlier(using > >> a > >> >> structure). Therefore I got > >> >> > accesses in that separate area. I > got > >> 100893832 > >> >> accesses in that area, but > >> >> > my total accesses are increased to > 302450960. > >> I > >> >> thought this is because > >> >> > every time I access variable i or j, > I have > >> to access > >> >> poniter p first. No > >> >> > I tried Fabi's suggestion. code > shown below > >> >> > > >> >> > int* p_i = &(p->i); > >> >> > int* p_j = &(p->j); > >> >> > int* p_k = &(p->k); > >> >> > > >> >> > for (*p_k=0; *p_k < *p_mat_size; > >> (*p_k)++) > >> >> > ... > >> >> > ... > >> >> > > >> >> > Still I got total access as > 302450960. Could > >> somebody > >> >> help me to > >> >> > understand this. > >> >> > > >> >> > Any help/advice is greatly > appreciated. > >> >> > > >> >> > regards, > >> >> > Isuru > >> >> > > >> >> >> Once you have p->i, you can > also do > >> int* > >> >> pi=&(p->i); > >> >> >> So *pi=1 will only be one > access. > >> >> > > >> >> >> bye Fabi > >> >> > > >> >> > > >> >> > --- On Tue, 2/1/11, isuru herath > <isuru81@xxxxxxxxx> > >> >> wrote: > >> >> > > >> >> >> From: isuru herath <isuru81@xxxxxxxxx> > >> >> >> Subject: Re: Allocate a variable > in a > >> known > >> >> physical location > >> >> >> To: gcc-help@xxxxxxxxxxx > >> >> >> Cc: david@xxxxxxxxxxxxxxx > >> >> >> Date: Tuesday, February 1, 2011, > 3:07 AM > >> >> >> Hi David, > >> >> >> > >> >> >> Thanks a lot for the reply. The > address > >> 0x10001000 > >> >> is a > >> >> >> physical address > >> >> >> and not a virtual address. I > thought we > >> can only > >> >> do this > >> >> >> type casting with > >> >> >> virtual addresses. Anyway I > tried the > >> method you > >> >> suggested > >> >> >> and I got a > >> >> >> segmentation fault. > >> >> >> > >> >> >> I use mmap to map those > physical > >> addresses to > >> >> virtual > >> >> >> addresses, because > >> >> >> OS(linux) in unaware of this > other piece > >> of memory > >> >> which > >> >> >> uses physical > >> >> >> address range 0x10001000 to > 0x10101000. > >> >> >> > >> >> >> In my example, when I use my > method to > >> access i > >> >> via pointer > >> >> >> p (p->i), it > >> >> >> first accesses p and then > accesses i. But > >> this > >> >> introduces > >> >> >> unnecessary > >> >> >> access p. Therefore I was > wondering how > >> to > >> >> allocate i in > >> >> >> the above > >> >> >> physical region.(Please note > that I cant > >> use any > >> >> >> optimization -O2, -O3) > >> >> >> > >> >> >> I was looking in section > attribute, but > >> still > >> >> couldn't > >> >> >> figure out how to > >> >> >> use it, also I am not sure it is > the > >> correct way > >> >> to do > >> >> >> this. > >> >> >> > >> >> >> any help/suggestion is greatly > >> appreciated. > >> >> >> > >> >> >> regards, > >> >> >> Isuru > >> >> >> > >> >> >> > I don't know what OS you > are using, > >> or what > >> >> you want > >> >> >> to do with mmap. > >> >> >> > But if you have struct that > you want > >> to > >> >> access at a > >> >> >> particular address, > >> >> >> > the easiest way is with a > bit of > >> >> typecasting: > >> >> >> > >> >> >> > struct my *p = (struct > my*) > >> 0x10001000; > >> >> >> > >> >> >> > Then when you access > p->j, for > >> example, > >> >> the > >> >> >> generated code will use the > >> >> >> > absolute address 0x10001004 > (for > >> 32-bit > >> >> ints). > >> >> >> > >> >> >> > mvh., > >> >> >> > >> >> >> > David > >> >> >> > >> >> >> --- On Mon, 1/31/11, isuru > herath <isuru81@xxxxxxxxx> > >> >> >> wrote: > >> >> >> > >> >> >> > From: isuru herath <isuru81@xxxxxxxxx> > >> >> >> > Subject: Re: Allocate a > variable in > >> a known > >> >> physical > >> >> >> location > >> >> >> > To: "Ian Lance Taylor" > <iant@xxxxxxxxxx> > >> >> >> > Cc: gcc-help@xxxxxxxxxxx > >> >> >> > Date: Monday, January 31, > 2011, 1:01 > >> PM > >> >> >> > Hi Ian, > >> >> >> > > >> >> >> > Thanks a lot for your quick > response > >> and I am > >> >> sorry > >> >> >> for not > >> >> >> > explaining the > >> >> >> > problem correctly. > >> >> >> > > >> >> >> > I have a separate piece of > memory > >> for which I > >> >> have > >> >> >> given > >> >> >> > physical address > >> >> >> > range 0x10001000 to > 0x10101000. I > >> want to > >> >> allocate > >> >> >> > variables in this > >> >> >> > address range. To achieve > this I > >> create a > >> >> structure > >> >> >> with > >> >> >> > variables I need > >> >> >> > to allocate there. For > example if I > >> need to > >> >> allocate i > >> >> >> and > >> >> >> > j in the above > >> >> >> > address range, I define a > structure > >> like > >> >> following. > >> >> >> > > >> >> >> > struct my > >> >> >> > { > >> >> >> > int i; > >> >> >> > int j; > >> >> >> > }; > >> >> >> > > >> >> >> > and then allocate memory > for the > >> structure > >> >> using mmap > >> >> >> like > >> >> >> > below.(bear with > >> >> >> > me if syntax are wrong). > >> >> >> > > >> >> >> > struct my *p = > mmap(........); > >> >> >> > > >> >> >> > when ever I need to access > i, j in > >> my code I > >> >> access > >> >> >> them > >> >> >> > via pointer p like > >> >> >> > following. > >> >> >> > > >> >> >> > p->i or p->j > >> >> >> > > >> >> >> > All what I need is to > allocate i and > >> j in the > >> >> above > >> >> >> address > >> >> >> > range. Due to > >> >> >> > lack of my knowledge in > compiler > >> and gcc > >> >> this is > >> >> >> how > >> >> >> > I did it. The > >> >> >> > drawback of this is that to > access > >> i, it has > >> >> to access > >> >> >> p > >> >> >> > first. This > >> >> >> > introduces an unnecessary > access to > >> my > >> >> statistics. > >> >> >> > Therefore if I could > >> >> >> > allocate i and j without > using the > >> above > >> >> method I > >> >> >> thought > >> >> >> > my problem will > >> >> >> > be solved. > >> >> >> > > >> >> >> > As you mentioned in your > reply can I > >> use > >> >> section > >> >> >> attribute > >> >> >> > to achieve this or do you > have any > >> other > >> >> suggestion. > >> >> >> > > >> >> >> > Any help/advice is greatly > >> appreciated. > >> >> >> > > >> >> >> > regards, > >> >> >> > Isuru > >> >> >> > > >> >> >> > --- On Mon, 1/31/11, Ian > Lance > >> Taylor <iant@xxxxxxxxxx> > >> >> >> > wrote: > >> >> >> > > >> >> >> > > From: Ian Lance Taylor > <iant@xxxxxxxxxx> > >> >> >> > > Subject: Re: Allocate > a > >> variable in a > >> >> known > >> >> >> physical > >> >> >> > location > >> >> >> > > To: "isuru herath" > <isuru81@xxxxxxxxx> > >> >> >> > > Cc: gcc-help@xxxxxxxxxxx > >> >> >> > > Date: Monday, January > 31, 2011, > >> 11:21 > >> >> AM > >> >> >> > > isuru herath <isuru81@xxxxxxxxx> > >> >> >> > > writes: > >> >> >> > > > >> >> >> > > > I need to > allocate a > >> variable in a > >> >> known > >> >> >> > physical > >> >> >> > > location, let's say I > need > >> >> >> > > > to allocate void > *p in > >> location > >> >> >> 0x10001000. I > >> >> >> > > was using mmap to to > do this, > >> >> >> > > > but in that > manner I can > >> only > >> >> allocate > >> >> >> p[0], > >> >> >> > > p[1]...p[n] in that > physical > >> >> >> > > > address range. > Therefore > >> when I > >> >> access > >> >> >> p[i], > >> >> >> > accesses > >> >> >> > > to p results in > >> >> >> > > > outside > {0x10001000, > >> >> 0x10001000+offset} > >> >> >> and > >> >> >> > p[i] > >> >> >> > > results as an access > in > >> >> >> > > > the range I am > interested > >> in. > >> >> >> > > > >> >> >> > > I don't understand the > last > >> sentence > >> >> there. > >> >> >> > > > >> >> >> > > > I was wondering > is there a > >> was for > >> >> me to > >> >> >> force > >> >> >> > > > to allocate > variable p in > >> that > >> >> address range > >> >> >> or I > >> >> >> > am > >> >> >> > > looking for something > >> >> >> > > > totally > unrealistic. > >> Because of the > >> >> nature > >> >> >> of my > >> >> >> > > research I can use > any > >> >> >> > > > optimization(-O2, > O3). > >> >> >> > > > >> >> >> > > If you don't want to > use mmap, > >> the > >> >> simplest way > >> >> >> to put > >> >> >> > a > >> >> >> > > variable at a > >> >> >> > > specific location is > to put it > >> in a > >> >> specific > >> >> >> section > >> >> >> > using > >> >> >> > > __attribute__ > >> >> >> > > ((section ("..."))) > and then > >> put that > >> >> section at > >> >> >> a > >> >> >> > specific > >> >> >> > > address > >> >> >> > > using a linker > script. > >> >> >> > > > >> >> >> > > Ian > >> >> >> > > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > >> > > >> > > >> > > >> > > >> > > > > > > > > >