On Sun, Nov 18, 2012 at 10:04 PM, Hei Chan <structurechart@xxxxxxxxx> wrote: > Hmm...I think I must miss something. > > Let's say a[0] is at 0x4 (so not an address divisible by 8), and let's say > only the first 4 characters are on the cache line. > > Then, long* p = (long*)a will give 0x4 in any situation as the address of an > array can't be changed. So it shouldn't be a problem, no? Or do you mean > when I try to de-reference p to read the value p pointing to, then there > will be a problem as only first 4 bytes are in the cache line? Yes. Ian > From: Ian Lance Taylor <iant@xxxxxxxxxx> > To: Hei Chan <structurechart@xxxxxxxxx> > Cc: "gcc-help@xxxxxxxxxxx" <gcc-help@xxxxxxxxxxx> > Sent: Sunday, November 18, 2012 9:46 PM > Subject: Re: __sync_fetch > > On Sun, Nov 18, 2012 at 9:38 PM, Hei Chan <structurechart@xxxxxxxxx> wrote: >> >> I am not sure why the casting in your example will cause any issue as I >> thought without __pack__, the variable a will be aligned by gcc, no? And >> you are trying to get the address of &a[0] (which is fixed) and then cast to >> long*, shouldn't have anything do with alignment....or do you mean long p = >> (long)(*a)? > > In my example, a is a char array. A char array is not required to be > aligned to an 8-byte boundary. It's true that GCC will align the > variable a, but it may align it to an odd address. So if I cast the > array a to long*, I may get a long* that is not aligned to an 8-byte > boundary. > > Ian > > > >> ----- Original Message ----- >> From: Ian Lance Taylor <iant@xxxxxxxxxx> >> To: Hei Chan <structurechart@xxxxxxxxx> >> Cc: >> Sent: Sunday, November 18, 2012 9:14 PM >> Subject: Re: __sync_fetch >> >> On Sun, Nov 18, 2012 at 9:03 PM, Hei Chan <structurechart@xxxxxxxxx> >> wrote: >>> You mean: >>> 1. the variable I am trying to read doesn't belong to a packed struct; >>> and >> >> Yes. >> >>> 2. I am not casting to something not 8 byte long on a 64 bit machine? I >>> thought casting would happen after the variable is stored in register? >> >> I'm sorry, I don't know what you mean. >> >> By casting I mean something like >> char a[16]; >> long *p = (long *) a; >> Nothing here makes a aligned. >> >> If you want to continue this discussion, please use the mailing list, >> not private mail to me. Thanks. >> >> Ian >> >> >> >> >>> From: Ian Lance Taylor <iant@xxxxxxxxxx> >>> To: Hei Chan <structurechart@xxxxxxxxx> >>> Sent: Sunday, November 18, 2012 7:26 PM >>> Subject: Re: __sync_fetch >>> >>> On Sun, Nov 18, 2012 at 7:02 PM, Hei Chan <structurechart@xxxxxxxxx> >>> wrote: >>>> So the case you mentioned about unaligned cache line shouldn't happen, >>>> right? >>> >>> Unless you are doing something unusual involving casts or packed >>> structs, that is correct. >>> >>> Ian >>> >>> >>>> Just want to check (so that I can shave another few hundreds nano sec in >>>> my code). >>>> >>>> Thanks in advance. >>>> >>>> >>>> ----- Original Message ----- >>>> From: Ian Lance Taylor <iant@xxxxxxxxxx> >>>> To: Hei Chan <structurechart@xxxxxxxxx> >>>> Cc: "gcc-help@xxxxxxxxxxx" <gcc-help@xxxxxxxxxxx> >>>> Sent: Sunday, November 18, 2012 6:57 PM >>>> Subject: Re: __sync_fetch >>>> >>>> On Sun, Nov 18, 2012 at 11:31 AM, Hei Chan <structurechart@xxxxxxxxx> >>>> wrote: >>>>> I just spoke with my coworker about this. We just wonder whether C++ >>>>> standard/GCC guarantees all the variables will be aligned if we don't >>>>> request for unaligned (e.g. __packed__). >>>> >>>> Yes. >>>> >>>> Ian >>>> >>>>> ----- Original Message ----- >>>>> From: Ian Lance Taylor <iant@xxxxxxxxxx> >>>>> To: Hei Chan <structurechart@xxxxxxxxx> >>>>> Cc: "gcc-help@xxxxxxxxxxx" <gcc-help@xxxxxxxxxxx> >>>>> Sent: Sunday, November 18, 2012 12:18 AM >>>>> Subject: Re: __sync_fetch >>>>> >>>>> On Sun, Nov 18, 2012 at 12:10 AM, Hei Chan <structurechart@xxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> How about on a 64-bit Intel processor, I use __sync_fetch_and_*() to >>>>>> write to a long variable, but never use any __sync_*() to read? Under >>>>>> what >>>>>> situation that I will read something invalid? >>>>> >>>>> On a 64-bit Intel processor, if the 64-bit value is at an aligned >>>>> adress, then to the best of my knowledge that will always be fine. If >>>>> the 64-bit value is misaligned and crosses a cache line, then if you >>>>> are unlucky I believe that a write can occur in between reading the >>>>> two different cache lines, causing you to read a value that was never >>>>> written. >>>>> >>>>> I feel compelled to add that attempting to reason about this sort of >>>>> thing generally means that you are making a mistake. Unless you are >>>>> writing very low-level code, such as the implementation of mutex, it's >>>>> best to avoid trying to think this way. >>>>> >>>>> Ian >>>>> >>>>> >>>>> >>>>>> ----- Original Message ----- >>>>>> From: Ian Lance Taylor <iant@xxxxxxxxxx> >>>>>> To: Hei Chan <structurechart@xxxxxxxxx> >>>>>> Cc: "gcc-help@xxxxxxxxxxx" <gcc-help@xxxxxxxxxxx> >>>>>> Sent: Sunday, November 18, 2012 12:07 AM >>>>>> Subject: Re: __sync_fetch >>>>>> >>>>>> On Sat, Nov 17, 2012 at 11:04 PM, Hei Chan <structurechart@xxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> After searching more for info, it seems like even though on a >>>>>>> 64-bit machine, reading a long (i.e. 8 bytes) is one operation, it >>>>>>> might not give the "correct" value: >>>>>>> http://gcc.gnu.org/ml/gcc/2008-03/msg00201.html >>>>>>> >>>>>>> And so, we have to use __sync_fetch_and_add(&x, 0) to read? >>>>>>> >>>>>>> Could >>>>>>> someone elaborate a situation that reading a long variable won't get >>>>>>> the correct value given that all writes in the application use >>>>>>> __sync_fetch_*()? >>>>>> >>>>>> If you always use __sync_fetch_and_add(&x, 0) to read a value, and you >>>>>> always use __sync_fetch_and_add to write the value also with some >>>>>> appropriate detla, then all the accesses to that variable should be >>>>>> atomic with sequential consistency. That should be true on any >>>>>> processors that implements __sync_fetch_and_add in the appropriate >>>>>> size. >>>>>> >>>>>> Ian >>>>>> >>>>> >>>> >>> >>> >> > >