Re: atomic operation in 32 bit but no in 64!?

Rene Herman <rene.herman@xxxxxxxxxxxx> · Fri, 29 Feb 2008 18:16:25 +0100

On 29-02-08 01:34, Peter Teoh wrote:

On Sat, Feb 9, 2008 at 8:10 AM, Rene Herman <rene.herman@xxxxxxxxxxxx> wrote:
On 09-02-08 00:22, Diego Woitasen wrote:

 >       I was reading the code of include/linux/fs.h and saw a comment
 >       before i_size_read() that says:
 >
 >       /*
 >        * NOTE: in a 32bit arch with a preemptable kernel and an UP
 >        * compile the i_size_read/write must be atomic with respect to
 >        * the local cpu (unlike with preempt disabled), but they don't
 >        * need to be atomic with respect to other cpus like in true SMP
 >        * (so they need either to either locally disable irq around the
 >        * read or for example on x86 they can be still implemented as a
 >        * cmpxchg8b without the need of the lock prefix). For SMP
 >        * compiles and 64bit archs it makes no difference if preempt is
 >        * enabled or not.
 >        */
 >
 >        I don't understand why this funcion shouldn't be atomic in a 64
 >        bit arch or why it isn't locked. Where is the race condition
 >        prevented?

 In the CPU's ALU. inode->i_size is a 64_bit integer, and access to it is
 atomic on 64-bit. On a 32-bit arch though, a 64-bit load will be split in
 two 32-bit ones where you can get an incoherent value if you're interrupted
 between getting the low and the high 32-bit.

I understood what u are saying, as i_size is loff_t, and loff_t is
defined as "long long".   But the fact is this, of the thousands of
assembly instructions in the kernel, in between any two, it can always
be interrupted, so long as u ensure that the interrupt handler ensure
that all the registers that it modified has been restored back to its
original value upon returning.   So I don't quite understand why it
cannot be interrupted between the upper and lower half of the 32bit
processing.

It's not about registers. i_size_read() is designed to be able to be called 
without the i_sem held meaning it needs to guard against i_size changing out 
from under it.

Say process A wants to know inode->i_size. On a 32-bit arch it's going to be 
split in two 32-bit loads such as (Intel pseudo-syntax):

	mov	eax, [inode->i_size]
	mov	edx, [inode->i_size + 4]

Now imagine process A being preempted just between these two loads by 
process B and process B changing inode->i_size. When process A resumes it 
gets the _new_ upper 32-bits while it already had the _old_ lower 32-bits, 
making for a combined 64-bit value which is complete nonsense.

Now, mind you, exactly how much point there is to any specific code path in 
checking i_size without grabbing i_sem is open for discussion -- even if 
with the locking you get a _coherent_ value, it may still be an _outdated_ 
value if you're preempted exactly after this sequence, but that's a 
higher-level issue. A bit of googling seems to imply stat() wants it 
non-locked. You'd have to ask a VFS person for a more detailed answer as to 
the why at that higher level.

Perhaps Andrew feels chatty...

But the core issue is just that you need to get a coherent value.

Rene.

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ