On 29-02-08 01:34, Peter Teoh wrote:
On Sat, Feb 9, 2008 at 8:10 AM, Rene Herman <rene.herman@xxxxxxxxxxxx> wrote:
On 09-02-08 00:22, Diego Woitasen wrote:
> I was reading the code of include/linux/fs.h and saw a comment
> before i_size_read() that says:
>
> /*
> * NOTE: in a 32bit arch with a preemptable kernel and an UP
> * compile the i_size_read/write must be atomic with respect to
> * the local cpu (unlike with preempt disabled), but they don't
> * need to be atomic with respect to other cpus like in true SMP
> * (so they need either to either locally disable irq around the
> * read or for example on x86 they can be still implemented as a
> * cmpxchg8b without the need of the lock prefix). For SMP
> * compiles and 64bit archs it makes no difference if preempt is
> * enabled or not.
> */
>
> I don't understand why this funcion shouldn't be atomic in a 64
> bit arch or why it isn't locked. Where is the race condition
> prevented?
In the CPU's ALU. inode->i_size is a 64_bit integer, and access to it is
atomic on 64-bit. On a 32-bit arch though, a 64-bit load will be split in
two 32-bit ones where you can get an incoherent value if you're interrupted
between getting the low and the high 32-bit.
I understood what u are saying, as i_size is loff_t, and loff_t is
defined as "long long". But the fact is this, of the thousands of
assembly instructions in the kernel, in between any two, it can always
be interrupted, so long as u ensure that the interrupt handler ensure
that all the registers that it modified has been restored back to its
original value upon returning. So I don't quite understand why it
cannot be interrupted between the upper and lower half of the 32bit
processing.
It's not about registers. i_size_read() is designed to be able to be called
without the i_sem held meaning it needs to guard against i_size changing out
from under it.
Say process A wants to know inode->i_size. On a 32-bit arch it's going to be
split in two 32-bit loads such as (Intel pseudo-syntax):
mov eax, [inode->i_size]
mov edx, [inode->i_size + 4]
Now imagine process A being preempted just between these two loads by
process B and process B changing inode->i_size. When process A resumes it
gets the _new_ upper 32-bits while it already had the _old_ lower 32-bits,
making for a combined 64-bit value which is complete nonsense.
Now, mind you, exactly how much point there is to any specific code path in
checking i_size without grabbing i_sem is open for discussion -- even if
with the locking you get a _coherent_ value, it may still be an _outdated_
value if you're preempted exactly after this sequence, but that's a
higher-level issue. A bit of googling seems to imply stat() wants it
non-locked. You'd have to ask a VFS person for a more detailed answer as to
the why at that higher level.
Perhaps Andrew feels chatty...
But the core issue is just that you need to get a coherent value.
Rene.
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ