Tetsuo Handa wrote:
Hello,
"Erik Mouw <J.A.K.Mouw@xxxxxxxxxxxxxx>" wrote:
I just want to know whether SMP Linux system is fault tolerant
(tolerant against CPU crashes).
No, but you can make it fault tolerant with a failover system, see the
linux high availability project.
I see.
Another think to look at is CPU hotplugging, which is supported on some
architectures (PowerPC, IIRC).
Sounds fun!
And it would be wonderful if they can solve "crash with locks held" problem.
(But CPU crash doesn't happen frequently, I do hope.)
Thank you very much.
--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/
The crash with locks problem is not unique to SMP systems. It is
something that happens all the time in distributed systems. What if a
client has a lock and the client crashes? The lock is held and no other
client can access it.What many distributed systems do is have a lease
period. After a period, the client loses the lock unless it asks for
another lease. However, there is no guarantee that the shared resource
is in a consistent state.
What about SMPs? What if Read-copy-update was used to update it? This
means that we make a copy of the resource we want to modify. During this
period, we would need a lock. Reader-writer locks are good for this. You
might have to modify these locks such that they spin and not sleep.
Then, we update this copy and to "commit" the change, we just switch the
pointer for the shared resource to point to the one we just modified and
deallocate the previous one. This does not solve the problem, but it
makes the window of vulnerability much smaller.
If we add leases to this..., then leases coupled with checkpointing
could help us. Maybe on lease expiry, we could "rollback" to the
previous consistent state.
What say guys? just my thoughts... nothing concrete...
-rahul
--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/