Re: Is SMP system tolerant against CPU crashes?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tetsuo Handa wrote:

Hello,

"Erik Mouw <J.A.K.Mouw@xxxxxxxxxxxxxx>" wrote:
I just want to know whether SMP Linux system is fault tolerant
(tolerant against CPU crashes).
No, but you can make it fault tolerant with a failover system, see the
linux high availability project.
I see.

Another think to look at is CPU hotplugging, which is supported on some
architectures (PowerPC, IIRC).
Sounds fun!
And it would be wonderful if they can solve "crash with locks held" problem.
(But CPU crash doesn't happen frequently, I do hope.)

Thank you very much.

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/


The crash with locks problem is not unique to SMP systems. It is something that happens all the time in distributed systems. What if a client has a lock and the client crashes? The lock is held and no other client can access it.What many distributed systems do is have a lease period. After a period, the client loses the lock unless it asks for another lease. However, there is no guarantee that the shared resource is in a consistent state.

What about SMPs? What if Read-copy-update was used to update it? This means that we make a copy of the resource we want to modify. During this period, we would need a lock. Reader-writer locks are good for this. You might have to modify these locks such that they spin and not sleep. Then, we update this copy and to "commit" the change, we just switch the pointer for the shared resource to point to the one we just modified and deallocate the previous one. This does not solve the problem, but it makes the window of vulnerability much smaller.

If we add leases to this..., then leases coupled with checkpointing could help us. Maybe on lease expiry, we could "rollback" to the previous consistent state.
What say guys? just my thoughts... nothing concrete...
-rahul


--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux