Re: [Patch 07/12] Chunk: retry initial CLD session open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/18/2010 12:41 AM, Pete Zaitcev wrote:
This was an error in the conversion to ncld. In the cldc code, we
kick the state machine and the natural retries do the rest. Any
failures occure there. But in ncld the original kick can fail too.

Five retries give CLD server time to reboot. If it's down, then
clients refuse to start. This may be a bad idea, or may be not.
We may yet change the retries to be infinite, but for now it's
better if builds terminate somehow in case of unexpected problems.

Signed-off-by: Pete Zaitcev<zaitcev@xxxxxxxxxx>

---
  server/cldu.c |   12 ++++++++++--
  1 file changed, 10 insertions(+), 2 deletions(-)

commit 44cdb98d2cceb2f4e081db2ee38ec60f1c1a8d8d
Author: Master<zaitcev@xxxxxxxxxxxxxxxxxx>
Date:   Sat Apr 17 19:50:06 2010 -0600

     Retry the initial connection to the CLD server.

In the short term, this is acceptable.

In the medium term, this is a protocol detail that should be handled somewhere in libcldc. We want all applications to behave the same way, including the method by which they attempt to contact a master.

Because there could be multiple CLD servers, you cannot think of retries in the context of a single server. This is crucial WRT work on #replica branch, but it is also somewhat relevant to #master, because we might have multiple servers listed in SRV records as fallbacks from which to choose.

You don't want each application implementing this logic, because we want to enforce some level of predictability in master-seeking behavior, and in making decisions about when contacts attempts for -all- servers should cease, as opposed to contact attempts for a -single- server. You don't want it to take 30 minutes to try all servers in a cluster, retrying a number of times on server A, then moving on to server B, etc.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux