On Mon, Feb 19, 2007 at 06:49:05AM -0800, Siman Hew wrote: > Hello, > > When I try to check the services in a cluster, I am > using "cman_tool services", what I get is something > like this: > hostname:/sbin#cman_tool services > Service Name GID LID State Code > Fence Domain: "default" 0 2 join S-1,1,3 > [] > > DLM Lock Space: "Magma" 3 4 run > [1 2] > > User: "usrm::manager" 2 3 run S-10,200,0 > [1 2] > > Some fields are quite obvious, like Service, Name, but > some are not, like GID, LID, Code and square bracket > under service(I guess it is node list). > Is there anywhere explain what these fields mean? I thought I'd explained these in an email before, but I can't find it. The non-obvious information is not explained because it doesn't have much meaning if you're not looking at the code, and you don't need to use it unless you're debugging the code. But, in case someone does want to start digging through the code: - The numbers in [] are the nodeids of the nodes in that group - GID is the global id of the group, sg->global_id - LID is the local id of the group, sg->local_id - State is the SGST_ value in sg->state none=SGST_NONE join=SGST_JOIN run=SGST_RUN recover=SGST_RECOVER, the number after "recover" is sg->recover_state update=SGST_UEVENT - Code is a combination of information First letter S=SGFL_SEVENT, U=SGFL_UEVENT, N=SGFL_NEED_RECOVERY First number sg->sevent->se_state or sg->uevent.ue_state Second number sg->sevent->se_flags or sg->uevent.ue_flags Third number sg->sevent->se_reply_count or sg->uevent.ue_nodeid Now, on to your specific problem. By the looks of it I'd say that your machine is trying to fence someone. /var/log/messages will usually have some clear information about what's wrong. Source code debugging using the info above is probably the wrong place to start. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster