On Tue, 2008-03-04 at 11:49 +0100, Rudolf Gabler wrote: > Hi Lon, > > Sorry to bother you directly, but although I subscribed to this list I > cannot post to it (I tried several times). Ok. I've CC'd the list. > My problem: we are running 3 shared-root gfs cluster (2 on x86_64, 1 ita64) > and after the last upgrade we are faced with the following messages (example > from cluster 1): > > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Invalid descriptor specified (-111). > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Someone may be attempting something > evil. > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Error while processing get: Invalid > request descriptor > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Invalid descriptor specified (-111). > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Someone may be attempting something > evil. > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Error while processing get: Invalid > request descriptor > Mar 4 10:33:12 bldsrv3 ccsd[1664]: Invalid descriptor specified (-21). I've seen this before - I'll try to dig up what I know. > As far as I understand this, the problem occurs because a connection to the > ccsd fails ("ccs_test connect" in one of the /usr/share/cluster scripts) > because of to many open connections (more than 30?). That could be, and it would make sense. You might have found the source of the problem. > ccs_test connect several times, I get .i.e 5 time a descriptor and then 6 > times a "connection refused". The descriptor numbers starts at number zero, > incrementing and the thing I don't understand is the huge ccsd activity. > After a fresh boot the descriptor number counts to around 1 Million after > one day running. Is this intended (normal behavior)? Maybe its related to a > cman upgrade. The ccs descriptors are non-decreasing. They're not "file descriptors", and they increment by a huge number each time. Don't worry about what the value is ;) The max # of open descriptors is fixed @ compile-time. I think there are a couple things we should do: * Increase the limit (as you noted, the max is 30). * Make the scripts calling ccs_test retry when an error is received. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster