Hi,
Not sure where to look for my issue. I hope someone can point me in the
correct place.
I have been working on a bespoke server package for more than twenty
years. It was originally developed on Solaris (Unix), was ported to
windows and now ported to Linux for the last five years. This system is
in live production under heavy usage everyday. The servers are all
written in C++ and use a version of encoded ONC RPC (without bind) to
communicate server to server and java for the client displays.
Since about six months ago, I have been experiencing a weird issue with
the sockets on my test system. My dev env is on CentsOS 7.7 running on
Virtualbox 6 on a Windows 10 machine. The VM has bridge networks
interface to my lan using a static IP. Our servers talk on the interface
on Virtualbox to other servers possibly on other hosts via my real
network. All works well until I do a massive relability and soak test of
one of our servers. I send a series of large data message every 15
second or so to one of our servers (say Y), expecting that I might see a
lockup and bugs to fix etc in that server Y. But instead of Y server
failing what I see is the well know port that our system uses (ie 2323)
for name lookup requests, block and I then see timouts of on that socket
(this is a different server say Z). All the others servers (A..Y) get
timeouts communicating to Z from then on. This effect I don't see on
other OSs with similar tests.
If I systemctl stop our service and then restart the servers A-Y start
but continue to fail with timeouts to Z. Reboot does the same. I have
change the well know port to 23232 and it still fails. I have run the
servers in the systemctl as a new user and it still fails. As a mad idea
I change the interface so the servers talk on the virtual box internal
network and the system returns to operation. Also if I run the servers
manually on the command line as my user account they work.
It kind of looks like a firewall/anti-virus/tojan block rule on our well
known port 2323 or Z server. As far as I can see the CentOS firewall is
not running. The Norton firewall on my PC does not seam to have an rules
or warning about my virtualbox ips or ports. Our servers don't cache any
ip data.
The first time this happened I was too busy to look at it and just
restored the VM from a backup. It then happened a second time a month
ago and spend a day looking at the issue found nothing and restored from
backup again, putting it down to the centos security update I have just
done earlier that day. It happen for a third time on Friday (24th). This
time I have done no updates since the last restore so I can be sure its
not a centos update. I checked again could find nothing wrong, did all
the updates and still nothing worked. Investigated all the firewall and
interfaces and it works. I need the system to work on the external
bridge network interface and I can't think of anything else todo. The
socket error messages are just Timeout, there is nothing in dmesg, or
journal that suggests anything.
I am now a complete lost to what is happening and why.
Regards
David Finch
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos