Weird problem with my app and blocked ports

david allan finch <david.allan@xxxxxxxxx> · Mon, 27 Apr 2020 11:52:15 +0100

Hi,

Not sure where to look for my issue. I hope someone can point me in the 
correct place.

I have been working on a bespoke server package for more than twenty 
years. It was originally developed on Solaris (Unix), was ported to 
windows and now ported to Linux for the last five years. This system is 
in live production under heavy usage everyday. The servers are all 
written in C++ and use a version of encoded ONC RPC (without bind) to 
communicate server to server and java for the client displays.

Since about six months ago, I have been experiencing a weird issue with 
the sockets on my test system. My dev env is on CentsOS 7.7 running on 
Virtualbox 6 on a Windows 10 machine. The VM has bridge networks 
interface to my lan using a static IP. Our servers talk on the interface 
on Virtualbox to other servers possibly on other hosts via my real 
network. All works well until I do a massive relability and soak test of 
one of our servers. I send a series of large data message every 15 
second or so to one of our servers (say Y), expecting that I might see a 
lockup and bugs to fix etc in that server Y. But instead of Y server 
failing what I see is the well know port that our system uses (ie 2323) 
for name lookup requests, block and I then see timouts of on that socket 
(this is a different server say Z). All the others servers (A..Y) get 
timeouts communicating to Z from then on. This effect I don't see on 
other OSs with similar tests.

If I systemctl stop our service and then restart the servers A-Y start 
but continue to fail with timeouts to Z. Reboot does the same. I have 
change the well know port to 23232 and it still fails. I have run the 
servers in the systemctl as a new user and it still fails. As a mad idea 
I change the interface so the servers talk on the virtual box internal 
network and the system returns to operation. Also if I run the servers 
manually on the command line as my user account they work.

It kind of looks like a firewall/anti-virus/tojan block rule on our well 
known port 2323 or Z server. As far as I can see the CentOS firewall is 
not running. The Norton firewall on my PC does not seam to have an rules 
or warning about my virtualbox ips or ports. Our servers don't cache any 
ip data.

The first time this happened I was too busy to look at it and just 
restored the VM from a backup. It then happened a second time a month 
ago and spend a day looking at the issue found nothing and restored from 
backup again, putting it down to the centos security update I have just 
done earlier that day. It happen for a third time on Friday (24th). This 
time I have done no updates since the last restore so I can be sure its 
not a centos update. I checked again could find nothing wrong, did all 
the updates and still nothing worked. Investigated all the firewall and 
interfaces and it works. I need the system to work on the external 
bridge network interface and I can't think of anything else todo. The 
socket error messages are just Timeout, there is nothing in dmesg, or 
journal that suggests anything.

I am now a complete lost to what is happening and why.

Regards
David Finch

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos