TCP Server (TLS) ReceiveNetworkData issue

Just starting out? Need help? Post your questions and find answers here.
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

Hello,

I'm encountering a really strange issue on the server side of my TCP client/server software.
The client is deployed on about 1,200 Windows 11 machines; they connect to the server as soon as they start up and disconnect when they shut down.

Until now, the server was running on Debian, but I wanted to migrate it to Windows (11 LTSC).
After several weeks of modifications, optimizations, and debugging, I believe the code is functional, BUT I'm facing a “bug” whose source I can’t identify.

Randomly, the server seems to “lose access to the receiving sockets.”
More specifically, the ReceiveNetworkData() function starts returning -1 for about ten seconds, then it begins receiving data normally again.

The more clients are connected, the more frequently the issue occurs. It can happen one minute after the server restarts, or twenty minutes later.
For example, it ran for three days straight when fewer than 10 clients were connected. Earlier today, with a little over 200 clients, the problem occurred every 1–10 minutes (randomly).

Since the server runs on a VM (Proxmox), I migrated it to a physical machine, but the issue appears in exactly the same way.
I'm using secure TLS connections, and I'm wondering if the problem might be related to that library.

I'm going to try migrating the server to Windows 10 and Windows Server just in case...

If you have any ideas or tests I could run, feel free to let me know.
Thanks.
Windows 11 Pro x64
PureBasic 6.30 x64
User avatar
STARGÅTE
Addict
Addict
Posts: 2308
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by STARGÅTE »

Without any piece of code, it's difficult to answer.
How do you call ReceiveNetworkData()? Immediately after NetworkServerEvent()/NetworkClientEvent()? In a Thread?
Is the connection still active when you call ReceiveNetworkData(), or do you receive a #PB_NetworkEvent_Disconnect before?
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

I will explain a bit better. The server run the NetworkserverEvent main loop in a thread : #PB_NetworkEvent_Connect, #PB_NetworkEvent_Data, and #PB_NetworkEvent_Disconnect. Indeed ReceiveNetworkData() is called inside #PB_NetworkEvent_Data case. All connections are still active when the problem arises. And it's because I implemented a Heartbeat function that I noticed the ReceiveNetworkData() error. Because of this error, no Ping packets are received from the clients for several seconds and the server thinks they are not connected anymore (ghost connection), so it forced closes the connections.

EDIT : When I say no packet is received, it's not entirely true, they are received BUT they are not read by ReceiveNetworkData(). When this function starts working again, it read all the previous packets which was waiting inside the socket buffer.

here is the code :

Code: Select all

Procedure NetworkEventThread(*val)
	
	Protected ServerEvent, ClientID, IpClient, Index_Client, IpClientString$
	Protected Index_Client_Array
	Protected i
	Protected BuffLength = 32768
	Protected *BufferReceive
	Protected Received_Packet.s, Packet_Incomplet.s, Infos_Completes$
	Protected SocketHandle, octets_restants, bytesRcv
	Protected nb_paquet, count, Paquet$
	Protected Cumul_Taille_Paquets = 0
	Protected RcvTimeout = 5000
	Protected ThreadID_temp, UID_temp.s, AdresseIP_temp.s
	Protected NewList ThreadIDList_temp.i()
	Protected ind
	Protected indexArray
	Protected *sb
	Protected MutexSendDataClient.i, MutexClient.i
	Protected erreur.i
	
	CreateThread(@CheckClientTimeOut(), 0)
	
	Repeat
		
		ServerEvent = NetworkServerEvent(0)
		
		Select ServerEvent

			Case #PB_NetworkEvent_None ;--- #PB_NetworkEvent_None				
				Delay(1)
				

			Case #PB_NetworkEvent_Connect ;--- #PB_NetworkEvent_Connect

				ClientID = EventClient()
				IpClient = GetClientIP(ClientID)
				IpClientString$ = IPString(IpClient)
				Index_Client = -1
							
				LockMutex(Mutex)
				For i = 0 To #Max_Client_Online - 1
					If ClientArray_Online(i)\ConnectionID = 0 ; la case est libre
						Index_Client = i
					EndIf
				Next
				UnlockMutex(Mutex)
			
				If Index_Client <> -1
					LockMutex(Mutex)
					ClientArray_Online(Index_Client)\ConnectionID 	 	  = ClientID
					ClientArray_Online(Index_Client)\SocketHandle		  = ConnectionID(ClientID)
					ClientArray_Online(Index_Client)\AdresseIP	 		  = IpClientString$
					ClientArray_Online(Index_Client)\MutexSendDataClient = CreateMutex()
					ClientArray_Online(Index_Client)\MutexClient	 		  = CreateMutex()
					MutexSendDataClient = ClientArray_Online(Index_Client)\MutexSendDataClient
					MutexClient 		  = ClientArray_Online(Index_Client)\MutexClient
					UnlockMutex(Mutex)

					; création de la correspondance clientId/indexArray dans notre Map pour accéder plus rapidement aux données du tableau
					LockMutex(Mutex_OnlineClient_Map)
					If AddMapElement(OnlineClient_Map(), Str(ClientID))
						OnlineClient_Map()\ConnectionID 			= ClientID
						OnlineClient_Map()\SocketHandle 			= ConnectionID(ClientID)
						OnlineClient_Map()\indexArray 			= Index_Client
						OnlineClient_Map()\AdresseIp 				= IpClientString$
						OnlineClient_Map()\MutexSendDataClient = MutexSendDataClient
						OnlineClient_Map()\LastPingTimeStamp 	= ElapsedMilliseconds()
						OnlineClient_Map()\MutexClient 			= MutexClient
					EndIf
					UnlockMutex(Mutex_OnlineClient_Map)
					
					LockMutex(Compteur_Clients_Online_Mutex)
					Compteur_Clients_Online = Compteur_Clients_Online + 1 ; augmentation du compteur de poste en ligne
					If Pic_Max_Clients_Online < Compteur_Clients_Online
						Pic_Max_Clients_Online = Compteur_Clients_Online
					EndIf
					UnlockMutex(Compteur_Clients_Online_Mutex)

					LockMutex(MutexCloseServer)
					If Not CloseServer
						PostEvent(#Gui_Client_ConnectEvent, ClientID, IpClient)
						PostEvent(#StatusBar_Update_Event)
					EndIf
					UnlockMutex(MutexCloseServer)
					
				Else
					Debug "aucune case du tableau n'est libre !!!"
				EndIf
				
				
			Case #PB_NetworkEvent_Data ;--- #PB_NetworkEvent_Data
				
				ClientID = EventClient()
				
				Cumul_Taille_Paquets = 0
				Received_Packet = ""
				Paquet$ = ""
				erreur = 0
				
				*BufferReceive = ReceiveNetworkDataEx(ClientID, BuffLength, RcvTimeout, 0, @erreur)
				If *BufferReceive
					Received_Packet = PeekS(*BufferReceive, MemorySize(*BufferReceive), #PB_UTF8 | #PB_ByteLength)
					FreeMemory(*BufferReceive)
				EndIf
						
				If Received_Packet <> ""
					
					Index_Client_Array = -1
					
					LockMutex(Mutex_OnlineClient_Map)
					If FindMapElement(OnlineClient_Map(), Str(ClientID)) ; on recherche l'index du client dans notre tableau
						Index_Client_Array = OnlineClient_Map()\indexArray
					EndIf
					UnlockMutex(Mutex_OnlineClient_Map)

					If Index_Client_Array <> -1

						*sb = StringAppend(*sb, Received_Packet) ; ***
						If FindString(Received_Packet, #PacketEND)
							
							Infos_Completes$ = SBToString(*sb)
							
							; il peut y avoir plusieurs paquets collés les uns aux autres, par exemple : #PacketExecution + données + #PacketEND + #PacketExecution + données + #PacketEND
							; on compte le nombre de paquet potentiel qu'on a extrait du buffer
							nb_paquet = CountString(Infos_Completes$, #PacketEND) ;Received_Packet
							
							For count = 1 To nb_paquet
								Paquet$ = StringField(Infos_Completes$, count, #PacketEND) ;Received_Packet
								Paquet$ = Paquet$ + #PacketEND
								
								LockMutex(Mutex)
								For ind = 0 To 9
									; on cherche une case vide pour y stocker les infos des threads de traitement des messages
									; permettant ainsi de les fermer au besoin AVANT que les variables du client soient nettoyées à sa déconnexion
									If ArraySize(ClientArray_Online(Index_Client_Array)\ArrayThreadClient()) = -1	
										; le clearstructure desinitialise le tableau donc on le reinitialise
										Dim ClientArray_Online(Index_Client_Array)\ArrayThreadClient.ClientInfoThreadStruct(9)
									EndIf
									If ClientArray_Online(Index_Client_Array)\ArrayThreadClient(ind)\ThreadID = 0
										ClientArray_Online(Index_Client_Array)\ArrayThreadClient(ind)\ReceivedPacket = Paquet$
										ClientArray_Online(Index_Client_Array)\ArrayThreadClient(ind)\ThreadID = CreateThread(@ProcessRequest(), Index_Client_Array)
										ClientArray_Online(Index_Client_Array)\ArrayThreadClient(ind)\ThreadHandle = CallFunctionFast(*GetThreadId, ThreadID(ClientArray_Online(Index_Client_Array)\ArrayThreadClient(ind)\ThreadID))
										Break
									Else
; 										Debug "case " + ind + " non libre pour " + ClientArray_Online(Index_Client_Array)\AdresseIP
									EndIf
								Next
								UnlockMutex(Mutex)
								
								; cumul de la taille des paquets lisibles
								Cumul_Taille_Paquets = Cumul_Taille_Paquets + Len(Paquet$)
							Next
							
							; s'il reste quelque-chose après l'indicateur de fin, on le restocke dans notre variable Received_Packet
							If Len(Infos_Completes$) > Cumul_Taille_Paquets
								Infos_Completes$ = Mid(Infos_Completes$, Cumul_Taille_Paquets + 1)
								FreeMemory(*sb)
								*sb = #Null
								*sb = StringAppend(*sb, Infos_Completes$)
							Else ; sinon on la vide
								Infos_Completes$ = ""
								FreeMemory(*sb)
								*sb = #Null
							EndIf
						EndIf
						
					Else
						Debug "Index du client non retrouvé dans le tableau"
					EndIf
					
				EndIf
				
				
			Case #PB_NetworkEvent_Disconnect ;--- #PB_NetworkEvent_Disconnect
				ClientID = EventClient()
				CleanConnection(ClientID, "#PB_NetworkEvent_Disconnect")

		EndSelect
		
		LockMutex(MutexCloseServer)
		If CloseServer : Break : EndIf
		UnlockMutex(MutexCloseServer)
		
	ForEver
	
EndProcedure
Windows 11 Pro x64
PureBasic 6.30 x64
User avatar
STARGÅTE
Addict
Addict
Posts: 2308
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by STARGÅTE »

Usually, ReceiveNetworkData() only returns -1, when you call this function without checking if there are data.

When NetworkClientEvent() or NetworkServerEvent() return a #PB_NetworkEvent_Data event, the data are already received and ReceiveNetworkData() just copies the data from the internal network buffer to the specified buffer.

How do you handle the package limit of 65536 for TCP transmissions?
How do you separate two packages that are received during one NetworkClientEvent() call?
How do you concat packages that are splitted during transmission in to two #PB_NetworkEvent_Data events?

Edit: What is ReceiveNetworkDataEx()??
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

ReceiveNetworkDataEx() by idle : viewtopic.php?t=86576

Normally I don't have packet > 65536 bytes.
How do you separate two packages that are received during one NetworkClientEvent() call
I don't understand, if they are 2 packets during one call, they are stocked in the network buffer ?
I use prefix and suffix to determinate the packet. you can see a #PacketEND in the code.
And I would say that packets management has nothing to do with the problem. I could remove all the code after the ReceivedNetworkData() and the problem would still be there.
Windows 11 Pro x64
PureBasic 6.30 x64
User avatar
mk-soft
Always Here
Always Here
Posts: 6595
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by mk-soft »

ReceiveNetworkData only retrieves the data from the receive buffer. According to the TCP/IP protocol, the data is entered into the receive buffer in the correct order.
You must separate the data and determine its length yourself. (ISO layer model layers 5 to 7 are your responsibility.)
My Projects EventDesigner V3 / ThreadToGUI / OOP-BaseClass / Windows: Module ActiveScript
PB v3.30 / v5.75 - OS Mac Mini - VM Window Pro / Linux Ubuntu
Downloads on my OneDrive
User avatar
Michael Vogel
Addict
Addict
Posts: 2860
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by Michael Vogel »

What about using a network analyzer on the server side? Could be a lot of traffic but would show if it could be a window problem (network card, IP stack, etc.) or purebasic (including your code).
User avatar
STARGÅTE
Addict
Addict
Posts: 2308
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by STARGÅTE »

tatanas wrote: Tue Feb 03, 2026 10:30 am ReceiveNetworkDataEx() by idle : viewtopic.php?t=86576
Ok, so the returned value (-1) is from this procedure or within this procedure? Then you should contact idle directly.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Thorium
Addict
Addict
Posts: 1314
Joined: Sat Aug 15, 2009 6:59 pm

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by Thorium »

I think windows drops the connection to enforce the 20 concurrent TCP connection limit.
I believe only the server version of windows does not have a limit.

Funny you want to switch to windows 11, whats the reason behind the switch?

I am a old Windows only user also use Windows servers but 11 made me reconsider and it seems i am pretty much forced to switch to Linux for how bad windows has become.
User avatar
HeX0R
Addict
Addict
Posts: 1254
Joined: Mon Sep 20, 2004 7:12 am
Location: Hell

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by HeX0R »

I don't think it has anything to do with that.
It's more a socket being in blocking mode for a while.
But idles ReceiveNetworkDataEx() should debug the reason for this, what happened to all the Debug commands in the NetworkErrorContinue() procedure?
Did you remove them? Or did you never run your server with debugger on?
And ReceiveNetworkDataEx() also returns more detailed reasons for a -1 result (look at *error.Integer=0)
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

I ran a few tests by removing parts of the code to simplify things as much as possible.
I eventually noticed that after creating the “ProcessRequest()” thread, I hadn’t added any delay. From experience, I know that creating threads too quickly can cause issues. So I added a delay(5), and since then I haven’t had any problems with ReceiveNetworkData() during the 5 hours the server has been running.
I’ll confirm tomorrow.
Windows 11 Pro x64
PureBasic 6.30 x64
User avatar
mk-soft
Always Here
Always Here
Posts: 6595
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by mk-soft »

You should collect the data in the thread from the server and then pass the data to a thread for processing.
Thus, you avoid conflicts with ReceiveNetworkData from different threads.
My Projects EventDesigner V3 / ThreadToGUI / OOP-BaseClass / Windows: Module ActiveScript
PB v3.30 / v5.75 - OS Mac Mini - VM Window Pro / Linux Ubuntu
Downloads on my OneDrive
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

Unfortunately, the bug is still present. The problem comes from somewhere else...

ReceiveNetworkData() is only used in the main thread, I don’t have any concurrent access.
And ReceiveNetworkDataEx() also returns more detailed reasons for a -1 result (look at *error.Integer=0)
*error returns #PB_Network_Error_Fatal so I put the Debug in the ReceiveNetworkDataEx() procedure :

Code: Select all

Procedure ReceiveNetworkDataEx(clientId,len,timeout=15000,mutex=0,*error.Integer=0) 
   
   Protected result,recived,recvTimeout
   
   If len > 0 
      Protected *buffer = AllocateMemory(len)
      If *buffer 
         
         recvTimeout=ElapsedMilliseconds()+timeout   
         
         Repeat
            If result > 0
               *buffer = ReAllocateMemory(*buffer, recived + len) 
            EndIf 
            If *buffer 
               If mutex 
                  Repeat 
                     If TryLockMutex(mutex)
                        Result = ReceiveNetworkData(clientId,*buffer+recived, len) 
                        If result < 0 
                           If NetworkErrorContinue(clientId,result) 
                              Delay(1)
                           Else 
                              UnlockMutex(mutex)
                              FreeMemory(*buffer)
                              If *error 
                                 *error\i = #PB_Network_Error_Fatal
                              EndIf   
                              ProcedureReturn 0
                           EndIf 
                        EndIf   
                        UnlockMutex(mutex) 
                        Break 
                     Else 
                        Delay(10)
                     EndIf   
                  Until  ElapsedMilliseconds() > recvTimeout  
               Else       
                  Result = ReceiveNetworkData(clientId,*buffer+recived, len)
                  If result < 0 
                     If NetworkErrorContinue(clientId,result) 
                        Delay(1)
                        Continue 
                     Else 
                        Debug ">>>>>>>>>>>>>>> ReceiveNetworkData returns :" + Str(Result) + "<<<<<<<<<<<<<<<<<<<<<"
                        FreeMemory(*buffer)
                        If *error 
                           *error\i = #PB_Network_Error_Fatal
                        EndIf   
                        ProcedureReturn 0
                     EndIf 
                  EndIf   
               EndIf   
               
               If result > 0 
                  recived+result  
                  recvTimeout = ElapsedMilliseconds() + timeout
               ElseIf result = 0 
                  FreeMemory(*buffer)
                  If *error 
                     *error\i = #PB_Network_Error_Dropped 
                  EndIf   
                  ProcedureReturn 0
               EndIf   
            Else 
               If *error 
                  *error\i = #PB_Network_Error_Memory 
               EndIf   
               ProcedureReturn 0
            EndIf   
            
            If ElapsedMilliseconds() > recvTimeout    
               FreeMemory(*buffer)
               If *error 
                  *error\i = #PB_Network_Error_timeout 
               EndIf   
               ProcedureReturn 0
            EndIf 
            
         Until result <> len   
         
         ProcedureReturn *buffer
         
      EndIf 
   EndIf 
   
EndProcedure   
Windows 11 Pro x64
PureBasic 6.30 x64
User avatar
Michael Vogel
Addict
Addict
Posts: 2860
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by Michael Vogel »

Still recommend to use an network analyzer to see which traffic is seen on the server side. If you are not possible to install an analyzer on the server (what is understandable) you may be able to configure a monitor port on the switch where the server is connected to. Would do some filtering to reduce the amount of packets and memory (IP addresses, cut frame size to 64 bytes, TCP protocol, TCP ports), maybe adding the ICMP protocol to get more information for high traffic (source quench), etc.

What about simulating 100 or more clients on the server (or better one which is not in the production network) directly to see if the problem can be reproduced easily.
tatanas
Enthusiast
Enthusiast
Posts: 275
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: TCP Server (TLS) ReceiveNetworkData issue

Post by tatanas »

This time, I think I’ve found where the problem comes from. I’m using a heartbeat system to detect ghost connections. Clients regularly send a “PING” packet to let the server know they’re still alive. If the delay between two pings is exceeded, the server forces the client to disconnect. Here is the procedure I’m using:

Code: Select all

Procedure CheckClientTimeOut(*val)
	Protected NewList ClientToClose.OnlineClientStruct()
	Protected NewMap Copy_OnlineClient_Map.OnlineClientStruct()
	Protected start.q, Difference.q
	
	Repeat
		
		start = ElapsedMilliseconds()
		Repeat
			LockMutex(MutexCloseServer)
			If CloseServer : Break 2 : EndIf
			UnlockMutex(MutexCloseServer)
			Delay(100)
		Until (ElapsedMilliseconds() - start) / 1000 >= #CheckFrequency
		
		LockMutex(Mutex_OnlineClient_Map)
		CopyMap(OnlineClient_Map(), Copy_OnlineClient_Map())
		UnlockMutex(Mutex_OnlineClient_Map)	
		Debug "[CheckClientTimeOut()] - Verification"
		
		ForEach Copy_OnlineClient_Map()
			Difference = (ElapsedMilliseconds() - Copy_OnlineClient_Map()\LastPingTimeStamp) / 1000
				
			If Difference > #PingTimeOut	
				AddElement(ClientToClose())
				ClientToClose()\ConnectionID = Copy_OnlineClient_Map()\ConnectionID
				ClientToClose()\SocketHandle = Copy_OnlineClient_Map()\SocketHandle
				ClientToClose()\AdresseIp	  = Copy_OnlineClient_Map()\AdresseIp
				ClientToClose()\MutexClient  = Copy_OnlineClient_Map()\MutexClient
			EndIf
		Next

		If ListSize(ClientToClose()) >= 1
			ForEach ClientToClose()
				If closesocket_(ClientToClose()\SocketHandle) = 0 ; deconnexion forcée
					Debug "[CheckClientTimeOut()] - Force CloseSocket_() : " + ClientToClose()\AdresseIp
				Else
					Debug "[CheckClientTimeOut()] - Socket déjà fermé : " + ClientToClose()\ConnectionID
				EndIf
				CleanConnection(ClientToClose()\ConnectionID, "CheckClientTimeOut")
			Next
		EndIf
		
		ClearList(ClientToClose())
		
	ForEver	
EndProcedure
To avoid a crash caused by an invalid “Connection” parameter in CloseNetworkConnection(), I preferred to use the Windows API function closesocket_(). I assumed CloseNetworkConnection() was just a wrapper around closesocket_(). But it turns out that since I replaced closesocket_() with CloseNetworkConnection() in this procedure, I haven’t observed any more issues with ReceiveNetworkData().
I’ll keep monitoring it…

You’re right, Michael — a network analyzer on the server would be a good idea if the problem comes back. I imagine Wireshark would do the job.


EDIT : The server has been running for 24 hours without issue. Seems the problem was triggered by closesocket_()
Windows 11 Pro x64
PureBasic 6.30 x64
Post Reply