An early draft of this blog post was inadvertently posted to Reddit; it was not appropriate for that site. However, since this occurred, we have taken the opportunity to revise the post for a wider audience and to clarify misunderstandings and correct errors from that draft version.
Joost van Dongen of Ronimo-Games blogged about how his team uses throttling to reduce network errors for the Awesomenauts online multiplayer game. See http://joostdevblog.blogspot.nl/2014/07/using-throttling-to-reduce-network.html
First, InterWorking Labs has a team of network protocol engineers and developers. We are not game developers! We would not presume to tell anyone how to design a game. It seems that we may have a “clash of contexts”, in that game developers and network protocol engineers start with different assumptions and contexts.
Imagine if you were to meet someone who told you that after observing stars at night and the seasons, he had decided that the sun, not the earth, was the center of the universe, and he was working hard on a mathematical proof to demonstrate this idea. You might then point out to him that this was not a new idea, in fact, he might want to take a look at the work of N. Copernicus. By reviewing Copernicus’ work, he could gain new insights, cover new ground, and re-use and re-purpose previously published work. While the mathematical proof for the sun at the center of the universe would be an impressive accomplishment, moving science forward to the next step and covering new ground would be better.
Our approach to reading Joost’s blog was to simply point out that many of the techniques described in the post have been earnestly debated, tried, tested, and vetted in organizations like the IETF in the 1980s and 1990s.
Why is that important? Most intractable network problems are found in the design phase (ref. Capers Jones). Thus, an idea for a new network protocol, to solve a specific problem faced by many other engineers can be put forward in a structured way in a protocol design document. The IETF is a standards body that provides the structure and mechanics for this kind of activity. The new idea can be critiqued, implemented, tested and refined over a period until it is ready for widespread adoption. This assures that the design is as well-thought-out as possible reducing errors or problems in implementation and interoperability. There are about 700 network protocols in the IETF (not all have been officially blessed as fully approved, but still enjoy widespread implementation) . Have the game developers reviewed them all to find if any are appropriate to their needs?
Why is it important to use a protocol or implementation originating in a standards body? To enjoy the benefits of “network effects” – literally and figuratively. If many people are using the same standard, then you get more bugs reported and fixed (on-going software maintenance), more how-to books, more training classes, active forums and message boards to share and solve problems, and in general more momentum for updates and improvements. These are many of the same benefits accrued from using well-designed open source software.
De facto standard
For example, there seems to be a de facto standard from Jenkins Software called RakNet – an open source, C++ networking engine for game developers. There’s both an open source and a commercial version. A de facto standard may or may not have the thorough vetting and community review that a de jure standard has. Still, it would be a good starting place.
The alternative – to do your own design and write your own code – is typically more risky and more expensive. But, if there is absolutely no existing, vetted, protocol that meets your needs, of course, you must create your own.
Scoping the Problem
The question should also be asked: if there is a problem, what is its scope, how much of it must be solved, and how does one determine the best solution?
The problems for a network game developer are primarily network drop, latency, jitter, and finite bandwidth. The developer needs to know what values users are likely to experience, the “envelope” of limits these values can take that still make the game playable, and a way to replicate the limits to compare proposed solutions.
Characterizing Network Performance
To understand what users are likely to experience, we’ve come across several “network characterization” studies, a few of which seem worth bringing to the attention of network game developers.
“Characterizing Residential Broadband Networks” – Dischinger et al.http://broadband.mpi-sws.org/residential/07_imc_bb.pdf
The above contains useful statistics on residential networks in the U.S. It is interesting to note that packet loss was measured at below 1% for more than 95% of DSL and cable path observed in that study.
Characterization of Wireless LANs is also important since so many mobile devices use such network links. Here is one typical study (it is interesting to note that packet loss on Wireless LANs is generally also less than 1%):
“TCP and UDP Performance over a Wireless LAN” – G. Xylomenos and G. Polyzos.
Interested to know whether you can run your game on a 3G or 4G mobile device? Well, the study cited below found packet loss on 3G and 4G networks to average (again!) under 1%. But if you attempt to use UDP to control re-transmission for games on such devices, you may find yourself thwarted because of buffered re-transmissions that you have no control over occurring on the physical/MAC-layer. As a result, jitter is quite large and large changes in throughput occur with time.
“Characterizing 4G and 3G Networks: Supporting Mobility with Multi-Path TCP” – Chen et al.
Once you have selected network characterization numbers you hope to achieve, the scientific method can be employed by plugging those numbers into a good network emulator and then running tests to compare protocols when they encounter the same network impairments. You would no longer have to rely on the opinions of “some guys on the Internet.”
You’d also need to use a network emulator to measure the actual impairment limits at which you have to declare to a user “Game over, man!” Or more accurately, that their connectivity doesn’t meet the minimums you’ve established for them to play your game with any satisfaction.
Some areas of interest from Joost’s blog post are described below.
Throttling, rate limitation, bandwidth limitation
“The basic idea of throttling is that if you detect that an internet connection cannot handle as much as you are sending, then you start sending less. This can happen on the side of the game, or on the side of the connection itself (by the modem or router, for example). If we keep sending more than the connection can handle, then either we lose a lot of packets or, even worse, the internet connection is lost altogether, causing a network error. Throttling is intended to keep this from happening.”
“Now that we know how to throttle we get to the much more important question: when to throttle. How can we know whether we need to throttle? It is not possible to just ask an internet connection how much bandwidth it can handle. Nor do you get notifications when the connection starts dropping packets because it is too much. We therefore can never know for sure whether throttling is needed and have to deduce this somehow”.(see footnote)
If your game application can use TCP as its underlying protocol, each side of the TCP connection will automatically detect a congested network situation and will start backing off. Each side will exercise its congestion avoidance algorithms. See RFC 5681.
If you use TCP, you do not need to throttle, you do not need notifications, the underlying network protocol will do all of this for you.
If you must use UDP but need to implement congestion avoidance, or congestion control, then consider using Datagram Congestion Control Protocol (DCCP) which is described in RFC 4340. Other alternatives include:
- UDT – UDP-based Data Transfer Protocol described at http://tools.ietf.org/html/draft-gg-udt-03
- Reliable UDP described at http://www.ietf.org/proceedings/44/I-D/draft-ietf-sigtran-reliable-udp-00.txt
- Stream Control Transmission Protocol (SCTP) described at https://tools.ietf.org/html/rfc4960
- Realtime Control Protocol (RTCP) and Realtime Protocol (RTP) described at https://www.ietf.org/rfc/rfc3550.txt
We present this list of protocols as suggestions to consider; the protocols may not be appropriate for a particular platform, application or gaming environment. However, there are many ideas embodied in these protocols worthy of consideration.
Many engineers responded to our draft post to tell us that it was ridiculous to use TCP for any type of online, multiuser, fast response time (RTT 30 ms) game. While online, multiuser video conferences are not online games, they do share similar requirements. Note that Webex uses TCP if it cannot get through using UDP. It appears, but we are not entirely certain, that gotomeeting.com uses TCP entirely for its real time communications. Likewise, if your game play mode is similar to that of World of Warcraft, like it, you may also be able to use TCP.
“Ping” is a utility that tests reachability. It is not a measurement; there is no “standard ping”.
“Our initial approach to this was to throttle if the ping was too high. The idea is that if a connection cannot handle the packets it needs to send, then latency will increase and we can detect this. This works fine for connections that normally have low ping: if the standard ping is 50ms and suddenly it rises to 300ms, then it is extremely likely that we are sending too much and need to throttle to keep the connection from being lost altogether.”
Detection of congestion by observing ping times generally does not work because of what may be incorrect assumptions about how networks react to congestion. Pinging for both the network engineer and the network game developer means sending back a uniquely tagged response message to a uniquely tagged request message so that the round-trip-time (RTT) can be estimated, along with detection of possible packet loss and packet reordering.
Unless you are flooding a network with such echo requests and responses, the RTT should not change much even when congestion is large. The reader may wish to perform controlled experiments with network emulators or background traffic to see what really happens. Packet loss due to buffer overflows (tail or random early detection drops,) rather than delay, should increase when congestion increases. TCP and DCCP use mechanisms that exponentially speed up (or quickly slow down; i.e. “back off”) the rate at which they transmit when they detect congestion due to lost acknowledgments.
You can learn the distribution of bandwidths observed for DSL, ADSL, and cable connection types (the most common connections used in residential broadband) by reviewing the ITU G.1050 report. In this report, you can learn the typical performance of each connection type.
Reliability, acknowledgments, packet loss
“Awesomenauts uses UDP packets and we have our own manual reliability system, since various parts of the game require various degrees of reliability. This means that we send and receive our own acknowledgements and thus know exactly how many packets are lost.”
Many existing protocols offer various forms of reiability, acknowledgments and packet loss. Is there something unique in this application that is not addressed by existing protocols?
Re-inventing congestion avoidance
“Ronimo programmer Maarten came up with a nice approach to solve this problem. If a player has high packet loss, then the game enables throttling and starts sending less. Then it measures the packet loss again. If packet loss decreased significantly, then we keep throttling. If packet loss remains roughly the same, then we stop throttling and start sending at maximum sending rate again.”
Maarten appears to have re-invented existing protocols, such as TCP Congestion Control and Avoidance Algorithms. See RFC 5681. Besides TCP, there are several other protocols that perform congestion detection or congestion avoidance.
Part of the problem is “bufferbloat”. Awesomenauts game players on residential broadband networks may encounter “bufferbloat”. Excess buffering of packets can cause high latency, packet delay variation (also known as jitter), and, ironically, can reduce overall network throughput – the opposite of the intent of those increasing buffer sizes in routers.
When a router device is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice, chat, and gaming.
The game developers have not mentioned “bufferbloat”, probably because they do not know about it and do not realize its impact. Unfortunately, this is an industry problem that is slowly being addressed.
With online gamers a fast growing market segment (New York Times 31Aug2014), there’s a definite possibility of a “tragedy of the commons.” The Internet is a shared resource. What will be the effect of multiple home-brewed protocols on other traffic? Could there be unintended consequences that are as yet unknown? One of the values of the standards setting process is to thoroughly vet these types of issues to keep the Internet a shared resource for all.
Footnote: Joost’s Dev Blog