Sector 7G's Learning Center

Articles • Tips • Tutorials

Understanding Network Latency (and why it matters)

Jun 16, 2021 | Greg Butler

Audience & Level: Technical & Intermediate

Intro
The cost of Internet network latency may not be apparent until an app is live or updated and users farther from the deployment incur response time penalties that closer users do not.

What is “Network Latency”?
Electricity, light, and radio waves move as fast as anything can travel on the planet. But life isn’t always just “Layer 1”—in fact, it seldom is.

For example, when a browser sends/receives, while in motion the data travels at (near) light speed. However, interim “pitstops” such as routing, filtering/inspection, and other factors delay actual drive time. Of more recent relevance is VPN to meet the demands of flex/hybrid work and online privacy and VPN’s impact on latency is noteworthy.

In addition, as the distance between points A and B increases, the number of pitstops also increases and therefore why users 50 miles from the app enjoy better response time than those 3000 miles away. Latency also impacts service communication, such as invoking a third party API and certainly attempting to sync data stores where single-digit milliseconds matter.

Network Bandwidth is Different
If network latency is analogous to the number of stops a driver must make, bandwidth is analogous to the number of lanes on a road. When the road reaches capacity, traffic slows, but due to “saturation”, not latency (red lights and stop signs are analogous to latency). Both latency and bandwidth impact travel time, but differently.

Measuring Network Latency
Network latency is measured as the time a request/reply takes to round trip known as “round trip time” (RTT). Granted, some services are mostly “one way”, such as media streams, and if an occasional packet loss can be tolerated, UDP may be used in lieu of TCP, but HTTP services are obviously TCP and bidirectional.

A typical RTT for a small request (e.g., ping) over 100ms network latency looks conceptually like this:

  • 00.000: ping request sent from client
  • 00.050: ping request received by server
  • 00.050: ping reply sent from server
  • 00.100: ping reply received by client

50ms in each direction totaling 100ms RTT (side note: the same test over VPN ~75 miles away adds another ~30ms to the RTT).

The cost of network latency can accumulate quickly, perhaps adding a perceptible response time degradation for a given action. But, with scaled/distributed systems’ requirements and knowing physics always wins, architects have made strides in designing for latency. Examples include:

  • Connection management/pooling.
    Opening a connection necessarily results in a round trip before any actual HTTP requests are sent (not to mention several trips to establish TLS). Therefore, fewer new connections means fewer round trips.

  • Related, tuning keep-alives (KAs).
    The extent to which connections should remain open depends on context, but finding the “when to close” sweet spot is not terribly complicated once an application and its usage are profiled.

  • Reducing round trips in general
    The fewer round trips for a given action, the better, especially for serial requests (developers may want to consider reviewing in detail network statistic details in web debuggers and browser tools; we’ll reserve the heavyweight champ of analysis, Wireshark, for later…).

  • Payload
    While a single 1MB response incurs much less latency penalty than 100 separate 10K responses, paying attention to payload sizes is important since a connection will necessarily be in use longer for larger than smaller payloads and thus opening new connections for other workload may be required. Compression, JSON’s compactness, “lazy requests”, like just-in-time paging/scrolling and other techniques, certainly help.

  • XHR/”background requests”
    Network latency isn’t “reduced” but at least we’re not making the user wait (as long). Nevertheless, architects & developers should remain mindful of connections and payloads.

  • Proximity deployment
    Cloud provides the ability to deploy virtually anywhere and therefore address latency’s impact, whether for end users, API integration, or data replication.

  • Proximity routing.
    Related, and perhaps best illustrating designing for network latency, maintaining geographically distributed deployments and routing accordingly reduces latency (CDNs have been doing this for a while).

Test
Of course, what’s practical and attainable varies by organization and app specifics but testing under various latency delays may reveal perceptible response time differences as RTT increases. While browser dev tools may be capable of adding latency, it may be more practical to emulate latency at the server, especially for multi/virtual user performance testing. Plus, browser tools aren’t applicable for testing non-interactive functionality (e.g., external API or data replication). Fortunately, server latency emulation is simple. Dial in Latency with netem explains how.