Since I’m going to be writing a whole bunch of articles about CDN speeds, I wanna consolidate my testing methodology into a single article. I’m going to be linking back to this article a lot.
Questions to be answered in this article:
- What numbers are measured for CDN download speed? (it is not just one number)
- Why were those numbers selected?
- How are the numbers measured?
- Where is the measurement done? Cloud server or home computer?
- 1 The three important measurements to make
- 2 How to measure the 3 measurements: my open sourced Python script (check github)
- 3 Why I chose not to test on a normal web server like DigitalOcean
- 4 Why the CDN speed tests need to be done almost at the same time
- 5 Summary (TL;DR)
- 6 Related Essays
The three important measurements to make
At the end of the day, the CDN is really just a caching layer. All it is, is just a massively distributed caching layer. That’s it. That’s all it is.
And if you’re familiar at all with how caching layers work, there are 3 all-important metrics to measure:
- Cold cache latency — item is not in cache at all
- Hot cache latency — item is in fastest response layer of cache
- Warm cache latency — item is at a slower response layer of cache.
To illustrate this in hardware terms: server responding from RAM is the fastest possible latency. That’s hot cache. Getting it from disk (SSD) is the second fastest possible latency. And you can add more layers (HDD). Finally, cold cache is if the server doesn’t have the item at all.
With regards to a CDN, how do we measure hot, cold, and warm cache?
Cold cache latency: is simple, you just fetch a file that the CDN has never seen before. This is going to take the longest time. On CloudFront for example, I’m seeing a drop from 1.2 seconds to 0.232 seconds in the hot cache.
Hot cache latency: also simple, you can fetch the same file 10 times in a row and take the average.
Warm cache latency: trickier. Much trickier. We must wait some amount of time to wait for the file to leave the hot cache. For now we are sticking with 30 minutes but we may adjust this as we learn how CDNs are moving items from hot cache to warm cache. The result of the warm cache is also much more ambiguous as it must be fetched at some time away from the cold and hot cache tests.
You can see the cold cache is much slower than hot cache by running this curl: `curl -w “@curl-format.txt” -o tmp -s “https://d3va53q3li7xt1.cloudfront.net/wp-content/uploads/2021/05/shoeb-1024×576.png”`. This example uses CloudFront. There is a massive jump from 1+ seconds to 0.1-0.2 seconds.
How to measure the 3 measurements: my open sourced Python script (check github)
I used a python script to automate the measurements. This is important so that the measurements are very consistent between CDNs. Also important to remove human error.
It is open sourced here: https://github.com/speedtestdemon/speed-tests/blob/master/test.py
There is a massive drop in CDN download time from cold cache to hot cache. CloudFront gave me 1.09 seconds for cold cache, then 0.112 seconds for hot cache, then 0.226 for warm cache.
Here is the full python output. The cool thing is that it prints out the headers it got for the cold cache and for warm cache. The headers are used to sanity check that the test is doing the correct thing regarding cold cache and warm cache test.
------------------------------------------------------------- Testing "Cold cache speed" ------------------------------------------------------------- Got headers: HTTP/2 200 content-type: image/png content-length: 719983 date: Sun, 20 Jun 2021 21:47:06 GMT last-modified: Mon, 07 Jun 2021 00:16:21 GMT etag: "52ae2ff2354d4a68e680b77b4da58985" accept-ranges: bytes server: AmazonS3 x-cache: Miss from cloudfront via: 1.1 c39432c353feb02b03735f3850e19107.cloudfront.net (CloudFront) x-amz-cf-pop: IAH50-C1 x-amz-cf-id: NgkCqcrwb3K65LeGu7uhebFNODrNI9s8wVeHZ93lq2XKrE3q9PMm-A== time_namelookup: 0.06327399999999999691 time_connect: 0.01762099999999999778 time_appconnect: 0.05910300000000001663 time_pretransfer: 0.00010199999999999099 time_redirect: 0.00000000000000000000 time_starttransfer: 0.95064099999999995827 time to download: 0.00008300000000005525 time_total: 1.09082400000000001583 ------------------------------------------------------------- Testing "Hot cache speed" ------------------------------------------------------------- 10 requests done. Average: time_namelookup: 0.00190460000000000000 time_connect: 0.02373660000000000006 time_appconnect: 0.06306589999999999419 time_pretransfer: 0.00019590000000000024 time_redirect: 0.00000000000000000000 time_starttransfer: 0.02317680000000000434 time to download: 0.00007860000000000089 time_total: 0.11215840000000001919 ------------------------------------------------------------- Testing "Warm cache speed" ------------------------------------------------------------- Sleeping for 0.5 hr to move cache from hot to warm Got headers: HTTP/2 200 content-type: image/png content-length: 719983 last-modified: Mon, 07 Jun 2021 00:16:21 GMT accept-ranges: bytes server: AmazonS3 date: Mon, 21 Jun 2021 00:09:46 GMT etag: "52ae2ff2354d4a68e680b77b4da58985" x-cache: Hit from cloudfront via: 1.1 9b59bfec44582f64d3d8dac9fb7d27b7.cloudfront.net (CloudFront) x-amz-cf-pop: DFW50-C1 x-amz-cf-id: hzwVoHfaHen2TR3cCNRsnwniXMc3_BaOWk7oa2DiQaWkioXqwSGRrg== time_namelookup: 0.11359600000000000253 time_connect: 0.01640300000000000091 time_appconnect: 0.07292699999999999183 time_pretransfer: 0.00030900000000000372 time_redirect: 0.00000000000000000000 time_starttransfer: 0.02220300000000000051 time to download: 0.00011099999999999999 time_total: 0.22554899999999999949
Please note these metrics like
time_namelookup do not correspond to the same meaning curl’s
time_namelookup. ‘Curl’ shows a cumulative time measure, so it is always increasing. However I want to look at the time each stage took separately from each other, so curl’s cumulative increasing timestamp was not helpful to me.
If you’re curious about what each of those metrics mean, such as
time_appconnect (which is a complete misnomer and doesn’t make sense), read this writeup that explains each curl metric.
Common pitfalls of speed testing CDNs
These are some mistakes I made while speed testing CDNs. These are easy mistakes to make so I decided to write about it.
Mistake #1: cold cache tests should be done where the CDN does not have the file cached.
Why it’s an easy mistake to make: most people think files are cached for only 24 hours. That’s not necessarily true. Some CDNs like Jetpack CDN evidently cache it for more than 1 day (based on my tests). Additionally, most people’s curl calls do not include the headers which have the warning sign that the file could be cached.
How to fix: check the headers returned. You should see the keyword “MISS” somewhere in there. If you see a “HIT”, that’s a warning sign. This is also why the python script prints out the headers for cold cache test and warm cache test.
Mistake #2: you need to actually download the file.
Why it’s an easy mistake to make: for some reason, it is extremely easy to accidentally make curl skip the download of the actual file. There are at least 3 ways of doing this.
- If you set the “NOBODY” option to curl via pycurl, it does not download the file. I made this mistake in my python script.
- If you set the “-O /dev/null” flag via the curl in the command line, it does not download the file. You think it would just download the file and dump it to /dev/null. No. curl is “smarter” than that and just skips the download altogether.
- If you set the “-I” flag via curl in the command line, it does not download the file.
Why this is an important mistake to avoid: CloudFront does not cache the file if you do not download the file! And since CloudFront is an industry-standard CDN, it is likely other CDNs have the same behavior.
How you can detect this mistake: look at the download time (time_total – time_starttransfer). If it is in the hundreds of microseconds, that’s a problem. The speed of light is only 0.184 mile/microsecond. If you got only 200 microseconds, then that is 37 miles…roundtrip. Or 18 miles one way. It is highly unlikely there’s a datacenter that close to where you are. It means that there was no network transfer.
How to fix: in pycurl, write the downloaded contents to a BytesIO or StringsIO variable. To see how it’s done, see my python CDN speed test script. Or if you’re using curl from command line, avoid “-O /dev/null” and “-I” flag and make sure download time is at least several milliseconds.
Mistake #3: The input URL is invalid.
Why it’s an easy mistake to make: your curl speed test doesn’t report any 404 error. So you get back the speed test results without realizing there’s an error.
How to catch mistake: one red flag you wanna look at is the “download” time. If it’s less than 1 millisecond, that’s way too fast. Do the math. Speed of light is .184 per microsecond. There are 1000 microseconds in a millisecond. If you’re getting, lets say 200 microseconds, that’s only 36 miles….roundtrip. One way, it’s only 18 miles. So 200 microseconds is way too fast, and indicates there’s probably no network transfer (i.e. no download) happening at all.
Of course, the other thing you could do is actually download the file…use wget. You’ll see an error message pretty easily.
How to fix the mistake: use the right URL.
Mistake #4: The input URL is HTTP not HTTPS.
Why it’s an easy mistake to make: people are careless or perhaps not technical enough. This is actually an important thing to ensure as I’m noticing the SSL exchange normally takes 50 milliseconds. Given a fast curl time is only around 120 milliseconds (from a home computer not cloud provider).
How to fix the mistake: be consistent in your testing. Use HTTPS for all or HTTP for all. And you should probably use HTTPS since that’s the standard (everybody uses HTTPS).
Why I chose not to test on a normal web server like DigitalOcean
Turns out it’s not a good idea to use DigitalOcean to run these CDN speed tests.
I initially thought it would be a good idea to run these CDN speed tests on a standardized server location like DigitalOcean. Since I travel a lot, I can’t guarantee my internet is always going to be the same quality, or the location could be drastically different, such as different countries.
So that’s what I did. I tried it out…and noticed DigitalOcean’s internet speed is fast — REALLY fast. Like, way faster than my home internet speed. I was a little shocked. I didn’t think download speeds could ever get that fast. Makes me wonder what I need to do to make my home internet as good as that, because I really thought I had the best possible internet that money could pay for (I basically use my computer 100 hours a week, OK. It’s important to me).
If you wanna see an example of the drastic difference in CDN speed…well here’s one. It’s using the CloudFront example that we’re using in the “Free CDN” blog series. When I ran my speed test python script with: `python3 test.py https://d3va53q3li7xt1.cloudfront.net/wp-content/uploads/2021/05/shoeb-1024×576.png`, I got back a download time of a blazing fast 0.0225 seconds, or 23 milliseconds. Holy cow. And then doing the same thing on my home internet, I only get 0.2315 seconds, or 232 milliseconds. What the hell.
Out of curiosity, I ran an internet speed test on the DigitalOcean server. Like how the hell is it that much faster. That’s gotta get anybody’s interest up. And would you believe the numbers???
# speedtest-cli Retrieving speedtest.net configuration... Testing from DigitalOcean (126.96.36.199)... Retrieving speedtest.net server list... Selecting best server based on ping... Hosted by NewMedia Express (Singapore) [13.17 km]: 2.16 ms Testing download speed................................................................................ Download: 1740.52 Mbit/s Testing upload speed...................................................................................................... Upload: 1420.29 Mbit/s
LOL holy crap Download speeds of 1740.52 Mbit/s and Upload speeds of 1420.29 Mbit/s? How much money did DigitalOcean pay for that? I wanna buy it.
That is literally 10x as fast as my internet, which tops out at 160 Mbit/s (using fast.com).
So there are several reasons why I decided not to run these speed tests on DigitalOcean after all:
- It’s hard to tell the difference between good or bad CDNs.
- It’s not a real-world usage of how CDNs actually get used. Almost nobody is going to have internet speeds that fast, where it’s nearly 10x faster than the most premium internet you can get.
BTW: relevant article that also talks about the distorted internet speeds of cloud providers: how accurate are CDNPerf’s numbers? Answer: not very accurate. TODO: I will link to this essay (rant) once I’ve written it!
Why the CDN speed tests need to be done almost at the same time
This is probably obvious to most internet users but because I’m trying to measure CDNs as methodically as possible, I might as well explain it briefly.
Internet speeds usually vary greatly depending on the time of the day. For example, during the evenings, there is typically massive internet congestion as people get off work and get leisure time. Video bandwidth is particularly notorious for hogging a lot of bandwidth. For example, did you know Netflix alone hogs 40% of the internet traffic during the evenings? Google it. And that’s just Netflix. Imagine if you added Youtube too. The evenings just get really congested with internet traffic.
And since we are focused on CDNs as the variable, we should make internet conditions an invariant. This means running the CDN speed test for 2 CDNs at basically the same time (let’s say within 1 minute is fine).
- I measure three CDN download speeds: cold, hot, and warm cache.
- The measurement is automated with a python script (it’s open sourced, see it here: https://github.com/speedtestdemon/speed-tests/blob/master/test.py)
- CDN speed measurements on 2 or more CDNs are to be conducted within the space of a few minutes.
- CDN speed measurement is conducted on a home computer with home internet, not on the cloud where internet speeds are 10x home internet speeds (easily).
Questions? You can ask me on social media https://twitter.com/SpeedTestDemon
Is a premium CDN like CloudFront worth it, instead of a free CDN?
Cheat Sheet on Curl Performance Metrics: how to benchmark server latency with curl
Jetpack CDN vs Amazon Web Services CloudFront: Jetpack surprisingly faster!
Cloudinary vs CloudFront: Cloudinary crushes Amazon Web Services!
Python CDN Speed Test Script: I use this to automate the collection of metrics about CDNs.