To start off, most of these tools are opening many TCP connections at once to maximize the amount of data that can be downloaded at one time, other tools will only open a single connection, and some tools may not even use TCP at all opting instead to use UDP packets. Then some tools look for the maximum speed that can be obtained, but others look at the sustained throughput that can be used for long downloads. Differences in simply looking at the maximum verses sustained throughput are a touchy subject when you get people from the cable and phone companies in the same room. Cable companies like to use "power boost" for marketing higher speeds provided by DOCSIS 3.0 even though these speeds may not be available for large transfers while the DSL folks like to claim higher sustained bandwidth that is available to longer connections. As a consumer you should already realize that those cute little speedometers on the popular testing sites are an over simplification at best.
http://reviews.cnet.com/internet-speed-test/ |
How fast is highway 101?
When you are driving in your car and look down at your speedometers you are looking at your speed relative to the road you are driving on. Although anyone who has taken a college physics class will disagree most of us have a pretty good grasp of how fast our cars move, and the implications of the extremes of vehicular velocity. Now take what you know about the speed that you travel in your car and tell me how fast is the street in front of your house? Now tell me how fast is the closest highway? You might be able to tell me what the speed limit is, but that is something else all together. You start to understand the complexities when you realize that you can talk about:
- How fast the average car travels - measured over an hour, day, week, or year?
- How many cars travel down the road per given period of time?
- How fast could your car travel down the road?
What I am getting at here is that measuring Internet performance is not as simple as glancing down at your speedometer, and the people designing these "speed test" tools need to stop kidding people into thinking that it is. I would love to think that we could educate the Internet about these things, but given that I have been in school for a long time and devoted a large part of it to understanding Internet measurement it is probably not likely. It may be a bit crass, but George Carlin put it nicely when he said, "think of how stupid the average person is, and realize half of them are stupider than that." I am not calling the average Internet user stupid, but consider the following.
Network Nutrition
I am an engineer and I understand computers pretty well. I attended a top ranked graduate school, I have worked for companies like IBM, and Google. I also worked as a chef (really I was not just a line cook) for just shy of a decade, which is to say that I know food pretty well too. As I am getting older I am realizing that I cannot eat anything that I want to any more, and as a result I am forced to look at food nutrition labels more often these days. These labels are great, if you are a nutritionist!
http://en.wikipedia.org/wiki/Nutrition_facts_label |
They give me detailed metrics about the caloric performance of that food, and that should empower me to make good choices about what I eat. Laughing yet? You see even though those labels have lots of detailed information, I would need to be a nutritionist to know how those metrics apply to my diet, activity level, and dietary goals. I need help interpreting how this applies to me. I need an application that I can give basic information about my body and activities and it would tell me if I was eating the right things.
https://market.android.com/details?id=com.fooducate.nutritionapp |
The idea of having a standardized set of metrics is a necessary step, and would be helpful to networking experts (PDF). For consumers and non-experts we are going to need to go a step further. How do we help consumers interpret network performance metrics?
Score like in algebra class not soccer
Either you have kids in school or remember what it was like to be in school. For those of us that worked hard report card day was a wonderful day validating all the hours of studying, but the rest devoted a large portion of that day devising arguments about how unfair the grading system is. Imagine that you are looking at the report card and it says that little Johnny got 1,000,000 points! Maybe it breaks it down further and tells you that he got 800,000 points in physical education and 200,000 points in math. Hey a million points is pretty good right? What if you later found out that the top score was a billion points and that the average was somewhere around 100 million? To make sense of things we need perspective, and we think that it isn't any different for network connections. Just like most teachers make use of distributions to grade their classes we can make use of distributions to grade the quality of network connections. Moving on from the lazy days on the playground, we are thinking about the way that colleges compute a grade point average (GPA) for a student.
As a student you get grades for each class and then those classes are used in a weighted average to produce an approximation of how that student compared to her peers. Each class is a sample point for a particular subject and some are more important that others. The systems employed by most universities are not perfect, but they do pretty well. Instead of classes we will turn our attention to applications.
There are a lot of different applications that we all use on the Internet everyday, and as you have probably noticed some require a better network connection than others. For instance you can get by checking your email in situations where trying to watch videos online would be unbearable. Just like it does not make sense to have one test for the variety of classes one might take in college we need to consider the different classes of applications individually. For browsing web pages we might need to look at how fast the pages load (response times), but for streaming applications like video we need to look at how long the video takes to start playing (join time), how much time is spent waiting for more of the video to load (buffering ratio), and how often the player has to wait for more of the video to load (buffering rate). You can imagine that if we can measure these things from lots of people we might get a distribution that looks something like this.
Not to scale. Not even real data. Good range is approximate, and your results may vary. |
If this were a distribution of test results for a population (it is not - it is just fabricated) and your test result was 16, then it would mean that around 50% of users had worse results than you. Things to keep in mind are that the population is important; just like it would be unfair to compare scores from MIT and a community college it is not fair to compare the test results from New York to South Dakota. That is not to say that you shouldn't be able to, and shouldn't be aware of the difference, because people in South Dakota deserve fast Internet access too. We combine these metrics to get scores (0.0 - 1.0) for each application, and based on how important each application is to a user we can combine these per-application sub-scores to get a single score.
A lot of people balk at the idea of a single score that still means something, and this is a valid concern. Just like a nutritionist might balk at the over simplification of our nutrition smart phone app, people who understand the complexities of network measurement can be troubled by the idea of one number. For people who understand that different applications have different needs we may be able to provide a set of scores, and to those with a deep grasp of the material they can drill down to all the gory details. We also imagine that when comparing across populations that if you look at 50 states and even just 4 sub-scores each it is difficult to visualize.
How do we make this work?
At a very high level we need to collect large numbers of tests, and the tests need to come from a representative sample of Internet users. Once we are on our way to building a large pool of data we can use that data to provide analysis of a user's test results with a chosen population. The more people use the system the better the results will get. Over the next couple of weeks I will be posting on how we design our system to scale to large numbers of users, how we are getting a more representative sample than any previous tool, and how we will translate all that data into scores that are so simple that my grandparents could appreciate the difference between, cable, DSL, and a T1 line. In the next post I will talk about what flock's of fowls, feathers, eggs, and ostriches have to do with the platform.
My first attempt at vector graphics, and I am not sure if the ostrich is creepy to everyone or just me. |
In the interim I encourage you to try out the tool yourself by visiting net-score.org, and if you develop an kind of blog, website or cloud application check out info.net-score.org.