Tuesday, April 2, 2013

Designing tests that scale

It has been a long time since I have posted anything here, but I feel like sharing a bit of what we have been up to.  We have been working on improving the quality of our tests, and increasing what we can measure while reducing the amount of data that we need to transfer to do the measurements.  Part of this work has been developing a custom web server written in Go that uses a highly efficient packet capture mechanism (golibpcap).  All this is wrapped up in something we call the net-score diagnostic test server (nsdt-server).

Without the nsdt-server we are limited to only looking at timings exposed at the application level (HTTP) as this is all the browser gives us.  This data is great for looking at end-to-end performance that applications can expect, because it is performance at the application.  It encompases everything between the server and the client giving a complete measurement - how long did an object take to get from server to client.  Simple and useful.  But what it does not tell us is why?

A lot can happen in the 30 ms between when the server sends the data and when it actually is received by the client.  The browser tells us that an object has finished downloading only after it has arrived in full - every last bit.  Before that last bit arrives there could have been packets that were lost, sent out of order, malformed and retransmitted.  This can mean that instead of one round trip to the server to get our object we may have had to make many smaller requests to make up for mishaps along the way.  Lucky for us TCP takes care of all the messy work but knowing what lengths TCP had to go through to get a complete copy of the data can tell us a lot about possible inefficiencies in the connection.

To learn about how TCP works hard so that applications don't have to you can use tools that capture streams of packets like wireshark.  This is great if you want to see one connection (your own), but we needed something that was going to scale to global proportions.

Part of our nsdt-server is a RESTful interface for starting packet traces.  Without going into too much detail here our server can manage many traces at once and when a trace completes it sends the results to a central control server to be processed.  Even with a highly efficient server we are going to need hundreds of these spread all over the world so we also developed a decentralized load balancer to send clients to a server that has a light load.  This decentralized load balancing server also keeps track of what servers are online and allows new servers to join without any human interaction.

I keep using the word global because even in our very quiet beginning we have data coming in from six continents!  Which is great because we are trying to build a globally representative picture of broadband access, but we still need more data.  Here is an interactive sample of some of the data we have collected so far.  You can see that western Europe and the eastern United States have the best coverage, but with more reach we hope to illuminate some of the darker corners of the global broadband market (like our tests from Fiji and Bermuda).  Here is a bigger interactive view of the map below.

net-score test locations

The hope at this point is that you are interested in getting involved.  If you have a blog or website no matter how small or obscure (actually the more obscure the better) you can help by putting a small snippet of our code in your website/blog.  We have widgets that can be installed into Google sites and Blogger, and JavaScript that can be pasted into other sites.  We would love for you to install a visible version of the tool (like in the top right corner of this page), but if you want something low-key we have a version that runs completely in the background too.  Detailed instructions can be found here, and if they are not clear or you need help we are glad to help.

Cross posted on my personal blog.

Monday, January 30, 2012

What is a net-score?

In order to understand what net-score is we first need to talk about what net-score isn't.  There are many broadband speed tests out there of varying degrees of accuracy, but play around with them for a while and many users will come to the conclusion that one or all of these tools must be broken.  If you visit an Ookla powered testing site (like Speakeasy's speed test) you will see numbers that are quite high, but if you were to run a test on the CNET's Bandwidth Meter you might see numbers that are quite different.  Compare those numbers to the  speeds that you see transferring a file or doing any downloading and you might be disappointed.  CNET has made a recent change in the way that they are measuring bandwidth that brings their results closer to the ones that you would see via an Ookla test, but for the vast majority of Internet users these numbers don't always correlate with better performance.

To start off, most of these tools are opening many TCP connections at once to maximize the amount of data that can be downloaded at one time, other tools will only open a single connection, and some tools may not even use TCP at all opting instead to use UDP packets.  Then some tools look for the maximum speed that can be obtained, but others look at the sustained throughput that can be used for long downloads.  Differences in simply looking at the maximum verses sustained throughput are a touchy subject when you get people from the cable and phone companies in the same room.  Cable companies like to use "power boost" for marketing higher speeds provided by DOCSIS 3.0 even though these speeds may not be available for large transfers while the DSL folks like to claim higher sustained bandwidth that is available to longer connections.  As a consumer you should already realize that those cute little speedometers on the popular testing sites are an over simplification at best.

http://reviews.cnet.com/internet-speed-test/

How fast is highway 101?
When you are driving in your car and look down at your speedometers you are looking at your speed relative to the road you are driving on.  Although anyone who has taken a college physics class will disagree most of us have a pretty good grasp of how fast our cars move, and the implications of the extremes of vehicular velocity.  Now take what you know about the speed that you travel in your car and tell me how fast is the street in front of your house?  Now tell me how fast is the closest highway?  You might be able to tell me what the speed limit is, but that is something else all together.  You start to understand the complexities when you realize that you can talk about:

  • How fast the average car travels - measured over an hour, day, week, or year?
  • How many cars travel down the road per given period of time?
  • How fast could your car travel down the road?
What I am getting at here is that measuring Internet performance is not as simple as glancing down at your speedometer, and the people designing these "speed test" tools need to stop kidding people into thinking that it is.  I would love to think that we could educate the Internet about these things, but given that I have been in school for a long time and devoted a large part of it to understanding Internet measurement it is probably not likely.  It may be a bit crass, but George Carlin put it nicely when he said, "think of how stupid the average person is, and realize half of them are stupider than that."  I am not calling the average Internet user stupid, but consider the following.

Network Nutrition
I am an engineer and I understand computers pretty well.  I attended a top ranked graduate school, I have worked for companies like IBM, and Google.  I also worked as a chef (really I was not just a line cook) for just shy of a decade, which is to say that I know food pretty well too.  As I am getting older I am realizing that I cannot eat anything that I want to any more, and as a result I am forced to look at food nutrition labels more often these days.  These labels are great, if you are a nutritionist!  

http://en.wikipedia.org/wiki/Nutrition_facts_label

They give me detailed metrics about the caloric performance of that food, and that should empower me to make good choices about what I eat.  Laughing yet?  You see even though those labels have lots of detailed information, I would need to be a nutritionist to know how those metrics apply to my diet, activity level, and dietary goals.  I need help interpreting how this applies to me.  I need an application that I can give basic information about my body and activities and it would tell me if I was eating the right things.


https://market.android.com/details?id=com.fooducate.nutritionapp

The idea of having a standardized set of metrics is a necessary step, and would be helpful to networking experts (PDF).  For consumers and non-experts we are going to need to go a step further.  How do we help consumers interpret network performance metrics?

Score like in algebra class not soccer
Either you have kids in school or remember what it was like to be in school.  For those of us that worked hard report card day was a wonderful day validating all the hours of studying, but the rest devoted a large portion of that day devising arguments about how unfair the grading system is.  Imagine that you are looking at the report card and it says that little Johnny got 1,000,000 points!  Maybe it breaks it down further and tells you that he got 800,000 points in physical education and 200,000 points in math.  Hey a million points is pretty good right?  What if you later found out that the top score was a billion points and that the average was somewhere around 100 million?  To make sense of things we need perspective, and we think that it isn't any different for network connections.  Just like most teachers make use of distributions to grade their classes we can make use of distributions to grade the quality of network connections.  Moving on from the lazy days on the playground, we are thinking about the way that colleges compute a grade point average (GPA) for a student.

As a student you get grades for each class and then those classes are used in a weighted average to produce an approximation of how that student compared to her peers.  Each class is a sample point for a particular subject and some are more important that others.  The systems employed by most universities are not perfect, but they do pretty well.  Instead of classes we will turn our attention to applications.

There are a lot of different applications that we all use on the Internet everyday, and as you have probably noticed some require a better network connection than others.  For instance you can get by checking your email in situations where trying to watch videos online would be unbearable.  Just like it does not make sense to have one test for the variety of classes one might take in college we need to consider the different classes of applications individually.  For browsing web pages we might need to look at how fast the pages load (response times), but for streaming applications like video we need to look at how long the video takes to start playing (join time), how much time is spent waiting for more of the video to load (buffering ratio), and how often the player has to wait for more of the video to load (buffering rate).  You can imagine that if we can measure these things from lots of people we might get a distribution that looks something like this.

Not to scale.  Not even real data.
Good range is approximate, and your
results may vary.  

If this were a distribution of test results for a population (it is not - it is just fabricated) and your test result was 16, then it would mean that around 50% of users had worse results than you.  Things to keep in mind are that the population is important; just like it would be unfair to compare scores from MIT and a community college it is not fair to compare the test results from New York to South Dakota.  That is not to say that you shouldn't be able to, and shouldn't be aware of the difference, because people in South Dakota deserve fast Internet access too.  We combine these metrics to get scores (0.0 - 1.0) for each application, and based on how important each application is to a user we can combine these per-application sub-scores to get a single score.

A lot of people balk at the idea of a single score that still means something, and this is a valid concern.  Just like a nutritionist might balk at the over simplification of our nutrition smart phone app, people who understand the complexities of network measurement can be troubled by the idea of one number.  For people who understand that different applications have different needs we may be able to provide a set of scores, and to those with a deep grasp of the material they can drill down to all the gory details.  We also imagine that when comparing across populations that if you look at 50 states and even just 4 sub-scores each it is difficult to visualize.

How do we make this work?
At a very high level we need to collect large numbers of tests, and the tests need to come from a representative sample of Internet users.  Once we are on our way to building a large pool of data we can use that data to provide analysis of a user's test results with a chosen population.  The more people use the system the better the results will get.  Over the next couple of weeks I will be posting on how we design our system to scale to large numbers of users, how we are getting a more representative sample than any previous tool, and how we will translate all that data into scores that are so simple that my grandparents could appreciate the difference between, cable, DSL, and a T1 line.  In the next post I will talk about what flock's of fowls, feathers, eggs, and ostriches have to do with the platform.

My first attempt at vector graphics, and I am not sure
if the ostrich is creepy to everyone or just me.

In the interim I encourage you to try out the tool yourself by visiting net-score.org, and if you develop an kind of blog, website or cloud application check out info.net-score.org.

Tuesday, January 24, 2012

net-score - ranking the world’s broadband connections

There is a huge gap between way we describe the performance characteristics of a network connection and how most people understand network performance.  We propose that given a user's network application usage patterns we can score network connections based on the average user experience for all of those applications -- resulting in a ranked list where the top network connection maximizes the expected user experience.  In an effort to rank network connections in a way that is easily accessible to the average user we have learned a few things about network measurement and designing scalable testing platforms.  The work being discussed is in an early stage, but there is a lot of interest in this project.  The initial progress is confidence inspiring and we have a lot to be optimistic about.


If we can help consumers make informed choices about their ISP,
will it result better connectivity for all of us?


To learn more about the project please have a look at net-score.org, and for more detailed information including ways to get involved see info.net-score.org.