You’ve probably already heard of algorithm analysis. Normally, within this topic students tend to study things like order of growth, Big Oh notation, Theta notation, asymptotic growth, worst case analysis and so forth. This article is the first part of a two-part series that will cover all of these topics incrementally. At the end of it, you’ll have a deep understanding of why we analyze algorithms the way we currently do and where do these different notations come into play.
This first part is dedicated to the construction of our analysis model. We’ll start by asking ourselves how exactly should we compare algorithms. Then, we will define our parameters and learn how to calculate them. Lastly, we will finetune our model.
We use algorithms and data structures in order to create efficient solutions for problems that we face. It’s obvious that there are better ways to do things. So, when facing a problem, there are algorithms that are better than others. But how exactly do we tell that one algorithm is better, or even if it is the best? And even if we have this answer, can we generalize it? Can we create a method that we’ll be able to use in order to compare all algorithms out there?
We start with the idea that there are algorithms that are better than others for certain situations. This is simple and intuitive. When searching for your keys to go to work, you don’t look in places you haven’t been. It’s more efficient to visit places you know you visited. But how does this translate to the context of machines and algorithms? How can we tell that an algorithm is better than another one?
Each situation has its own unique characteristics and requirements. An algorithm that is good by a set of standards might not be as good when analyzed through another one. Still, our goal here is to develop a method that will work with most algorithms.
What we can do is abstract the many different factors that might go into considering an algorithm good or not and stick to a few that seem relevant enough to any algorithm. Those factors are time and memory. This makes a lot of sense, since solutions that are faster or that consume more memory translate to less cost. From here on out, we’ll focus on analyzing the running time.
This was our first step. We have parameters that can help us in comparing two algorithms. But this isn’t enough. We don’t know how to determine these parameters. What factors determine the running time of an algorithm? How can we calculate it? When saying that one car is faster than another, we cap specify a multitude of things that determine that fact. Maybe the engine is more modern, maybe its design is more aerodynamic, or it might have a larger cylinder capacity. These are concrete reasons that make a car faster. And with that, we gain deeper insight on the problem. But what exactly are those reasons for algorithms?
In the real world, there are, again, many factors that may come into play. In a system, the hardware, software, operating system and many more could give their own contribution to the running time of an algorithm (or memory consumption). Just like when we defined our parameters, we need to abstract this multitude of things and focus on what really matters, which is the size of the input data. This also makes a lot of sense, since a problem’s size and complexity grows as its input grows. If we’re designing an algorithm that performs a search on an array, it will take a lot more to time to find a result within an array with 10 million elements than it would within another one with only 10.
We have taken our second step. So far, our model analyzes algorithms by their running time, which depends on the size of the input. We can say that the running time T is a function of n. Therefore, T(n) is an algorithm’s running time for an input of size n.
Great. But given a particular algorithm, how can we calculate its running time?
An immediate response could be that we can just run the algorithm for a number of different input sizes while timing it. Then, we can plot the pairs of values and analyze how the algorithm grows. This would be a valid method. Nevertheless, this can get really tiresome and it’s not broad enough. When running this experiment, we are still very dependent on the specifications of the system in charge. We fall back into the same problem before our second step.
What we need is a mathematical method that will enable us to analyze algorithms without having to relentlessly test them. Is such a model possible?
Return to the roots
In the 20th century, computer scientists were not sure if such accurate models were possible. Then, legendary computer scientist Donald Knuth — a.k.a. The Father of Algorithm Analysis — stated that they are indeed possible. By describing algorithms as a series of instructions and operations, we can assign each of these an individual contribution to the running time (cost), and then count the amount of time they occur (frequency). By multiplying both, we get the cost of a certain operation. By summing all costs, we can calculate an algorithm’s running time. It’s important realize that we are not totally free from the specifications of the system, as a particular operations have their cost defined by it. But what we can do is represent the value with an arbitrary constant (Cn), since we don’t want to know its exact value. Let’s see an example.
In this very simple piece of C++ code, we’re looping through an array and counting how many of its elements are equal to zero. As simple as it may be, we have a few different operations that are happening. There is assignment, inequality comparisons, equality comparisons, increment and array access. When we count them, we get the following frequencies:
Now, let’s calculate the running time.
For clarity’s sake, the new constants — such as c7, c8 and c9 — are the result of the addition of the other constants that arise from the calculations. I just combined them into other constants in order to make it easier to read/write.
We can now see that the algorithm’s running time grows linearly according to its input.
So far, we have built a valid model that can analyze an algorithm’s running time. By itself, this method is very rich. It gives us the ability to conclude things in advance, without the cost of actually implementing and live testing these algorithms. We’re safe from having algorithms underperforming when things really matter.
Even though our model is very powerful, there are still some problems with it. They’re not really noticeable based on the last example that we worked on, but as an algorithm becomes bigger, the calculations tend to get complicated and we start to see that some operations don’t make a meaningful impact on the total running time of the algorithm. Let’s look at another example.
In this piece of C++ code we’re looping through an array and trying to find how many triplets have their sum equal to zero. So, we’re checking every single triplet and comparing their sum to zero. We can already see that things become a bit more complicated. We’re still dealing with the same set of operations, but they happen more frequently and with different conditions. If we were to count just how many times each of these operations happen, things would become really tiresome and complex.
To simplify this, we can focus on the main operation that is being executed within the algorithm. We’ll only count this operation’s frequency and through it we will determine an approximation for the running time. But why exactly would we do this? Wouldn’t this compromise the accuracy of our models and thus make this whole endeavor pointless?
Not exactly. When analyzing and choosing algorithms, we’re usually trying to solve important problems. More specifically, we’re trying to solve very big instances of that problem (versions of that problem with large amounts of data). In these occasions, picking the right algorithm truly makes a difference. But also, as the problem grows, we can see that our approximation is just as good as the in-depth calculation.
The image above represents two functions that describe the running time of an algorithm, with the red one being f(x)=x³+x²+3x (the in-depth analysis), and the blue one being just g(x)=x³ (the approximation). We can see that with small values, the approximation might not be good enough. But as the input size grows, the functions become indistinguishable. As matter of fact, if we we’re to calculate f(1000) and g(1000), we would get an accuracy of 99.899%.
If you understand a bit of calculus, this is equivalent to saying that:
This is the thing with algorithm analysis: we’re interested in the algorithm’s behavior with really large inputs, as with small inputs the choice of the algorithm doesn’t really matter, since we can compensate that with other things (computing power, network speed), or it might not even need to due to the simplicity.
With our method so far, the resulting function has its highest order term defined by the operation that happens the most frequently. This term is responsible for the largest part of the function’s value at a specific point. Similarly, the operation with the highest frequency is responsible for the largest part of the algorithm’s running time. Because of this, we can just use the highest order term and forget about the rest.
Let’s return to our last piece of code. The operation that happens the most is the if block that checks whether the triplet sum is equal to zero. If we calculate its frequency, we can see that we’re just trying to count how many ways we can pick three different elements from the array. We can use the combination formula for that:
Again, if you know a bit of calculus, you can check this by solving the following integral:
If we ignore the lower order terms, we get that our approximation is T(n) = n³/6.
It’s also important to realize that even though we are using approximations, there might be cases where you’ll need an exact formula. But they aren’t the majority.
We’ve built an algorithm analysis method from scratch. We began asking ourselves how could we compare algorithms and how can we determine these parameters. Then, saw how we could improve our model through some justified simplifications. More specifically, we saw that we can get a good enough approximation for an algorithm’s running time by focusing on its operation with the highest frequency.
Therefore, our mathematical model deals with approximations. We do so because calculations can get really complex and those cases are best to be left to experts. When analyzing most algorithms, these approximations will suffice.
This concludes the first part of our study of algorithm analysis. In the next part, we will use the method we’ve built in order to categorize algorithms with similar running times and to introduce some important mathematical notations in algorithm analysis (Big Oh, Theta, Omega).