Analysis Of Algorithms

Analysis Of Algorithms

In the previous post Introduction of Algorithms, We saw the importance and why they are required in building any simple solution. But we did not see how algorithms are analysed. So in this post we will learn how to analyse a given algorithms and what are the ways to analyse them.

Table of contents:

  1. Why analysis of algorithms?
  2. Asymptotic analysis.
  3. Asymptotic notations.
  4. Analysis of loops(Rule of Sum and Rule of Product).
  5. Analysis of recursive functions using Induction method.
  6. Amortized analysis using potential function.

Why analysis of algorithms?

When we are buying car or a bike, the first question we ask is “Kitna deti hai”(How much it gives) right! 😛 . In the world of computing, to describe the efficiency of algorithms, the first question we ask is “What’s the time complexity?”, “What’s the space complexity?”. They tell us which algorithm is a good fit for a given requirement. Some people may  say that now a days we got high RAM chips and secondary memory which will perform any given task so quickly, and why do we really need to care about efficiency of algorithms?

Let me tell by giving you a relevant example, lets assume that you are given a Ferrari car and asked to compete with Michael Schumacher. you being a novice driver know only how to drive but don’t know how to best utilize the car in a racing competition. Michael Schumacher wins as he out perform you with his ability because he know how to utilize Ferrari car features. By the time you finish one lap, he would have finished 10 laps.In a similar way, if you run a algorithm on an machine which has high configuration, then the expectation is also very high, it should utilize lesser memory and compute the task in lesser time. Hence to evaluate algorithms, we tend to use Time and Space complexity.

So does running two algorithms and declaring the one which runs faster as a winner for a given input?

No! For a given input, first algorithm may compute task faster as compared to second algorithm. And input can be changed in such a way that second algorithm can be faster than first. Lets take linear search and Binary search as an example for searching a number in the list.

Linear search algorithm searches from the beginning of the list, and it goes till the end of the list. where as binary search will start searching from the middle of the list. So if we are given a number which is there at the beginning of the list then obviously Linear search will find it faster as compared to Binary search. If the given number is in the middle then Binary search will find it faster. So we can not conclude just based on the running time of the algorithm.

Asymptotic analysis

So we use Asymptotic Analysis to analyse the efficiency of an algorithm for a given input. And here we don’t just blindly compare the running time but check how time and space taken varies as input increases. It helps in computing rate of growth of time taken with respect to input. Hence for a given algorithm with input size of N, we need to compute time complexity with respect to N, lets see how we compute for below program.

private int totalSum(int[] list) {		        //Cost		No of Times
	int sum = 0;					            //1			1
	for (int i = 0; i < list.length; i++) {		//2			n
		sum = sum + list[i];			        //2		    n
	}	
	return sum;					                //1			1
}

We need to consider a hypothetical machine which takes 1 time unit each for assignment, arithmetic and logical operations and find the expression which gives us the rate of growth of time with respect to input.

As we see the inline comments in the above program, initializing 0 to sum variable is a single assignment operation, hence we consider that machine take 1 time unit for executing that line, similarly i is initialized to 0 in 1 time unit.  Inside the for loop, sum is assigned a value n times and cost of it is 2 because once we are adding sum to list item and another time we are assigning it back to sum variable. Along with assigning value to sum variable, i is increamented n time and compared with length of the list. which makes total time unit consumed by for loop to be (1 + n (2 + 2)). Return statement will also takes 1 time unit as its a single operation. So total time taken by the method totalSum is

Time complexity = [1 ]+ [1 + n(2 + 2)] + [1]

T(n) = 3 + 4(n)

assume constants 3 and 4 to be C and C1 respectively, T(n) = C + C1(n) as we are only interested in rate of growth time with respect to n.

As we can see, it turns out to be a linear function, so rate of growth of time is linear with respect to input.

If we take a method to calculate sum of given matrix then it will be a quadratic function as seen below

	private int matrixSum(int[][] matrix) {				// Cost		//No of Times
		int sum = 0;									//  1		//  1
		for (int i = 0; i < matrix.length; i++) {		//	2		//	n
			for (int j = 0; j < matrix[0].length; j++) {//	2		//	n
				sum = sum + matrix[i][j];				//	2		//	n
			}
		}
		return sum;										//	1		//	1
	}

T(n) = [1] + [1 + n(2 + {1 + n(2 + 2)})] + [1]

T(n) = 3 + [n(3 + 4(n))]

T(n) = 3 + 3(n) + 4(n^2)

T(n) = 4(n^2) + 3(n) + 3

T(n) = a(n^2) + b(n) + c

assume constants 4 and 3 to be C and C1 respectively, then T(n) = C(n^2) + C1(n) + C1, as we are only interested in rate of growth time with respect to n. Equation of matrixSum method seems to be a quadratic function.

If we plot these equations in a graph then we can visualize as below

Rate of growth matrixSum method increases drastically as compared to rate of growth of totalSum method with respect to input.

But for a given algorithm, rate of growth of time varies based input right. like consider linear search algorithm where it behaves worst if the searching key is at the end of the list, and it behaves best if the searching key is at the beginning of the list. so it is hard to define the time complexity of the algorithms. So to analyse algorithms we tell the upper bound and the lower bound in which the algorithm can vary for any given input range.

So we will categorize these expressions into set where rate of growth of time is similar. We will use asymptotic notations such as Big O, Omega, and theta to express analysis. So above two expressions can be classified into O(n) and O(n^2). We will see what these asymptotic notations are all about.

What are asymptotic notations?

Asymptotic notations is a way to classify the running time of algorithms in generic classes or sets. When we analyse time complexity, we analyse it on very large input, i.e. when n tends to ∞.

 As we saw the expression while calculating sum of matrix elements T(n) = a(n^2) + b(n) + c, here when n tends to ∞, [b(n) + c] becomes insignificance. So we will not consider them in the expression.

Big O asymptotic notation

For positive function g(n), O(g(n)) is defined as a set of all the functions F(n), there exists constant c and n’, such that F(n) <= c * g(n), for all n >= n’. In simple words it gives us the upper bound of rate of growth of an algorithm. i.e. consider above equation f(n) = 4(n^2) + 3(n) + 3

as logiclly 4(n) < 4(n^2) and  3 < 3(n^2). we will replace them in above equation

f(n) < 4(n^2) + 3(n^2) + 3(n^2)

f(n) < 10(n^2)

f(n) < c*g(n) for n>1 where c = 10 and g(n) = n^2

It says that our given function will grow maximum up to n^2 after n = 1, so we represent it as O(n^2) .

Omega (Ω) asymptotic Notation

For a positive function g(n), then Ω(g(n)) is defined as a set of all the functions f(n), there exists constant C, n’, such that c*g(n) <= f(n), for all n >= n’.

 It gives us the lower bound of the rate of growth of an algorithm.i.e. consider below equation

f(n) = 4(n^2) + 3(n) + 3.

logically f(n) >= 4(n^2)

f(n) >= c*g(n), for n >= 0, c = 3 and g(n) = n^2

It says that our given function will grow minimum up to n^2 after n = 0, so we represent it as Ω(n^2) .

Theta (θ) asymptotic notation

For a positive function g(n), then θ(g(n)) is defined as a set of all the functions f(n), there exists constant c1, c2 and n’, such that c1*g(n) <= f(n) <= c2*g(n), for all n >= n’. It gives us both upper and lower bound of the rate of growth of an algorithm.i.e. consider below equation

f(n) = 4(n^2) + 3(n) + 3.

for c1 = 4, c2 = 10 and n’ = 1

g(n) = n^2,  and θ(n^2)

theta notation best describe by giving the tight bound and says that rate of growth of g(n) is as close as rate of growth of f(n). Below is the visualization of it.

Analysis of loops

we use asymptotic analysis to calculate time taken by each of the assignment, arithmetic and logical operations. Apparently those operations are considered as constant time i.e. O(1) when there are loops or recursive functions which are executed N times which is very huge. So if there is a function with one loop which executes N times computing some task, then it is considered as having O(N) time complexity. for example consider a function which has one loop as below for finding the sum of given array

int findSum(int[] array) {
	int sum = 0;
	for(int i = 0;i < array.lenght; i++) {	//O(N)
		sum = sum + array[i];				//O(1)
	}
	return sum;
}

Here the time complexity of the function findSum is O(N). As for loop is executed for N times and sum is assigned the value in O(1).

Rule of product

If we have a nested loop like below then the time complexity of each is multiplied.

int findMatrixSum(int[][] matrix) {
	int sum = 0;
	for(int i = 0;i < matrix.lenght; i++) {				//O(N)
		for (int j = 0;j < matrix[0].lenght; j++ ) {	//O(N)
			sum = sum + array[i][j];					//O(1)
		}
	}
	return sum;
}

The time complexity of the function fincMatrixSum is O(N^2). As there are two nested loops, inner loop is executed for N times each which makes it N * N.

Rule of Sum

But if two loops are executed consecutively then the time complexity of each is summed.

for (int i = 1; i <=m; i++) {  
        // Operations
	}
    for (int i = 1; i <=n; i++) {
        // Operations
    }

Here the time complexity is O(m) + O(n).

Analysis of recursive functions.

Many a times we write a recursive functions to perform some operations so how do we calculate the time complexity of the recursive functions?
There are broadly 3 ways to calculate them

  1. Induction method.
  2. Master theorem.
  3. Recurrence tree method.

But in this post lets discuss only about Induction method and understand it thoroughly. And you can find the reference of the Master theorem and recurrence tree method in this post.

Induction method

In one of my previous posts while explaining about the sorting algorithms, i have analysed time complexity using induction method.

We find the function of running time on input size. i.e. we first find how the given input is divided in each recursion and formulate expression for it. Consider our merge sort itself where in each recursive method, input is divided as shown below. So if you consider the total time complexity of the merge sort then it is

T(n) = T(n/2) + T(n/2) + n
T(n) = 2T(n/2) + n.

You can visualize it from below code.

To solve the equation using induction method, we start substituting n with how it is has been divided in each recursion. i.e. in each recursion of merge sort, N is divided by 2, so substitute it by n / 2

T(n) = T(n/2) + T(n/2) + (n)
T(n) = 2T(n/2) + n
T(n) = n + 2[n/2 + 2T(n/4)]
T(n) = n + n + 4T(n/4)
T(n) = 2n + 4[n/4 + 2T(n/8)]
T(n) = 2n + 4n + 8T(n/8)
T(n) = 6n + 8T(n/8)

T(n) = n(2k) + 2^kT(n/2^k)
when 2^k tends to n, n = 2^k
k = log(n)
so, T(n) = n(2logn) + nT(1)
T(n) = 2nlogn + n
T(n) = θ(nlogn).

So personally i believe Induction method of analyzing the recursive methods is far better and easier than Master theorem and Recurrence tree method.

 Amortized analysis using potential function

Before going in dept of understanding amortized analysis, i would like to give you a simple analogy for your better understanding. Consider you want to manage your finance for your entire month, and you want to keep aside a few rupees based on the estimate you make for an operation such as buying groceries. lets say you spent last 4 months spending 5000, 6000, 5000 and 15000 rupees consecutively, and you want to allocate money for next month considering the worst case that would spend on buying groceries,  lets just consider that to be 15000 rupees per month, but you are considering that amount keeping in mind that you wont be in short of money for buying groceries for the next month. But in reality every month your expenses on buying groceries not always will be 15000 rupees right! So you spent 15000 only once and occurrence of spending 15000 rupees is very rare, so considering that would not give you a very tighter bound. So here amortized analysis comes in handy for calculating the tighter upper bound. Lets now understand amortized analysis with respect to analyzing algorithms.

Amortized analysis is a another technique of analyzing an algorithm’s running time. We often use binary search tree, AVL tree, stack, queue, hash map or any other data structures to implement algorithms efficiently and the running time of the algorithms depend upon the running time of the operation done on those data structures multiplied by the no of such operations. So while calculating the running time of the operation, we tend to chose the worst time taken to perform those operations. Lets take the operation of adding elements into balanced binary search tree(AVL tree) and hashmap.

In AVL tree we know that adding an element would take order of Log(n) in worst case which is the upper bound, and multiplying it with M number of operations would give use M x Log(n) time. In case of Hashmap, insertion would take constant time i.e. O(1) but it will last only until hashmap is near to fill, once the hashmap is full with all the elements, we need to re-hash the entire set of elements which would take again O(n) time. So multiplying M number of operations would give us M x N, which we think as the upper bound.But that would be very pessimistic way of analyzing the operations. re-hashing happens very rarely and considering that while analyzing would be a bad idea as it is not a tighter upper bound. Hence amortized analysis is used to give a very tighter bound of running time of the operations which better depicts the performance. it gives us the average performance of each operation in the worst case

There are 3 ways of performing amortized analysis, namely

  1. Aggregate analysis
  2. Accounting or Banker’s method
  3. Potential function

In this post we will focus only on amortized analysis using Potential function.

We will define a potential function which will assign a credit to the entire data structure as a whole, i.e we will define potential on the current state of the data structure. potential function Φ which maps a state D to a potential Φ(D). We will define a potential function at a particular operation i by mapping Φ(Di) to a real value and take the difference between current and the previous operation.

Let C’i be the amortized cost of the ith operation

Φ : D -> R+
at i, C’ + Φ(Di) – Φ(Di-1)
C’1 = C1 + Φ(D1) – Φ(D0)
C’2 = C2 + Φ(D2) – Φ(D1)
C’3 = C3 + Φ(D3) – Φ(D2)
.
.
C’n = Cn + Φ(Dn) – Φ(Dn – 1)
when we add all the above equations, we get
∑C’i = ∑Ci + Φ(Dn) – Φ(D0)

i.e. Total amortized cost = Total actual cost + Potential difference

And if the potential difference is positive, then

∑Ci < ∑C’i i.e it gives us the upper bound of the overall cost of a sequence of operations

So what i am trying to say in the above proof is that while analyzing the algorithm, we first need to define the potential function. Then for each operation we need to calculate the difference and sum it up, and total amortized cost will be slightly greater than total cost which also gives us the tighter upper bound.

 Let me know in the comment section if you want me to explain defining the potential difference for the algorithms and explain amortized analysis.

5 thoughts on “Analysis Of Algorithms

  1. Covered all major topics including Master’s Theorem and amortized cost on Analysis of Algorithm and explained clearly with proper examples.
    Waiting for more posts.

  2. liked how analogies were used to drive the point home. keep up the good work..!!!

Comments are closed.

Comments are closed.