=t for q
65
13. repeat 14. I=I+1; 15. until(a[I]>=v); 16. repeat 17. j=j1; 18. until(a[j]<=v); 19. if (I=j); 21. a[m]=a[j]; a[j]=v; 22. retun j; 23. } 1. 2. 3. 4. 5. 6. 7.
Algorithm Interchange(a,I,j) //Exchange a[I] with a[j] { p=a[I]; a[I]=a[j]; a[j]=p; }
Algorithm Quicksort(p,q) 1. //Sort the elements a[p],….a[q] which resides 2. //is the global array a[1:n] into ascending 3. //order; a[n+1] is considered to be defined 4. // and must be >= all the elements in a[1:n] 5. { 6. if(p
66
Left part A[0]A[1]….A[K1]
Pivot element A[K]
Right part A[K+1]A[K+2]…..A[N1]
Step 2: Conquer: Sort the left part of the array A[O]A[l].. ...A[kl] recursively Sort the right part of the array A[k+l] A[k+2] A[knl] recursively To simplify the design, assume that a very large value is stored at the end of the array. This is achieved by storing 00 in a[n]. Apart from a, low and high the other variablesthat are used are: . i : The initial value of index i is low i.e., i low . j : The initial value of index j is one more than high i.e., j high . pivot: a[low]is treated as the pivot element. The general procedure to partition the array is shown below: . Keep incrementing the index i as long as pivot a[i]. This is achieved using the statement: do i i + 1 while (pivot >= a[i]); .. Once the above condition fails, keep decrementing the index j as long as pivot S a[j]' This is achieved using the statement: do j + j  1 while (pivot >= a[i]); . Once the above condition fails, if i is less than j exchange a[i] with a[j] and repeat all the above process as long as i <= j. Performance of Quick sort The running time of quick sort depends on whether partition is balanced or unbalanced, which in turn depends on which elements of an array to be sorted are used for partitioning. A very good partition splits an array up into two equal sized arrays. A bad partition, on other hand, splits an array up into two arrays of very different sizes. The worst partition puts only one element in one array and all other elements in the other array. If the partitioning is balanced, the Quick sort runs asymptotically as fast as merge sort. On the other hand, if partitioning is unbalanced, the Quick sort runs asymptotically as slow as insertion sort. Best Case The best thing that could happen in Quick sort would be that each partitioning stage divides the array exactly in half. In other words, the best to be a median of the keys in A[p . . r] every time procedure 'Partition' is called. The procedure 'Partition' always split the array to be sorted into two equal sized arrays. If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then
67
T(n)=T(n/2)+T(n/2)+ (n) = 2T(n/2) + (n) And from case 2 of Master theorem T(n) =
(n lg n)
Worstcase: Let T(n) be the worstcase time for QUICK SORT on input size n. We have a recurrence T(n) = max1≤q≤n1 (T(q) + T(nq)) +
(n)
 1
where q runs from 1 to n1, since the partition produces two regions, each having size at least 1. Now we guess that T(n) ≤ cn2 for some constant c. Substituting our guess in equation 1.We get T(n) = max1≤q≤n1 (cq2 ) + c(n  q2)) + = c max (q2 + (n  q)2) + (n)
(n)
Since the second derivative of expression q2 + (nq)2 with respect to q is positive. Therefore, expression achieves a maximum over the range 1≤ q ≤ n 1 at one of the endpoints. This gives the bound max (q2 + (n  q)2)) 1 + (n 1)2 = n2 + 2(n 1). Continuing with our bounding of T(n) we get T(n) ≤ c [n2  2(n1)] + (n) = cn2  2c(n1) + (n) Since we can pick the constant so that the 2c(n 1) term dominates the have
(n) term we
T(n) ≤ cn2 Thus the worstcase running time of quick sort is T(n)=
(n2).
68
Averagecase Analysis If the split induced by RANDOMIZED_PARTITION puts constant fraction of elements on one side of the partition, then the recurrence tree has depth (lgn) and (n) work is performed at (lg n) of these level. This is an intuitive argument why the averagecase running time of RANDOMIZED_QUICKSORT is (n lg n). Let T(n) denotes the average time required to sort an array of n elements. A call to RANDOMIZED_QUICKSORT with a 1 element array takes a constant time, so we have T(1) = (1). After the split RANDOMIZED_QUICKSORT calls itself to sort two sub arrays. The average time to sort an array A[1 . . q] is T[q] and the average time to sort an array A[q+1 . . n] is T[nq]. We have T(n) = 1/n (T(1) + T(n1) + n1∑q=1 T(q) + T(nq))) +
(n)
 1
We know from worstcase analysis T(1)= (1)andT(n1)=O(n2) T(n) = 1/n ( (1) + O(n2)) + 1/n n1∑q=1 (r(q) + T(n = 1/n n1∑q=1(T(q) + T(n  q)) + (n) n1 = 1/n[2 ∑k=1(T(k)] + = 2/n n1∑k=1(T(k) + (n)  3
q))
(n)  2 (n) +
Solve the above recurrence using substitution method. Assume inductively that T(n) ≤ anlgn + b for some constants a > 0 and b > 0. If we can pick 'a' and 'b' large enough so that n lg n + b > T(1). Then for n > 1, we have T(n)
≥
n1
∑k=1 2/n (aklgk = 2a/n n1∑k=1 klgk  1/8(n2) + 2b/n (n 1) + (n)
+ b)  4
+
(n)
At this point we are claiming that n1
∑k=1 klgk ≤ 1/2 n2 lgn  1/8(n2)
Stick this claim in the equation 4 above and we get
69
T(n)
≤ 2a/n [1/2 n2 lgn ≤ anlgn  an/4 + 2b + (n)

1/8(n2)]
+
2/n
b(n
1)
+
(n)
In the above equation, we see that (n) + b and an/4 are polynomials and we certainly can choose 'a' large enough so that an/4 dominates (n) + b. We conclude that QUICKSORT's average running time is
(n lg(n)).
Conclusion : Quick sort is an in place sorting algorithm whose worstcase running time is (n2) and expected running time is (n lg n) where constants hidden in (n lg n) are small.
5.4
BINARY SEARCH Suppose we are given a number of integers stored in an array A, and we want to locate a specific target integer K in this array. If we do not have any information on how the integers are organized in the array, we have to sequentially examine each element of the array. This is known as linear search and would have a time complexity of O(n ) in the worst case. How ever, if the elements of the array are ordered, let us say in ascending order, and we wish to find out the position of an integer target K in the array, we need not make a sequential search over the complete array. We can make a faster search using the Binary search method. The basic idea is to start with an examination of the middle element of the array. This will lead to 3 possible situations: If this matches the target K, then search can terminate successfully, by printing out the index of the element in the array. On the other hand, if KA[middle], then further search is limited to elements to the right of A[middle]. If all elements are exhausted and the target is not found in the array, then the method returns a special value such as –1. Here is one version of the Binary Search function:
70
Algorithm : BinarySearch (int A[ ], int n, int K) { L=0, Mid, R= n1; while (L<=R) { Mid = (L +R)/2; if ( K= =A[Mid] ) return Mid; else if ( K > A[Mid] ) L = Mid + 1; else R = Mid – 1 ; } return –1 ; } Analysis of Binary search Best case : The best case occurs when the item to be searched is present in the middle of the array. So the total number of comparisons required will be 1.. Therefore, the time complexity of binary search in the best case is given by Tbest(n)=Ω(1). Worst case: This case occurs when the key to be searched is in either at the first position or at the last position in the array. In such situations, maximum number of elements comparisons are required and the time complexity is given by T(n)={ 1 if n=1 T(n/2) + 1 otherwise Consider, t(n)=t(n/2)+1 This recurrence relation can be solved using repeated substitution as shown below: t(n)=t(n/2)+1 replace n by n/2 2 t(n)=1+1+t(n/2 ) t(n)=1+2+t(n/23 ) replace n by n/2 ………….. ………….. In general, t(n)=i+t(n/2i ) Finally to get the initial condition t(1),let 2i = n t(n)=i+t(1) Where t(1)=0 t(n)=i We have n=2i , Take log on both sides i*log2 =log2 n i= log2 n So, time complexity is given by Tavg(n) € ( log2 n) Advantages of binary search • Simple technique • Very efficient searching technique Disadvantages of binary search • The array should be sorted.
71
Check your progress 1. What is a sorting? 2. Write an algorithm for merge sort? Explain its working. 3. Write an algorithm for quick sort? calculate its best case, worst case and average case time complexity. 4. Explain the working of binary search with example? Write an algorithm and its time complexity. 5.5
SUMMARY Merge sort is a divide and conquer algorithm. It works by dividing an array into two halves, sorting them recursively, and then merging the two sorted halves to get the original array sorted. The algorithm’s time efficiency is same in all cases i.e θ(nlogn). Quick sort is a divide and conquer algorithm that works by partitioning its input’s elements according to their value relative to some preselected element. Quick sort is noted for its superior efficiency among nlogn algorithms for sorting randomly ordered arrays but also for the quadratic worstcase efficiency. Binary search is o(logn) algorithm for searching in sorted arrays. It is typical example of an application of the divide and conquer technique because it needs to solve just one problem of half the size on each of its iterations. ANSWERS TO CHECK YOUR PROGRESS 1. 1.1 2. 1.2 3. 1.3 4. 1.4 UNITEND EXERCISES AND ANSWERS 10. a.)What is largest number of key comparisions made by binary search insearching for a key in the following array? { 3,14,27,31,39,42,55,70,74,81,85,93,98 } b) List all the keys of this array that will require the largest number of key comparisons when searched for by binary search. 11. Apply quick sort to the list A N A L Y S I S in alphabetical order.
72
12. Apply merge sort algorithm to sort A L G O R I T H M in alphabetical order? Is merge sort a stable algorithm? Answers: SEE 1. 1.4 2. 1.3 3. 1.2 5.6
SUGGESTED READINGS 1. Inroduction to The design and analysis of algorithms by Anany Levitin 2. Analysis and design of algorithms with C/C++  3rd edition by Prof. Nandagopalan
73
MODULE3,UNIT 3:
GREEDY TECHNIQUE
Structure 4.0
Objectives
1.1
Introduction
1.1.1
Concept of greedy mehod
1.2
Optimization Problems
1.3
Summary
1.4
Key words
1.5
Answers to check your progress
1.6
Unitend exercises and answers
1.7
Suggested readings
6.0
OBJECTIVES
At the end of this unit you will be able to
6.1
•
Find how to apply Greedy technique
•
Identifying whether to solve problem by using greedy technique
•
Know how to find single source shortest path
•
Construct Huffman tree and to generate Huffman’s code.
INTRODUCTION Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the sense that they take decisions on the basis of information at hand without worrying about the effect these decisions may have in the future. They are easy to invent, easy to implement and most of the time quite efficient. Many problems cannot be solved correctly by greedy approach. Greedy algorithms are used to solve optimization problems
74
1.1.1 Concept of Greedy meyhod Greedy Algorithm works by making the decision that seems most promising at any moment; it never reconsiders this decision, whatever situation may arise later. As an example consider the problem of "Making Change". Coins available are: • • • • •
dollars (100 cents) quarters (25 cents) dimes (10 cents) nickels (5 cents) pennies (1 cent)
Problem : Make a change of a given amount using the smallest possible number of coins. Informal Algorithm • • •
Start with nothing. at every stage without passing the given amount. add the largest to the coins already chosen.
Formal Algorithm Make change for n units using the least possible number of coins. MAKECHANGE (n) { C ← {100, 25, 10, 5, 1} //constants S ← {};
// Set that hold the solution
Sum ← 0 While sum !=n
75
x=largest item in set C such that sum+x <= n if no such item then return “No solution” S ← S{valueofx} sum ← sum+x RETURN S } Example : Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the largest coin without worrying about the consequences. Moreover, it never changes its mind in the sense that once a coin has been included in the solution set, it remains there. 1.1.2
Characteristics and Features of Problems solved by Greedy Algorithms
To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen items and the other contains rejected items. The greedy algorithm consists of four (4) function. 1. 2. 3. 4.
A function that checks whether chosen set of items provide a solution. A function that checks the feasibility of a set. The selection function tells which of the candidates is the most promising. An objective function, which does not appear explicitly, gives the value of a solution.
Structure Greedy Algorithm • •
Initially the set of chosen items is empty i.e., solution set. At each step o item will be added in a solution set by using selection function. o IF the set would no longer be feasible reject items under consideration (and is never consider again). o ELSE IF set is still feasible THEN add the current item.
76
1.1.3 Definitions of feasibility A feasible set (of candidates) is promising if it can be extended to produce not merely a solution, but an optimal solution to the problem. In particular, the empty set is always promising why? (because an optimal solution always exists) Unlike Dynamic Programming, which solves the subproblems bottomup, a greedy strategy usually progresses in a topdown fashion, making one greedy choice after another, reducing each problem to a smaller one.
GreedyChoice Property The "greedychoice property" and "optimal substructure" are two ingredients in the problem that lend to a greedy strategy. GreedyChoice Property It says that a globally optimal solution can be arrived at by making a locally optimal choice.
6.2
OPTIMIZATION PROBLEMS
1.2.1 Huffman Codes Huffman code is a technique for compressing data. Huffman's greedy algorithm look at the occurrence of each character and it as a binary string in an optimal way. Example Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the data occur with following frequencies. a b c d e f Frequency 45,000 13,000 12,000 16,000 9,000 5,000
77
Consider the problem of designing a "binary character code" in which each character is represented by a unique binary string. Fixed Length Code In fixed length code, needs 3 bits to represent six(6) characters.
Frequency Fixed Length code
a b c d 45,000 13,000 12,000 16,000 000
001
010
011
e 9,000
f 5,000
100
101
This method require 3000,000 bits to code the entire file. How do we get 3000,000? • •
Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 1000,000. Add each character is assigned 3bit codeword => 3 * 1000,000 = 3000,000 bits.
Conclusion Fixedlength code requires 300,000 bits while variable code requires 224,000 bits. Saving of approximately 25%.
Prefix Codes In which no codeword is a prefix of other codeword. The reason prefix codes are desirable is that they simply encoding (compression) and decoding. Can we do better? A variablelength code can do better by giving frequent characters short codewords and infrequent characters long codewords.
Frequency Fixed Length code
a b c d 45,000 13,000 12,000 16,000 0
101
100
111
e 9,000
f 5,000
1101
1100
78
Character 'a' are 45,000 each character 'a' assigned 1 bit codeword. 1 * 45,000 = 45,000 bits. Characters (b, c, d) are 13,000 + 12,000 + 16,000 = 41,000 each character assigned 3 bit codeword 3 * 41,000 = 123,000 bits Characters (e, f) are 9,000 + 5,000 = 14,000 each character assigned 4 bit codeword. 4 * 14,000 = 56,000 bits. Implies that the total bits are: 45,000 + 123,000 + 56,000 = 224,000 bits Encoding: Concatenate the codewords representing each characters of the file.
String Encoding TEA 10 00 010 SEA 011 00 010 TEN 10 00 110 Example
a 0
b
From variablelength codes table, we code the3character file abc as:
c
101 100
=> 0.101.100 = 0101100
Decoding Since no codeword is a prefix of other, the codeword that begins an encoded file is unambiguous. 79
To decode (Translate back to the original character), remove it from the encode file and repeatedly parse. For example in "variablelength codeword" table, the string 001011101 parse uniquely as 0.0.101.1101, which is decode to aabe. The representation of "decoding process" is binary tree, whose leaves are characters. We interpret the binary codeword for a character as path from the root to that character, where 0 means "go to the left child" and 1 means "go to the right child". Note that an optimal code for a file is always represented by a full (complete) binary tree.
1.2.2 Dijkstra's Algorithm (Single Source shortest path algorithm ) Dijkstra's algorithm solves the singlesource shortestpath problem when all edges have nonnegative weights. It is a greedy algorithm.Algorithm starts at the source vertex, s, it grows a tree, T, that ultimately spans all vertices reachable from S. Vertices are added to T in order of distance i.e., first S, then the vertex closest to S, then the next closest, and so on. Following implementation assumes that graph G is represented by adjacency. Problem Statement : To find out the shortest distance from a single source to different cities. Algorithm : DIJKSTRA (G, w, s) 1. INITIALIZE SINGLESOURCE (G, s) 2. S ← { } // S will ultimately contains vertices of final shortestpath weights from s 3. Initialize priority queue Q i.e., Q ← V[G] 4. while priority queue Q is not empty do 5. u ← EXTRACT_MIN(Q) // Pull out new vertex 6. S ← S ∪ {u} // Perform relaxation for each vertex v adjacent to u 7. for each vertex v in Adj[u] do 8. Relax (u, v, w)
Example: Step by Step operation of Dijkstra algorithm. Step1. Given initial graph G=(V, E). All nodes nodes have infinite cost except the source node, s, which has 0 cost.
80
Step 2. First we choose the node, which is closest to the source node, s. We initialize d[s] to 0. Add it to S. Relax all nodes adjacent to source, s. Update predecessor (see red arrow in diagram below) for all nodes updated.
Step 3. Choose the closest node, x. Relax all nodes adjacent to node x. Update predecessors for nodes u, v and y (again notice red arrows in diagram below).
Step 4. Now, node y is the closest node, so add it to S. Relax node v and adjust its predecessor (red arrows remember!).
81
Step 5. Now we have node u that is closest. Choose this node and adjust its neighbor node v.
Step 6. Finally, add node v. The predecessor list now defines the shortest path from each node to the source node, s.
82
Analysis Like Prim's algorithm, Dijkstra's algorithm runs in O(ElgV) time.
Check your progress 1. Explain the greedy method of problem solving with the example.. 2. Write a note on Huffman coding. 3. Write an algorithm for a single source shortest path by using greedy technique explain it with the example. 1.3 SUMMARY The Greedy Technique suggests constructing a solution to an optimization problem through a sequence of steps, each expanding a partially constructed solution obtained so far, until acompletesolution to the problem is reached. On each step, the choice made must be feasible, locally optimal , and irrevocable. Dijkstra’s algorithm solves the single source shortest path problem of finding shortest paths from a given vertex (the source) to all the other vertices of weighted graph or digraph. Huffman code is an optimal prefix free variable length encoding scheme that assigns bit string to characters based on their frequencies in a given text. This is accomplished by a greedy construction of a binary tree whose edges are labeled with 0’s and 1’s. 1.4
KEYWORDS
1 Greedy Technique – It is a method (approach) of problem solving. 2. Huffman’s tree – It is a binary tree generated by Huffman algorithm having left child edge weight as 0 and right child weight as 1. 1.5 ANSWERS TO CHECK YOUR PROGRESS 1. 1.1
83
2. 1.2 3. 1.2 1.6 UNITEND EXERCISES AND ANSWERS 1.a. Compare fixed length encoding with the variable length encoding b. Prove that how variable length encoding is better than the fixed length encoding. c. Is that the Huffman encoding can be used for data compression ,defend your answer with the example. 2. Discuss how Dijkstra’s algorithm belongs to greedy technique with the example. Answers: SEE 1. 1.3 2. 1.3 1.7
SUGGESTED READINGS
1. Introduction to The design and analysis of algorithms by Anany Levitin 2. Analysis and design of algorithms with C/C++  3rd edition by Prof. Nandagopalan 3. . Analysis and design of algorithms by Padma reddy
84
MODULE3,UNIT 3:
APPLICATIONS OF GREEDY METHOD
Structure 5.0
Objectives
1.1
Introduction
1.2
Container loading problem
1.3
0/1 Knapsack problem
1.4
Minimum cost spanning tree algorithms
1.6
Summary
1.7
Key words
1.8
Answers to check your progress
1.9
Unitend exercises and answers
1.10
Suggested readings
7.0
OBJECTIVES
At the end of this unit you will be able to
7.1
•
Solve container loading and knapsack problems.
•
Find the shortest paths using Prim’s and Kruskal’s algorithm.
•
Identifying the difference between graph tree and minimum spanning tree.
INTRODUCTION • •
Greedy method is the most straightforward designed technique. As the name suggest they are short sighted in their approach taking decision on the basis of the information immediately at the hand without worrying about the effect these decision may have in the future.
DEFINITION: • A problem with N inputs will have some constraints .any subsets that satisfy these constraints are called a feasible solution. • A feasible solution that either maximize can minimize a given objectives function is called an optimal solution. 7.2
CONTAINER LOADING PROBLEM
85
The container laoding problem is almost similar to the knapsack problem and also similar to another interesting problem called packing problem. The container loading is stated as follows: We have equal size containers to be loaded on to the cargo and in turn the cargo is to be loaded on to the ship. Each container has weight, wi and the cargo has a maximum capacity of c units. The objective of this problem is to load the ship with maximum number of containers. Let xi is a variable, taking values 0 or 1. A 1 indicates that container is to be loaded and a 0 means it should not. Formally we can define the problem as,
Maximise Subjected to the constraints, from i=1 to n, wi.xi<= C Greedy Strategy In this problem, fortunately, we do not have the profit that has to be considered for constraints. Since the objective is to load the maximum number of containers, the greedy strategy that we use is : Include the containers from lowest to highest (i.e. ascending order of weights) weights so that we could pack more containers. Example Consider a container loading instances with n=7, {w1,w2,w3,w4,w5,w6,w7}={90,190,40,80,140,40,40,10} and C=300. Sol : When the containers are arranged in the ascending order of their weights, we get {w1,w2,w3,w4,w5,w6,w7}={ 10, 40, 40,80,90,140,190} ={ 1, 1, 1 , 1 , 1, 0, 0 } Therefore, the solution is {1,0,1,1,0,1,1} Total number of containers=5 Total weight=260 7.3
KNAPSACK PROBLEM Statement : A thief robbing a store and can carry a maximal weight of w into their knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should thief take?
86
There are two versions of problem Fractional knapsack problem The setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 ≤ xi ≤ 1. Exhibit greedy choice property.
Greedy algorithm exists.
Exhibit optimal substructure property.
?????
01 knapsack problem The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a fraction of an item. Exhibit No greedy choice property.
No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.
1.3.1 01 Knapsack Problem using Dynamic programming • • • • • •
we are given n objects and knapsack or bag with capacity M object I has a weight Wi where I varies from 1 to N. The problem is we have to fill the bag with the help of N objects and the resulting profit has to be maximum. Formally the problem can be stated as Maximize xipi subject to XiWi<=M Where Xi is the fraction of object and it lies between 0 to 1. There are so many ways to solve this problem, which will give many feasible solution for which we have to find the optimal solution. But in this algorithm, it will generate only one solution which is going to be feasible as well as optimal. First, we find the profit & weight rates of each and every object and sort it according to the descending order of the ratios.
87
• • • •
Select an object with highest p/w ratio and check whether its height is lesser than the capacity of the bag. If so place 1 unit of the first object and decrement .the capacity of the bag by the weight of the object you have placed. Repeat the above steps until the capacity of the bag becomes less than the weight of the object you have selected .in this case place a fraction of the object and come out of the loop. Whenever you selected.
The most common formulation of the problem is the 01 knapsack problem, which restricts the number xi of copies of each kind of item to zero or one. Mathematically the 01knapsack problem can be formulated as:
maximize
subject to
The bounded knapsack problem restricts the number xi of copies of each kind of item to a maximum integer value ci. Mathematically the bounded knapsack problem can be formulated as:
maximize
subject to
Algorithm: Knapsack(n,m,w,p,v) //Input : n – number of objects to be selectd m capacity of the knapsack w – weight of all the objects p profits of all the objects //Output: v the optimal solution for the number of objects selected with specified remaining capacity For i 0 to n do For j0 to m do If(i=0 or j=0) v[i,j]=0
88
Else if(w[i]>j) v[i,j]=v[i1,j] else v[i,j]=max(v[i1,j],v[i1,jw[i]]+p[i]) end if end for end for Algorithm: objectselected(n,m,w,v,x) //Input : n – number of objects to be selectd m capacity of the knapsack w – weight of all the objects p profits of all the objects //Output : x the information of objects selected and not selected For i0 to n1 do X[i]=0 End for I=n; j=m While( i != 0 and j != 0) { If(v[I,j] != v[i1,j]) { x[i]=1; j=jw[i] } i=i1 } For i1 to n do If(x[i]=1) Write “ object i selected” End if End for Example : Given some items, pack the knapsack to get the maximum total value. Each item has some weight and some value. Total weight that we can carry is no more than some fixed number W. So we must consider weights of items as well as their value. Item #Weight Value 1 2 12 2 1 10 Maximum capacity is i.e. M=5 3 3 20 4 2 15
89
Sol :
optimal profit obtained is 37 by selecting the objects 1, 2 and 4.
1.4 MINIMUM COST SPANNING TREE ALGORITHM
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the spanning tree is called a chord. We construct spanning tree whenever we want to find a simple, cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.). Spanning trees are important because of following reasons.
Minimum spanning tree A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges sum is minimum weight. In other words, a MST is a tree formed from a subset of the edges in a given undirected graph, with two properties: it spans the graph, i.e., it includes every vertex of the graph. • it is a minimum, i.e., the total weight of all the edges is as low as possible. •
Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the set of edges. Each edge has a given non negative length.
1 PRIMS ALGORITHM: This algorithm was first proposed by Jarnik, but typically attributed to Prim. It starts from an arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed; the algorithm halts when all the vertices in the graph have been reached. 90
This strategy is greedy in the sense that at each step the partial spanning tree is augmented with an edge that is the smallest among all possible adjacent edges.
Example : Start from an arbitrary vertex (root). At each stage, add a new branch (edge) to the tree already constructed; the algorithm halts when all the vertices in the graph have been reached.
Algorithm prims(e,cost,n,t) { Let (k,l) be an edge of minimum cost in E; Mincost :=cost[k,l]; T[1,1]:=k; t[1,2]:=l; For I:=1 to n do If (cost[i,l]
Near[j]:=0; For k:=0 to n do If near((near[k]≠0) and (Cost[k,near[k]]>cost[k,j])) then Near[k]:=j; } Return mincost; } • • •
The prims algorithm will start with a tree that includes only a minimum cost edge of G. Then, edges are added to the tree one by one. the next edge (i,j) to be added in such that I is a vertex included in the tree, j is a vertex not yet included, and cost of (i,j), cost[i,j] is minimum among all the edges. The working of prims will be explained by following diagram
Step 1:
Step 3:
Step 5:
Step 2:
Step 4:
Step 6:
92
Analysis : The algorithm spends most of its time in finding the smallest edge. So, time of the algorithm basically depends on how do we search this edge. Straightforward method Just find the smallest edge by searching the adjacency list of the vertices in V. In this case, each iteration costs O(m) time, yielding a total running time of O(mn).
2. KRUSKAL’S ALGORITHM:
In kruskal's algorithm the selection function chooses edges in increasing order of length without worrying too much about their connection to previously chosen edges, except that never to form a cycle. The result is a forest of trees that grows until all the trees in a forest (all the components) merge in a single tree. •
In this algorithm, a minimum costspanning tree ‘T’ is built edge by edge. • Edge are considered for inclusion in ‘T’ in increasing order of their cost. • •
An edge is included in ‘T’ if it doesn’t form a cycle with edge already in T. To find the minimum cost spanning tree the edge are inserted to tree in increasing order of their cost
Algorithm: Algorithm kruskal(E,cost,n,t) //Eset of edges in G has ‘n’ vertices. //cost[u,v]cost of edge (u,v).tset of edge in minimum cost spanning tree // the first cost is returned. { for i=1 to n do parent[I]=1; I=0;mincost=0.0; While((I
k=find(v); if(j not equal k) than { i=i+1 t[i,1]=u; t[i,2]=v; mincost=mincost+cost[u,v]; union(j,k); } } if(i notequal n1) then write(“No spanning tree”) else return minimum cost; } Analysis • The time complexity of minimum cost spanning tree algorithm in worst case is O(ElogE), where E is the edge set of G.
Example: Step by Step operation of Kurskal algorithm.
Step 1. In the graph, the Edge(g, h) is shortest. Either vertex g or vertex h could be representative. Lets choose vertex g arbitrarily.
Step 2. The edge (c, i) creates the second tree. Choose vertex c as representative for second tree.
94
Step 3. Edge (g, g) is the next shortest edge. Add this edge and choose vertex g as representative.
Step 4. Edge (a, b) creates a third tree.
Step 5. Add edge (c, f) and merge two trees. Vertex c is chosen as the representative.
Step 6. Edge (g, i) is the next next cheapest, but if we add this edge a cycle would be created. Vertex c is the representative of both.
Step 7. Instead, add edge (c, d).
95
Step 8. If we add edge (h, i), edge(h, i) would make a cycle.
Step 9. Instead of adding edge (h, i) add edge (a, h).
Step 10. Again, if we add edge (b, c), it would create a cycle. Add edge (d, e) instead to complete the spanning tree. In this spanning tree all trees joined and vertex c is a sole representative.
96
Check your progress 1. Explain the container loading problem with example 2. Discuss 0/1 knapsack problem with example 3. Write Prims algorithm for minimum spanning tree generation?Explain it with the example. 4. Write Kruskals algorithm for minimum spanning tree generation.? Explain it with the example. SUMMARY: Container loading problem is to fill more number of containers into the ship subjected to some specified constraints. Knapsack problem is to selecting the objects from n objects which yields more profit and in which the sum of selected objects weight should not cross the knapsack weight. Prims and Kruskal’s are the two algorithms to generate minimum spanning trees. So that one can easily traverse the whole graph with less cost. 1.13 KEYWORDS 1 Minimum spanning tree – its is tree consists of all the vertices of the graph with very few number of edges and also the total weight of the spanning tree is minimum. 1.14
ANSWERS TO CHECK YOUR PROGRESS 1. 1.2 2. 1.3 3. 1.4(1) 4. 1.4(2) 1.7 UNITEND EXERCISES AND ANSWERS 13. List out the difference between Prim’s algorithm and Kruskal’s algorithm. 14. Compare container loading problem with the knapsack problem Answers: SEE 1. 1.4 (1 & 2)
97
2. 1.2 & 1.3 1.12
SUGGESTED READINGS
1. Inroduction to The design and analysis of algorithms by Anany Levitin 2. Analysis and design of algorithms with C/C++  3rd edition by Prof. Nandagopalan 3. . Analysis and design of algorithms by Padma reddy 4. Even, Shimon., "Graph Algorithms",Computer Science Press.
98
MODULE4,UNIT1 INTRODUCTION TO GRAPHS Structure 1.0 Objectives 1.1 Graphs as data structures 1.2 Graph representation • Adjacency matrix • Adjacency list 1.3 Depth First Search (DFS) traversal 1.4 Summary 1.5 Keywords 1.6 Answers to check your progress 1.7 Unit end exercises and answers 1.8 Suggested readings
8.0
OBJECTIVES
At the end of this unit you will be able to • • • 8.1
Represent the graph in a computer by using adjacency matrix or adjacency list type. Identify which method of representation graph is better and when. Traverse the graph using DFS traversal and its time complexity.
GRAPHS AS DATA STRUCTURE 1.1.1 Introduction to graph : Graphs are widelyused structure in computer science and different computer applications. We don't say data structurehere and see the difference. Graphs mean to store and analyze metadata, the connections, which present in data. For instance, consider cities in your country. Road network, which connects them, can be represented as a graph and then analyzed. We can examine, if one city can be reached from another one or find the shortest route between two cities. First of all, we introduce some definitions on graphs. Next, we are going to show, how graphs are represented inside of a computer. Then you can turn to basic graph algorithms. There are two important sets of objects, which specify graph and its structure. First set is V, which is called vertexset. In the example with road network cities are vertices. 99
Each vertex can be drawn as a circle with vertex's number inside.
vertices Next important set is E, which is called edgeset. E is a subset of V x V. Simply speaking, each edge connects two vertices, including a case, when a vertex is connected to itself (such an edge is called a loop). All graphs are divided into two big groups: directed and undirected graphs. The difference is that edges in directed graphs, called arcs, have a direction. These kinds of graphs have much in common with each other, but significant differences are also present. We will accentuate which kind of graphs is considered in the particular algorithm description. Edge can be drawn as a line. If a graph is directed, each line has an arrow.
undirected graph
directed graph
Now, we present some basic graph definitions. •
Sequence of vertices, such that there is an edge from each vertex to the next in sequence, is called path. First vertex in the path is called the start vertex; the last vertex in the path is called the end vertex. If start and end vertices are the same, path is called cycle. Path is called simple, if it includes every vertex only once. Cycle is called simple, if it includes every vertex, except start (end) one, only once. Let's see examples of path and cycle.
100
path (simple)
cycle (simple)
The last definition we give here is a weighted graph. Graph is called weighted, if every edge is associated with a real number, called edge weight. For instance, in the road network example, weight of each road may be its length or minimal time needed to drive along.
weighted graph
1.2 Graphs representation There are several possible ways to represent a graph inside the computer. We will discuss two of them: adjacency matrix and adjacency list. a ) Adjacency matrix Each cell aij of an adjacency matrix contains 0, if there is an edge between ith and jth
101
vertices, and 1 otherwise. Before discussing the advantages and disadvantages of this kind of representation, let us see an example.
The graph presented by example is undirected. It means that its adjacency matrix is symmetric. Indeed, in undirected graph, if there is an edge (2, 5) then there is also an edge (5, 2). This is also the reason, why there are two cells for every edge in the sample. Loops, if they are allowed in a graph, correspond to the diagonal elements of an adjacency matrix. Advantages. Adjacency matrix is very convenient to work with. Add (remove) an edge can be done in O(1) time, the same time is required to check, if there is an edge between two vertices. Also it is very simple to program and in all our graph tutorials we are going to work with this kind of representation. Disadvantages. •
•
•
Adjacency matrix consumes huge amount of memory for storing big graphs. All graphs can be divided into two categories, sparse and dense graphs. Sparse ones contain not much edges (number of edges is much less, that square of number of vertices, E << V2). On the other hand, dense graphs contain number of edges comparable with square of number of vertices. Adjacency matrix is optimal for dense graphs, but for sparse ones it is superfluous. Next drawback of the adjacency matrix is that in many algorithms you need to know the edges, adjacent to the current vertex. To draw out such an information from the adjacency matrix you have to scan over the corresponding row, which results in O(V) complexity. For the algorithms like DFS or based on it, use of the adjacency matrix results in overall complexity of O(V2), while it can be reduced to O(V + E), when using adjacency list. The last disadvantage, we want to draw you attention to, is that adjacency matrix requires huge efforts for adding/removing a vertex. In case, a graph is
102
used for analysis only, it is not necessary, but if you want to construct fully dynamic structure, using of adjacency matrix make it quite slow for big graphs. To sum up, adjacency matrix is a good solution for dense graphs, which implies having constant number of vertices. b.) Adjacency list This kind of the graph representation is one of the alternatives to adjacency matrix. It requires less amount of memory and, in particular situations even can outperform adjacency matrix. For every vertex adjacency list stores a list of vertices, which are adjacent to current one. Let us see an example.
14 245 35 425
Graph
Adjacency list
Advantages. Adjacent list allows us to store graph in more compact form, than adjacency matrix, but the difference decreasing as a graph becomes denser. Next advantage is that adjacent list allows to get the list of adjacent vertices inO(1) time, which is a big advantage for some algorithms. Disadvantages. •
•
Adding/removing an edge to/from adjacent list is not so easy as for adjacency matrix. It requires, on the average,O(E / V) time, which may result in cubical complexity for dense graphs to add all edges. Check, if there is an edge between two vertices can be done in O(E / V) when list of adjacent vertices is unordered or O(log2(E / V)) when it is sorted. This operation stays quite cheap.
103
•
Adjacent list doesn't allow us to make an efficient implementation, if dynamically change of vertices number is required. Adding new vertex can be done in O(V), but removal results in O(E) complexity.
Conclusion : Adjacency list is a good solution for sparse graphs and lets us changing number of vertices more efficiently, than if using an adjacent matrix. But still there are better solutions to store fully dynamic graphs.
1.3 Algorithms associated with graphs and its time complexities 1.3.1 Depthfirst search (DFS) for undirected graphs Depthfirst search, or DFS, is a way to traverse the graph. Initially it allows visiting vertices of the graph only, but there are hundreds of algorithms for graphs, which are based on DFS. Therefore, understanding the principles of depthfirst search is quite important to move ahead into the graph theory. The principle of the algorithm is quite simple: to go forward (in depth) while there is such possibility, otherwise to backtrack.
Algorithm Algorithm : DFS Traversal(G) //Implement a depth first search traversal of a graph //Input : Graph G=//Output: Graph G with its vertices marked with consecutive integers in order they //have been first encountered by the DFS traversal Mark each vertex in V with 0 as a mark of being “unvisited” Count0 For each vertex v in V do If v is marked with 0 dfs(v) //end DFS Traversal Routine dfs (v) //visits recursively all unvisited vertices connected to vertex v and assigns them the //numbers in the order they are encountered via global variable count CountCount+1 Mark v with Count For each vertex w in V adajacent to v do 104
If w is marked woth 0 Dfs(w) //end dfs
In DFS, each vertex has three possible colors representing its state: white: vertex is unvisited; gray: vertex is in progress; black: DFS has finished processing the vertex. NB. For most algorithms boolean classification unvisited / visited is quite enough, but we show general case here. Initially all vertices are white (unvisited). DFS starts in arbitrary vertex and runs as follows: 1. Mark vertex u as gray (visited). 2. For each edge (u, v), where u is white, run depthfirst search for u recursively. 3. Mark vertex u as black and backtrack to the parent. Example : Traverse a graph shown below, using DFS. Start from a vertex with number 1.
Source graph.
105
Mark a vertex 1 as gray.
There is an edge (1, 4) and a vertex 4 is unvisited. Go there.
Mark the vertex 4 as gray.
106
There is an edge (4, 2) and vertex a 2 is unvisited. Go there.
Mark the vertex 2 as gray.
There is an edge (2, 5) and a vertex 5 is unvisited. Go there.
107
Mark the vertex 5 as gray.
There is an edge (5, 3) and a vertex 3 is unvisited. Go there.
Mark the vertex 3 as gray.
108
There are no ways to go from the vertex 3. Mark it as black and backtrack to the vertex 5.
There is an edge (5, 4), but the vertex 4 is gray.
There are no ways to go from the vertex 5. Mark it as black and backtrack to the vertex 2.
109
There are no more edges, adjacent to vertex 2. Mark it as black and backtrack to the vertex 4.
There is an edge (4, 5), but the vertex 5 is black.
There are no more edges, adjacent to the vertex 4. Mark it as black and backtrack to the vertex 1.
110
There are no more edges, adjacent to the vertex 1. Mark it as black. DFS is over.
As you can see from the example, DFS doesn't go through all edges. The vertices and edges, which depthfirst search has visited is a tree. This tree contains all vertices of the graph (if it is connected) and is called graph spanning tree. This tree exactly corresponds to the recursive calls of DFS. If a graph is disconnected, DFS won't visit all of its vertices. For details, see finding connected components algorithm. Complexity analysis Assume that graph is connected. Depthfirst search visits every vertex in the graph and checks every edge its edge. Therefore, DFS complexity is O(V + E). As it was mentioned before, if an adjacency matrix is used for a graph representation, then all edges, adjacent to a vertex can't be found efficiently, that results in O(V2) complexity.
Check your progress 1. Write an algorithm for DFS traversal analyze its complexity. 2. What are the different ways of representing a graph explain it with the example. 3. What are the advantages and disadvantages of adjacency matrix and adjacency list method of representing the graph. 1.4 SUMMARY: Graph can be represented in two ways i.e. adjacency matrix and adjacency list method.
111
Adjacency matrix is a good solution for dense graph and adjacency list is good for sparse graph. Depthfirst search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking. 1.15 KEYWORDS Graph : a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. Digraph is a graph with directions on its edges. 1.16
ANSWERS TO CHECK YOUR PROGRESS 1. 1.3.1 2. 1.2. 3. 1.2 1.7 UNITEND EXERCISES AND ANSWERS 15. Apply a DFS traversal for a graph having an adjacency matrix
Matrix 1 16. a.) Write the equivalent graph for the above matrix (i.e. matrix1). b.) Represent the matrix1 in adjacency list. 3. Write a note on path , weighted graph, cycle, loop. Answers: SEE
112
1. 1.3.1 2. 1.2 3. 1.1 1.13
SUGGESTED READINGS
1. Introduction to The design and analysis of algorithms by Anany Levitin 2. Analysis and design of algorithms with C/C++  3rd edition by Prof. Nandagopalan 3. . Analysis and design of algorithms by Padma reddy 4. Even, Shimon., "Graph Algorithms",Computer Science Press. 5. Data structures, Algorithms and applications in C++ 2nd edition, By Sartaj Sahni.
113
114