GRAPH THEORY

For articles on related subjects see ALGORITHMS, ANALYSIS OF; COMPUTATIONAL COMPLEXITY; DATA STRUCTURES; DISCRETE MATHEMATICS; NP-COMPLETE PROBLEMS; and TREE.

A graph is a set of points (commonly called vertices or nodes) in space that are interconnected by a set of lines (called edges). For a graph G, the edge set is denoted by E and the vertex set by V, so that G = (V,E). Common nomenclature denotes the number of Vertices |V| by n and the number of edges |E| by m. Fig. 1 shows a graph G with V = {v1, v2, v3, v4, v5}, E = {e1, e2, e3, e4, e5, e6, e7}, n = 5, and m = 7. If, within E, each edge is specified by its pair of endpoints (e.g. for the example of Fig. 1, e1 is replaced by (v1, v2) etc), the figure can be dispensed with.

If, as in most applications, the values of both n and m are finite, G is said to be a finite graph. The degree of a vertex v (denoted by d(v)) is the number of edges that have v as an endpoint. An elementary theorem (with an easy inductive proof) is that within a finite graph there are always an even number of vertices with odd degree. For example, in the graph of Fig. 1 there are two vertices (v1 and v2) of odd degree (both have degree 3). A self-loop is an edge (vi, vj) where vi = vj. Two edges (vi, vj) and (vr, vs) are parallel edges if vi = vr and vj = vs. A simple graph is a graph without self-loops and without parallel edges. A multigraph contains parallel edges but no self-loops. A path between vertices v1 and vs is a sequence of vertices (v1, v2, v3..., vs) such that . If vi = vs, the path is a circuit (or cycle). If no vertex appears more than once on a path, then the path is a simple path (similarly, a simple circuit passes through any vertex at most once). A component of a graph is defined by stating that a path exists between any pair of vertices if and only if the two vertices belong to the same component of the graph. A graph consisting of a single component is said to be a connected graph. A tree is a connected graph containing no circuits. For any tree T with n vertices and m edges, m=n-1 and there exists precisely one path between any pair of vertices of T.

In many applications it is natural to associate a direction with each edge of the graph. The graph is then said to be a directed graph or digraph. In specifying any edge of a digraph by its end-points, then (by convention) the edge is understood to be directed from the first vertex towards the second. The indegree, d-(v), of any vertex v is the number of edges directed towards v. Similarly, the outdegree, d+(v), of v is the number of edges directed from v. A digraph G is strongly connected if, for any pair of vertices u and v of G, there exists a path from u to v and a path from v to u.

Any subgraph of a graph G can be obtained by removing vertices and edges from G. It is understood that the removal of an edge leaves its endpoints in place, whereas the removal of a vertex necessitates the removal of any edges with that vertex as an endpoint. An articulation point of a connected graph is any vertex whose removal produces a subgraph with two or more components. Any graph with no articulation point is said to be 2-connected. In a 2-connected graph there are at least two vertex-disjoint paths between any pair of vertices. Two paths are vertex disjoint if (apart from the ends of the path) they do not share a vertex. A 2-connected component (sometimes called a block) of a graph G is a maximal subgraph G'of G (maximal in the sense that no additional edges or vertices of G can be added to G') such that there are at least two vertex-disjoint paths between every pair of vertices of G'.

Many applications require a number to be associated with each edge of a graph. Such a graph with associated edge weights is said to be a weighted graph. For any edge (u,v), w(u,v) denotes the edge weight, which is also sometimes called the length of (u,v).

In a complete graph, there is an edge between every pair of vertices. The complete graph with n vertices is denoted by Kn. Fig. 2 shows K3 and K4. In a regular undirected graph, every vertex has the same degree; if this is k, the graph is k-regular. Note that Kn is (n-1)-regular.

If, for a graph G, it is possible to partition the vertex set v into two disjoint subsets, V1 and V2 , such that every edge of G connects a vertex in V1 to a vertex in V2, then G is a bipartite graph. If there is an edge between every vertex of V1 and every edge of V2, then G is said to be a complete bipartite graph, which is denoted by Ki,j where |V1| = i and |V2| = j. Fig. 3 shows two representations of K3,3. In this figure, dots distinguish vertices from edge crossings. The graphs of this figure are said to be isomorphic. Two graphs are isomorphic if there is a one-to-one correspondence between the vertices of one and

the vertices of the other such that the number of edges between any two vertices of one is equal to the number of edges between the corresponding vertices in the other.

Computations involving graphs require that the graph be represented somehow within computer storage. The choice of data structure may have important implications for the complexity of the algorithm. A natural form of representation is provided by a so-called adjacency matrix. An adjacency matrix A is an n X n matrix where A(i,j)=1 if and is 0 otherwise. Such a representation requires O(n2) storage space and consequently requires O(n2) time to initialize. Adjacency matrices are very useful for algorithmic questions concerning paths in graphs. For example, it is not difficult to show that there is a path of length r (i.e. having r edges) from vs to vt if and only if Ar(s,t)=1, where Ar is the r-th matrix product. Another common data structure for graphs is the so-called adjacency list representation. In this representation, for every is a pointer to a list of vertices adjacent to v. Fig. 4 shows the adjacency list representation of the graph of Fig. 1. The adjacency list representation of a graph requires O(n + m) space and thus O(n + m) time to initialize. From the point of view of complexity considerations, this is usually an improvement compared with adjacency matrices.

Graph Algorithms

Many graph algorithms are structured by systematically searching (or traversing) the graph subjected to the algorithm. Consider the following technique for traversing a (connected) graph. Initially mark all vertices as being "unvisited." Now start the search at some arbitrarily chosen vertex and, when visiting any vertex v, proceed as follows. If v has not been previously visited, mark v as "visited." Next, visit an "unvisited" vertex in the adjacency list for v. If no such vertex exists, return to the vertex visited just before v was visited for the first time. The visit terminates when all vertices adjacent to the initial vertex have been visited and the search has returned to the initial vertex.

Such a search is called a depth first search (DFS). A DFS of a graph G has certain useful properties: (a) the set of edges by which vertices are visited for the first time form a tree (called the DFS tree and rooted at the initially visited vertex), (b) edges of G not belonging to the DFS tree connect two nodes, one of which is a descendent of the other in the DFS tree, (c) DFS can be achieved in O(n + m) time. Another frequently employed traversal method is breadth first search (BFS). In a BFS, some arbitrary vertex is visited first, and this vertex is placed on an initially empty queue (first-in, first-out data structure), Q. At any point in the traversal, visit next some vertex v' that is "unvisited" and is in the adjacency list of the vertex v at the head of the queue (if v has not been visited before, add v to the queue). If no such vertex as v' exists, remove v from the queue and repeat the process. The traversal stops when the queue becomes empty. A BFS of G has the following properties: (a) the edges by which vertices are first visited form a BFS tree rooted at the initially visited vertex, (b) those edges of G not in the BFS tree connect vertices, neither of which is an ancestor of the other in the BFS tree, (c) BFS can be achieved in O(n + m) time.

Particular types of traversals of a graph such as those just described have individual properties that make them appropriate for particular algorithmic application. For example, the properties of DFS leads to a classic O(n + m) algorithm to find the blocks of a graph. Similarly, application of DFS to digraphs provides an O(n + m) algorithm to find the strongly connected components of a digraph. In a BFS of an undirected graph, the depth of a vertex v in the BFS tree rooted at v' is precisely the shortest distance from v to v' in the original graph. This leads to an O(n + m) algorithm for finding distances between a single vertex and all the other vertices in a graph (which is the so-called single source shortest paths problem). Such a search, however, is not useful when shortest paths in weighted graphs are required. In such graphs, distances are measured as the sum of the edge weights (rather than in numbers of edges) on a path, and there are several classic algorithms for the single source shortest paths problem (such as Dijkstra's algorithm, which operates in O(n2) time with simple data structures that can be improved to O(m+nlogn), using sophisticated structures). There now follows a catalog of commonly occuring graph problems with some indication of their algorithmic complexities. Algorithms for these problems can be found in the references or in the primary sources cited therein. Many problems concerning graphs are intractable, being NP-complete (q.v.). (Garey and Johnson, 1979).

An Eulerian circuit of a graph is a circuit that contains every edge of the graph precisely once. Of course, not every graph contains an Eulerian circuit. In fact, a necessary and sufficient condition for a connected, undirected graph to contain an Eulerian circuit is that every vertex of the graph is of even degree. If the graph is a directed connected graph, it contains an Eulerian circuit (such a circuit traces each edge in the sense that it is directed) if and only if for each vertex v, d+(v)=d-(v). Eulerian circuits can be found in O(m) time. Eulerian circuits, for the special class of graphs that contain them, are solutions of the Chinese Postman problem. Given a weighted graph or digraph, the Chinese Postman problem is to find a (not necessarily simple) circuit of shortest length (the length is given by , where w(e) is the weight of e and r(e) is the number of occurrences of e in the circuit) that traverses each edge of the graph at least once. Every connected undirected graph contains a solution to the Chinese Postman problem, whereas a connected digraph has a solution if and only if the digraph is strongly connected. There are low-order polynomial-time algorithms for the Chinese Postman problem for both undirected and directed connected graphs.

A Hamiltonian circuit of a connected graph is a circuit that passes through each vertex precisely once. Such a circuit can be defined for both directed and undirected graphs (of course, in the case of digraphs, edges are traversed in the same sense as they are directed). Not every graph contains a Hamiltonian circuit and (unlike the case for Eulerian circuits) there seems to be no polynomial time test for whether such a circuit exists for a given graph. In fact, the problem of determining whether a graph contains a Hamiltonian circuit is a classic NP-complete problem. If the graph is weighted, the problem of finding a shortest Hamiltonian circuit is one variation of the well-known Traveling Salesman problem. Another variant is to find the shortest circuit that passes through each vertex at least once. In this second form, a solution to the problem always exists, whereas for the former specification the graph of course has to contain a Hamiltonian circuit. In both these forms, the Traveling Salesman problem is NP-complete.

A (transport) network is a connected digraph in which one vertex x (called the source) has d+(x)>0 and one vertex y (the sink) has d-(y)>0. A flow of the network associates a weight f(e) with each edge e such that, for all vertices v other than x or y, . Clearly, a network is a model for the flow of material leaving a single departure point (the source) and arriving at a single point (the sink). The equation ensures a conservation of flow at all other vertices. It is usual to associate (apart from f(e)) another parameter (called the capacity of e and denoted by c(e)) with each edge, which is the maximum value that f(e) can attain. Thus, for all e. For a network N, F(N) denotes the value of the flow that is defined to be the net flow leaving the source: . A standard problem in network theory is to find a set of f(e) such that F(N) is a maximum. This is called the maximum flow problem. The problem of finding a maximum flow has a low-order, polynomial-time solution. The value of such a flow is given by the classic theorem of Ford and Fulkerson (called the max-flow, min-cut theorem), which states that F(N) has a maximum value equal to the minimum capacity of all cuts of the network. A cut of the network is a minimal set of edges whose removal from the network separates the network into two components, one containing x and the other containing y. The capacity of the cut is the sum of the capacities of those edges in the cut that are directed from the component containing x and directed towards the component containing y. Many other problems can be posed for networks. For example, if yet another parameter (called the cost of e and denoted by a(e)) is associated with each edge, the minimum cost flow problem is to find (for a given value F(N)) a set of f(e) such that the sum is minimized. Again, there are low-order, polynomial-time algorithms to solve this problem.

Planar graphs are an important subclass of graphs. A graph is planar if it can be arranged on a planar surface (the arrangement is called an embedding) so that, at most, one vertex occupies or, at most, one edge passes through any point of the plane. There exist algorithms that can test whether a graph is planar (such algorithms normally generate an embedding) in O(n) time.

Scheduling and time-tabling problems, among others, generate problems equivalent to coloring graphs. A vertex coloring of a graph is the assignment of a color to each vertex of the graph in such a way that no two adjacent vertices have the same color (vertices are adjacent if there is an edge connecting them). Normally, the interest is to find a vertex coloring that employs a minimum number of colors, this number is called the vertex-chromatic index. Similarly, an edge coloring is an assignment of colors to the edges in such a way that no two edges sharing an endpoint have the same color. The minimum number of colors required to do this is the edge-chromatic index. Problems involving the coloring of graphs are notoriously intractable. For example, the problems of determining whether the vertex- or the edge-chromatic index are less than some fixed constant are both NP-complete.

A matching of a graph is a subset of its edges such that no two elements of the subset have a common endpoint. The question of determining matchings arises in many guises. Common problems concern finding maximum cardinality matchings (such a matching has a maximum number of elements) and maximum weight matchings (which occur for weighted graphs; the sum of the edge weights of such a matching is maximized). Efficient (polynomial-time) algorithms are known for both of these problems.

A spanning tree of a connected graph G is a connected circuitless subgraph of G that contains every vertex of G. The connector problem, a classic problem of graph theory, is to find a spanning tree of a weighted connected graph such that the sum of the edge weights of the tree is a minimum. Such a solution is also called a minimum weight spanning tree. Spanning trees can be found in O(m + n) time (e.g. a depth-first-search tree). Prim's or Kruskal's algorithms provide classic solutions to the connector problem at low-order polynomial-time cost.

The vertex-connectivity Kv(G) of a connected graph G is the minimum number of vertices that have to be removed from G in order to produce a subgraph of two or more components. If a graph has Kv(G)=k, there are k vertex disjoint paths between every pair of vertices. The edge-connectivity Ke(G) is similarly defined. Also, if Ke(G)=k, there are k edge disjoint paths between every pair of vertices. There exist polynomial-time solutions to the problems of finding Kv(G) and Ke(G) for an arbitrary graph G.

Given two graphs G1 and G2, many applications of graphs require one of the following two problems to be solved. (a) Is G1 isomorphic to G2? (This is called the graph isomorphism problem.) (b) Is G1 isomorphic to a subgraph of G2? (This is the subgraph isomorphism problem.) Both of these problems are notoriously costly to solve, and no polynimial-time solution is known for either if G1 and G2 are arbitrary graphs.

Given the intractability of many problems in graph theory, it is natural that this area has given rise to the development of many approximation algorithms. An approximation algorithm is an algorithm that runs in polynomial-time but that provides an approximation (within known bounds) to the problem in hand. A classic example is provided by Christofides' algorithm, which finds a solution to the Traveling Salesman problem that guarantees that the solution found is no more than a factor of 3/2 longer than an optimum solution. While this approximation may not seem to be very tight, no approximation algorithm is presently known that provides a better guarantee. Another example provides an approximation for the edge-chromatic index of a graph and is the polynomial-time algorithm implicit in the proof of Vizing's theorem. This algorithm gives an edge-coloring, using no more than one more color than is necessary, a very good approximation. Unless there are polynomial-time solutions for the NP-complete problems (an unlikely possibility), it has been proved that there can be no polynomial-time solution that gives a vertex coloring of an arbitrary graph that guarantees to use less than twice the minimum number of colors required.

Bibliography ALAN M. GIBBONS