Version 2.2.11

Handouts:

Examples

Graph drawings on this page were made using the dot and neato programs of the open source Graphviz package from AT&T.

An undirected graph

An undirected graph

A directed graph

A directed graph

Man dressing activities, for topological sort

Man dressing activities, for topological sort

Graph search: the graph to be searched

Graph search: the graph to be searched

Graph search: the search tree

Graph search: the search tree

Weighted graph no. 1

Weighted graph no. 1

Part A, Pages 407–422

Highlights:

Definition. A (directed or undirected) graph consists of a set of vertices, V, and a set of (directed or undirected) edges, E; each edge connects two vertices.

Conventionally, the vertices are represented as dots or nodes, and the edges as lines connecting them.

Some examples of graph representations of information:

Some typical graph problems:

In an undirected graph, there may be no more than one edge connecting a given pair of vertices. In an undirected graph, there may be no more than two edges connecting a given pair, one edge in each direction.

Something graph-like with more edges is often called a multigraph, but the term is vague.

Each edge connects two different vertices.

Drake allows graphs to have self loops, i.e., an edge between a vertex and itself, but these are normally not considered graphs. They may be called pseudographs.

15.1 Basic Graph Terms

Graphs may contain data in their vertices or in their edges or both. For example, in a city-highway graph, cities (vertices) have names, and highway segments (edges) have distances. Or the vertices and edges may be "bare".

Complexity

The complexity (running time) of graph algorithms depends on the number of vertices v and the number of edges e.

Trees

Trees are a special case of graphs.

15.2 Representations

Focusing on the bare "dots and lines" kind of graph, there are two main types of representation: adjacency lists and adjacency matrices.

In both representations we assign each node an identifying number which is used as an array index to retrieve the node.

A graph can also be represented as a pair of sets. Since a graph is a set of vertices with a set of edges, it would be straightforward to use any of the implementations of the set data type to store the two sets. However, other forms of representation are typically more efficient.

Adjacency List Representation

We use a one-dimensional array representing the vertices, and each vertex has a adjacency list (Drake calls it neighbor list) — a list (or set, array, bit vector) of its neighbors (or their node numbers). For a directed graph, the neighbors are those on the receiving end of an edge from the vertex in question.

The space required is Θ(v + e), and the time to check whether an edge exists between two particular vertices is Θ(e).

Adjacency Matrix Representation

An adjacency matrix (Drake: neighbor matrix) is a two-dimensional array of boolean values. Call it "edges" (Though maybe it's better to call it "graph" or "adjacent"). Edges[i][j] is true if there is an edge between vertices i and j (in an undirected graph), or an edge from i to j (in a directed graph).

The space required is Θ(v2), and the time to check whether an edge exists between i and j, is Θ(1).

Comparison

Adjacency matrix: faster test for "does this edge exist?"

Adjacency matrices are okay for dense graphs with a static set of vertices.

Sparsely connected graphs (where e is much less than v2) are better with the adjacency list representation, since a matrix wastes storage for the edges that are not there (Θ(v2) compared to Θ(v + e)).

Dynamic (changing) graphs: inserting or deleting a vertex is awkward (at best!) with a matrix representation, although inserting or deleting an edge is not difficult.

For small graphs, either a contiguous or linked implementation should do well. For large graphs with the adjacency list representation, we should replace the adjacency lists by an efficient implementation of sets, such as balanced search trees or hash tables.

Variations

  1. A directed graph might use a variation of the adjacency list representation, using two adjacency lists for each vertex: one for the neighbors by an edge to the neighbor, and one for the neighbors by an edge from the neighbor.
  2. Data associated with vertices can be stored in a supplementary structure, such as an array or map.
  3. Data associated with edges. If there is a number associated with each edge, the number is called the weight or cost of the edge; the graph is called a weighted graph or a network. These numbers often represent the "cost" of traversing an edge, for example, the distance of a road segment, or the price of an airline ticket.

The Graph Class

Figures 15–11 and 15–12, pp. 414–416.

Exercises, pp. 412–413.

Graph traversal, commonly presented in data structures courses and texts, is boring. The object seems to be just to visit as many nodes as possible, either all of the nodes, or at least all those connected to some starting point. If it's just to visit every node, we can easily iterate through the array of vertices (in the adjacency list representation) or through the rows, representing vertices, in the adjacency matrix representation. There is no need for anything as sophisticated as depth- or breadth-first.

Graph search is a much more interesting problem. Here we are given a starting vertex and a desired destination vertex (goal), and try to find a path connecting the start to the goal. When we reach the goal, we stop, because there is no point in visiting all those other vertices.

These are brute force searching algorithms: they search the graph exhaustively, without using any "intelligence" to eliminate unlikely paths.

Generalized DFS/BFS Procedure

Here's the general procedure for DFS or BFS. It uses a generalized COLLECTION of pending nodes, with operations ADD and REMOVE. The COLLECTION might be a stack or a queue or something else. In case the graph is cyclic, we need to avoid going around in circles, so we maintain a set of nodes already visited.

Version 1

Returns true if there is a path from start to goal; otherwise returns false

def search (graph g, vertex start, vertex goal):

  pending = make empty COLLECTION of vertices 
  visited = make empty set of vertices
  node = start 

  ADD node to pending

  while pending is not empty:
      node = REMOVE a vertex from pending
      if node is equal to goal:
          return true 
      else if node is not in visited: 
          add node to visited 
          for each n in neighbors of node: 
              ADD n to pending 
          endfor
      endif
  endwhile

  (assert: pending is empty) 
  return false
Version 2

Returns a path from start to goal, if there is one; otherwise returns FAILURE.

def search (graph g, vertex start, vertex goal):

  pending = make empty COLLECTION of (vertex, path) pairs 
  visited = make empty set of vertices
  node = start 
  path = [start] 

  ADD (node, path) to pending

  while pending is not empty:
      (node, path) = REMOVE an element from pending 
      if node is equal to goal: 
          return path 
      else if node is not in visited: 
          add node to visited
          for each n in neighbors of node: 
              ADD (n, extend path to n) to pending 
          endfor 
      endif
  endwhile

  (assert: pending is empty)
  return FAILURE

Examples

Comparison

  1. If there are many paths to the goal, then DFS may find it sooner than BFS.
  2. BFS stays as close to the origin as possible; therefore it finds the shortest path, whereas DFS might find a longer one.
  3. In "branchy" graphs, BFS requires more memory than DFS.
  4. Both of these are exhaustive, "brute force" search techniques, and the search space they explore is large. They are not feasible for large-scale problems. To deal with large problems, we need a way to focus our search on "most promising" paths.

Java Implementations


Part B, Pages 422–437

Highlights:

15.4 Topological sorting

Graphs, Relations, and Partial Orders

Intuitively, present the graph data type as a partial order and a sequence A → B → … → N as a total order.

(Skim over the details of partial and total order)

Topological Sort Algorithm

The algorithm for topological sort will also tell us if the graph is cyclic, i.e., it does not represent a partial order.

Examples: apply topsort to

Efficiency:

Another application of topological sorting is in Kruskal's algorithm, which we will see later (?), to detect when adding an edge would create a cycle in the graph.

The GNU tsort program

15.5 Shortest paths

Dijkstra's algorithm

Preliminaries

Pseudocode of Dijkstra's Algorithm

Inputs: a weighted graph g, a designated source vertex s.

Outputs: a list of minimum cost paths from s to each other vertex in g.

Procedure:

1.  Create result as an array of length v
    of distances (costs) from s to
    each vertex.  Initialize each element as
    result[s] = 0; result[v] = infinite, if v != s.

    This array is our initial estimate of mincost.
    When finished, it will be an exact representation of mincost.

2.  Create visited, a set of vertices that are visited.
    This can be done as a bitmap set or an array
    of boolean with one element per vertex.
    Initialize each element as visited[v] = false.
    (Implicit initialization in Drake's code.)

3.  Repeat v times:

    a.  x = least cost unvisited vertex,
        i.e., the vertex x such that visited[x] = false and
        for every vertex y, result[x] <= result[y] or visited[y].
        There may be more than one such vertex; take any one of them.

    b.  visited[x] = true

    c.  for EACH vertex y,
            result[y] = min(result[y],
            result[x] + directcost(x, y))

Comments:

Example: apply Dijkstra's algorithm to the weighted graph example …

Floyd-Warshall algorithm

15.6 Minimum Spanning Trees

Definitions:

Prim's algorithm

  1. Start with any vertex. Connect it to the nearest distinct vertex.
  2. Find the unconnected vertex that is closest to a connected vertex, then connect those two vertices. Repeat until all vertices have been connected.

This is called a greedy algorithm, because its action at each step involves choosing what looks best from the restricted viewpoint of that step. Yet it leads to an overall optimal outcome!

Example: ...

Kruskal's algorithm

(Compare with Drake's exposition)

Inputs: a weighted graph G

Output: a minimum spanning tree for G.

Procedure:

  1. Initialize result as an unconnected graph (all the vertices of G, none of the edges).
  2. Collect the edges into a sorted list or a priority queue, ordered by edge cost, E.
  3. Repeat:
    1. d = remove lowest-cost edge from E.
    2. If d does not join edges that are already connected (equivalently, if it does not create a cycle), then add d to the result.
    until there are v − 1 edges in result
  4. Return result

Remarks:

14.2 Disjoint Set Clusters

To Do

(Future improvements)


  1. Revision log:
    • Version 2.2.1, 2012 Nov 7. Corrected maximum number of edges.
    • Version 2.2, 2010 Nov 8. Converted to markdown, ...
    • Version 2.1, 2009 Nov 12. Better formatting; added Java depth-first search.
    • Version 2, 2009 Nov 9. Partly converted to restructured text.
    • Version 1, ≤ 2006 Nov 27.
  2. Drake says v2 because he's allowing self-loops in graphs.