Something about programming

Depth-First Search (DFS) and Breadth-First Search (BFS) algorithms implementation

Previous tutorial: Graphs

In this tutorial we'll implement depth-first search and breadth-first search algorithms on graphs in Python.

Throughout tutorial we'll use the same graph from previous tutorial:

graph = { "a": ["b", "c"], "b": ["a", "d", "f"], "c": ["a", "d", "e"], "d": ["b", "c"], "e": ["c", "f", "i"], "f": ["b", "e", "g", "i"], "g": ["f", "h","i"], "h": ["g", "j"], "i": ["e", "f", "g", "j"], "j": ["i"]}

Simply saying, we'll write implementation of finding path on graph. There are many ways to do it and later we'll learn other algorithms. But now we'll start with depth-first search. For simplicity we'll work only with undirected graphs - so we can go in both directions between any two nodes.

Depth-First Search Algorithm

Depth-first search algorithm visits nodes and mark visited. The distinction of depth-first search from other path-finding algorithms is how it chooses next node to explore. Depth-first search goes as deep as possible in path. For our example with start node a and end node j depth-first search will start with a, then it will choose b, then d. At this point depth-first search we'll see that node b is already visited, so it will choose next node adjacent to d - c. a and d are visited, so last adjacent to c will be chosen. And so on: e -> f -> g -> h -> j.

So, depth-first search will return ["a", "b", "d", "c", "e", "f", "g", "h", "j"] path. As you can see depth-first search algorithm doesn't return the shortest path, but it's very simple, so we'll start from it.

Depth-First Search Algorithm implementation in Python

Let's see the code of depth-first search algorithm first:

def dfs(graph, start, end, path=[], visited=[]): if start in visited: return path path += [start] visited += [start] if start == end: return path for edge in graph[start]: if edge not in visited: return dfs(graph,edge,end,path,visited)

As you can see it's recursive. Function dfs is called on each explored node of graph. We pass graph itself, start node (on each call this parameter is changed), end, path and visited. At the end dfs will return the path.

First, we check if start node of this call is in visited nodes. We don't move to visited nodes, so we just return current path and next adjacent node is explored. If the node is not visited yet, we add it to visited and add it to the path.

Then we check if the path is found. In this case we return the path. And in the end we check all adjacent nodes of current node (start) and exploring all of them.

Breadth-First Search Algorithm

Breadth-first search algorithm has completely different exploring strategy than dfs. It checks all nodes on current level first. Let's check implementation:

def bfs(graph, start, end): visited = [start] queue = [start] hierarchy = {start: None} while len(queue) > 0: current = queue.pop(0) if current == end: path = [end] child = end while start not in path: path = [hierarchy[child]] + path child = hierarchy[child] return path for node in graph[current]: if node not in visited: visited += [node] queue += [node] hierarchy[node] = current

It's not recursive. Function bfs accepts three arguments: graph itself, start node and end node. In the beginning we create structures to control visited nodes, queue of nodes that are needed to be explored and hierarchy of nodes to backtrace the path from end to start.

We add in the queue start node and run the loop that checks the length of the queue - it works while there are elements in the queue. Then we take first element from the queue and save it in the current. Pay attention that we add elements to the end and take them from the beginning - that allows us to get all element from one level before start next one.

There is another loop inside that deals with all adjacent nodes of current one. If it's not in visited list we add it to visited, add it to queue and set the parent of this node.

The most important part - when current node is the end node - path exists. In this case we start to build the path between start and end. We don't know the path yet so we use backtracing - we go back from end to start using hierarchy dictionary as it contains information about parent of each node. And then we return path.

On our graph it will behave like this: It adds to path a and it doesn't have parent. Then it will add to queue b and c. Next iteration of while starts. Now current node is b and we remove it from the queue. We add to queue it's adjacent (not visited yet) nodes: b and f. Next iteration - we take c from queue...

Conclusion

Here is output for both algorithms on our graph from node a to j:

['a', 'b', 'd', 'c', 'e', 'f', 'g', 'h', 'j'] ['a', 'b', 'f', 'i', 'j']

First is DFS, second is BFS. Breadth-first search always finds the shortest path. But don't think depth-first search doesn't has it's applications - we'll see it soon.

Exercises

1. Depth-first search implementation has serious flaw. Try to create the graph that doesn't have path between start and end nodes. Also check how dfs behaves when it meets dead-ends.

2. Same for breadth-first search algorithm. Check what happens if there is no path from start to end.

Comments:

No comments yet