Exploring Beam Search – a heuristic search algorithm


In 1977, Turing Prize winner Dr. Raj Reddy of Carnegie Mellon University coined the term “beam search”. Ray search is a best-first search optimization that reduces memory footprint. The best-first search is a graph search that ranks all partial solutions (states) based on a heuristic. However, only a predetermined number of the best partial solutions are retained as candidates in the beam search. As a result, it is a greedy algorithm.


beam search builds its search tree using breadth-first search. It generates all descendants of the states at the current level at each level of the tree and sorts them in ascending order of heuristic cost. However, it only stores a limited number of the best states at each level (called the beamwidth). After that, it will only be extended to those states. As more states are clipped, the beam becomes wider. No states are truncated when the ray width is infinite, and ray search is identical to breadth-first search.

Besides the jet width limitations the amount of memory required for the search. Finally, ray search sacrifices completeness (the guarantee that an algorithm ends up with a solution if one exists) since it can truncate a goal state. Consequently, ray search is inefficient (i.e. there is no guarantee that it will find the best solution).

Beam Search has different parts:

  • A problem to find out
  • A set of rules to follow when pruning,
  • And a memory that can only hold so much.

The problem is the problem that needs to be solved. It’s usually displayed as a chart and has a series of nodes, each of which is a goal. The heuristic rules consist of rules specific to the problem domain and eliminate unhelpful nodes in memory.

The “beam” is stored in memory. When the memory is full and a node needs to be added to the bar, the most expensive node is removed so that the memory limit is not reached.


Ray search is commonly used to maintain steerability in large systems where memory is insufficient to store the entire search tree.

  • For example, it is used in many machine translation systems. (The current state of the art mainly uses neural machine translation methods.)
  • Each part is processed to find the best translation and many different ways to translate the words appear.

In addition, the best translations are retained based on their sentence structures, while the rest are discarded. The translator then evaluates the translations based on a predefined criterion and selects the translation that best achieves the objectives. The Harpy Speech Recognition System, CMU 1976, was the first to use beam search.


The combination of ray search with depth search led to Beam Stack Search and deep beam search. Combining ray search with bounded discrepancy search resulted in ray search using bounded discrepancy backtracking (BULB). The resulting search algorithms are like ray search, which finds reasonable but likely suboptimal solutions, then goes back and keeps looking for better solutions until it finds the best one.

Furthermore, local ray search often ends at local maxima, so a standard solution is to randomly choose the following states, with the probability depending on how it heuristically evaluated them. The name for this type of search is “stochastic ray search”. In addition, flexible beam search and recovery beam search are two other types.


Comments are closed.