Given an image as a bitmap, essentially a grid of pixel values, how do you find out if there is a circle in it? Or a square? A line? In recent times deep learning methods, especially convolutional neural networks have been really successful for tasks like recognizing objects and shapes in images. That is a good choice for detecting if an image contains, say, a dog, or a car. But for simpler geometrical shapes, image processing techniques are often sufficient. In this post I want to explain one of those techniques, called the Hough Transform, that I think is particularly elegant.
Other than a point, a line is the simplest of two-dimensional shapes. Naturally, finding whether an image contains points is not a very interesting problem. So here we will deal with the problem of finding lines in an image. It turns out that finding other, more complicated shapes is more computationally intensive and also less effective, so the case of lines is both more suitable for illustration and more widely used in practice.
[Say somewhere that what we will be finding is the so-called “analytic representation” of the shape, something that mathematically describes it. For a line in 2d space, that is a linear equation in two variables. For a circle, it is a combination of two values for the center and another for radius]
Here we will deal with binary images, or images whose pixels are either black or white, without any in-between values or different color channels. Obviously if you want to perform line detection on grayscale or color images, appropriate processing steps must be applied to convert it to a binary image. Also, we will consider black pixels to be ‘foreground’ and white pixels to be ‘background’ so that the correct lines are detected. Also, we should note that here we will extract what is called an analytic representation of shapes, a set of parameters that mathematically describe a shape. That is in contrast to approaches which try to find the set of pixels in a given image which are part of the shape.
To get started, consider an image where there are just two points. There is only one possible line that passes through two points, so our technique will have to find that line. Here is an illustration of the points and the line we need.
In 2-D space, a point is a pair of numbers, the x-coordinate and the y-coordinate. A line is analytically described by the (aptly named) linear equation in x and y. For simplicity, let’s assume that the lines are represented by an equation of the form . Take , for example. An infinite number of points fall on the line, and whether a point falls on a line can be checked by finding out if the co-ordinates of the point satisfies the equation of the line. Here, the point falls on the line because . But because there are an infinite number of possible lines, we can’t go through all the lines and check whether any one contains both of our points. This is where the Hough Transform comes to our rescue. Because the equation of a line is of the form , there are two numbers that represent a line. Two values. Does that ring a bell? Why, that’s just what a point is. The and the of a line form a point in a 2-D space different than our original x-y space, call it the parameter space. Just as a line in our x-y space is a point in the parameter space, a point in the x-y space is a line in the parameter space. If we have a point , then substituting it into an equation of a line gives us , which can be rearranged as , giving us a line in the parameter space with a slope of and a y-intercept of . It is because of this space-changing that the Hough Transform is a transform.
The two points that we have in our image therefore make two lines in the parameter space. If the two lines are not parallel, we can solve the two equations to find the point where they intersect. And the where they intersect, well, that’s a point in the parameter space, thus, a line in the x-y space. And there we have our line! If you are thinking, “Well, I guess that makes sense but why exactly?”, then hold on. Let’s have a look at the equations once again. Call our points and . The equations of the lines they form in the parameter space are and respectively. The point in line space where they intersect is such a pair that satisfies both of the equations. But saying that a point satisfies both equations is exactly the same (algebraically) as saying that two points and satisfy the equation . Both of these statements mean that the equations and hold.
All fine and dandy, except we’ve got a few problems. First, our line equation of the form cannot represent vertical lines, because the in the equation is the slope of the line and vertical lines have no finite slope. Second, whe we have a lot of points (in x-y space), that means there will be a lot of lines in the parameter space. A lot of lines means a lot of computation, because we have to check each line-pair to find if they intersect. So to bring the problem of finding lines from the mathematical realm to the computational realm, we make a few modifications to our technique. To get rid of the vertical line problem, we use a different method of representing lines - the polar equation. A polar equation is an equation of the form
Here, is the perpendicular distance (the shortest distance) of the line from the origin and is the angle that the line connecting our line and the origin makes with the x-axis. Here is an illustration:
Notice that now, a point in x-y space will no longer be a line in parameter space. It is now a sum of two sinusoids. The sum can be expressed as a single sinusoid (with different ‘origin’ and ‘scale’):
where and . So instead of having to find intersections of lines in parameter space, we now have to find intersections of sinusoidal curves, which is not nearly as straightforward as solving linear equations. Thankfully, we will now discuss a method that solves this problem without having to solve equations.
For the problem of having to find lots of intersections between lines or curves, we do something called quantization - we consider our parameter space to be discrete, allowin only a fixed number of possible values for each parameter. This allows us create an accumulator, a 2D array of points in the parameter space, each cell of which keeps a count of how many parameter curves pass through it. For every point we have in our x-y space, we create a parameter curve in the parameter space and increment the value of each cell in the accumulator through which the parameter curve passes. So every point in the x-y space corresponds to a ‘1’ added to many cells in the accumulator. In the end the values in the accumulator tell us how many points in the x-y space pass through the line that is represented by that cell’s point in parameter space. If a certain cell’s value crosses a fixed threshold, we take its co-ordinates (parameters) and use them to determine a line in x-y space. That determined line is the line we have just detected. So, there we have a line detector!
An interactive demonstration
Here is a demonstration of the Hough Transform in operation. On the left there is a grid representing x-y space that you can click on to add or remove points. On the right is a representation of the parameter space for polar representation of lines. If you add a point in the left grid, the corresponding parameter-space curve will show up on the right. The points formed in parameter space by the curves will accumulate on top of one another and grow darker, as more points on the same line in x-y space are added. Once there are eight points overlapping in the right grid, that is considered enough to detect a line in the x-y space and the line appears on the left grid which passes through the points you have drawn. The left grid goes from -20 to +20 in both axes, and the origin is in the center. It runs from 0 to 2 times PI horizontal and 0 to 30 vertical, and the origin is at the left bottom.
Finding other shapes
So that’s how we find lines in an image but how about other shapes? Let’s consider the circle. There are three parameters that define a circle - the radius, and the x and y co-ordinates of the center. Thus, the parameter space is three-dimensional here instead of two-dimensional. Our accumulator array is likewise going to be a three-dimensional array. What would a point in x-y space be in the circle parameter space? Let’s see. The equation of a circle is of the form:
where are the co-ordinates of the center and is the radius. When we fix the x and y co-ordinates of a point we have in our x-y space, say , and treat the parameters , and as variables, what we have is an equation for a 3-D shape called a double cone, two regular cones joined at their apexes, extending infinitely to both sides.
Written with the usual letters used for 3-D cartesian co-ordinates,
To ‘plot’ the quantized 3D double-cone in the accumulator, we increment values in the cells that correspond to points that lie on the double cone. Then, just like we did with lines, we find the cells with a large number of double cones passing through them and use their co-ordinates as parameters for our detected circles in x-y space. Note that points don’t have to form a full circle to be detected by this method - even relatively short arcs will be detected, because points in the same arc satisfy the same circle equation.
Similarly, for shapes that need more parameters to describe them, we need parameter spaces of higher and higher dimensions and working with the accumulator soon becomes unweildy. Squares, like circles, need three parameters, rectangles and ellipses need four. So use of the Hough Transform in practical applications is usually confined to finding lines.