How do I compare two images to see if they are equal?
Introduction
Often people ask how can they compare images (probably meaning the contents of images) to see whether they are equal, hoping for a simple
function or operator that returns true or a value between 0 and 1 where 0 means different, 1 means equal and intermediate values can be
considered different degrees of similarity.
Many applications would benefit of this kind of high-level comparison, for example, for detection of image copyright violations,
face identification in crowds for security purposes, semantic image searching, visual navigation for robots, etc.
Unfortunately
comparison of images contents is not a simple task, and in most cases, a ill-defined one.
For example, consider the images below. Most of those depict the same animal,
so those could
be considered equal for some purposes (e.g. while searching the web for mammals behind fences),
or completely different for other purposes (e.g. looking for squirrels facing up or large mammals behind fences).
 Squirrel, saved (using Gimp) as a jpg with 85% quality.
|
 Squirrel, saved (using Gimp) as a gif. The colors are quantized.
|
 Squirrel, cropped and scaled (using Gimp) and saved as a jpg.
|
 Squirrel, saved (using Gimp) as a jpg with 5% quality.
|
 The same squirrel, other position.
|
 A different animal behind a green fence.
|
Most people think about the contents of the image without realizing how complex its processing by the human visual system and brain is.
Do an experiment, show the first image to different people and ask "what's in this image?" -- probably most will describe the animal in the image
(since it can be perceived as the subject in it), but the description
itself will probably vary in
completeness (number of relevant objects perceived in the image) and
detail (amount of relevant information about each object).
If not even humans can come up with a simple, concise description of what's in an image (so we
could compare the descriptions of the contents), how can we expect
a computer to do that with a simple algorithm?
The problem is that computers cannot "see" an image and "compare" it to other to determine if they are equal, where "equal" refers
to the images' contents. Granted, advances in image processing and computer vision may one day make this task feasible under controlled
conditions, but not now, and definitely not with simple algorithms. Consider how many variables we have to deal with: each object on the images have a
shape, size, orientation, color (maybe more than one!), texture, etc.; the same object may appear different in images, e.g. it can appear in a different
position, scale, orientation or illumination conditions or combinations of those; it can be partially ocluded; should we consider these variations as
irrelevant for the comparison or are they important? Should we assign different "weights" for the objects for similarity comparison
so similarity measures in some are more important than in others?
Small objects can be missing in one of the images; should we ignore them?
Is "background" an object and should it be considered in the comparison, and what, exactly, is the background?
Comparing images in regard to their contents
If you want to compare images semantically, so some images on the example above will be considered equal (or very similar),
you will face a hard task that cannot be solved with simple algorithms.
You will possibly have to process the image to extract a set of features that can be mapped
to semantic objects even if those features are different in regard to some aspects (e.g. position, scale) and compare
the objects' features instead of the pixel or image features.
You can see that image contents comparison is not a simple task, but it can be made simpler if we decide to
reduce the problem to a simpler one -- instead of asking "are the object in these images equal?" we could ask
"are some of the regions in this image similar, in some aspect, to regions in the other image?".
Now we can deal with the problem with some image processing (and artificial intelligence) techniques.
Some steps that could solve the problem are:
- If needed, pre-process the image (e.g. to enhance contrast, filter noise, etc.).
- Do an image segmentation, process in which the image is converted to regions which contains pixels that are similar to pixels in the same region and
different from pixels to other regions. This can be done using region-growing, mathematical morphology, clustering or classification algorithms.
There are many algorithms to do that, just google for "image segmentation" and other keywords to get more information.
- With the regions, create descriptors for them. Descriptors are calculated from the region and can include shape, area, perimeter, number of holes, general
color of the region, texture, orientation, position, etc.
- If needed, do a re-segmentation of the image, process in which regions are merged if they can be considered as belonging to the same
object. Note that this step may require some high-level knowledge of the objects and the task in general, seldom being
fully automatic and often being task-dependent.
- If needed, filter the regions that seem relevant to the task in hand, eliminating small regions or regions which are deemed unrelated to the
task (again this may require some knowledge
about the task).
- Store the image's regions' descriptor for further processing. Repeat those steps for other images.
- Use the descriptors for comparison of the contents of the images, using some of many algorithms for pattern matching, classification, clustering,
artificial intelligence and data mining in general.
Note that the steps above are as generic as possible, and does not ensure success for a particular task. Note also that there are many
possible algorithms that can be used in some steps, and each of those have many variations and parameters, therefore blind application of a
off-the-shelf algorithm will probably lead to failure -- it is very important to understand what you are doing and why so you can
expect useful, consistent results.
Note also that the process used to extract semantic objects in the human brain is very, very complex and flexible, being able to
map not only the semantic objects, but their generic behaviour on the scenes and even their categories as well.
It is very easy for us to point to the images that have a "squirrel facing up" or "small rodent hanging in a fence", but those
tasks cannot be done with only image processing algorithms.
A very simple algorithm to compare images for possible similarity without considering the high-level contents
In some applications a very superficial similarity measure between images can be useful.
For example, consider the task of locating images approximately similar in content to one you have, but without
any guarantee whatsoever that they represent or contain the same object -- something similar to a rough, passing similarity.
An obvious application would be locate all images in a
disk that can be similar to a chosen one.
There are several possible ways to do that, but all of them require the extraction of some feature from the image to comparison with
the features from other images and the calculation of a similarity measure. Both the feature extraction and similarity calculation
algorithms can be as simple or complex as we want.
As an example, let's choose a simple feature extraction method and similarity
measure, both shown below:
 Regions for similarity calculation.
|
The features for our test will be 25 RGB triples,
corresponding to the average of the RGB values on the 25 regions marked in the figure on the left.
The image will be normalized to 300x300 pixels. No texture or variance feature will be stored, only the color averages.
Each region has 30x30 pixels.
Each image will be represented, then, a 25x3 feature vector.
To calculate the similarity measure between two images A and B
we will take each of the 25 regions, calculate the Euclidean distance between the regions
and accumulate.
The distance from A to A will be, by definition, zero. The upper bound (maximum possible distance between
two images, using this similarity measure method) is
calculated as 25*(Math.sqrt( (255-0)*(255-0)+ (255-0)*(255-0)+ (255-0)*(255-0) ))
or a little bit over 11041.
|
This method was chosen because it is simple to understand and implement and can be easily modified by the reader.
It combines color (spectral) information with spatial (position/distribution) information, and is expected to be more
robust (i.e. tolerant to differences) than comparing pixel by pixel or the average of the whole image, but, again,
it is very simple and cannot be expected for perform well in any circumstances, being shown only
as an example.
To test the feature extraction and similarity measure we will use a set of
24 test images, shown below. Some of those images have similar objects on
them (wall, trees, sky) but we are not considering meaning on the images,
just patches of similar colors. Images are in different sizes, click on the icons to get the
full images.
The 16 images on the first two rows are photos of
objects, while the last row is of images from the first two rows distorted in scale, color, position, etc.
The application
NaiveSimilarityFinder, show below,
has a GUI which allows the user to select a reference image A and then get all images in the same directory,
normalize them to the same size (300x300), extract the features, calculate the distance from the features of A
and show them in order of less distant to most distant.

NaiveSimilarityFinder.java
1
2
3
4
5
6
7 package howto.compare;
8
9 import java.awt.BorderLayout;
10 import java.awt.Color;
11 import java.awt.Container;
12 import java.awt.Font;
13 import java.awt.GridLayout;
14 import java.awt.image.RenderedImage;
15 import java.awt.image.renderable.ParameterBlock;
16 import java.io.File;
17 import java.io.IOException;
18
19 import javax.imageio.ImageIO;
20 import javax.media.jai.InterpolationNearest;
21 import javax.media.jai.JAI;
22 import javax.media.jai.iterator.RandomIter;
23 import javax.media.jai.iterator.RandomIterFactory;
24 import javax.swing.JFileChooser;
25 import javax.swing.JFrame;
26 import javax.swing.JLabel;
27 import javax.swing.JOptionPane;
28 import javax.swing.JPanel;
29 import javax.swing.JScrollPane;
30
31 import com.sun.media.jai.widget.DisplayJAI;
32
33
34
35
36 public class NaiveSimilarityFinder extends JFrame
37 {
38
39
40 private Color[][] signature;
41
42 private static final int baseSize = 300;
43
44 private static final String basePath =
45 "/home/rafael/Pesquisa/ImageSimilarity";
46
47
48
49
50 public NaiveSimilarityFinder(File reference) throws IOException
51 {
52
53 super("Naive Similarity Finder");
54 Container cp = getContentPane();
55 cp.setLayout(new BorderLayout());
56
57 RenderedImage ref = rescale(ImageIO.read(reference));
58 cp.add(new DisplayJAI(ref), BorderLayout.WEST);
59
60 signature = calcSignature(ref);
61
62
63 File[] others = getOtherImageFiles(reference);
64 JPanel otherPanel = new JPanel(new GridLayout(others.length, 2));
65 cp.add(new JScrollPane(otherPanel), BorderLayout.CENTER);
66
67
68 RenderedImage[] rothers = new RenderedImage[others.length];
69 double[] distances = new double[others.length];
70 for (int o = 0; o < others.length; o++)
71 {
72 rothers[o] = rescale(ImageIO.read(others[o]));
73 distances[o] = calcDistance(rothers[o]);
74 }
75
76 for (int p1 = 0; p1 < others.length - 1; p1++)
77 for (int p2 = p1 + 1; p2 < others.length; p2++)
78 {
79 if (distances[p1] > distances[p2])
80 {
81 double tempDist = distances[p1];
82 distances[p1] = distances[p2];
83 distances[p2] = tempDist;
84 RenderedImage tempR = rothers[p1];
85 rothers[p1] = rothers[p2];
86 rothers[p2] = tempR;
87 File tempF = others[p1];
88 others[p1] = others[p2];
89 others[p2] = tempF;
90 }
91 }
92
93 for (int o = 0; o < others.length; o++)
94 {
95 otherPanel.add(new DisplayJAI(rothers[o]));
96 JLabel ldist = new JLabel("<html>" + others[o].getName() + "<br>"
97 + String.format("% 13.3f", distances[o]) + "</html>");
98 ldist.setFont(new Font(Font.MONOSPACED, Font.PLAIN, 36));
99 System.out.printf("<td class=\"simpletable legend\"> "+
100 "<img src=\"MiscResources/ImageSimilarity/icons/miniicon_%s\" "+
101 "alt=\"Similarity result\"><br>% 13.3f</td>\n", others[o].getName(),distances[o]);
102 otherPanel.add(ldist);
103 }
104
105 pack();
106 setVisible(true);
107 setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
108 }
109
110
111
112
113
114 private RenderedImage rescale(RenderedImage i)
115 {
116 float scaleW = ((float) baseSize) / i.getWidth();
117 float scaleH = ((float) baseSize) / i.getHeight();
118
119 ParameterBlock pb = new ParameterBlock();
120 pb.addSource(i);
121 pb.add(scaleW);
122 pb.add(scaleH);
123 pb.add(0.0F);
124 pb.add(0.0F);
125 pb.add(new InterpolationNearest());
126
127 return JAI.create("scale", pb);
128 }
129
130
131
132
133 private Color[][] calcSignature(RenderedImage i)
134 {
135
136 Color[][] sig = new Color[5][5];
137
138
139 float[] prop = new float[]
140 {1f / 10f, 3f / 10f, 5f / 10f, 7f / 10f, 9f / 10f};
141 for (int x = 0; x < 5; x++)
142 for (int y = 0; y < 5; y++)
143 sig[x][y] = averageAround(i, prop[x], prop[y]);
144 return sig;
145 }
146
147
148
149
150
151
152 private Color averageAround(RenderedImage i, double px, double py)
153 {
154
155 RandomIter iterator = RandomIterFactory.create(i, null);
156
157 double[] pixel = new double[3];
158 double[] accum = new double[3];
159
160 int sampleSize = 15;
161 int numPixels = 0;
162
163 for (double x = px * baseSize - sampleSize; x < px * baseSize + sampleSize; x++)
164 {
165 for (double y = py * baseSize - sampleSize; y < py * baseSize + sampleSize; y++)
166 {
167 iterator.getPixel((int) x, (int) y, pixel);
168 accum[0] += pixel[0];
169 accum[1] += pixel[1];
170 accum[2] += pixel[2];
171 numPixels++;
172 }
173 }
174
175 accum[0] /= numPixels;
176 accum[1] /= numPixels;
177 accum[2] /= numPixels;
178 return new Color((int) accum[0], (int) accum[1], (int) accum[2]);
179 }
180
181
182
183
184
185
186 private double calcDistance(RenderedImage other)
187 {
188
189 Color[][] sigOther = calcSignature(other);
190
191
192
193 double dist = 0;
194 for (int x = 0; x < 5; x++)
195 for (int y = 0; y < 5; y++)
196 {
197 int r1 = signature[x][y].getRed();
198 int g1 = signature[x][y].getGreen();
199 int b1 = signature[x][y].getBlue();
200 int r2 = sigOther[x][y].getRed();
201 int g2 = sigOther[x][y].getGreen();
202 int b2 = sigOther[x][y].getBlue();
203 double tempDist = Math.sqrt((r1 - r2) * (r1 - r2) + (g1 - g2)
204 * (g1 - g2) + (b1 - b2) * (b1 - b2));
205 dist += tempDist;
206 }
207 return dist;
208 }
209
210
211
212
213
214 private File[] getOtherImageFiles(File reference)
215 {
216 File dir = new File(reference.getParent());
217
218 File[] others = dir.listFiles(new JPEGImageFileFilter());
219 return others;
220 }
221
222
223
224
225
226 public static void main(String[] args) throws IOException
227 {
228 JFileChooser fc = new JFileChooser(basePath);
229 fc.setFileFilter(new JPEGImageFileFilter());
230 int res = fc.showOpenDialog(null);
231
232 if (res == JFileChooser.APPROVE_OPTION)
233 {
234 File file = fc.getSelectedFile();
235 new NaiveSimilarityFinder(file);
236 }
237
238 else
239 {
240 JOptionPane.showMessageDialog(null,
241 "You must select one image to be the reference.", "Aborting...",
242 JOptionPane.WARNING_MESSAGE);
243 }
244 }
245
246 }
The application NaiveSimilarityFinder needs an auxiliary class, JPEGImageFileFilter, shown below.

JPEGImageFileFilter.java
1
2
3
4
5
6
7 package howto.compare;
8
9 import java.io.File;
10
11 import javax.swing.filechooser.FileFilter;
12
13
14
15
16
17 public class JPEGImageFileFilter extends FileFilter implements java.io.FileFilter
18 {
19 public boolean accept(File f)
20 {
21 if (f.getName().toLowerCase().endsWith(".jpeg")) return true;
22 if (f.getName().toLowerCase().endsWith(".jpg")) return true;
23 return false;
24 }
25 public String getDescription()
26 {
27 return "JPEG files";
28 }
29
30 }
Some test runs of the application with that image data set are shown below. The first image is a row is the reference
image, and the others image in the row are best seven matches are shown (except for the match for the same image), then the worst two matches.
We can see that some results are expected (e.g. the dissimilarity measure between two equal imagens being zero).
The algorithm performed quite acceptably,
assigning a relatively low measure to the distorted versions when the originals were used as the references, as long as
the distortion does not makes the versions very different and the colors are not changed much.
This application could
successfully help organize a large collection of digital photographies when there are many repeated shots or when the
photographer took many shots of the same subject with small adjustments on the camera (e.g. photos of groups, to make sure no one has blinked).
It should be noted that the application could be modified to calculate the distance from each image to
all the others, which could be used then to create a dendogram through a
hierarchical clustering algorithm. The dendogram could then be used to separate the images in groups of
images which are similar considering the features and measure used.
Many improvements could be done in this simple algorithm: filtering of the regions to reduce influence of noise in the average value,
different weights for different regions (e.g. central regions could be considered more important than edges),
distance measure in other color space than RGB,
etc.
References
This list is far from complete:
- Shape Analysis and Classification: Theory and Practice, by Luciano da Fontoura Costa and Roberto Marcondes Cesar Jr., CRC, ISBN 0849334934, 2000.
- Toward Category-Level Object Recognition, edited by Jean Ponce, Martial Hebert, Cordelia Schmid and Andrew Zisserman, Springer (Lecture Notes in Computer Science 4170), ISBN 3-540-68794-7, 2006.
- Feature Extraction and Image Processing, by Mark S. Nixon and Alberto S. Aguado, Newnes, ISBN 0-7506-5078-8, 2002.
- Image Recognition and Classification: Algorithms, Systems, and Applications, edited by Bahram Javidi, Marcel Dekker, ISBN 0-8247-0783-4, 2002.
- Practical Algorithms for Image Analysis -- Description, Examples and Code, by Michael Seul, Lawrence O'Gorman and Michael J. Sammon, Cambridge University Press, ISBN 0-521-66065-3, 2000.
- Proceedings of the International Conference on Image and Video Retrieval (CIVR) (may be available from Springer, free of charge in some university libraries).
- Search your local university library or Google or even better,
Google Scholar
for references on Content-based image retrieval, Content-based video retrieval or for feature extraction.
For related information, see
A Brief Tutorial on Supervised Image Classification and
null.