Sunday, 15 September 2013

Histogram Similarity

Histogram Comparision

Introduction

In this section we will look at techniques for histogram comparison .

Histogram Comparison

In many application histogram is used as object model .Thus to compare an unknown object with a known object we can compute the similarity between their histograms.

The histogram of image shows the frequency distribution of different pixel intensities.

The value of histogram for pixel level/intensity $\mathcal{g}$ shows the total count of pixels in the image with pixel intensity $\mathcal{g}$.Let this be denoted by $h_a(g)$

\[ h_a(g) = \sum_{i,j \in x} \delta(g-A(i,j)) \] \subsection{Minkowski norm} The histogram can be viewed as vectors and similariy is calculated using Minkowski norm $(L_p)$.

The manhattan distance $p=1$ and euclidean distance $p=2$ are special cases of minkowski norm.

\begin{eqnarray*} d_{L_1}(h_a,h_b) = \sum_{g \in G} | h_a(g) - h_b(g) |

d_{L_2}(h_a,h_b) = \sum_{g \in G} | h_a(g) - h_b(g) |^2 \end{eqnarray*} A high value indicates a mismatch while low value indicates a better match.0 indicates a perfect mismatch while total mismatch is 1 for normalized histogram. \subsection{Histogram Intersection} For 2 histograms $h_a$ and $h_b$ the histogram intersection is defined as \begin{eqnarray*} d_I(h_a,h_b) = \sum_{g \in G} min(h_a(g),h_b(g)) \end{eqnarray*} A high score indicates a good match while low score indicates bad match.if both histograms are normalized then 1 is a perfect match while 0 indicates total mismatch indicating no regions of histogram overlap. \subsection{Correlation} For 2 histograms $h_a$ and $h_b$ the histogram correlation is defined as ,where histograms are simple considered as discrete 1-D signals. \begin{eqnarray*} d_C(h_a,h_b) = \frac{\sum_{g \in G} (h_a(g)-\bar{h_a})(h_b(g)-\bar{h_b})}{\sqrt{ \sum_{g \in G} (h_a(g)-\bar{h_a})^2 \sum_{g \in G} (h_b(g)-\bar{h_b})}}

where \bar{h_k} = \frac{1}{N} \sum_j H_k(j)

N is the total number of bins \end{eqnarray*} If the histogram is normalized ,then \[ \sum_j H_k(j) = 1 ,\bar{h_k} = \frac{1}{N} \] The value of 1 represents a perfect match ,-1 indicates a ideal mismatch while 0 indidates no correlation.

For correlation a high score represents a better match than a low score. \subsection{Chi-Square} For 2 histograms $h_a$ and $h_b$ the histogram similarity using Ch-Squared method is defined as \[ d(h_a,h_b) = \sum_{g\in G} \frac{\left(H_a(g)-H_b(g)\right)^2}{H_a(g)} \] A low score represents a better match than a high score.0 is a perfect match while mismatch is unbounded.

The chi-square test is testing the null hypothesis, which states that there is no significant difference between the expected and observed result \subsection{Bhattacharya Distance} In statistics, the Bhattacharyya distance measures the similarity of two discrete or continuous probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations \[ d(h_a,h_b) = \sqrt{1 - \frac{1}{\sqrt{\bar{h_a} \bar{h_b} N^2}} \sum_{g\in G} \sqrt{h_a(g) \cdot h_b(g)}} \] For Bhattacharyya matching , low scores indicate good matches and high scores indicate bad matches. A value of 0 is perfect match while 1 indicates a perfect mismatch.

multi-channel images

For histogram with independ channels,values of similarity scores for each channel can be added seperately.

Implementation

The opencv libraries are used to perform histogram related operations.All the histogram related functions are encapsulate in the class named histogram.

The code is present in files histogram.cpp and histogram.hpp which are wrapper classes for calling opencv histogram functions and performing comparison operations etc.

Let us consider the Hue histogram for following images and below are the similarity score using the bhattachrya distance method,chi-square,intersection and correlation methods.

Consider the sets of image

1-2 0.5912 12.221 0.2798 0.2342
1-3 0.9841 3209.8 0.0027 -0.0775
1-4 0.7662 288.80 0.1975 0.0814


We can see that images 1 and 2 will have similar histograms ,for correlation and intersection for image pair 1-2 will be greater than 1-3 .Indicating the similarity of histograms 1-2 incomparison to 2-3.

Chi square mismatch is unbounded as can be seen by high value.A large value can be observed in 2-3 compared to 1-2

The bhattachrya distance is close to 0 for better match,and nearly 1 for mismatch, for 1-3 value is almost 1 indicating a ideal mismatch.

In case of image pair 1-2 and 1-4 are similar,but coefficients indicate that histogram of 1-2 is more similar than 1-4.
Histogram Comparison
The similarity measures are crude at best,however it might be useful in case of discriminative task ,however for identification task it seems like a poor measure.

Code

For code refer to site http://code.google.com/p/m19404/source/browse/OpenVisionLibrary/ImgProc/ Histogram.cpp,Histogram.hpp,test_histogram.cpp files.