I use Cython (CPU) to brute force 400,000 pHash's of images. It takes somewhere ...

tanujjain · on Oct 8, 2019

Only realized now that the 100-200ms time you refer to is for a single search and not for 400,000 searches. The package already achieves this brute-force speed. In fact, the package also implements bktree, which, depending upon the distance threshold passed, could drastically reduce the search time. Moreover, the search through bktree is also parallelized in the package(each image's hash gets searched through the tree independently after the tree is constructed). On one of the example dataset containing 10k images, with a distance threshold of 10 (for 64-bit hashes), the retrieval time per image obtained was < 50 ms.

jonatron · on Oct 11, 2019

The 100-200ms time I referred to was indeed a single search. The difference is, it's on a single core. Cython definitely makes the hamming distance function faster.

tanujjain · on Oct 7, 2019

That sounds good! Would be great if you can share the code, or even better, make a PR to the repo.

jonatron · on Oct 7, 2019

The implementation is trivial, for speed we use:

cdef extern int __builtin_popcountll(unsigned long long) nogil

dist = __builtin_popcountll(key ^ phash)

It would only take a couple of minutes to fill out the rest.

mkl · on Oct 7, 2019

How long did the generating take?

jonatron · on Oct 7, 2019

We store pHash'es in a database, but just quickly checking on my laptop, between 1-2ms to generate a single image pHash. IO could be significant for many images.