Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use Cython (CPU) to brute force 400,000 pHash's of images. It takes somewhere between 100 and 200ms to search.


Only realized now that the 100-200ms time you refer to is for a single search and not for 400,000 searches. The package already achieves this brute-force speed. In fact, the package also implements bktree, which, depending upon the distance threshold passed, could drastically reduce the search time. Moreover, the search through bktree is also parallelized in the package(each image's hash gets searched through the tree independently after the tree is constructed). On one of the example dataset containing 10k images, with a distance threshold of 10 (for 64-bit hashes), the retrieval time per image obtained was < 50 ms.


The 100-200ms time I referred to was indeed a single search. The difference is, it's on a single core. Cython definitely makes the hamming distance function faster.


That sounds good! Would be great if you can share the code, or even better, make a PR to the repo.


The implementation is trivial, for speed we use:

cdef extern int __builtin_popcountll(unsigned long long) nogil

dist = __builtin_popcountll(key ^ phash)

It would only take a couple of minutes to fill out the rest.


How long did the generating take?


We store pHash'es in a database, but just quickly checking on my laptop, between 1-2ms to generate a single image pHash. IO could be significant for many images.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: