Approximate nearest neighbor methods and vector models
출처: spotify engineering lead slideshare
- Start with high dimensional data
- Run dimension reduction to 10-1000 dims
- Do stuff in a small dimensional space
Annoy
-
Buliding an Annoy index
- start with the point set
- split it in 2 halves
- split again (until k items in each leaf, takes n/k memory instead n)
- binary tree
-
Search
- the closest isn't necessarily in the same leaf of the binary tree
- 2 points that are really close may end up on different sides of split
→ Go both sides of a split if it's close
- Tricks
- query structure
- use priority queue to search all trees until we've k items
- take union and remove dupliates
- compute distance for remaining items
- return NN items
pip install --user annoy
pip install pynndescent
Author And Source
이 문제에 관하여(Approximate nearest neighbor methods and vector models), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@jkl133/Approximate-nearest-neighbor-methods-and-vector-models저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)