Abusing vector search for texts, maps, and chess

lbrandy · on May 10, 2023

I agree entirely with the premise here save one subtle bit at the start. I think there is grave danger in reducing "vector database" to "vector search" as equivalent domains and/or pieces of software. I would argue that for "vector databases" there's alot more "database" problems than "vector" problems to be solved.

I fear there's going to be alot of homerolled "vector search" infra that accidentally wanders into an ocean of database problems.

fzliu · on May 10, 2023

Totally agree. It takes _a lot_ to go from Hnswlib to a full-fledged vector database.

Here's an architecture diagram for a production-ready vector database https://milvus.io/docs/architecture_overview.md. Not exactly something you can build in a month.

ShamelessC · on May 10, 2023

> I would argue that for "vector databases" there's alot more "database" problems than "vector" problems to be solved.

Why the need for new technologies then? Databases are well studied. Vector search is relatively easy to implement. Sure, there are some new insights to be gained by respecting a hybrid approach - but they are clearly overvalued.

Machine learning is supposed to make things easier. If you implement vector search across your company's data, there's no reason a LLM couldn't simply do the various SQL-style operations on chunks of that data retrieved via KNN. I'm not aware of this approach being used in practice - but I still think the obvious direction we are heading towards is to be able to talk to computers in plain english, not SQL or some other relational algebra framework.

ashvardanian · on May 10, 2023

Exactly!

It's much easier to start from a database and add vector search as one of the features, then to go backwards. We have spent 7.5 years on the DBMS part, while the vector search can literally be added in a week...

And that's why every major modern database is now integrating such solutions :)

betacat · on May 10, 2023

So many projects forget the MS in DBMS.

PaulHoule · on May 10, 2023

Chess might be a step too far. Whether a position is a checkmate or not is an exact thing, you could have two positions that are close in vector space but the position of one piece makes the difference of win, lose or draw which is the only difference that really matters.

seanhunter · on May 10, 2023

Yes there is also another factor in a chess position which is not included in this encoding which is who is next to move.

That being said the basic idea is really interesting. I'd love to see a fully opensource competitor to chessbase. For people who don't know, chessbase is a subscription service which gives you a windows-only chess analysis platform and game database. It allows you to do advanced searches (eg I want to find games by 2500+ rated players in the Caro-Kann where black has a pawn on c5 and a bishop on e7 or whatever) which advanced players use to do "prep" (deep positional analysis typically of opening positions) which they save in reams of files prior to memorization. It's probably not an exaggeration to say nearly all strong players and serious improvers subscribe to it.

Chessbase made themselves persona non grata in the opensource world by apparently ripping off parts of stockfish and selling it under the names "Fat Fritz" and "Houdini"[1] and even were that not the case it would be great to have mac and linux opensource options.

[1] https://stockfishchess.org/blog/2022/public-court-hearing-so...

emptybits · on May 10, 2023

Chessbase and the Stockfish developers came to an agreement after that public court hearing.[1] I don't think it changes the landscape at all and it doesn't help MacOS or Linux users, so I agree with all you said.

X is to Chessbase for data as Lichess is to Chess.com for play. Solve for X.

I'll open my wallet for a community/open equivalent. But €500 and Chessbase's mildly bewildering selection and tiers of subscription services are too rich for this amateur hack.[2]

[1] https://www.chess.com/news/view/chessbase-stockfish-reach-se...

[2] https://shop.chessbase.com/en/products/chessbase_17_premium_...

rdlw · on May 10, 2023

Yeah, also

> Alternatively, you can design a custom scheme to weigh pieces differently, assuming pawns’ positions affect the game less than those of queens.

...no? A pawn being blocked vs. passed can completely change who's winning, and if the queen is hanging it doesn't really matter where. The chess section is strange.

It's interesting to think of positions that can be reached in fer moves being close to each other in search space, but that seems like it would just become standard BFS.

bazzargh · on May 10, 2023

the Hamming distance in the article is about twice the number of moves to get from one position to the other, mate in 4 is important?

A better criticism might be: this metric defines position as close even if can only be reached by reversing moves. There's some discussion in the HNSW paper https://arxiv.org/pdf/1603.09320.pdf of working with non-symmetric metrics but I haven't read further.

ashvardanian · on May 10, 2023

Sure, it was meant as a toy example. I see that often multi-stage search systems work best, and having multiple subsequently complex metrics may be a good idea. Same way as with text hashing.

adastra22 · on May 10, 2023

Is HN allowing emoji now? ♟

vages · on May 10, 2023

It’s part of Unicode 1.1 and has later been styled as an emoji: https://emojipedia.org/chess-pawn/

29athrowaway · on May 10, 2023

    8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜
    7 ♟ ♟ ♟ ♟ ♟ ♟ ♟ ♟
    6 
    5 
    4 
    3 
    2 ♙ ♙ ♙ ♙ ♙ ♙ ♙ ♙
    1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖

seanhunter · on May 10, 2023

    8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜
    7 ♟ ♟ ♟ ♟ ♟ ♟ ♟ ♟
    6 
    5 
    4         ♙
    3 
    2 ♙ ♙ ♙ ♙   ♙ ♙ ♙
    1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖
      a b c d e f g h

Your move.

m1117 · on May 10, 2023

    8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜
    7 ♟ ♟ ♟ ♟   ♟ ♟ ♟
    6 
    5         ♟
    4         ♙
    3 
    2 ♙ ♙ ♙ ♙   ♙ ♙ ♙
    1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖
      a b c d e f g h

ashvardanian · on May 10, 2023

    8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜
    7 ♟ ♟ ♟ ♟   ♟ ♟ ♟
    6 
    5         ♟
    4         ♙
    3           ♘
    2 ♙ ♙ ♙ ♙   ♙ ♙ ♙
    1 ♖ ♘ ♗ ♕ ♔ ♗   ♖
      a b c d e f g h

udkl · on May 10, 2023

    8 ♜ ♞ ♝ ♛ ♚ ♝   ♜
    7 ♟ ♟ ♟ ♟   ♟ ♟ ♟
    6           ♞
    5         ♟
    4         ♙
    3           ♘
    2 ♙ ♙ ♙ ♙   ♙ ♙ ♙
    1 ♖ ♘ ♗ ♕ ♔ ♗   ♖
      a b c d e f g h

seanhunter · on May 11, 2023

Stafford Gambit Time!

29athrowaway · on May 10, 2023

Nuclear tesuji

dang · on May 10, 2023

We try to allow the Unicode ranges that are closer to text and disallow the ones that are closer to candy.

teaearlgraycold · on May 10, 2023

Why doesn’t HN allow emoji?

Imnimo · on May 10, 2023

Perhaps you could vectorize chess positions by passing them to Leela or another NN-based engine and using it's internal activations as the embedding.