Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In other words, it seems as if you can take any past state and fold it into one large concatenated present state

These are what n-grams are, even traditional token/word/character based Markov chains don't rely on just the most recent word. Typical Markov Chains in NLP are 3-7-grams.

> Can you give an example of a non-Markov system?

Encoder-decoder LLMs violate the Markov Property and would not count as Markov Chains.



If you include the encoder outputs as part of the state, then encoder-decoder LLMs are Markovian as well. While in token space, decoder-only LLMs are not Markovian. Anything can be a Markov process depending what state you include. Humans, or even the universe itself are Markovian. I don't see what insight about LLMs you and other commenters are gesturing at.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: