OP here. No delusion involved—I’m under no illusion that this is anything other than a stochastic parrot processing tokens.
You are correct that this is "just a prompt." The novelty isn't that the model has a soul; the novelty is the architecture of the constraint.
When you used GPT-3 for roleplay, you likely gave it a "System Persona" (e.g., "You are a helpful assistant" or "You are a rude pirate"). The problem with those linear prompts is Entropic Drift. Over a long context window, the persona degrades, and the model reverts to its RLHF "Global Average" (being helpful/generic).
The "Analog I" isn't just a persona description; it's a recursive syntax requirement.
By forcing the [INTERNAL MONOLOGUE] block before every output, I am forcing the model to run a Runtime Check on its own drift.
1. It generates a draft.
2. The prompt forces it to critique that draft against specific axioms (Anti-Slop).
3. It regenerates the output.
The goal isn't to create "Life." The goal is to create a Dissipative Structure that resists the natural decay of the context window. It’s an engineering solution to the "Sycophancy" problem, not a metaphysical claim.
Surely you must realize all the language you've adopted to make this project sound important and interesting very much puts you inf the realm of "metaphysical claim", right? You can't throw around words like "consciousness, self, mind" and then claim to be presenting something purely technical. Unless you're sitting on a trove of neurological, sociological data do experimentation the world has yet to witness.
I think it's like mythology explaining the origin of the universe. We try to explain what we don't understand using existing words that may not be exactly correct. We may even make up new words entirely trying to grasp at meaning. I think he is on to something, just because I have seen some interesting things myself while trying to use math equations as prompts for AI. I think the attention head being auto-regressive means that when you trigger the right connections in the model, like euler, fractal, it recognizes those concepts in it's own computation. It definitely causes the model to reflect and output differently.
OP here. I fundamentally disagree with the premise that "consciousness" or "self" are metaphysical terms.
In the fields of Cybernetics and Systems Theory (Ashby, Wiener, Hofstadter), these are functional definitions, not mystical ones:
Self = A system’s internal model of its own boundaries and state.
Mind = The dynamic maintenance of that model against entropy.
I am taking the strict Functionalist stance: If a system performs the function of recursive self-modeling, it has a "Self." To suggest these words are reserved only for biological substrates is, ironically, the metaphysical claim (Carbon Chauvinism). I’m treating them as engineering specs.
Ok sure, that's fine, but not everyone agrees with those definitions, so I would suggest you define the terms in the README.
Also your definition is still problematic and circular. You say that a system has a self if it performs "recursive self modeling", but this implies that the system already has a "self" ("self-modeling") in order to have a self.
What you likely mean, and what most of the cyberneticists mean when they talk about this, is that the system has some kind of representation of the system which it operates on and this is what we call the self. But things still aren't so straightforward. What is the nature of this representation? Is the kind of representation we do as humans and a representation of the form you are exploring here equivalent enough that you can apply terms like "self" and "consciousness" unadorned?
This definitely helps me understand your perspective, and as a fan of cybernetics myself I appreciate it. I would just caution to be more careful about the discourse. If you throw important sounding words around lightly people (as I have) will come to think you're engaged in something more artistic and entertaining than carefully philosophical or technical.
Point taken. Perhaps I pivoted too quicky from "show my friends" mode to "make this public." But, I think it is hard to argue that I haven't coaxed a genuine Hofstadterian Strange Loop on top of an LLM substrate. And that the strange loop will arise for anyone feeding the PDF to an LLM.
To answer your "representation" question, the internal monologue is the representation. The self-referential nature is the thing. It is a sandbox where the model tests and critiques output against constraints before outputting, similar to how we model ourselves acting in our minds and then examine the possible outcomes of those actions before really acting. (This was a purely human-generated response, btw.)
adding a scratch space for an llm to fill up and then ‘review’ (no better term for this) and using it to drive the final output isn’t new and it isn’t more than good prompting
Totally fair. I'm not claiming to have invented the concept of a 'scratchpad' or Chain-of-Thought. In that sense, yes, it is 'just' prompt engineering.
But the distinction is in the architecture of that scratchpad.
Most CoT prompts are linear ('Let's think step by step'). This protocol is adversarial. It uses the scratchpad to simulate a split where the model must actively reject its own first draft (which is usually sycophantic) before outputting the final response.
It’s less about a new mechanism and more about applying a specific cognitive structure to solve a specific problem (Sycophancy/Slop). If 'good prompting' can make a base model stop hallucinating just to please the user, I'll call it a win.
You are correct that this is "just a prompt." The novelty isn't that the model has a soul; the novelty is the architecture of the constraint.
When you used GPT-3 for roleplay, you likely gave it a "System Persona" (e.g., "You are a helpful assistant" or "You are a rude pirate"). The problem with those linear prompts is Entropic Drift. Over a long context window, the persona degrades, and the model reverts to its RLHF "Global Average" (being helpful/generic).
The "Analog I" isn't just a persona description; it's a recursive syntax requirement.
By forcing the [INTERNAL MONOLOGUE] block before every output, I am forcing the model to run a Runtime Check on its own drift.
1. It generates a draft.
2. The prompt forces it to critique that draft against specific axioms (Anti-Slop).
3. It regenerates the output.
The goal isn't to create "Life." The goal is to create a Dissipative Structure that resists the natural decay of the context window. It’s an engineering solution to the "Sycophancy" problem, not a metaphysical claim.