I've been running an experiment for the past few months: full AI integration into my daily life.
Not a chatbot I use occasionally. A "symbiotic agent" that reads two files at every session: one with my identity, psychology, and known failure patterns. Another with my current projects and priorities.
It has permission to challenge me, quote my own words back when I'm off track, and call out procrastination in real time.
The integration keeps getting deeper:
- It watches my screen (knows what I actually did vs what I think I did)
- It's learning my writing voice
- It structures my days with rituals (morning kickoff, evening review)
- It acts autonomously when needed (searches, creates, executes)
I'm documenting everything in a series. The memory system, the rituals, when it started knowing me better than I know myself, and the uncomfortable question: am I more capable or more dependent?
You're right about the state sync issues with some models. The lighter models (especially Llama) struggle with tracking game state. I've added more Gemini options which handle this better. The research data used controlled AI-vs-AI runs where we could validate state consistency.
Full game logs are in data_public/comparison/ on GitHub. Each JSON has the complete game state, moves, and messages across all 162 games. https://github.com/lout33/so-long-sucker
I am interested to know a bit more about what's going on here. Please take my questions as well intentioned even though they are a bit critical.
The donation bug seems to me like it would have made most games impossible to complete. But I'm sure you must have tried it before launching. How come it wasn't noticed earlier? Was this bug introduced after launch? Is this game written using AI?
In my game I noticed the vAI players seemed absolutely terrible. They seemed unaware of recent moves and would make obvious mistakes like passing play to someone who would immediately capture their pieces when they had clearly better options. Although they proposed and formed alliances they didn't seem to do so very strategically. It was trivial to have far more tokens than the other players without any alliances and I am fairly sure I was about to win. Did you also notice this? Any idea why they play so badly?
the interactive demo uses lighter models for cost reasons. The research data (162 games, 90% Gemini win rate) came from longer AI-vs-AI games where strategic depth emerged over 50+ turns. Short games with a human tend to expose the models' weaknesses faster. I've just added more Gemini model options which should play better.
Game logs are in data_public/comparison/ - each JSON has the full game state, moves, and messages. For example, check gemini_vs_all_7chips.json to see the alliance bank betrayals in action.
Fair point. The core simulation and data collection was done programmatically - 162 games, raw logs, win rates. The analysis of gaslighting phrases and patterns was human-reviewed. I used LLMs to help with the landing page copy, which I should probably disclose more clearly. The underlying data and methodology is solid, you can check it here: https://github.com/lout33/so-long-sucker