I mentioned a potential OpenAI insider in https://x.com/peterjliu/status/2024901585806225723, that was from 5 minutes of investigation. There are probably more. And then there's a lot of other companies.
Post author here: To clarify, this is not a post from Polymarket.
This is talking about using Compound AI (product I'm working on) to query Polymarket data, including finding insiders, just as a fun example analysis you could do.
Often you need a well-calibrated probability of a future event to feed into some other analysis, and Polymarket is pretty great for that. An example is how much insurance (hedge) to buy for some disastrous event.
If I'm an insider with 100% confidence, I'll take all offers at a certain price as long as I can afford it. Similar story for lower levels of confidence (but still inside info). There won't necessarily be any left for you to copy at a viable price.
Because there's always some uncertainty and capital limits. But the uncertainty about the outcome is itself inside info, and that's compounded with your own uncertainty about the insider as a copy trader. So the insider will empty out certain price levels only, and your certainty is strictly less than theirs, meaning you have even fewer viable levels to buy.
therefore, the polymarket betting odds will reflect the truth - even if that info is a secret that nobody else but the insider knows. And if this is the case, then even an outsider could make use of the odds as a source of info which would ensure that market efficiency (which is about the flow of information) is high.
If you believe Polymarket as a serious source of truth, consider that somebody manipulated "Will Jesus Christ return before 2027?" because there was a secondary market on whether that market will rise above 5%. Which defeats the whole idea that the betting odds will reflect the truth. Also even pre-manipulation I don't think a 2% chance that Jesus will return was reflective of truth.
The issue comes from situations where the insiders can alter the answer to help their own bets. The simple example is the bet on how long a press conference will be: It's a ridiculous bet when the person giving said press conference can bet and fleece the market.
Will X country invade another before or after day X? A large enough market changes the answer, as the agent can change the decision. And we can see this kind of thing in many interesting questions.
These are not secret divinations though, the participants know this and price it in or otherwise allow it to determine which markets they participate in.
That someone with inside information will e.g. make 500% while those late to the party e.g. only get 10%? (of course your example is not very realistic to begin with)
Has there ever been any documented circumstance where significant inside information became public and known thanks to a trade? Most often, the trade is made at the last minute, and the information gets subsequently revealed anyway. And it's impossible to tell whether somebody is an inside trader, a wealthy gambling addict making a stupid decision, or hypothetically a foreign agent pretending to be an inside trader to make people believe in a particular outcome.
It's impossible to know anything for certain; almost everything is probabilistic.
Also I'm not sure how to interpret your criteria because timing matters, I don't think saying 'it gets revealed in the end' is very meaningful.
Anyway, on Polymarket specifically, sure, military strikes are a common one. Seems like a useful signal to go hide in the basement. Outside Polymarket, there were insider trades in 2008 that I'm sure were useful.
No vigilant insider is making a series of "single market predictions with high accuracy" on the same account. They would make unlinkable bets on fresh accounts.
I have a similar but opposite experience. Since around 2015 I've mostly been working with people who primarily use Emacs. In 2014 I was the only weird one, then next team about 3-5, then a dozen, then there was a team of a few dozen where only two were using Vim. On my current team also most of the devs are Emacs users. However, a lot of people use Emacs with Evil-mode, so I guess they can be considered vimmers.
Also, I don't remember the last time when I worked with anyone who writes code and uses Windows.
Anecdotal experiences can lead to a warped understanding of reality; in mine, Windows and non-emacs users are niche.
Don't y'all have a #emacs slack channel or equivalent at your company? I work for a medium-sized tech company and we have a single digit amount of emacs users I feel like. The channel is mostly dead except for a few tips and tricks and the odd time people asking how we each install it on our macbooks.
Anecdotally a lot of managers use Emacs, though that may be an age thing.
(I use emacs for Real Work, unless that Real Work involves a JVM. Still do all the git stuff in emacs/magit, though)
Reddit was an interesting case here. They knew that they had particularly good AI training data, and they were able to hold it hostage from the Google crawler, which was an awfully high risk play given how important Google search results are to Reddit ads, but they likely knew that Reddit search results were also really important to Google. I would love to be able to watch those negotiations on each side; what a crazy high stakes negotiation that must've been.
Say what you will, but there's a lot of good answers to real questions people have that's on Reddit. There's a whole thing where people say "oh Google search results are bad, but if you append the word 'REDDIT' to your search, you'll get the right answer." You can see that most of these agents rely pretty heavily from stuff they find on Reddit.
Of course, that's also a big reason why Google search results suggest putting glue on pizza.
This is an underrated comment. Yes it's a big advantage and probably a measurable pain point for Anthropic and OpenAI. In fact you could just do a 1% survey of robots.txt out there and get a reasonable picture. Maybe a fun project for an HN'er.
This is right on. I work for a company with somewhat of a data moat and AI aspirations. We spend a lot of time blocking everyone's bots except for Google. We have people whose entire job is it to make it faster for Google to access our data. We exist because Google accesses our data. We can't not let them have it.
We've (ex Google Deepmind researchers) been doing research in increasing the reliability of agents and realized it is pretty non-trivial, but there are a lot of techniques to improve it. The most important thing is doing rigorous evals that are representative of what your users do in your product. Often this is not the same as academic benchmarks. We made our own benchmarks to measure progress.
> The most important thing is doing rigorous evals that are representative of what your users do in your product. Often this is not the same as academic benchmarks.
OMFG thank you for saying this. As a core contributor to RA.Aid, optimizing it for SWE-bench seems like it would actively go against perf on real-world tasks. RA.Aid came about in the first place as a pragmatic programming tool (I created it while making another software startup, Fictie.) It works well because it was literally made and tested by making other software, and these days it mostly creates its own code.
Do you have any tips or suggestions on how to do more formalized evals, but on tasks that resemble real world tasks?
I would start by making the examples yourself initially, assuming you have a good sense for what that real-world task is. If you can't articulate what a good task is and what a good output is, it is not ready for out-sourcing to crowd-workers.
And before going to crowd-workers (maybe you can skip them entirely) try LLMs.
> I would start by making the examples yourself initially
What I'm doing right now is this:
1) I have X problem to solve using the coding agent.
2) I ask the agent to do X
3) I use my own brain: did the agent do it correctly?
If the agent did not do it correctly, I then ask: should the agent have been able to solve this? If so, I try to improve the agent so it's able to do that.
The hardest part about automating this is #3 above --each evaluation is one-off and it would be hard to even formalize the evaluation.
SWE bench, for example uses unit tests for this, and the agent is blind to the unit tests --so the agent has to make a red test (which it has never seen) go green.
reply