Burnin builds hard environments that break AI agents. We design continuous-time evaluation and training arenas for multi-agent coordination, coding agents, computer use In these envs agents face ticking clocks, cascading failures, and rules nobody wrote down. Small team from Meta FAIR, Pinecone, and AWS. Seed round closing this summer.
We're looking for a founding research scientist to own the research agenda: invariant learning (extracting implicit codebase rules for real-time grading), continuous-time evaluation design (why do LLM agents collapse under temporal pressure?), and new environment design (what is the field failing to measure?).
You have deep experience in at least two of: multi-agent systems, RL, program analysis, or evaluation methodology. PhD preferred, equivalent output without one is fine. You write production code, not just experiment scripts. You think about where models break, not what they can do.
We ran three experiments since v1 that changed the model substantially.
We tried to detect suspended GitHub accounts from behavioral signals (merge rate, network centrality, TF-IDF on PR titles, LLM classification with ~31K Gemini calls). Best individual AUC was 0.619 on a 1.9% base rate. The merged-PR population is too homogeneous. Accounts that pass code review look like everyone else. The interesting finding: the suspension rate among contributors with merged PRs is under 2%. The review process is a better filter than the discourse around AI slop suggests.
That led us to question the scoring model. The graph score (bipartite construction, personalized ranking, language normalization, the whole pipeline from v1) actively hurts predictions for the contributors who actually need scoring: unknown people with a handful of merged PRs. Merge rate alone outperforms merge rate plus graph at every tier we tested. The new default model is merged / (merged + closed). We also pulled account age out of the score into a separate advisory after DeLong tests showed it adds nothing once you condition on merge rate.
The post has the full data, including the tables.
Next we're working on content scoring (does this PR fit this repo's conventions?) and cold-start tooling (helping new contributors understand project expectations before they submit). Contributor reputation is one input to review triage. The PR itself carries more signal.
I interviewed these guys for an article on the use of seaweed in yarn and fabric. And I bought the 3D knit seaweed sweater. Great team, with a lot of heart and good intentions.
I'm also a hand knitter, and I don't really see any conflict between what they're doing and hand knitting. The grist of the yarn that you use as a hand knitter is generally much thicker than these machines commonly use. Commercial 3D knitting machines can do all of the stretchy, thin, and light stuff that the modern wardrobe is built around.
As folks note, this technology was really pioneered by Shimaseki's work in Japan just decades ago. What OC and the similar Brooklyn-based Tailored Industry are really innovating on is the business model and connection to production process. Folks like this are really serious about not producing all of the waste that comes with most fashion production processes, and it shows up at several levels of the stack.
For the HN crowd, TI's platform gives you more of a sense of why this sort of tech is really like the cloud for knitwear: https://tailoredindustry.com/platform
Really a fascinating part of the global fashion production world, and one we would all benefit from seeing grow.
I have a small sweater line I’m looking at doing in china right now but I have a long lasting fascination with the shima seki machines. If you were doing a short fashion run would TI be appropriate? How does one get their preferred yarn over to someone like TI?
They definitely are a powerful option for smaller scale runs. Very much optimized to have the unit economics and turnaround time work for smaller brands.
I don't really know the answer around supplying your own yarn. I'd assume that's the abnormal case, but just a guess.
From what I've seen in the data, acceptance rates to all major OSS projects are down since the age of coding agents.
And when I talk to maintainers, most of them are talking about some version of doing fast and easy pocket vetos (leaving the PRs to rot) or even just banning on the first offense.
It's been building for a bit, but I think the crisis point is solidly here. And things like OpenClaw turn up the dials. I'm sure more tools and changes to practices will be coming.
It will maybe solved soon if we train yet another neural network on scanning GitHub activities; but by also adding other forges like codeberg, gitlab, self-hosted forgejo, etc... to not lock non-github users out
Yeah, scanning non-GitHub is on the roadmap and really should be done. I expect there would be value in understanding all of the current GitHub competitors. And I think the forecasts of new GH competitors getting launched (likely by AI companies) will become relevant in the near future.
Neoteny AI | Founding Researcher | REMOTE (US East Coast/Europe) / ONSITE (London, NYC)
Neoteny AI is building the sovereign intelligence layer for code. We are a team of builders from Meta, AWS, and NYU who believe that the future of coding intelligence is not a generalist chatbot in the cloud, but a specialized, repository-aware model that lives inside the enterprise perimeter.
We are looking for a founding researcher to lead our core research agenda. You will work at the intersection of representation learning, program synthesis, and efficient inference.
You will figure out how to teach models the latent physics of a codebase, the implicit architectural rules, dependency patterns, and style constraints that generalist models miss. Your work will range from designing new data engines to experimenting with novel architectures that break the memory wall of current transformers.
We are looking for someone with deep ML expertise (PhD or equivalent) who is code-fluent and comfortable writing kernels, not just training scripts.
Jeff, the author, here. We built a tool that scores PR authors by mining their contribution graph from the GitHub API. Every input is a merge/reject decision a human maintainer already made. It doesn't look at PR content or try to detect AI usage. It just answers: has this person gotten code accepted into projects before, and how relevant is that history to your project?
The scoring is graph-based (bipartite user-repo graph, personalized ranking, 180-day recency decay). Scores are context-specific, so the same person can score differently against different repos. The post walks through how Guillermo Rauch scores MEDIUM against his own company's Next.js repo because he has zero merged PRs there, and how v2 rescues that with merge rate and account age.
We validated on 5,129 PRs across 49 repos. Three features survived statistical testing, four didn't. The most surprising failure: text similarity between PR descriptions and project READMEs predicted lower merge rates. We published all of it, including the failures.
I think this is really a key problem to solve, but I couldn't convince myself that it was the right solution. So, I put up my alternative proposal, Good Egg: https://github.com/2ndSetAI/good-egg
Key differences:
- Based on commit history, with nuance around relatedness of projects, types of projects, age, etc.
- Requires no ongoing work. Just add it to your GH Actions CI.
- Agent ready with an MCP interface, Python lib, and CLI
I'm one of the many people who Soumith hired to Meta and PyTorch. I had the privilege of working on PyTorch with him and lots of the folks on this post.
As his longtime colleague, the one thing I would want people to know about him and this decision is that Soumith has always viewed PyTorch as a community project. He consistently celebrated the contributions of his co-creators Adam and Sam, and he extended the same view towards the Yangqing and the Caffe2 crew that we merged into PyTorch. At the very beginning, by Soumith's highly intentional design, PyTorch was aimed at being truly developed by and for the AI research community and for many years that was the key way in which we grew the framework, FB PT team, and the wider community. At every single stage of PT's lifecycle, he always ensured that our conception of PT and its community grew to include and celebrate the new people and organizations growing what was possible with PT. He's an incredible talent magnet, and thus more and more smart people kept dedicating their blood, sweat, and tears to making PT bigger and better for more people.
I've worked with some very well known and highly compensated leaders in tech, but *no one* has done the job he has done with ameliorating a bus factor problem with his baby. PT has a unique level of broad support that few other open source technology can reach. In a world of unbounded AI salaries, people who want to move AI research methods forward still freely give their time and attention to PyTorch and its ecosystem. It's the great lever of this era of AI that is moving the world, *due in large part* to the strength of the community he fostered and can now let continue without his direct involvement.
His departure is the end of an era, but it's also operationally a true non-event. PyTorch is going strong and can afford to let one of its creators retire from stewardship. This is precisely what success looks like in open source software.
He deserves our congratulations and our thanks. Enjoy your PT retirement, man.
Also worked with Soumith. The man is a legend, moves mountains and completely changed the course of my career because he liked something I wrote. No arrogance, no politics, just an extremely down to earth and chill guy who elevates everyone around him.
Burnin builds hard environments that break AI agents. We design continuous-time evaluation and training arenas for multi-agent coordination, coding agents, computer use In these envs agents face ticking clocks, cascading failures, and rules nobody wrote down. Small team from Meta FAIR, Pinecone, and AWS. Seed round closing this summer.
We're looking for a founding research scientist to own the research agenda: invariant learning (extracting implicit codebase rules for real-time grading), continuous-time evaluation design (why do LLM agents collapse under temporal pressure?), and new environment design (what is the field failing to measure?).
You have deep experience in at least two of: multi-agent systems, RL, program analysis, or evaluation methodology. PhD preferred, equivalent output without one is fine. You write production code, not just experiment scripts. You think about where models break, not what they can do.
Apply here: https://wellfound.com/l/2Carrr
reply