you're right. those things predated llama leak. but from my understanding (from the sideline), it's llama that's made them popular and approachable from hacker perspective.
> Surely if OpenAI had insisted upon the same things that Anthropic had, the government would not have signed this agreement.
But they did.
"Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement."
The difference is that Anthropic wanted to reserve the right to judge when the red lines are crossed, while OpenAI will defer to the DoD and its policies for that. In both cases, the two parties can claim to agree on the principles, but when push comes to shove, who decides on whether the principles are violated differs.
> The difference is that Anthropic wanted to reserve the right to judge when the red lines are crossed, while OpenAI will defer to the DoD and its policies for that.
It was pretty clear from Anthropic’s and Hegseth’s statements that they didn’t disagree on the two exclusions, but on who would be the arbiter on those. And Sam’s wording all but confirms that OpenAI’s agreement defers to DoD policies and laws (which a defense contract cannot prescribe), and effectively only pays lip service to the two exclusions.
who decides these weighty questions? Approach (1), accepted by OAI, references laws and thus appropriately vests those questions in our democratic system. Approach (2) unacceptably vests those questions in a single unaccountable CEO who would usurp sovereign control of our most sensitive systems.
Amodei is the type of person who thinks he can tell the US government what they can and can’t do.
And the US government should have precisely none of that, regardless of whether they’re red or blue.
> Amodei is the type of person who thinks he can tell the US government what they can and can’t do.
I don't think that's the case. Amodei is worried that AI is extraordinarily capable, and our current system of checks and balances is not adequate yet to set the proper constraints so the law is correctly enforced. Here's an excerpt from his statement [1]:
> Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.
Let's do this thought exercise: how long would it take you, using Claude Code, to write some code to crawl the internet and find all the postings of the HN user nandomrumber under all their names on various social media, and create a profile with the top 10 ways that user can be legally harassed? Of course, Claude would refuse to do this, because of its guardrails, but what if Claude didn't refuse?
And that’s where the authoritarian in you is shining through.
You see, Obama droned more combatants than anyone else before or after him but always followed a legal paper trail and following the book (except perhaps in some cases, search for Anwar al-Awlaki).
One can argue whether the rules and laws (secret courts, proceedings, asymmetries in court processes that severely compress civil liberties… to the point they might violate other constitutional rights) are legitimate, but he operated within the limits of the law.
You folks just blurt “me ne frego” like a random Mussolini and think you’re being patriotic.
> Amodei is the type of person who thinks he can tell the US government what they can and can’t do.
> And the US government should have precisely none of that, regardless of whether they’re red or blue.
This is a pretty hot take. "You can't break the law and kill people or do mass surveillance with our technology." fuck that, the government should break whatever laws and kill whoever they please
I hope you A: aren't a U.S. citizen, and B: don't vote.
If I'm selling widgets to the government and come to find out they are using those widgets unconstitutionally and to violate my neighbors rights you can be damn sure I'm going to stop selling the gov my widgets. Amodei said that Anthropic was willing to step away if they and the government couldn't come to terms, and instead of the government acting like adults and letting them they decided to double down on being the dumbest people in the room and act like toddlers and throw a massive fit about the whole thing.
> It was pretty clear from Anthropic’s and Hegseth’s statements that they didn’t disagree on the two exclusions, but on who would be the arbiter on those.
No. Altman said human responsibility. Anthropic said human in the loop.
> And Sam’s wording all but confirms that OpenAI’s agreement defers to DoD policies and laws (which a defense contract cannot prescribe), and effectively only pays lip service to the two exclusions.
I don’t understand your first comment. At that point, Altman’s tweet didn’t exist yet, and is immaterial to the reading of Anthropic’s and Hegseth’s statements.
To your second comment, it was clear enough to me to be the most plausible reading of the situation by far.
We state what we think the situation is all the time, without explicitly writing “I think the situation is…”.
Seems Anthropic did not understand the questions they were asked. From the WaPo:
>A defense official said the Pentagon’s technology chief whittled the debate down to a life-and-death nuclear scenario at a meeting last month: If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic’s Claude AI system to help shoot it down?
>It’s the kind of situation where technological might and speed could be critical to detection and counterstrike, with the time to make a decision measured in minutes and seconds. Anthropic chief executive Dario Amodei’s answer rankled the Pentagon, according to the official, who characterized the CEO’s reply as: You could call us and we’d work it out.
>An Anthropic spokesperson denied Amodei gave that response, calling the account “patently false,” and saying the company has agreed to allow Claude to be used for missile defense. But officials have cited this and another incident involving Claude’s use in the capture of Venezuelan leader Nicolás Maduro as flashpoints in a spiraling standoff between the company and the Pentagon in recent days. The meeting was previously reported by Semafor.
I have a hunch that Anthropic interpreted this question to be on the dimension of authority, when the Pentagon was very likely asking about capability, and they then followed up to clarify that for missile defense they would, I guess, allow an exception. I get the (at times overwhelming) skepticism that people have about these tools and this administration but this is not a reasonable position to hold, even if Anthropic held it accidentally because they initially misunderstood what they were being asked.
"It’s the kind of situation where technological might and speed could be critical to detection and counterstrike"
Missile detection and decision to make a (nuclear) counterstrike are 2 different things to me but apparently the department of war wants both, so it seems not "just" about missile detection.
Is there any reason at all to believe the account of the unnamed "defence official"? Whatever your position on this administration, you know that it lies like the rest of us breathe. With a denial from the other side and a lack of any actual evidence, why should I give it non-negligible credence?
It is bizarre. I like how, "past performance predicts future performance" is supposed to apply to founders and companies but completely disregarded for a two term president and admin, as if we have no idea how they will operate in the future.
Anthropic, with its current war chest, is supposedly employeeing lawyers that are misunderstanding the Department of War? This is considered to be the likelier of possibilities, am I understanding this correctly?
This is not what I said, and not what the WaPo quoted. We're talking about the CEO, who is shall we say unfamiliar with war making, getting asked a hypothetical about how the product he sells would perform in a first strike scenario, and he reportedly gives what is an entirely legalese answer. Yes, I consider this a likely possibility. It sounds exactly like how someone would respond if they've been swimming in legal memos for months.
> It sounds exactly like how someone would respond if they've been swimming in legal memos for months.
I think you're being highly speculative. The part you quoted from the WaPo doesn't even state the defense official was complaining about about any "legalese" reponse, that seems like a projection on your part. The only info you gave in your comment about what Dario said is only a defense official's paraphrasing. It seems a simply case of Dario refusing to give a blank check in all scenerios whereas the defense official, for maximum impact, chose to portray "not having a blank check" as "having to call Anthropic" in every case
where "help" is given by an LLM. The appearance of "misunderstanding" you're seeing in the media is not about the parties' misunderstanding of what the other side wants, it's simply a fallout from each side fighting to control the narrative.
That's a copout and you know it. You're focusing on the 'unnamed' part; I'm focusing on the 'representative of an administration that lies constantly and brazenly' part.
Noted Rationalist responds to a question about a first strike scenario with "I need to think about it" instead of "of course we'd launch the missiles, are you kidding?" and everyone here seems to think this is somehow unbelievable.
You're still dancing around the point. Person A said X; person B said not X; we have no concrete evidence either way. Person A is an anonymous representative of a group that has no norms against dishonesty, an obvious motive to falsely claim X, and a track record of telling frequent, shameless lies. X doesn't need to be 'unbelievable' for me to ask, again, what positive reason do you have for believing it?
Why the fuck would you use an LLM to determine whether a nuclear missile was hurtling towards you? The question makes no sense, and so you get a nonsensical answer.
Seems not unlikely that Anthropic was manipulated into this position for purposes of invalidating their contract.
> If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic’s Claude AI system to help shoot it down?
Are you serious? This is the kind of thing you'd ask a clarifying question on and get information back immediately. Further, the huge overreaction from Hegseth shows this is a fundamental disagreement.
The flip side of "Hegseth is an unqualified drunk", a position which I've always held and still maintain, is that he very well might crash out over nothing instead of asking clarifying questions or suggesting obvious compromises. This is the same guy who recalled the entire general staff to yell at them about the warrior mindset. Not an excuse for any of this, but I do think the precise nature of the badness matters.
> could the military use Anthropic’s Claude AI system to help shoot it down?
What a joke. I suggest folks read up on the very poor performance of US ICBM interceptor systems. They're barely a coin flip, in ideal conditions. How is Claude going to help with that? Push the launch interceptor button faster? Maybe Claude can help design a better system, but it's not turning our existing poor systems into super capable systems by simply adding AI.
I'm sure it's a matter of interpretation. Anthropic thinks the DoW's demands will lead to mass surveillance and auto-kill bots. The DoW probably disagrees with that interpretation, and all OpenAI needs to do is agree with the DoW.
My bet is that what the DoW wants is pretty clearly tied to mass surveillance and kill-bots. Altman is a snake.
Why do you choose to call it the "DoW"? Its official name is the Department of Defense, it was titled that way by Congress and only Congress can change it. What is your motivation in using a term that the current administration has started to use? Do you also use the Gulf of America when referrring to the body of water that defines the southern edge of the USA?
Don't you think it is more to-the-point to call it what it is and what the people running it with, i'll bet everything i have, absolute immunity, are doing and intend to do with it?
It is "honest" in the historical sense, certainly.
But the executive-order driven name change just another bit of illegal/extra-legal/paralegal behavior by the administration that, every time we just nod along, eats away at the constitutional structure of our government. So don't go along with it.
Personally, as someone coming from a region that has suffered many times over by the actions of this so called "Ministry of Defense," I feel like "Department of War" is a more accurate and honest term.
As I've noted multiple times here on HN, I don't disagree with this.
But the question is not about whether it is a more accurate and honest term. It is about people complying in advance with the illegal/extra-legal renaming of a federal agency by a president who does not have the right or authority to do so.
If we were talking about Congress voting to rename the DoD as the DoW, I'd have nothing to say on the matter that differed from your observation.
It's the term used by Sam Altman in the announcement. Maybe aim your anger there, to someone knowingly helping them in their attempt to turn the department into one of aggression.
No, the Department of War is the former name of the Department of the Army and nothing else. DoD is a new creation that includes the Army, historic Department of the Navy, and the other, post-WW2, new services.
The president has no authority to do this. Federal departments and agencies are named by Congress, and even the Republicans in Congress have shown no interest in formalizing this.
Sure, no such law that I know of. But there's also no law that suggests that anybody else needs to refer to the Department of Defense using terms that the president and his minions just made up out of thin air. I'm also arguing that going along with them, by itself, is harmful to a democratic government.
Exactly this! Just like the Gulf of Mexico is still called the Gulf of Mexico, if we just ignore his ramblings and continue calling the department of defense, we undermine his whole point. If we fall for all their crap and just accept it, then we loose in the end. Any resistance to a Fascist government is good resistance. Anything that makes their life's a little shittier is good. Better that they go around having tantrums about how they renamed it but no one is paying attention.
Anthropic has safeguards baked in the model, this is the only way to make sur it's harder for the DOJ to misuse it. A pinky swear from the DoD means nothing
If your starting position is already that Sam Altman lies about everything that doesn't fit your preconceived positions, that doesn't seem like a very useful meaningful position to update.
>that they need to rig their elections against themselves to get dissenting voices
I don't believe this is true. If you're talking about Non-Constituency Members of Parliament, they are consolation prizes given to best losers, and there are many things they cannot vote on. Moreover, the ruling party almost never lifts the party whip, i.e. members of the party CANNOT vote against the party line (without being kicked out of the party, which results in them being kicked out of parliament). In other words, since the ruling party already has a majority, any opposing votes literally do not matter.
If you aren't talking about the NCMP scheme, then I do not know what you're talking about, as the ruling party does institute policies that are beneficial for the incumbent party.
GPT-1 wasn't used as a zero-shot text generator; that wasn't why it was impressive. The way GPT-1 was used was as a base model to be fine-tuned on downstream tasks. It was the first case of a (fine-tuned) base Transformer model just trivially blowing everything else out of the water. Before this, people were coming up with bespoke systems for different tasks (a simple example is that for SQuAD a passage-question-answering task, people would have an LSTM to read the passage and another LSTM to read the question, because of course those are different sub-tasks with different requirements and should have different sub-models). One GPT-1 came out, you just dumped all the text into the context, YOLO fine-tuned it, and trivially got state on the art on the task. On EVERY NLP task.
Overnight, GPT-1 single-handedly upset the whole field. It was somewhat overshadowed by BERT and T5 models that came out very shortly after, which tended to perform even better on the pretrain-and-finetune format. Nevertheless, the success of GPT-1 definitely already warrants scaling up the approach.
A better question is how OpenAI decided to scale GPT-2 to GPT-3. It was an awkward in-between model. It generated better text for sure, but the zero-shot performance reported in the paper, while neat, was not great at all. On the flip side, its fine-tuned task performance paled compared to much smaller encoder-only Transformers. (The answer is: scaling laws allowed for predictable increases in performance.)
> Transformer model just trivially blowing everything else out of the water
no, this is the winners rewriting history. Transformer style encoders are now applied to lots and lots of disciplines but they do not "trivially" do anything. The hype re-telling is obscuring the facts of history. Specifically in human language text translation, "Attention is All You Need" Transformers did "blow others out of the water" yes, for that application.
>a (fine-tuned) base Transformer model just trivially blowing everything else out of the water
"Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for.
GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything".
Because the author is artifically shrinking the scope of one thing (prompt engineering) to make its replacement look better (context engineering).
Never mind that prompt engineering goes back to pure LLMs before ChatGPT was released (i.e. before the conversation paradigm was even the dominant one for LLMs), and includes anything from few-shot prompting (including question-answer pairs), providing tool definitions and examples, retrieval augmented generation, and conversation history manipulation. In academic writing, LLMs are often defined as a distribution P(y|x) where X is not infrequently referred to as the prompt. In other words, anything that comes before the output is considered the prompt.
But if you narrow the definition of "prompt" down to "user instruction", then you get to ignore all the work that's come before and talk up the new thing.
1) It's a small island, but it's also a major trading port. Which means its whole economy is already geared towards importing food from neighboring countries.
2) On the other hand: no domestic industry to disrupt! No domestic farming groups lobbying against meat substitutes, which may push research/distribution furhter along.
They're also big on future-proofing and environmental awareness in general as they have a very long term stable government that looks 10-100 years ahead.
The long story short is you are technically correct but in practice things are a little different. There are 2 factors to consider here:
1. Model Capability
You are right that mechanically, input and output tokens in a standard decoder Transformer are "the same". A 32K context should mean you can have 1 input tokens and 32K output tokens (you actually get 1 bonus token), or 32K input tokens and 1 output token,
However, if you feed an LM "too much" of its own input (read: have too long an output length), it starts to go off the rails, empirically. The word "too much" is doing some work here: it's a balance of both (1) LLM labs having data that covers that many output tokens in an example and (2) LLMs labs having empirical tests to have confidence that the model won't reasonably go off the rails within some output limit. (Note, this isn't pretraining but the instruction tuning/RLHF after, so you don't just get examples for free)
In short, labs will often train a model targeting an output context length, and put out an offering based on that.
2. Infrastructure
While mathematically having the model read external input and its own output are the same, the infrastructure is wildly different. This is one of the first things you learn when deploying these models: you basically have a different stack for "encoding" and "decoding" (using those terms loosely. This is after all still a decoder only model). This means you need to set max lengths for both encoding and decoding separately.
So, after a long time of optimizing both the implementation and length hyperparameters (or just winging it), the lab will decide "we have a good implementation for up to 31K input and 1k output" and then go from there. If they wanted to change that, there's a bunch of infrastructure work involved. And because of the economies of batching, you want many inputs to have as close to the same lengths as possible, so you want to offer fewer configurations (some of this bucketing may be performed hidden from the user). Anyway, this is why it may become uneconomical to offer a model at a given length configuration (input or output) after some time.
You could easily make the other argument: As a professor of ethics she studies many different ethical systems, including ones that are not mainstream. This means that she can more easily find some ethical system under which a given action is considered ethical.
The "ethics expert = more ethical" connection has never held up and mainly serves as a gotcha.
It's a good thing I never claimed "ethics expert = more ethical", then. What I'm saying is that I agree there's an irony here.
It's true that, as you say, she could use her knowledge of ethics to be less ethical. But that would just be a different kind of irony for somebody who teaches on law and ethics.
I tried to find the where I heard that Radford was inspired by that blog post, but the closest thing I found is that in the "Sentiment Neuron" paper (Learning to Generate Reviews and Discovering Sentiment: https://arxiv.org/pdf/1704.01444.pdf), in the "Discussion and Future Work" section they mention this Karpathy paper from 2015: Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078
RoPE? The position encoding method published 2 years before Llama and already in models such as GPT-J-6B?
DPO, a method whose paper had no experiments with Llama?
QLoRA? The third in a series of quantization works by Tim Dettmers, the first two of which pre-dated Llama?
reply