More

arugulum · 2026-03-11T03:40:41 1773200441

LoRA? The parameter-efficient fine-tuning method published 2 years before Llama and already actively used by researchers?

RoPE? The position encoding method published 2 years before Llama and already in models such as GPT-J-6B?

DPO, a method whose paper had no experiments with Llama?

QLoRA? The third in a series of quantization works by Tim Dettmers, the first two of which pre-dated Llama?

tuananh · 2026-03-11T07:34:13 1773214453

you're right. those things predated llama leak. but from my understanding (from the sideline), it's llama that's made them popular and approachable from hacker perspective.

arugulum · 2026-02-28T04:36:24 1772253384

> Surely if OpenAI had insisted upon the same things that Anthropic had, the government would not have signed this agreement.

But they did.

"Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement."

layer8 · 2026-02-28T05:20:23 1772256023

The difference is that Anthropic wanted to reserve the right to judge when the red lines are crossed, while OpenAI will defer to the DoD and its policies for that. In both cases, the two parties can claim to agree on the principles, but when push comes to shove, who decides on whether the principles are violated differs.

outside1234 · 2026-02-28T05:34:39 1772256879

This. Sam is going to pretend they aren’t going to use it for that because his company is collapsing in losses. He will never audit.

Probably also got assurances about a bailout when OpenAI collapses.

pseudalopex · 2026-02-28T05:36:46 1772257006

> The difference is that Anthropic wanted to reserve the right to judge when the red lines are crossed, while OpenAI will defer to the DoD and its policies for that.

You learned this where?

layer8 · 2026-02-28T05:49:40 1772257780

I’m reading between the lines of the involved parties’ various statements, but there’s also this: https://x.com/UnderSecretaryF/status/2027594072811098230

pseudalopex · 2026-02-28T05:55:58 1772258158

> I’m reading between the lines of the involved parties’ various statements

You should have said this.

> https://x.com/UnderSecretaryF/status/2027594072811098230

Thank you.

layer8 · 2026-02-28T06:18:45 1772259525

It was pretty clear from Anthropic’s and Hegseth’s statements that they didn’t disagree on the two exclusions, but on who would be the arbiter on those. And Sam’s wording all but confirms that OpenAI’s agreement defers to DoD policies and laws (which a defense contract cannot prescribe), and effectively only pays lip service to the two exclusions.

nandomrumber · 2026-02-28T06:30:37 1772260237

From the referenced tweet;

who decides these weighty questions? Approach (1), accepted by OAI, references laws and thus appropriately vests those questions in our democratic system. Approach (2) unacceptably vests those questions in a single unaccountable CEO who would usurp sovereign control of our most sensitive systems.

Amodei is the type of person who thinks he can tell the US government what they can and can’t do.

And the US government should have precisely none of that, regardless of whether they’re red or blue.

credit_guy · 2026-02-28T13:28:42 1772285322

> Amodei is the type of person who thinks he can tell the US government what they can and can’t do.

I don't think that's the case. Amodei is worried that AI is extraordinarily capable, and our current system of checks and balances is not adequate yet to set the proper constraints so the law is correctly enforced. Here's an excerpt from his statement [1]:

  > Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.

Let's do this thought exercise: how long would it take you, using Claude Code, to write some code to crawl the internet and find all the postings of the HN user nandomrumber under all their names on various social media, and create a profile with the top 10 ways that user can be legally harassed? Of course, Claude would refuse to do this, because of its guardrails, but what if Claude didn't refuse?

[1]https://www.anthropic.com/news/statement-department-of-war

eecc · 2026-02-28T08:06:10 1772265970

And that’s where the authoritarian in you is shining through.

You see, Obama droned more combatants than anyone else before or after him but always followed a legal paper trail and following the book (except perhaps in some cases, search for Anwar al-Awlaki).

One can argue whether the rules and laws (secret courts, proceedings, asymmetries in court processes that severely compress civil liberties… to the point they might violate other constitutional rights) are legitimate, but he operated within the limits of the law.

You folks just blurt “me ne frego” like a random Mussolini and think you’re being patriotic.

SMH

nullocator · 2026-02-28T07:17:54 1772263074

> Amodei is the type of person who thinks he can tell the US government what they can and can’t do.

> And the US government should have precisely none of that, regardless of whether they’re red or blue.

This is a pretty hot take. "You can't break the law and kill people or do mass surveillance with our technology." fuck that, the government should break whatever laws and kill whoever they please

I hope you A: aren't a U.S. citizen, and B: don't vote.

If I'm selling widgets to the government and come to find out they are using those widgets unconstitutionally and to violate my neighbors rights you can be damn sure I'm going to stop selling the gov my widgets. Amodei said that Anthropic was willing to step away if they and the government couldn't come to terms, and instead of the government acting like adults and letting them they decided to double down on being the dumbest people in the room and act like toddlers and throw a massive fit about the whole thing.

pseudalopex · 2026-02-28T06:31:27 1772260287

> It was pretty clear from Anthropic’s and Hegseth’s statements that they didn’t disagree on the two exclusions, but on who would be the arbiter on those.

No. Altman said human responsibility. Anthropic said human in the loop.

> And Sam’s wording all but confirms that OpenAI’s agreement defers to DoD policies and laws (which a defense contract cannot prescribe), and effectively only pays lip service to the two exclusions.

All but confirmed was not confirmed.

layer8 · 2026-02-28T06:42:39 1772260959

I don’t understand your first comment. At that point, Altman’s tweet didn’t exist yet, and is immaterial to the reading of Anthropic’s and Hegseth’s statements.

To your second comment, it was clear enough to me to be the most plausible reading of the situation by far.

We state what we think the situation is all the time, without explicitly writing “I think the situation is…”.

remarkEon · 2026-02-28T05:40:40 1772257240

Seems Anthropic did not understand the questions they were asked. From the WaPo:

>A defense official said the Pentagon’s technology chief whittled the debate down to a life-and-death nuclear scenario at a meeting last month: If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic’s Claude AI system to help shoot it down?

>It’s the kind of situation where technological might and speed could be critical to detection and counterstrike, with the time to make a decision measured in minutes and seconds. Anthropic chief executive Dario Amodei’s answer rankled the Pentagon, according to the official, who characterized the CEO’s reply as: You could call us and we’d work it out.

>An Anthropic spokesperson denied Amodei gave that response, calling the account “patently false,” and saying the company has agreed to allow Claude to be used for missile defense. But officials have cited this and another incident involving Claude’s use in the capture of Venezuelan leader Nicolás Maduro as flashpoints in a spiraling standoff between the company and the Pentagon in recent days. The meeting was previously reported by Semafor.

I have a hunch that Anthropic interpreted this question to be on the dimension of authority, when the Pentagon was very likely asking about capability, and they then followed up to clarify that for missile defense they would, I guess, allow an exception. I get the (at times overwhelming) skepticism that people have about these tools and this administration but this is not a reasonable position to hold, even if Anthropic held it accidentally because they initially misunderstood what they were being asked.

https://web.archive.org/web/20260227182412/https://www.washi...

lukan · 2026-02-28T06:08:39 1772258919

"It’s the kind of situation where technological might and speed could be critical to detection and counterstrike"

Missile detection and decision to make a (nuclear) counterstrike are 2 different things to me but apparently the department of war wants both, so it seems not "just" about missile detection.

retsibsi · 2026-02-28T13:01:19 1772283679

Is there any reason at all to believe the account of the unnamed "defence official"? Whatever your position on this administration, you know that it lies like the rest of us breathe. With a denial from the other side and a lack of any actual evidence, why should I give it non-negligible credence?

sillyfluke · 2026-02-28T13:47:54 1772286474

It is bizarre. I like how, "past performance predicts future performance" is supposed to apply to founders and companies but completely disregarded for a two term president and admin, as if we have no idea how they will operate in the future.

Anthropic, with its current war chest, is supposedly employeeing lawyers that are misunderstanding the Department of War? This is considered to be the likelier of possibilities, am I understanding this correctly?

remarkEon · 2026-02-28T16:19:35 1772295575

This is not what I said, and not what the WaPo quoted. We're talking about the CEO, who is shall we say unfamiliar with war making, getting asked a hypothetical about how the product he sells would perform in a first strike scenario, and he reportedly gives what is an entirely legalese answer. Yes, I consider this a likely possibility. It sounds exactly like how someone would respond if they've been swimming in legal memos for months.

sillyfluke · 2026-03-01T18:17:27 1772389047

> It sounds exactly like how someone would respond if they've been swimming in legal memos for months.

I think you're being highly speculative. The part you quoted from the WaPo doesn't even state the defense official was complaining about about any "legalese" reponse, that seems like a projection on your part. The only info you gave in your comment about what Dario said is only a defense official's paraphrasing. It seems a simply case of Dario refusing to give a blank check in all scenerios whereas the defense official, for maximum impact, chose to portray "not having a blank check" as "having to call Anthropic" in every case where "help" is given by an LLM. The appearance of "misunderstanding" you're seeing in the media is not about the parties' misunderstanding of what the other side wants, it's simply a fallout from each side fighting to control the narrative.

remarkEon · 2026-02-28T16:20:28 1772295628

Anonymous sources are bad again. Glad we could clear that up.

retsibsi · 2026-03-01T05:52:25 1772344345

That's a copout and you know it. You're focusing on the 'unnamed' part; I'm focusing on the 'representative of an administration that lies constantly and brazenly' part.

remarkEon · 2026-03-01T08:09:57 1772352597

Noted Rationalist responds to a question about a first strike scenario with "I need to think about it" instead of "of course we'd launch the missiles, are you kidding?" and everyone here seems to think this is somehow unbelievable.

retsibsi · 2026-03-01T10:10:17 1772359817

You're still dancing around the point. Person A said X; person B said not X; we have no concrete evidence either way. Person A is an anonymous representative of a group that has no norms against dishonesty, an obvious motive to falsely claim X, and a track record of telling frequent, shameless lies. X doesn't need to be 'unbelievable' for me to ask, again, what positive reason do you have for believing it?

darkerside · 2026-03-01T02:19:12 1772331552

Why the fuck would you use an LLM to determine whether a nuclear missile was hurtling towards you? The question makes no sense, and so you get a nonsensical answer.

Seems not unlikely that Anthropic was manipulated into this position for purposes of invalidating their contract.

wraptile · 2026-02-28T12:45:17 1772282717

> If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic’s Claude AI system to help shoot it down?

I'm sorry but lol

quaunaut · 2026-02-28T06:14:32 1772259272

Are you serious? This is the kind of thing you'd ask a clarifying question on and get information back immediately. Further, the huge overreaction from Hegseth shows this is a fundamental disagreement.

SpicyLemonZest · 2026-02-28T06:50:15 1772261415

The flip side of "Hegseth is an unqualified drunk", a position which I've always held and still maintain, is that he very well might crash out over nothing instead of asking clarifying questions or suggesting obvious compromises. This is the same guy who recalled the entire general staff to yell at them about the warrior mindset. Not an excuse for any of this, but I do think the precise nature of the badness matters.

twistedpair · 2026-02-28T13:18:13 1772284693

> could the military use Anthropic’s Claude AI system to help shoot it down?

What a joke. I suggest folks read up on the very poor performance of US ICBM interceptor systems. They're barely a coin flip, in ideal conditions. How is Claude going to help with that? Push the launch interceptor button faster? Maybe Claude can help design a better system, but it's not turning our existing poor systems into super capable systems by simply adding AI.

WD-42 · 2026-02-28T04:59:23 1772254763

I'm sure it's a matter of interpretation. Anthropic thinks the DoW's demands will lead to mass surveillance and auto-kill bots. The DoW probably disagrees with that interpretation, and all OpenAI needs to do is agree with the DoW.

My bet is that what the DoW wants is pretty clearly tied to mass surveillance and kill-bots. Altman is a snake.

PaulDavisThe1st · 2026-02-28T06:04:31 1772258671

Why do you choose to call it the "DoW"? Its official name is the Department of Defense, it was titled that way by Congress and only Congress can change it. What is your motivation in using a term that the current administration has started to use? Do you also use the Gulf of America when referrring to the body of water that defines the southern edge of the USA?

thejazzman · 2026-02-28T06:43:38 1772261018

Don't you think it is more to-the-point to call it what it is and what the people running it with, i'll bet everything i have, absolute immunity, are doing and intend to do with it?

It's like the one honest thing they've done

PaulDavisThe1st · 2026-02-28T15:43:52 1772293432

It is "honest" in the historical sense, certainly.

But the executive-order driven name change just another bit of illegal/extra-legal/paralegal behavior by the administration that, every time we just nod along, eats away at the constitutional structure of our government. So don't go along with it.

j_maffe · 2026-03-02T18:32:23 1772476343

Personally, as someone coming from a region that has suffered many times over by the actions of this so called "Ministry of Defense," I feel like "Department of War" is a more accurate and honest term.

PaulDavisThe1st · 2026-03-02T21:37:56 1772487476

As I've noted multiple times here on HN, I don't disagree with this.

But the question is not about whether it is a more accurate and honest term. It is about people complying in advance with the illegal/extra-legal renaming of a federal agency by a president who does not have the right or authority to do so.

If we were talking about Congress voting to rename the DoD as the DoW, I'd have nothing to say on the matter that differed from your observation.

j_maffe · 2026-03-03T07:32:00 1772523120

Yeah that's fair you guys should still call it DoD. I think I'm gonna stick to DoW from here on out though.

matsemann · 2026-02-28T07:04:48 1772262288

It's the term used by Sam Altman in the announcement. Maybe aim your anger there, to someone knowingly helping them in their attempt to turn the department into one of aggression.

charcircuit · 2026-02-28T09:07:20 1772269640

The president changed it back to its original name with an executive order. The administration did not just start spontaneously using it.

herewulf · 2026-02-28T17:23:29 1772299409

No, the Department of War is the former name of the Department of the Army and nothing else. DoD is a new creation that includes the Army, historic Department of the Navy, and the other, post-WW2, new services.

"Changing it back" is completely ahistoric.

PaulDavisThe1st · 2026-02-28T15:38:32 1772293112

The president has no authority to do this. Federal departments and agencies are named by Congress, and even the Republicans in Congress have shown no interest in formalizing this.

charcircuit · 2026-03-01T01:08:56 1772327336

The president can't change the law itself, but he can change the name they use.

PaulDavisThe1st · 2026-03-01T03:58:05 1772337485

The law defines the name they should be using

charcircuit · 2026-03-01T05:33:48 1772343228

It's not like there is a law that says they have to use that name on their X account or what domain to use for their website or emails.

PaulDavisThe1st · 2026-03-02T21:40:02 1772487602

Sure, no such law that I know of. But there's also no law that suggests that anybody else needs to refer to the Department of Defense using terms that the president and his minions just made up out of thin air. I'm also arguing that going along with them, by itself, is harmful to a democratic government.

j_maffe · 2026-02-28T08:46:54 1772268414

If someone is calling themselves a warmonger, they should be called a warmonger.

PaulDavisThe1st · 2026-02-28T15:41:57 1772293317

100%. But the names of US agencies are not the names of people, and not determined by individuals, even the warmongers.

IsTom · 2026-02-28T10:17:43 1772273863

The only more fitting name currently would be Department of Peace

calgoo · 2026-02-28T10:16:05 1772273765

Exactly this! Just like the Gulf of Mexico is still called the Gulf of Mexico, if we just ignore his ramblings and continue calling the department of defense, we undermine his whole point. If we fall for all their crap and just accept it, then we loose in the end. Any resistance to a Fascist government is good resistance. Anything that makes their life's a little shittier is good. Better that they go around having tantrums about how they renamed it but no one is paying attention.

IsTom · 2026-02-28T10:18:51 1772273931

> The DoW probably disagrees with that interpretation

Or perhaps, maybe, just a little maybe, DoW is getting absolutely excited about mass surveillance and kill-bots?

tombert · 2026-02-28T06:19:04 1772259544

Not that this will matter on any individual level, but I canceled my ChatGPT subscription after this.

I didn't have much of an opinion of Altman before but now I think he's a grifting douche.

propagandist · 2026-02-28T05:12:24 1772255544

Human responsibility is not the same as human decision making.

And they are crossing the picket line, which honestly I was sure they would do, though I did expect it to take a bit longer.

This is too transparent even for sama.

nick486 · 2026-02-28T05:49:17 1772257757

>Human responsibility is not the same as human decision making.

this is going to end up being interpreted as "well, the president signed off on the operation. see - there's a human in the loop!" - is it?

propagandist · 2026-02-28T06:23:01 1772259781

That's precisely how I read it. They're weasel words delivered by the master weasel himself.

khalic · 2026-02-28T09:01:07 1772269267

Anthropic has safeguards baked in the model, this is the only way to make sur it's harder for the DOJ to misuse it. A pinky swear from the DoD means nothing

newguytony · 2026-02-28T04:41:23 1772253683

Good ole Sammy has never lied

arugulum · 2026-02-28T05:16:07 1772255767

If your starting position is already that Sam Altman lies about everything that doesn't fit your preconceived positions, that doesn't seem like a very useful meaningful position to update.

lioeters · 2026-02-28T05:47:39 1772257659

The company started with a lie, it's in the name.

sumeno · 2026-02-28T13:08:20 1772284100

Ok, but he does

johnbellone · 2026-02-28T11:35:16 1772278516

fooker · 2026-02-28T04:56:40 1772254600

Unrelated, but want to buy a bridge?

You could recoup your investment in a year by collecting toll. Expedited financing available on good credit!

tomhow · 2026-02-28T05:16:52 1772255812

Please don’t do this here.

adampunk · 2026-02-28T05:28:46 1772256526

[flagged]

pseudalopex · 2026-02-28T05:37:33 1772257053

https://news.ycombinator.com/item?id=47190644

arugulum · 2025-11-05T01:13:50 1762305230

>that they need to rig their elections against themselves to get dissenting voices

I don't believe this is true. If you're talking about Non-Constituency Members of Parliament, they are consolation prizes given to best losers, and there are many things they cannot vote on. Moreover, the ruling party almost never lifts the party whip, i.e. members of the party CANNOT vote against the party line (without being kicked out of the party, which results in them being kicked out of parliament). In other words, since the ruling party already has a majority, any opposing votes literally do not matter.

If you aren't talking about the NCMP scheme, then I do not know what you're talking about, as the ruling party does institute policies that are beneficial for the incumbent party.

arugulum · 2025-08-17T05:28:35 1755408515

GPT-1 wasn't used as a zero-shot text generator; that wasn't why it was impressive. The way GPT-1 was used was as a base model to be fine-tuned on downstream tasks. It was the first case of a (fine-tuned) base Transformer model just trivially blowing everything else out of the water. Before this, people were coming up with bespoke systems for different tasks (a simple example is that for SQuAD a passage-question-answering task, people would have an LSTM to read the passage and another LSTM to read the question, because of course those are different sub-tasks with different requirements and should have different sub-models). One GPT-1 came out, you just dumped all the text into the context, YOLO fine-tuned it, and trivially got state on the art on the task. On EVERY NLP task.

Overnight, GPT-1 single-handedly upset the whole field. It was somewhat overshadowed by BERT and T5 models that came out very shortly after, which tended to perform even better on the pretrain-and-finetune format. Nevertheless, the success of GPT-1 definitely already warrants scaling up the approach.

A better question is how OpenAI decided to scale GPT-2 to GPT-3. It was an awkward in-between model. It generated better text for sure, but the zero-shot performance reported in the paper, while neat, was not great at all. On the flip side, its fine-tuned task performance paled compared to much smaller encoder-only Transformers. (The answer is: scaling laws allowed for predictable increases in performance.)

gnerd00 · 2025-08-17T14:31:37 1755441097

> Transformer model just trivially blowing everything else out of the water

no, this is the winners rewriting history. Transformer style encoders are now applied to lots and lots of disciplines but they do not "trivially" do anything. The hype re-telling is obscuring the facts of history. Specifically in human language text translation, "Attention is All You Need" Transformers did "blow others out of the water" yes, for that application.

arugulum · 2025-08-17T20:16:04 1755461764

My statement was

>a (fine-tuned) base Transformer model just trivially blowing everything else out of the water

"Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for.

GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything".

arugulum · 2025-07-01T05:09:45 1751346585

Because the author is artifically shrinking the scope of one thing (prompt engineering) to make its replacement look better (context engineering).

Never mind that prompt engineering goes back to pure LLMs before ChatGPT was released (i.e. before the conversation paradigm was even the dominant one for LLMs), and includes anything from few-shot prompting (including question-answer pairs), providing tool definitions and examples, retrieval augmented generation, and conversation history manipulation. In academic writing, LLMs are often defined as a distribution P(y|x) where X is not infrequently referred to as the prompt. In other words, anything that comes before the output is considered the prompt.

But if you narrow the definition of "prompt" down to "user instruction", then you get to ignore all the work that's come before and talk up the new thing.

arugulum · on Dec 24, 2024

I believe the above post was highlighting that as a misconception young people may have, not saying it is the case.

arugulum · on June 23, 2024

Two points to consider, one against and one for.

1) It's a small island, but it's also a major trading port. Which means its whole economy is already geared towards importing food from neighboring countries.

2) On the other hand: no domestic industry to disrupt! No domestic farming groups lobbying against meat substitutes, which may push research/distribution furhter along.

rjh29 · on June 24, 2024

Singapore has the most expensive meat in Asia ( https://www.picodi.com/sg/bargain-hunting/meat-prices-2023 ) and I guess depends mostly on Malaysia for fresh meat.

They're also big on future-proofing and environmental awareness in general as they have a very long term stable government that looks 10-100 years ahead.

aziaziazi · on June 24, 2024

May you share some more knowledge or source regarding their environmental awareness ? Thanks

arugulum · on June 16, 2024

The long story short is you are technically correct but in practice things are a little different. There are 2 factors to consider here:

1. Model Capability

You are right that mechanically, input and output tokens in a standard decoder Transformer are "the same". A 32K context should mean you can have 1 input tokens and 32K output tokens (you actually get 1 bonus token), or 32K input tokens and 1 output token,

However, if you feed an LM "too much" of its own input (read: have too long an output length), it starts to go off the rails, empirically. The word "too much" is doing some work here: it's a balance of both (1) LLM labs having data that covers that many output tokens in an example and (2) LLMs labs having empirical tests to have confidence that the model won't reasonably go off the rails within some output limit. (Note, this isn't pretraining but the instruction tuning/RLHF after, so you don't just get examples for free)

In short, labs will often train a model targeting an output context length, and put out an offering based on that.

2. Infrastructure

While mathematically having the model read external input and its own output are the same, the infrastructure is wildly different. This is one of the first things you learn when deploying these models: you basically have a different stack for "encoding" and "decoding" (using those terms loosely. This is after all still a decoder only model). This means you need to set max lengths for both encoding and decoding separately.

So, after a long time of optimizing both the implementation and length hyperparameters (or just winging it), the lab will decide "we have a good implementation for up to 31K input and 1k output" and then go from there. If they wanted to change that, there's a bunch of infrastructure work involved. And because of the economies of batching, you want many inputs to have as close to the same lengths as possible, so you want to offer fewer configurations (some of this bucketing may be performed hidden from the user). Anyway, this is why it may become uneconomical to offer a model at a given length configuration (input or output) after some time.

arugulum · on May 23, 2024

You could easily make the other argument: As a professor of ethics she studies many different ethical systems, including ones that are not mainstream. This means that she can more easily find some ethical system under which a given action is considered ethical.

The "ethics expert = more ethical" connection has never held up and mainly serves as a gotcha.

wpietri · on May 24, 2024

It's a good thing I never claimed "ethics expert = more ethical", then. What I'm saying is that I agree there's an irony here.

It's true that, as you say, she could use her knowledge of ethics to be less ethical. But that would just be a different kind of irony for somebody who teaches on law and ethics.

jessriedel · on May 24, 2024

It’s not ironic if a mechanic is a bad race-car driver

arugulum · on Feb 14, 2024

Is it stated somewhere that Radford was inspired by that blog post?

magoghm · on Feb 14, 2024

I tried to find the where I heard that Radford was inspired by that blog post, but the closest thing I found is that in the "Sentiment Neuron" paper (Learning to Generate Reviews and Discovering Sentiment: https://arxiv.org/pdf/1704.01444.pdf), in the "Discussion and Future Work" section they mention this Karpathy paper from 2015: Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078