More

jmogly · 2026-04-03T03:52:19 1775188339

It is kind of a fundamental risk of IMDS, the guest vms often need some metadata about themselves, the host has it. A hardened, network gapped service running host side is acceptable, possibly the best solution. I think the issue is if your IMDS is fat and vulnerable, which this article kind of alludes to.

There’s also the fact that azure’s implementation doesn’t require auth so it’s very vulnerable to SSRF

axelriet · 2026-04-03T04:25:33 1775190333

You could imagine hosting the metadata service somewhere else. After all there is nothing a node knows about a VM that the fabric doesn’t. And things like certificates comes from somewhere anyway, they are not on the node so that service is just cache.

cyberax · 2026-04-03T08:11:45 1775203905

Hosting IMDS on the host side is pretty much the only reasonable way to provide stability guarantees. It should still work even if the network is having issues.

That being said, IMDS on AWS is a dead simple key-value storage. A competent developer should be able to write it in a memory-safe language in a way that can't be easily exploited.

axelriet · 2026-04-03T10:35:05 1775212505

“No, there is another”—Yoda, The Empire Strikes Back :)

What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

Then someone can try to DoS you, etc.

Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

This is a path to own the host, an unnecessary risk with too many moving parts.

Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.

cyberax · 2026-04-03T17:32:23 1775237543

Of course, putting the metadata service into its own separate system is better. That's how Amazon does it with the modern AWS. A separate Nitro card handles all the networking and management.

But if you're within the classic hypervisor model, then it doesn't really matter that much. The attack surface of a simple plain HTTP key-value storage is negligible compared to all other privileged code that needs to run on the host.

Sure, each tenant needs to have its own instance of the metadata service, and it should be bound to listen on the tenant-specific interface. AWS also used to set the max TTL on these interface to 1, so the packets would be dropped by routers.

jmogly · 2026-04-03T05:24:27 1775193867

Ah yes great point, awesome article by the way —- thought provoking, shocking, really crazy stuff. Hopefully some good comes of it, godspeed.

jmogly · 2026-04-03T03:34:12 1775187252

Mainly for getting managed-identity access tokens for Azure APIs. In AWS you can call it to get temporary credentials for the EC2’s attached IAM role. In both cases - you use IMDS to get tokens/creds for identity/access management.

Client libraries usually abstract away the need to call IMDS directly by calling it for you.

dh2022 · 2026-04-03T05:25:33 1775193933

Thank you, and everyone else who responded. So then this type of service seems to be used by other cloud providers (AWS). What makes this Azure service so much more insecure than its AWS equivalent?

Thanks again!

[edited phrasing]

guardiangod · 2026-04-03T06:47:09 1775198829

Having it running on host (!), and the metadata for all guest VMs stored and managed by the same memory/service (!!), with no clear security boundary (!!!).

It's like storing all your nuke launch codes in the same vault, right in the middle of Washington DC national mall. Things are okay, until they are not okay.

axelriet · 2026-04-03T10:56:05 1775213765

Lovely explanation :)

jmogly · 2026-04-03T02:08:13 1775182093

This is insane, when you say azure OpenAI, do you mean like github copilot, microsoft copilot, hitting openai’s api, or some openai llm hosted on azure offering that you hit through azure? This is some real wild west crap!

nkozyra · 2026-04-03T02:47:26 1775184446

The latter, their arrangement with OpenAI enabled this.

SahAssar · 2026-04-03T14:12:04 1775225524

I'd assume they mean https://azure.microsoft.com/en-us/products/ai-foundry/models...

Manouchehri · 2026-04-03T14:20:39 1775226039

Correct.

pratyushnair01 · 2026-04-03T06:16:15 1775196975

I have noticied a similar bug on Copilot. I noticed a chat session with questions that I had no recollection of asking. I wonder if it's related. I brushed it off as the question was generic.

Manouchehri · 2026-04-03T12:43:34 1775220214

I would guess that Copilot uses Azure OpenAI.

In my small sample size of a bit over a 100 accidentally leaked messages, many/most of them are programming related questions.

It's easy to brush it off as just LLM hallucinations. Azure OpenAI actually shows me how many input tokens were billed, and how many input tokens checked by the content filter. For these leaked responses, I was only billed for 8 input tokens, yet the content filter (correctly) checked >40,000 chars of input token (which was my actual prompt's size).

jmogly · 2026-03-28T12:46:45 1774702005

Haha, you can already see wheel reinventors in this thread starting to spin their reinvention wheels. Nice stuff, I run my agents in containers.

jmogly · 2026-03-15T11:49:46 1773575386

OP never mentioned letting the agent run as him or use his secrets. All of the issues you mention can be solved by giving the agent it’s own set of secrets or using basic file permissions, which are table stakes.

Back to the MCP debate, in a world where most web apis have a schema endpoint, their own authentication and authorization mechanisms, and in many instances easy to install clients in the form of CLIs … why do we need a new protocol, a new server, a new whatever. KISS

CharlieDigital · 2026-03-15T13:41:27 1773582087

    > OP never mentioned letting the agent run as him or use his secrets

That is implicit with a CLI because it is being invoked in the user session unless the session itself has been sandboxed first. Then for the CLI to access a protected resource, it would of course need API keys or access tokens. Sure, a user could set up a sandbox and could provision agent-specific keys, but everyone could always enable 2FA, pick strong passwords, use authenticators, etc . and every org would have perfect security.

That's not reality.

jmogly · 2026-03-09T11:08:20 1773054500

Chat gpt is a great name though — you “chat” with the “GPT” so its self informing (even if you dont know what a GPT is), it’s 4 syllables that roll off the tongue well together.

RSS, has no vowels, no information, and looks like an alphabet term you might see at the doctor’s office or in an HR onboarding form at a corpo.

wiether · 2026-03-09T15:03:05 1773068585

Randos are just calling it "Chat" now.

"I'll ask Chat about x!"

msephton · 2026-03-10T01:28:53 1773106133

In Japan it's now known colloquially as 「チャッピー」 ("Chappy" or "Chappie"). High praise that it has received such shortened and personified version so quickly.

tobr · 2026-03-09T15:15:56 1773069356

It’s the new ”I looked it up on wiki”.

youniverse · 2026-03-09T23:02:13 1773097333

I've heard 'just ai it' from high schoolers.

jmogly · 2026-02-01T14:18:43 1769955523

Early in my career I would build something I thought was useful, deploy it, meet with people within the company to get people to start using it. A lot of effort for something that would have a positive impact. My manager would schedule a meeting with me, and with a look of panic open with, “why didn’t you tell me about this or why did you do this?”. I understand now that before you start something, you need to decide who you are going to give credit to, and that person needs to be made aware that they will get credit for the project. Ideally your boss’s boss’s boss. Corporate caché only exist insofar as leadership allows it to exist, you gotta play the game. Pawns don’t get to take the glory for themselves.

nebezb · 2026-02-01T14:37:20 1769956640

Were you doing it on your own time? From your described “a lot of effort,” I assume it was not but please correct me if I’m wrong.

If you’re being paid for your time by someone else, it’s fair to notify them how you plan to use a significant chunk of that money before you do it. Unless of course you were employed to _not_ do that.

I am not suggesting explaining a day or two of work. But it sounds like you’re talking weeks.

jmogly · 2026-02-01T16:26:11 1769963171

It would be like if I was expected to deliver A by the end of the quarter and instead I delivered A + B. The value gain from B was more than A. Your manager (and hopefully higher up the org) better know about B, or they will attack it as a threat.

Also, I’m not being paid for my time, I’m being paid to do a job. “Trading your time for money” is one of the most self defeating views on work you can have. It reduces you from a worker with agency to a detached prostitue, and is harmful to both the employer and employee.

quijoteuniv · 2026-02-01T16:40:22 1769964022

Not sure i understand what your are trying to say, good communication is definitely important, if only to serve oneself

jmogly · 2026-01-09T22:13:46 1767996826

Like it, a lot. I think the future of software is going to be unimaginably dynamic. Maybe apps will not have statically defined feature sets, they will adjust themselves around what the user wants and the data it has access to. I’m not entirely sure what that looks like yet, but things like this are a step in that direction.

dmux · 2026-01-09T23:30:59 1768001459

> I think the future of software is going to be unimaginably dynamic.

>...I’m not entirely sure what that looks like yet, but things like this are a step in that direction.

This made me stop and think for a moment as to what this would look like as well. I'm having trouble finding it, but I think there was a post by Joe Armstrong (of Erlang) that talked about globally (as in across system boundaries, not global as in global variable) addressable functions?

cess11 · 2026-01-11T19:20:21 1768159221

Not sure if I've read such an article, but it would be a reasonable next step from the globally addressable processes of the BEAM VM.

As I understand it Unison tries to do something like that but that might be wrong.

https://www.unison-lang.org/

jmogly · 2025-12-29T22:53:18 1767048798

I would say it varies from 0x to a modest 2x. It can help you write good code quickly, but, I only spent about 20-30% of my time writing code anyway before AI. It definitely makes debugging and research tasks much easier as well. I would confidently say my job as a senior dev has gotten a lot easier and less stressful as a result of these tools.

One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.

mattmanser · 2025-12-30T09:34:41 1767087281

Ok, if you're a senior dev, have you 'caught' it yet?

Ask it a question about something you know well, and it'll give you garbage code that it's obviously copied from an answer on SO from 10 years ago.

When you ask it for research, it's still giving you garbage out of date information it copied from SO 10 years ago, you just don't know it's garbage.

theshrike79 · 2025-12-30T16:07:20 1767110840

That's why you dont use LLMs as a knowledge source without giving them tools.

"Agents use tools in a loop to achieve a goal."

If you don't give any tools, you get hallucinations and half-truths.

But you give one a tool to do, say, web searches and it's going to be a lot smarter. That's where 90% of the innovation with "AI" today is coming from. The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.

Tools are the main reason Claude Code is as good as it is compared to the competition.

andrekandre · 2025-12-30T19:14:08 1767122048

  > The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.

yes, that is my understanding as well, though it gets me thinking if that is true, then what real value is the llm on the server compared to doing that locally + tools?

theshrike79 · 2025-12-30T20:04:40 1767125080

You still can't beat an acre of specialized compute with any kind of home hardware. That's pretty much the power of cloud LLMs.

For a tool use loop local models are getting to "OK" levels, when they get to "pretty good", most of my own stuff can run locally, basically just coordinating tool calls.

jmogly · 2025-12-30T14:03:19 1767103399

Of course, step one is always critically think and evaluate for bad information. I think for research, I mainly use it for things that are testable/verifiable, for example I used it for a tricky proxy chain set up. I did try to use it to learn a language a few months ago which I think was counter productive for the reasons you mentioned.

mattmanser · 2026-01-02T12:46:49 1767358009

How can you critically assess something in a field you're not already an expert on?

That Python you just got might look good, but could be rewritten from 50 lines to 5, it's written in 2010-style, it's not using modern libraries, it's not using modern syntax.

And it is 50 to 5. That is the scale we're talking about in a good 75% of AI produced code unless you challenge it constantly. Not using modern syntax to reduce boilerplate, over-guarding against impossible state, ridiculous amounts of error handling. It is basically a junior dev on steriods.

Most of the time you have no idea that most of that code is totally unnecessary unless you're already an expert in that language AND libraries it's using. And you're rarely an expert in both or you wouldn't even be asking as it would have been quicker to write the code than even write the prompt for the AI.

skydhash · 2025-12-30T13:14:07 1767100447

I use web search (DDG) and I don’t think I have ever try more than one queries in the vast majority of cases. Why because I know where the answer is, I’m using the search engine as an index to where I can find it. Like “csv python” to find that page in the doc.

jmogly · 2025-12-28T09:49:59 1766915399

I’ve been using the following pattern since gpt3, the only other thing I have changed was added another parameter for schema for structured output. People really love to overcomplicate things.

class AI(Protocol):

def call_llm(prompt: str) -> str: …