Hacker Newsnew | past | comments | ask | show | jobs | submit | dceddia's commentslogin

This look nice! I was curious about being allowed to use a Claude Pro/Max subscription vs an API key, since there's been so much buzz about that lately, so I went looking for a solid answer.

Thankfully the official Agent SDK Quickstart guide says that you can: https://platform.claude.com/docs/en/agent-sdk/quickstart

In particular, this bit:

"After installing Claude Code onto your machine, run claude in your terminal and follow the prompts to authenticate. The SDK will use this authentication automatically."


But their docs also say:

> Unless previously approved, Anthropic does not allow third party developers to offer claude.ai login or rate limits for their products, including agents built on the Claude Agent SDK. Please use the API key authentication methods described in this document instead.

Which I have interpreted means that you can’t use your Claude code subscription with the agent SDK, only API tokens.

I really wish Anthropic would make it clear (and allow us to use our subscriptions with other tools).


Didn't Thariq make it clear three weeks ago when they shut down 3rd party tool access and the OpenCode users were upset?

> Third-party harnesses using Claude subscriptions create problems for users and are prohibited by our Terms of Service.

https://xcancel.com/trq212/status/2009689809875591565


i think thats conflating two things (am not an expert). opencode exploited unauthorized use/api access, but obviously whatever that is using claude code sdk is kosher because its literally anthropic's blessed way to do this

thariq did a good intro here https://www.youtube.com/watch?v=TqC1qOfiVcQ


OP here. Yes! This was a big motivation for me to try and build this. Nervous Anthropic is gonna shut down my account for using Clawdbot.

This project uses the Agents SDK so it should be kosher in regards to terms of service. I couldn't figure out how to get the SDK running inside the containers to properly use the authenticated session from the host machine so I went with a hacky way of injecting the oauth token into the container environment. It still should be above board for TOS but it's the one security flaw that I know about (malicious person in a WhatsApp group with you can prompt inject the agent to share the oauth key).

If anyone can help out with getting the authenticated session to work properly with the agents running in containers it would be much appreciated.


I went down this rabbit hole a bit recently trying to use claude inside fence[0] and it seems that on macOS, claude stores this token inside Keychain. I'm not sure there's a way to expose that to a container... my guess would be no, especially since it seems the container is Linux, and also because keeping the Keychain out of reach of containers seems like it would be paramount. But someone might know better!

0: https://github.com/Use-Tusk/fence


> "I went down this rabbit hole a bit recently trying to use claude inside fence[0]"

Did you get it working in the end? I assume you didn't share your setup/config anywhere?


Yeah, forgot when I wrote this comment that the thing about keychain was to pass that auth token into a Docker container, which I gave up on (Tauri desktop app needs to compile Rust and link against other stuff, different architecture inside the container blah blah)

More or less what it says in the README:

    fence -t code -- claude --dangerously-skip-permissions
Or wrap it in a function as an alias

    # cat prompt.md | ralph
    function ralph() {
      fence -t code -- \
        claude --verbose --dangerously-skip-permissions --output-format stream-json -p "$@" \
        | jq -r 'select(.type == "assistant") | .message.content[]? | select(.type? == "text") | .text'
    }

True. There’s a setting for Claude code though where you can add apiKeyHelper which is a script you add that gets the token for Claude Code. I imagine you can use that but haven’t quite figured out how to wire it up

Can you do everything via the SDK as via regular API calls? Caching etc all works? You can get reasoning, responses, tool call info, ... ?

Wow, thanks for posting that, news to me! In this case I don’t understand why there was a whole brouhaha with OpenClaw and the like - I guess they were invoking it without the official SDK? Because this makes it seem like if you have the sub you can build any agentic thing you like and still use your subscription, as long as you can install and login to Claude code on the machine running it.

Tons of chatter on Twitter making it sound like you'll get permabanned for doing this but... 1) how would they know if my requests are originating from Claude Code vs. OpenClaw? 2) how are we violating... anything? I'm working within my usage limits...

$70 or whatever to check if there's milk... just use your Claude Max subscription.


> how would they know if my requests are originating from Claude Code vs. OpenClaw

How wouldn't they know? Claude Code is proprietary they can put whatever telemetry they want in there.

> how are we violating... anything? I'm working within my usage limits...

It's well known that Claude code is heavily discounted compared to market API rates. The best interpretation of this is that it's a kind of marketing for their API. If you are not using Claude code for what it's intended for, then it's violating at least the spirit of that deal.


The Claude Code client adds system prompts and makes a bunch of calls to analytics/telemetry endpoints so it's certainly feasible for them to tell, if they inspect the content of the requests and do any correlation between those services.

And apparently it's violating the terms of service. Is it fair and above board for them to ban people? idk, it feels pretty blatantly like control for the sake of control, or control for the sake of lock-in, or those analytics/telemetry contain something awfully juicy, because they're already getting the entire prompt. It's their service to run as they wish, but it's not a pro-customer move and I think it's priming people to jump ship if another model takes the lead.


Hate to ask the obvious question but.. how does Claude check for milk?

Was there a brouhaha with OpenClaw or was that with OpenCode?

It was with OpenCode, but a LOT of the commentariat is insisting that running OpenClaw through subscription creds instead of API is out of TOS and will get you banhammered.

I think you’re right and it was OpenCode. The semantic collisions are going to becpme more of a problem in the coming Cambrian explosion of software

Interesting about the level of detail. I’ve noticed that myself but I haven’t done much to address it yet.

I can imagine some ideas (ask it for more detail, ask it to make a smaller plan and add detail to that) but I’m curious if you have any experience improving those plans.


I’m trying to solve this myself by implementing a whole planner workflow at https://github.com/solatis/claude-config

Effectively it tries to resolve all ambiguities by making all decisions explicit — if the source cannot be resolved to documentation or anything, it’s asked to the user.

It also tries to capture all “invisible knowledge” by documenting everything, so that all these decisions and business context are captured in the codebase again.

Which - in theory - should make long term coding using LLMs more sane.

The downside is that it takes 30min - 60min to write a plan, but it’s much less likely to make silly choices.


Have you tried the compound engineering plugin? [^1]

My workflow with it is usually brainstorm -> lfg (planning) -> clear context -> lfg (giving it the produced plan to work on) -> compound if it didn’t on its own.

[^1]: https://github.com/EveryInc/compound-engineering-plugin


That’s super interesting, I’ll take a look to see if I can learn something from it, as I’m not familiar with the concept of compound engineering.

Seems like a lot of it aligns with what I’m doing, though.


> The downside is that it takes 30min - 60min to write a plan

Oof you weren't kidding. I've got your skills running on a particularly difficult problem and it's been running for over three hours (I keep telling it to increase the number of reviews until its satisfied).


Yeah I’m working on some improvements in this area, should make things faster. But yeah I’ve frequently had 1h-2h planning sessions as well, depending upon the complexity of the task.

I have had good success with the plans generated by https://github.com/obra/superpowers I also really like the Socratic method it uses to create the plans.

I iterate around issues. I have a skill to launch a new tmux window for worktree with Claude in one pane and Codex in another pane with instructions on which issue to work on, Claude has instructions to create a plan, while Codex has instructions to understand the background information necessary for this issue to be worked on. By the time they're both done, then I can feed Claude's plan into Codex, and Codex is ready to analyze it. And then Codex feeds the plan back to Claude, and they kind of ping pong like that a couple times. And after a certain or several iterations, there's enough refinement that things usually work. Then Claude clears context and executes the plan. Then Codex reviews the commit and it still has all the original context so it knows what we have been planning and what the research was about the infrastructure. And it does a really good job reviewing. And again, then they ping pong back and forth a couple times, and the end product is pretty decent. Codex's strength is that it really goes in-depth. I usually do this at a high reasoning effort. But Codex has zero EQ or communication skills, so it works really well as a pedantic reviewer. Claude is much more pleasant to interact with. There's just no comparison. That's why I like planning with Claude much more because we can iterate.. I am just a hobbyist though. I do this to run my Ansible/Terraform infrastructure for a good size homelab with 10 hosts. So we actually touch real hardware a lot and there's always some gotchas to deal with. But together, this is a pretty fun way to work. I like automating stuff, so it really scratches that itch.

I concur, also with no benchmarks to share, but I had the experience of rewriting a video editor timeline to use WebGL instead of the 2D canvas I started with and it got much faster. Like being able to draw 10k+ rectangles at 60fps became easy, where with 2D canvas it was stumbling.

I don't know any project which uses the 2D canvas. It's horribly inefficient except for the most trivial use-cases (basically demos). Any serious web graphics uses WebGL and shaders.

Hard disagree. Canvas 2D is fully GPU-accelerated in modern browsers and can easily handle thousands of draw calls at 60fps,more than enough for most practical applications. For data visualization, interactive tools, drawing apps, and UI rendering, it's a robust and performant choice. WebGL is often overkill unless you're dealing with extreme datasets or 3D scenes. With its simpler API and faster startup, Canvas 2D is perfectly suited for the vast majority of 2D use cases. Labeling it as 'horribly inefficient' is simply wrong ._.

With their tagline being “video for developers”, isn’t this their whole thing? It seems like another service would be a better fit if having a management UI is a requirement.

So I’m probably in a similar spot - I mostly prompt-and-check, unless it’s a throwaway script or something, and even then I give it a quick glance.

One thing that stands out in your steps and that I’ve noticed myself- yeah, by prompt 10, it starts to suck. If it ever hits “compaction” then that’s beyond the point of return.

I still find myself slipping into this trap sometimes because I’m just in the flow of getting good results (until it nosedives), but the better strategy is to do a small unit of work per session. It keeps the context small and that keeps the model smarter.

“Ralph” is one way to do this. (decent intro here: https://www.aihero.dev/getting-started-with-ralph)

Another way is “Write out what we did to PROGRESS.md” - then start new session - then “Read @PROGRESS.md and do X”

Just playing around with ways to split up the work into smaller tasks basically, and crucially, not doing all of those small tasks in one long chat.


I will check out Ralph (thank you for that link!).

> Another way is “Write out what we did to PROGRESS.md” - then start new session - then “Read @PROGRESS.md and do X”

I agree on small context and if I hit "compacting" I've normally gone too far. I'm a huge fan of `/clear`-ing regularly or `/compact <Here is what you should remember for the next task we will work on>` and I've also tried "TODO.md"-style tracking.

I'm conflicted on TODO.md-style tracking because in practice I've had an agent work through everyone on the list, confidently telling me steps are done, only to find that's not the case when I check its work. Either a TODO.md that I created or one I had the agent create both suffer from this. Also, getting it update the TODO.md has been frustrating, even when I add it to CLAUDE.md "Make sure to mark tasks as complete in the TODO.md as you finish them" or adding the same message to the end of all my prompts, it won't always update it.

I've been interested in trying out beads to see if works better than a markdown TODO file but I haven't played with that yet.

But overall I agree with you, smaller chunks are key to success.


I hate TODO.mds too. If I ever have to use one, I'll keep track of it manually, and split the work myself into chunks of the size I believe CC/codex can handle. TODO.md is a recipe for failure because you'll quickly have more code than you can review and nothing to trust that it was executed well.

That looks so ridiculous that it has me wondering how hard of a technical change it would’ve been to change that drag target, and if they just punted on it.


Is it possible for Ghostty to figure out how much memory its child processes (or tabs) are using? If so maybe it would help to surface this number on or near the tab itself, similar to how Chrome started doing this if you hover over a tab. It seems like many of these stem from people misinterpreting the memory number in Activity Monitor, and maybe having memory numbers on the tabs would help avoid that.


In many cases today “gif” is a misnomer anyway and mp4 is a better choice. Not always, not everywhere supports actual video.

But one case I see often: If you’re making a website with an animated gif that’s actually a .gif file, try it as an mp4 - smaller, smoother, proper colors, can still autoplay fine.


I had kinda suspected this just based on my own experience of paper vs screen, but hadn’t run across any research.

After seeing your comment I went looking! I found this interesting: https://phys.org/news/2024-02-screens-paper-effective-absorb...


That was one of the studies that I saw too.

There's some others about learning more from writing with pen on paper compared to tablet or taking notes digitally typing.

I am a digital note taker at heart but can't deny using a notebook still has better outcomes sometimes.


The situation on Windows got remarkably better and cheaper recently-ish with the addition of Azure code signing. Instead of hundreds or thousands for a cert it’s $10/month, if you meet the requirements (I think the business must have existed for some number of years first, and some other things).

If you go this route I highly recommend this article, because navigating through Azure to actually set it up is like getting through a maze. https://melatonin.dev/blog/code-signing-on-windows-with-azur...


Thanks for the link, I see only available to basically US, Canada and EU though.


That's not easier and cheaper than before. That's how it's always been only now you can buy the cert through Azure.

For an individual the Apple code signing process is a lot easier and more accessible since I couldn't buy a code signing certificate for Windows without being registered as a business.


> That's how it's always been only now you can buy the cert through Azure.

Where can you get an EV cert for $120/year? Last time I checked, all the places were more expensive and then you also had to deal with a hardware token.

Lest we talk past each other: it's true that it used to be sufficient to buy a non-EV cert for around the same money, where it didn't require a hardware token, and that was good enough... but they changed the rules in 2023.


> it’s $10/month

So $120 a year but no it's only Apple with a "tAx"


Millions of Windows power users are accustomed to bypassing SmartScreen.

A macOS app distributed without a trusted signature will reach a far smaller audience, even of the proportionately smaller macOS user base, and that's largely due to deliberate design decisions by Apple in recent releases.


As you said, you need to have a proper legal entity for about 2 years before this becomes an option.

My low-stakes conspiracy theory is that MS is deliberately making this process awful to encourage submission of apps to the Microsoft Store since you only have to pay a one-time $100 fee there for code-signing. The downside is of course that you can only distribute via the MS store.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: