I think about a third of the reason I get lead positions is because I'm willing to be an 'accountability sink', or the much more colorful description: a sin-eater. You just gotta be careful about what decisions you're willing to own. There's a long list of decisions I won't be held responsible for and that sometimes creates... problems.
Some of that is on me, but a lot is being taken for granted. I'm not a scapegoat I'm a facilitator, and being able to say, "I believe in this idea enough that if it blows up you can tell people to come yell at me instead of at you." unblocks a lot of design and triage meetings.
There are a lot of webcrawlers where the chief feature is turning the website into markdown, I don't quite understand what they are doing for me thats useful since I can just do something like `markdownify(my_html)` or whatever, all this to say is that I wouldn't find this useful, but also clearly people think this is a useful feature as part of an LLM pipeline.
You don't want the footer or navigation in the output. Ideally you want the main content of the page, if it exists. How do you assign header level if they're only differentiated by CSS left-margin in a variety of units? How do you interpret documents that render properly but are hardly correct HTML?
-High impact AI projects - we work on companies' hardest problems to get genAI into production (in the same quarter you might build an AI phone agent for one customer and automate a complex workflow for another customer)
-Culture of a startup but substantive problems to solve that impact millions of users
- Curious, humble, banter-y in-person team comprised of multi-time founders
Why we may not be the best fit:
-Excellence isn't really really important to you (this is less of a 'move fast and break things' role, though we respect that ethos!)
-Predictability is important to you - we work across customers, tech stacks, industries etc
The point is that you don’t need a framework for that; the APIs are already similar enough that it should be obvious how to abstract over them using whatever approach is natural in your programming language of choice.
I have a consumer app that swaps between the 5 bigs and wholeheartedly agree, except, God help you if you're doing Gemini. I somewhat regret hacking it into the same concepts as everyone else.
I should have built stronger separation boundaries with more general abstractions. It works fine, I haven't had any critical bugs / mistakes, but it's really nasty once you get to the actual JSON you'll send.
Google's was 100% designed by a committee of people who had never seen anyone else's API, and if they had, they would have dismissed it via NIH. (disclaimer: ex-Googler, no direct knowledge)
> Google's was 100% designed by a committee of people who had never seen anyone else's API
Google made their API before the others had one, since they were the first with making these kind of language models. Its just that it has been an internal API before.
Google started including LLM features in internal products 2019 at least, I knew since I worked there then. I can't remember exactly when they started having LLM generated snippets and suggestions everywhere but it was there at least since 2019. So they have had internal APIs for this for quite some time.
> All this AI stuff was under lock and key until Nov 2022
That is all wrong... Did you work there? What do you base this on? Google has been experimenting with LLMs internally ever since the original paper, I worked in search then and I remember my senior manager said this was the biggest revolution in natural language processing since ever.
So even if Google added a few concepts from OpenAI, or renamed them, they still have had plenty of experience working with LLM APIs internally and that would make them want different things in their public API as well.
> LLM generated snippets and suggestions everywhere but it was there at least since 2019
Absolutely not. Note that ex. Google's AI answers are not from an LLM and they're very proud of that.
> So they have had internal APIs for this for quite some time.
We did not have internal or external APIs for "chat completions" with chat messages, roles, and JSON schemas until after OpenAI.
> Did you work there?
Yes
> What do you base this on?
The fact it was under lock and key. You had to jump through several layers of approvals to even get access to a standard text-completion GUI, never mind API.
> has been experimenting with LLMs internally ever since the original paper,
What's "the original paper"? Are you calling BERT an LLM? Do you think transformers implied "chat completions"?
> that would make them want different things in their public API as well.
It's a nice theoretical argument.
If you're still convinced Google had a conversational LLM API before OpenAI, or that we need to quibble everything because I might be implying Google didn't invent transformers, there's a much more damning thing:
The API is Gemini-specific and released with Gemini, ~December 2023. There's no reason for it to be so different other than NIH and proto-based thinking. It's not great. That's why ex. we see the other comment where Cloud built out a whole other API and framework that can be used with OpenAI's Python library.
>All this AI stuff was under lock and key until Nov 2022, then it was an emergency.
This is absolutely false, as the other person said.
As one example: We had already built and were using AI based code completion in production by then.
Pretending that was an LLM as it is understood today, and that whatever internal API was available for internal use cases is actually the same as the public API for Gemini today, and that it was the same as an API for adding a "chat completion" to a "conversation" with messages, roles, and JSON schemas is silly.
My understanding is that the original Gmail team actually invented modern LLMs in passing back in 2004, and it’s taken outsiders two decades to catch up because doing so requires setting up the Closure Compiler correctly.
Lol, sounds like you have more experience with other ex/Googlers doing this than I do. I'm honestly surprised, I didn't know there was a whole shell game to be played with "what's an LLM anyway" to justify "whats NIH? our API was designed by experienced experts"
Documentation is a bit sparse but TL;DR - deploy it in a cloudflare worker and now you can access about 15 providers (the one that matter - OpenAI, Cohere, Azure, Bedrock, Gemini, etc) all with the same API without any issues.
Coming back to write something more full-throated: Klu.ai is a rare thing in the LLM space, well-thought out, has the ancillary tools you need, is beautiful, and isn't a giveaway from a BigCo that is a privacy nightmare: ex. Cloudflare has some sort of halfway similar nonsense that, in all seriousness, logs all inputs/outputs.
I haven't tried it out in code, it's too late for me and I'm doing native apps, but I can tell you this is a significant step up in the space.
Even if you don't use multiple LLMs yet, and your integration is working swell right now, you will someday. These will be commodities, valuable commodities, but commodities. It's better to get ahead of it now.
Ex. If you were using GPT-4 2 months ago, you'd be disappointed by GPT-4o, and it'd be an obvious financial and quality decision to at least _try_ Claude 3.5 Sonnet.
It's a weird one. Benchmarks great. Not bad. Pretty damn good. But ex. It's now the only provider I have to worry about for RAG. Prompt says "don't add footnotes, pause at the end silently, and I will provide citations", and GPT-4o does nonsense like saying "I am now pausing silently for citations: markdown formatted divider"
Using Llama Index for this via the `llama_index.core.base.llms.base.BaseLLM` interface. Using config files to describe the args to different models makes swapping models literally as easy as:
LiteLLM seemed to be the best approach for what I needed - simple integration with different models (mainly OpenAI and the various Bedrock models) and the ability to track costs / limit spending. It's working really well so far.
Use a consistent argument structure and make a simple class or function for each provider that translates that to the specific API calls. They are very similar APIs. Maybe select the function call based on the model name.
Just to echo what you are saying, I've read the first chapter and I thought the thesis is interesting and the writing is good but I failed to be convinced becuase it makes a lot of classic mistakes you make in science. Even though logical arguments are being made there is no attempt not to overfit the data.
The author brings out a lot of stats "smart high-schooler", "effective compute", "OOM", "Test Scores", "inference efficiency" but doesn't do a good job of explaining how the author predicted these things before hand (preregistering) and how they actual will result in new technologies or how we can extrapolate past the trend line.
Also in the unhobbling section
"Tools: Imagine if humans weren’t allowed to use calculators or computers. We’re only at the beginning here, but ChatGPT can now use a web browser, run some code, and so on. "
This is so non-specific (because no one has really commercialized anything with this yet) that I worry that we don't actually know if we can make the kinds of effective tools the author is talking about. Would love some feedback on these critisms
also one funny thing is that the author mentions power constraints, but then doesn't calculate how many terraflops for example the us grid can produce etc.
At least in the US the fed has specifically had full employment as one of their goals at least for the last few years so I'm pretty skeptical of "corporations control the fed arguments"
Like I agree there's a lot of regulatory capture in the US but I just don't see any reason to think the fed is responding to big business rather than their own preferences.
There are in fact people who are saying its only corporate inflation.
The real reason i'm skeptical of the greed argument is that its just a vibes based guess on whats happening and not based on any data or insight into the situation.
Dan davies did a great interview on odd lots about this he called it accountability sinks