Hacker Newsnew | past | comments | ask | show | jobs | submit | shenberg's commentslogin

Using existing enterprise apps probably - this solution is scalable for the vendor and it's easier to sell using existing software as-is than to start out by writing new custom tools.

Yes, I can see the usecase for legacy desktop apps etc, but the web? it's a DOM. WebMCP coming now too, no need for screenshotting or DOM querying then either..

After doing few experiments, I think that having Agents work on browser for all tasks wouldn't be best due to many factors like token cost, safety, etc. But browser/computer can be a tool that the agent can be alongside MCPs to complete tasks that requires interaction with such modalities.

Mid-way I realized this was AI writing (took me a while), then I read a quote in the text about a comment that "The tragedy isn’t that they cheated; it’s that the system was designed to let them thrive for a decade before anyone bothered to look at the data." I didn't find this comment in EJMR, or anywhere on the internet except the OP post, for that matter.

Moshi was an amazing tech demo, building the entire stack from scratch in 6 months with a small team was an amazing show of skill: 7B text LLM data + training, emotive TTS for synth data generation (again model + data collection), synth data pipeline, novel speech codec, rust inference stack for low latency, audio LLM architecture incl. text "thoughts" stream which was novel.

But, this piece is a fluff piece: "underfunded" means a total of around $400 million ($330 million in the initial round, $70 million for Gradium). Compare to Elevenlabs who used a $2 million pre-seed for creating their initial product.

A bunch of other stuff there is disingenuous, like comparing their 7B model to Llama-3 405B (hint: the 7B model is a _lot_ dumber). There's also the outright lie: team of 4 made Moshi, which is corrected _in the same piece_ to 8 if you read enough.


Stopped reading there: "This model (Moshi) could [...] recite an original poem in a French accent (research shows poems sound better this way)."

Location: Paris, France (US citizen, EU resident)

Remote: Yes

Willing to relocate: Not until 2027

Technologies:

ML / DS: PyTorch, CUDA, distributed training & inference, performance profiling/optimization (audio & speech focus, some LLM inference acceleration)

Systems: C/C++/asm, low-level performance work, reliability/scale engineering

Backend / Infra: Python/Java/C# prod services, ETL / data pipelines, k8s (incl. operator work)

Roles: Tech Lead, Research Engineer (training and/or inference of large models)

Résumé/CV: https://www.linkedin.com/in/roeeshenberg/

Email: [email protected]

I’m a hands-on engineer who’s spent the last 6 years doing freelance ML + data science, primarily in audio/speech, and before that 10+ years in startups building and scaling production systems.

I’m looking for where research meets real systems: training and/or inference for large models, especially roles that value end-to-end ownership. Open to freelance engagements or full-time roles.


There are two ingredients that don't fit in the "attention-is-kernel-smoothing" as far as I can tell: positional encoding and causal masking (another way to say positional encoding, I guess)

Also, Simplical attention is pretty much what the OP was going for, but the hardware lottery is such that it's gonna be pretty difficult to get competitive in terms of engineering, not that people aren't trying (e.g. https://arxiv.org/pdf/2507.02754)


I don't understand how using group-theory language to describe number-theoretic properties provides extra insight in this case (e.g. conjecture: all perfect numbers are even is more concise than the group-theoretic description given in the page). Can you expand on why you believe the tools of group theory have something to say about this? (e.g. for polynomial roots, the connection with symmetry groups comes from symmetries of factorized polynomials, while there's no obvious-to-me connection here as there is no unique-up-to-symmetry integer factorization)


I just found it interesting that certain problems in number theory could be rephrased as problems about cyclic groups. Maybe it could potentially make some easier to solve.


ssh exe.dev works


The short and unsatisfying answer is that an LLM generation is a markov chain, except that instead of counting n-grams in order to generate the posterior distribution, the training process compresses the statistics into the LLM's weights.

There was an interesting paper a while back which investigated using unbounded n-gram models as a complement to LLMs: https://arxiv.org/pdf/2401.17377 (I found the implementation to be clever and I'm somewhat surprised it received so little follow-up work)


When countries like North Korea, which depends on cybercrime to fund itself, are signatories, you have to wonder whether this agreement means what its title says.


They have also had the longest on going embargo on earth right after they were nearly wiped out by a genocidal war on behalf of the US.

I don't doubt their history explains the shape of their economy.

This may seem like I am defending North Korea, but in reality I am putting in perspective who/why they are. Facts which nearly amount propaganda to western nations.


I don't think it's right to blame ordinary North Koreans for the state of their country like that. Clearly it has more to do with the paranoid authoritarianism of 1 guy, rather than the collective war trauma of the people. Just look at South Korea, the other party of that "genocidal war". They moved on a long time ago, because their national politics allowed them to.


Not really. Squid games is loosely based on for-profit concentration camps run by US backed dictators in South Korea. The true story is actually worse than the fictional show.

All US colonies have had terrible dictatorships, the news of which purposefully does not reach the western core. Saying "they've moved on" is ignoring what actually happened, and I'm not bringing this up in a "lets focus on the past" sort of way, but on how it actually got there.

Is a murderous dictatorship that forcefully bashes the economy into the shape it wants the same as "getting over it"?

And blaming how North Korea is on one guy is a cop out. And, like I said, I'm not even defending the guy, but you can't simplify history too much either, else it starts to look like cartoons. There are 4 other guys who lead North Korea, Kim Jong Un just has a family legacy involved in its founding and is the general secretary of state, the real President, coincidentally, died yesterday, Nov 3th.


The old “think of the children/fight terrorism/support our troops/be a good person” style of naming propositions to destroy data privacy.


Not just data privacy, this is intended to destroy free speech.


The reality of meetings in most places I've seen is that key stakeholders have already formed an opinion beforehand, the meeting is a place to disseminate decisions that have already been made and align the organization.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: