I’m not a mathematician (just a programmer), but reading this made me wonder—doe...

Sniffnoy · 2025-05-05T18:48:26 1746470906

> That’s so unintuitive…

It's pretty simple, actually. Imagine you have a knot you want to untie. Lay it out in a knot diagram, so that there are just finitely many crossings. If you could pass the string through itself at any crossing, flipping which strand is over and which is under, it would be easy, wouldn't it? It's only knotted because those over/unders are in an unfavorable configuration. Well, with a 4th spatial dimension available, you can't pass the string through itself, but you can still invert any crossing by using the extra dimension to move one strand around the other, in a way that wouldn't be possible in just 3 dimensions.

> Or am I just seeing patterns where there aren’t any?

Pretty sure it's the latter.

stouset · 2025-05-05T20:23:52 1746476632

That makes sense for a 2D rope in 4D space, but I’m not convinced the same approach holds for a 3D ”hyperrope” in 4D space.

nimih · 2025-05-06T03:50:46 1746503446

Your intuition is correct, it doesn't! A "3D hyperrope" is in fact just the surface of a ball[1], and it turns out that you can actually form non-trivial knots of that spherical surface in a 4-dimensional ambient space (and analogously they can be un-knotted if you then move up to 5-dimension ambient space, although the mechanics for doing so might be a little trickier than in the 1d-in-4d case). In fact, if you have a k-dimensional sphere, you can always knot it up in a k+2 dimensional ambient space (and can then always be unknotted if you add enough additional dimensions).

[1] note that a [loop of] rope is actually a 1-dimensional object (it only has length, no width), so the next dimension up should be a 2-dimensional object, which is true of the surface of a ball. a topologist would call these things a 1-sphere and a 2-sphere, respectively

trollbridge · 2025-05-06T16:30:12 1746549012

Any time I am tempted to feel smart, I try to go and study some linear algebra and walk away humbled. I will be spending 20-30 minutes probably trying to understand what you said (and I think you typed it out quite reasonably), but first I have to figure out how... a 3D hyperrope is the same as a surface of a ball...

Sniffnoy · 2025-05-05T21:51:52 1746481912

I'm not sure what you mean here. This is discussing a 1-dimensional structure embeded in 4-dimensional space. If you're not sure it works for something else, well, that isn't what's under discussion.

If you just mean you're just unclear on the first step, of laying the knot out in 2D with crossings marked over/under, that's always possible after just some ordinary 3D adjustments. Although, yeah, if you asked me to prove it, I dunno that I could give one, I'm not a topologist... (and I guess now that I think about it the "finitely many" crossings part is actually wrong if we're allowing wild knots, but that's not really the issue)

seanhunter · 2025-05-06T05:28:03 1746509283

There is a real connection insofar as the internal space of an LLM is a vector space so things which hold for vector spaces hold for the internal space of an LLM. This is the power of abstract algebra. When an algebraic structure can be identified you all of a sudden know a ton of things about it because mathematicians have been working to understand those structures for a while.

The internal space of an LLM would also have things in common with how, say currents flow in a body of water because that too is a vector space. When you study this stuff you get this sort of zen sense of everything getting connected to everything else. Eg in one of my textbooks you look at how pollution spreads through the great lakes and then literally the next example looks at how drugs are absorbed into the bloodstream through the stomach and it’s exactly the same dynamic matrix and set of differential equations. Your stomach works the same as the great lakes on a really fundamental level.

The spaces being described here are a little more general than vector spaces, so some of things which are true about vector spaces wouldn’t necessarily work the same way here.

sorokod · 2025-05-06T12:39:47 1746535187

> The spaces being described here are a little more general than vector spaces

You probably mean considerably more special than a general vector space. We do have differentiable manifolds here.

amelius · 2025-05-06T08:56:57 1746521817

If you're holding a hammer, everything looks like a nail ...

janalsncm · 2025-05-06T01:41:35 1746495695

I would be careful about drawing any analogies which are “too cute”. We use LLMs because they work, not because they are are theoretically optimal. They are full of lossy tradeoffs that work in practice because they are a good match for the hardware and data we have.

What is true is that you can get good results by projecting lower dimensional data into higher dimensions, applying operations, and then projecting it back down.

amelius · 2025-05-05T18:14:51 1746468891

> "And dimension 3 is the only one that can contain knots — in any higher dimension, you can untangle a knot even while holding its ends fast."

Maybe you could create "hyperknots", e.g. in 4D a knot made of a surface instead of a string? Not sure what "holding one end" would mean though.

Sniffnoy · 2025-05-05T18:50:24 1746471024

Yes, circles don't knot in 4D, but the 2-sphere does: https://en.wikipedia.org/wiki/Knot_theory#Higher_dimensions

Warning: If you get too deep into this, you're going to find yourself dealing with a lot of technicalities like "are we talking about smooth knots, tame knots, topological knots, or PL knots?" But the above statement I think is true regardless!

zmgsabst · 2025-05-05T22:57:44 1746485864

Yep — you can always “knot” a sphere of two dimensions lower, starting with a circle in 3D and a sphere in 4D.

lamename · 2025-05-05T19:45:07 1746474307

It's not just LLMs. Deep learning in general forms these multi-d latent spaces

nandomrumber · 2025-05-05T18:43:41 1746470621

When you untie a knot, it’s ends are fixed in time.

Humans also unravel language meaning from within a hyper dimensional manifold.

AIPedant · 2025-05-05T21:40:59 1746481259

I don't think this is true, I believe humans unravel language meaning in the plain old 3+1 dimensional Galilean manifold of events in nonrelativistic spacetime, just as animals do with vocalizations and body language, and LLM confabulations / reasoning errors are fundamentally due to their inability to access this level of meaning. (Likewise with video generators not understanding object permanence.)

robocat · 2025-05-05T19:23:02 1746472982

> Or am I just seeing patterns where there aren’t any?

Meta: there are patterns to seeing patterns, and it's good to understand where your doubt springs from.

1: hallucinating connections/metaphors can be a sign you're spending too much time within a topic. The classic is binging on a game for days, and then resurfacing back into a warped reality where everything you see related back to the game. Hallucinations is the wrong word sorry: because sometimes the metaphors are deeply insightful and valuable: e.g. new inventions or unintuitive cross-discipline solutions to unsolved maths problems. Watch when others see connections to their pet topics: eventually you'll learn to internally dicern your valuable insights from your more fanciful ones. One can always consider whether a temporary change to another topic would be healthy? However sometimes diving deeper helps. How to choose??

2: there's a narrow path between valuable insight and debilitating overmatching. Mania and conspirational paranioa find amazing patterns, however they tend to be rather unhelpful overall. Seek a good balance.

3: cultivate the joy within yourself and others; arts and poetry is fun. Finding crazy connections is worthwhile and often a basis for humour. Engineering is inventive and being a judgy killjoy is unhealthy for everyone.

Hmmm, I usually avoid philosophical stuff like that. Abstract stuff is too difficult to write down well.

hinkley · 2025-05-05T20:51:40 1746478300

A lot of innovation is stealing ideas from two domains that often don’t talk to each other and combining them. That’s how we get simultaneous invention. Two talented individuals both realize that a new fact, when combined with existing facts, implies the existence of more facts.

Someone once asserted that all learning is compression, and I’m pretty sure that’s how polymaths work. Maybe the first couple of domains they learn occupy considerable space in their heads, but then patterns emerge, and this school has elements from these other three, with important differences. X is like Y except for Z. Shortcut is too strong a word, but recycling perhaps.

robocat · 2025-05-06T00:30:38 1746491438

I'm unsure if I misunderstand you or your writing ingroup!

> learning is compression

I don't think I know enough about compression to find that metaphor useful

> occupy considerable space in their heads

I reckon this is a terribly misleading cliche. Our brains don't work like hard drives. From what I see we can keep stuffing more in there (compression?). Much of my past learning is now blurred but sometimes it surfaces in intuitions? Perhaps attention or interest is a better concept to use?

My favorite thing about LLMs is wondering how much of people's (or my own) conversations are just LLMs. I love the idea of playing games with people to see if I can predictably trigger phrases from people, but unfortunately I would feel like a heel doing that (so I don't). And catching myself doing an LLM reply is wonderful.

Some of the other sibling replies are also gorgeously vague-as (and I'm teasing myself with vagueness too). Abstracts are so soft.

drdeca · 2025-05-06T02:44:36 1746499476

If you have some probability distribution over finite sequences of bits, a stream of independent samples drawn from that stream can be compressed so that the number of bits in the compressed stream per sample from the original stream, is (in the long run) the (base 2) entropy of the distribution. Likewise if instead of independent samples from a distribution there is instead a Markov process or something like that, with some fixed average rate of entropy.

The closer one can get to this ideal, the closer one has to a complete description of the distribution.

I think this is the sort of thing they were getting at with the compression comment.

bee_rider · 2025-05-05T19:35:22 1746473722

I think LLM layers are basically big matrices, which are one of the most popular many-dimensional objects that us non-mathematician mortals get to play with.