More

mcabbott · 2025-08-17T23:31:09 1755473469

This doesn't really help with programming, but in physics it's traditional to use up- and down-stairs indices, which makes the distinction you want very clear.

If input x has components xⁿ, and output f(x) components fᵐ, then the Jacobian is ∂ₙfᵐ which has one index upstairs and one downstairs. The derivative has a downstairs index... because x is in the denominator of d/dx, roughly? If x had units seconds, then d/dx has units per second.

Whereas if g(x) is a number, the gradient is ∂ₙg, and the Hessian is ∂ₙ∂ₙ₂g with two downstairs indices. You might call this a (0,2) tensor, while the Jaconian is (1,1). Most of the matrices in ordinary linear algebra are (1,1) tensors.

flufluflufluffy · 2025-08-18T01:13:00 1755479580

We always referred to them as super/sub-scripts. So like xₙ is read “x sub n”

Upstairs/downstairs is kinda cute tho xD

mcabbott · 2025-08-18T04:34:04 1755491644

Covariant and contravariant indices would be the formal terms. I'm not really sure whether I've seen "upstairs" written down.

Sub/superscript... strike me as the typographical terms, not the meaning? Like $x_\mathrm{alice}$ is certainly a subscript, and footnote 2 is a superscript, but neither is an index.

mcabbott · on Aug 30, 2024

That's what Reactant.jl is aiming to do: Take the LLVM code from Julia and pipe to to XLA, where it can benefit from all the investment which make Jax fast.

Same author as the more mature Enzyme.jl, which is AD done at the LLVM level.

Julia's flexibility makes it easy to write a simple AD system, but perhaps we're learning that it's the wrong level to write the program transformations needed for really efficient AD. Perhaps the same is true of "run this on a GPU, or 10" transformations.

mcabbott · on April 27, 2024

This is also how it works in Julia, where macros digest notation for einsum-like operations before compile-time. In fact the linked file's explanatory comment:

     (einsum (A i j) (B i k) (C k j)) 
    results in the the updates
      A[i,j] = \\sum_k B[i,k]C[k,j],
    which is equivalent to matrix multiplication.

very nearly contains the syntax used by all the Julia packages (where @ marks a macro), which is

    @tensor A[i,j] = B[i,k] * C[k,j]

(using https://github.com/Jutho/TensorOperations.jl, but see also OMEinsum, Finch, and my Tullio, TensorCast.)

mcabbott · on Dec 4, 2023

    using Unitful: m; foo(2.0m)

With the above definition, this will give DimensionError: 2.0 m and 1.0 are not dimensionally compatible.

Probably this means you should define `foo(x::Number) = x + oneunit(x)` to respect the Number interface. But this interface isn't very strictly defined. I believe `Base.oneunit` was added after someone started writing the Unitful package -- building something useful in a legal grey area, and formalising later?

sertbdfgbnfgsd · on Dec 4, 2023

Interesting case! Lets discuss.

I don't know/use Unitful, but if that function didn't fail it's because the guys who wrote Unitful defined a promotion rule from Int to their Unit thingy. So.... that's how their type works. Don't use it. Or better, open an issue on github, they might have an explanation that's escaping us. I suspect it has to do with their mental model of what Unitful is supposed to achieve.

Let me tell you that my intuition agrees with you. As an ex-physicist, if I was designing a lib called Unitful I wouldn't let you sum sum 1 meter plus 1 unitless thing.

EDIT:

Actually, I just tried running your code and I do get DimensionError

    foo(2.0m)
    ERROR: DimensionError: 2.0 m and 1.0 are not dimensionally compatible.

I'm guessing you have some seriously outdated versions of something?

EDIT2: Sorry I misread you. You do get an error. Ok I see your point. Maybe the Julia docs should more explicit about what the Number interface entails. Is `+ 1` allowed? You're assuming the answer is obvious, but it's not to me. In particular, that's not generic at all. You're probably right about `+ oneunit(T)`

mcabbott · on April 7, 2023

Yes, it obeys enough algebraic laws that calling it a derivative is useful.

But I don't think there's any underlying notion of small changes to something continuous. It is not the slope of some smooth function.

mcabbott · on April 7, 2023

Or perhaps better "foreigner", legal alien, someone from over the border.

The alien derivative is an operation which moves you from one sector to another. A bit like crossing a branch cut.

mcabbott · on Nov 17, 2022

I presume so.

Note that this pseudo-code is very nearly valid Julia code (with my package):

    @tullio (min) C[i,j] := A[i,k] + B[k,j]

mcabbott · on May 16, 2022

Yes, although that seems like the easy half of this, making sure `struct NewNum <: AbstractFloat` defines everything. There aren't yet tools for this but they are easy to imagine. And missing methods do give errors.

The hard half seems to be correctness of functions which accept quite generic objects. For example writing `f(x::Number)` in order to allow units, means you also allow quaternions, but many functions doing that will incorrectly assume numbers commute. (And not caring is, for 99% of these, the intention. But it's not encoded anywhere.) Less obviously, we can differentiate many things by passing dual numbers through `f(x::Real)`, but this tends to find edge cases nobody thought of. Right now if your algorithm branches on `if det(X) == 0` (or say a check that X is upper triangular) then it will sometimes give wrong answers. This one should be fixed soon, but I am sure there are other subtleties.

mcabbott · on May 5, 2022

The unicode epsilon isn't in the public API, it's describing the 3rd positional argument.

This was added recently, and for some reason the PR (1840) didn't fix the docs, which is bad. The Optimisers.jl version has an explanation: https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp

mcabbott · on May 5, 2022

The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.

The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.

nullstyle · on May 5, 2022

Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.

My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).

It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.

Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.