This doesn't really help with programming, but in physics it's traditional to use up- and down-stairs indices, which makes the distinction you want very clear.
If input x has components xⁿ, and output f(x) components fᵐ, then the Jacobian is ∂ₙfᵐ which has one index upstairs and one downstairs. The derivative has a downstairs index... because x is in the denominator of d/dx, roughly? If x had units seconds, then d/dx has units per second.
Whereas if g(x) is a number, the gradient is ∂ₙg, and the Hessian is ∂ₙ∂ₙ₂g with two downstairs indices. You might call this a (0,2) tensor, while the Jaconian is (1,1). Most of the matrices in ordinary linear algebra are (1,1) tensors.
Covariant and contravariant indices would be the formal terms. I'm not really sure whether I've seen "upstairs" written down.
Sub/superscript... strike me as the typographical terms, not the meaning? Like $x_\mathrm{alice}$ is certainly a subscript, and footnote 2 is a superscript, but neither is an index.
That's what Reactant.jl is aiming to do: Take the LLVM code from Julia and pipe to to XLA, where it can benefit from all the investment which make Jax fast.
Same author as the more mature Enzyme.jl, which is AD done at the LLVM level.
Julia's flexibility makes it easy to write a simple AD system, but perhaps we're learning that it's the wrong level to write the program transformations needed for really efficient AD. Perhaps the same is true of "run this on a GPU, or 10" transformations.
This is also how it works in Julia, where macros digest notation for einsum-like operations before compile-time. In fact the linked file's explanatory comment:
(einsum (A i j) (B i k) (C k j))
results in the the updates
A[i,j] = \\sum_k B[i,k]C[k,j],
which is equivalent to matrix multiplication.
very nearly contains the syntax used by all the Julia packages (where @ marks a macro), which is
With the above definition, this will give DimensionError: 2.0 m and 1.0 are not dimensionally compatible.
Probably this means you should define `foo(x::Number) = x + oneunit(x)` to respect the Number interface. But this interface isn't very strictly defined. I believe `Base.oneunit` was added after someone started writing the Unitful package -- building something useful in a legal grey area, and formalising later?
I don't know/use Unitful, but if that function didn't fail it's because the guys who wrote Unitful defined a promotion rule from Int to their Unit thingy. So.... that's how their type works. Don't use it. Or better, open an issue on github, they might have an explanation that's escaping us. I suspect it has to do with their mental model of what Unitful is supposed to achieve.
Let me tell you that my intuition agrees with you. As an ex-physicist, if I was designing a lib called Unitful I wouldn't let you sum sum 1 meter plus 1 unitless thing.
EDIT:
Actually, I just tried running your code and I do get DimensionError
foo(2.0m)
ERROR: DimensionError: 2.0 m and 1.0 are not dimensionally compatible.
I'm guessing you have some seriously outdated versions of something?
EDIT2: Sorry I misread you. You do get an error. Ok I see your point. Maybe the Julia docs should more explicit about what the Number interface entails. Is `+ 1` allowed? You're assuming the answer is obvious, but it's not to me. In particular, that's not generic at all. You're probably right about `+ oneunit(T)`
Yes, although that seems like the easy half of this, making sure `struct NewNum <: AbstractFloat` defines everything. There aren't yet tools for this but they are easy to imagine. And missing methods do give errors.
The hard half seems to be correctness of functions which accept quite generic objects. For example writing `f(x::Number)` in order to allow units, means you also allow quaternions, but many functions doing that will incorrectly assume numbers commute. (And not caring is, for 99% of these, the intention. But it's not encoded anywhere.) Less obviously, we can differentiate many things by passing dual numbers through `f(x::Real)`, but this tends to find edge cases nobody thought of. Right now if your algorithm branches on `if det(X) == 0` (or say a check that X is upper triangular) then it will sometimes give wrong answers. This one should be fixed soon, but I am sure there are other subtleties.
The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.
The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.
Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.
My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).
It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.
Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.
If input x has components xⁿ, and output f(x) components fᵐ, then the Jacobian is ∂ₙfᵐ which has one index upstairs and one downstairs. The derivative has a downstairs index... because x is in the denominator of d/dx, roughly? If x had units seconds, then d/dx has units per second.
Whereas if g(x) is a number, the gradient is ∂ₙg, and the Hessian is ∂ₙ∂ₙ₂g with two downstairs indices. You might call this a (0,2) tensor, while the Jaconian is (1,1). Most of the matrices in ordinary linear algebra are (1,1) tensors.