I hope Julia succeeds and replaces the clunky R and numpy+python "ecosystems". Every few months I try to decide to do all my computing on Julia and quit fooling around with lesser environments. And every time I get stuck at the same point: the slow startup time. I want to draw 100 plots per second, by calling the same Julia script on a bash loop 100 times; but this is utterly impossible. Of course, the Julia community has a standard answer to this concern: this is not how you are supposed to use the language. But I don't listen to them, for the best tools are those that perform well doing tasks they were not designed to do. I feel the same dismay when I get stuck at slow loops in Python, and the Python people tell me that I'm not supposed to use loops. Well, this is the main reason that I want to move away from Python and into Julia.
I'm not interested in the implementation details of the language interpreter. The language is already very good, and the libraries are excellent. My main point of friction against using Julia is the slow startup time that forbids its use in a wide variety of contexts. I feel like the best usage of resources for the Julia community would be to spend all available money in hiring a Mike Pall-esque figure to advise them on JIT. Even if it was a part-time or a one-shot hire.
This just seems like such an artificially constructed problem. You just move that loop into Julia and the problem is solved. It is not hard to write shell code in Julia.
I have rewritten complex build scripts in bash into Julia code. It was not very hard and it made everything run way faster despite mostly shelling out to external programs.
I don't know how secret/protected this shell script of yours is, but I'd be willing to have a go at rewriting the loop part in Julia if you showed it to me.
Anyway I do think the startup speed problem can be solved in Julia and they don't need a rockstar advisor to do it. I think the solutions are already pretty well known. It is more of an issue of manpower. Somebody has to put in the hours to do it.
And that is already being done. First time to plot will be twice as fast as it used to be in the next Julia release. Further improvements can be done, but again that requires somebody to put in the hours. Julia does not have access to Google level resources.
> the startup speed problem can be solved in Julia and they don't need a rockstar advisor to do it. I think the solutions are already pretty well known.
> And that is already being done. First time to plot will be twice as fast as it used to be in the next Julia release. Further improvements can be done
I'm really, really happy to hear that! I hope the Julia runtime becomes more and more streamlined in the (near) future.
See, I was just stating my use case, without pretense that it is representative at all. Yet I received 20 upvotes in a few minutes, and Julia is maybe the only language where "time to first plot" is a thing. So I'm not completely alone in my (admittedly minoritary) concern.
You say that rewriting everything in Julia would solve my problem. I'm sure that this is the case, but this is not at all my point. Some of us do not want a shell replacement, we want a bc replacement, and julia is a nearly perfect one, if it wasn't for the outrageously slow startup time. I have zero interest in the julia REPL, I just write julia scripts (among scripts in other languages) and I'm not willing to change that.
Julia is a compiled language, like C. Using it in this way is basically like writing a shell script in C and calling GCC on every invocation. You can't expect good performance from that, and it's really a testament to Julia that it feels so dynamic that you feel as it it should do that.
If you AOT-compiled your Julia "script" (program) to a binary and invoked that, your startup time problems would go away. Julia's "application deployment" stack is underdeveloped compared to its REPL experience, but it's still possible to do this today with PackageCompiler.jl and will only get easier with time. I think that will prove to be the "right" way to solve this problem in the long run.
(Or, if you don't care about performance, you can just turn off the compiler and interpret everything, and you basically have Python-but-in-Julia. Fast startup, slow running. Just run it with julia --compile=no)
> writing a shell script in C and calling GCC on every invocation. You can't expect good performance from that
Of course you can. Have you ever used the "-run" option of the TCC compiler? It's blazingly fast. With gcc it's a bit slower, but still orders of magnitude faster than julia. You can use pre-compiled libraries and the linking to your freshly compiled code is extremely fast. The fact is that compiling and linking C code is much faster that just launching the Julia environment with some packages. There's no fundamental reason for that enormous disparity in running time. I agree that it is a completely irrelevant nuisance for most people; but still, for some workflows not blessed by the Julia developers, it is the main point of friction.
EDIT: if you want to try it yourself, write the following text into a .c file, chmod +x it, and you can run it like a C script on most unix systems:
//usr/bin/gcc -O0 "$0" -lpng && exec ./a.out "$@"
#include <png.h>
int main(int c, char *v[])
{
// do stuff with png images
return 0;
}
This seems too obvious to even comment, but timing the compilation of a no-op program doesn’t show much. The meaningful comparison would be compiling a C program that does the same thing as some Julia code with `gcc -O2`. Btw, you can also run Julia in `-O0` or even better `-O1` mode — Julia even uses these same flags at the command line. These low optimization modes are extremely snappy — time to first plot is no issue. Of course, if you want to run some compute intensive code, it’s much slower, which is why `-O1` isn’t the default.
This is not to dismiss the TTFP issue, just pointing out that your argument seems to be that gcc is faster than Julia, which is definitely not the case. Indeed, gcc is about the same speed as clang, which like Julia, uses LLVM. The way Julia uses LLVM is a bit different, but something would be very wrong if took Julia much longer to compile code with the same functionality as it takes gcc or clang. Julia spreads the compilation out over time, but when you do something complex, a lot of compilation happens all at once. However, a static compilation would not do this work any faster, static compilers just do the work in a separate phase rather than interleaved with execution.
Point taken, but I think you overreach a bit with:
>There's no fundamental reason for that enormous disparity in running time
I mean... Julia does full type inference, which it uses to present a dynamic type interface. It's not necessarily possible to statically compile a module ahead of time, because the module is designed to be generic and will generate different code if fed different types, which is how Julia attains its extraordinary composability. In other words, it's a much nicer language than C, and correspondingly much harder to compile. I'd call that a pretty fundamental reason.
Perhaps C was a poor example for me to pick. C++, maybe?
p.s. your "script" example clobbers any file named "a.out" in the current directory.
> In other words, it's a much nicer language than C, and correspondingly much harder to compile.
Sounds like an anti-feature to me. What I want is "matlab with fast loops". I couldn't care less about "extraordinary composability", "full type inference" or "genericity". Well, I do care because these unneeded features make my stuff much slower. It's a hefty price to pay for uncalled-for features!
Yes, my script was just a silly joke, to show that compiling and linking C code is really fast.
yep, Octave is what I currently use when I can choose. I loved the concept of Julia as an "octave with fast loops". But it seems that there are some compromises with the Julia interpreter that go against my intersts. Maybe thee's still space for a modern language for numerical computation whose efficiency is not encumbered by the need to support strings, dictionaries, multiple dispatch and the like?
> Maybe thee's still space for a modern language for numerical computation whose efficiency is not encumbered by the need to support strings, dictionaries, multiple dispatch and the like?
I don't think so: I don't know of any scientific code that doesn't have to interact with its environment, if only to import/export data, and for that, strings at the very least are necessary.
Multiple dispatch is the same thing for me; either it or another form of polymorphism will be very quickly asked for by the users of a scientific language, as no one wants to write dozens of time the same functions for different types, and the HPC community loves its programs to go fast, so they need/want to be able to chose their types.
Often the reason I have a bash script is because I'm running some AI tool to solve lots of problems in parallel, gather/filter results, and then gathering results, then finally doing plotting and things.
Personally, I never really like solutions which are "you can't do 5% of stuff in X, you have to do everything in X". I like trying new things out, but I'm not willing to 100% invest everything in Julia up front, rather than just trying out some small bits.
Also, I think my collaborators would get annoyed if I rewrote all our bash scripts in Julia -- I can't expect them all to learn Julia.
A tool with slow startup time that forces me to structure the workflow around its idiosyncrasies is less convenient than a tool with fast startup time that can be seamlessly integrated into pre-existing workflows (eg driving pipelines via Makefiles).
Start up time for the interpreter is 0.13s for me: ~ time julia -E "1+1"
What takes time is precompilation of packages and functions - with Julia 1.6 the precompilation is much faster now than before.
Your bash script that calls Julia 100 times is indeed not something that Julia was made for. It excels in many other areas and that's quite fine. I'm okay with plotting in a bash script in Python if it means that I can use Julia for everything else.
Your last sentence seems to ignore the reality that plotting is only one step in a larger pipeline. It sounds miserable to need to write analytics code twice, once in Julia “for everything else” and again in Python just for the plotting. I’ll just write the whole thing in Python and save myself the headache.
What I meant with this is that Julia isn't intended as a bash replacement. You can write your code in Julia and circumvent the overhead of having to start up the interpreter every time. But if you try to execute it 100 times per second then of course the overhead will add up.
Ah, I understand what you mean now. And you may be right that there’s a Better Way to do it natively in Julia. But, there’s lots of friction to adopting entirely new dev practices, and I’m inclined to just stick w what tried and true methods I’m already familiar with—old habits die hard! And that’s a big friction against Julia adoption (IMO).
There's nothing wrong with using the tools you know. But IMO it's quite interesting to use languages that might just be a big improvement over how things have been done so far. I think Julia is such a language when compared to Python (excluding the ecosystem, of course).
Also, if you come back to Julia sometime there's this:
My main use case for that would be: I'm generic user. I just want to run and use a Julia script for some output, by adding 'julia somescript.jl'. I don't want to modify it because I don't know the language.
Are you fitting, evaluating, and plotting complex models 100 times per second?
I would be quite okay with logging the results into a file and only plot it with python if this was the cost for using Julia, yes. This might not be for everybody but your scenario sounds strange for me to begin with.
No, but I am applying a serialized fitted model to 100 separate out-of-sample datasets and generating diagnostic plots for every output of predictions / scorings.
While moving the loop into Julia (as others suggested) is probably the better option, an alternative you could consider is DaemonMode: https://github.com/dmolina/DaemonMode.jl
I.e., have a background Julia process so that you only have to pay the precompile cost once.
If you are going to use Julia in one of the absolute worst workflows for how it is designed, you shouldn’t be surprised it doesn’t work well... That said, have you tried using PackageCompiler to add your needed libraries to the system image? This seems to show a factor of 100 speed up for the time to first plot: https://julialang.github.io/PackageCompiler.jl/dev/examples/...
Why do you find R’s ecosystem “clunky”? The Tidyverse is unequaled for its elegance. I come from the CS world, so I’m supposed to like languages like Python, but I really, really like R, mainly for its elegance.
The Tidyverse is great, but vanilla R is a monstrosity. After five years of heavy use, I still don't really understand the random idiosyncracies of its various types. Arrays and lists and dataframes and tibbles are confusingly named, and operations that work on one type often balk at the others, without telling you what's wrong. I have lost many many nerves with it.
I agree that for statistics and data exploration R is certainly not clunky. My use case is more discrete PDE, where the R capabilities for sparse matrices and advanced linear algebra are a bit limited (but this may be just because I'm more used to the annoyances of numpy).
If this is your big setback did you try using a sysimage? The VSCode extension even has a build task for it. To make the sysimage just be in the base environment and then Ctrl+shift+B and select the Juild build sysimage task. The terminal will tell you where its saving the sysimage. It reduced my startup time to unnoticeable (at least to a person used to MATLAB/Python). I am not a bash guru so I don't know how you do it on command line but its a parameter to julia interpreter.
One thing to be aware of is that you're not running a bash script which simply causes julia to "include" stuff, effectively recompiling everything each time the intepreter is run.
As long as you make sure that all the custom code you want to run is in the form of a precompiled module, I think the time required for the interpreter to launch per se shouldn't be that much of a problem.
Yes, but if the road to cached builds or running in interpreted mode is not low friction or no friction, that matters and it's the fault of the language / ecosystem.
> I get stuck at the same point: the slow startup time…
The language maintainers steadfastly refuse to include ahead-of-time compilation. They seem super focused on their narrow use case scenario and ignore everything outside that.
It is not nice to lie about people like that. They have never refused to do that and in fact you can already do ahead-of-time compilation in Julia. Many of us have already done it.
It is not great yet, but it is an ongoing problem, which they constantly work on and improve.
Claiming they refuse to do it is either ignorant or a flat out lie.
> It is not great yet, but it is an ongoing problem, which they constantly work on and improve.
That's just it. This effort has been dragging on for years now, slow as molasses in winter. If it were a regular goal it would have been a solved problem long ago. That hacks and workarounds for this have been deemed acceptable for so long just goes to show that it's not on the list.
I'm sorry if I'm blunt, but the last year of compiler improvements have been nothing but targeted for this exact purpose? That's half of all spent time since 1.0! The issue tracker is filled to the brim with PRs and issues about making _everything_ faster. How does this constitute "refusal of a obvious step" for you?
I'm not interested in the implementation details of the language interpreter. The language is already very good, and the libraries are excellent. My main point of friction against using Julia is the slow startup time that forbids its use in a wide variety of contexts. I feel like the best usage of resources for the Julia community would be to spend all available money in hiring a Mike Pall-esque figure to advise them on JIT. Even if it was a part-time or a one-shot hire.