I hope Julia succeeds and replaces the clunky R and numpy+python "ecosystems". E...

socialdemocrat · on March 8, 2021

This just seems like such an artificially constructed problem. You just move that loop into Julia and the problem is solved. It is not hard to write shell code in Julia.

I have rewritten complex build scripts in bash into Julia code. It was not very hard and it made everything run way faster despite mostly shelling out to external programs.

I don't know how secret/protected this shell script of yours is, but I'd be willing to have a go at rewriting the loop part in Julia if you showed it to me.

Anyway I do think the startup speed problem can be solved in Julia and they don't need a rockstar advisor to do it. I think the solutions are already pretty well known. It is more of an issue of manpower. Somebody has to put in the hours to do it.

And that is already being done. First time to plot will be twice as fast as it used to be in the next Julia release. Further improvements can be done, but again that requires somebody to put in the hours. Julia does not have access to Google level resources.

enriquto · on March 8, 2021

> the startup speed problem can be solved in Julia and they don't need a rockstar advisor to do it. I think the solutions are already pretty well known.

> And that is already being done. First time to plot will be twice as fast as it used to be in the next Julia release. Further improvements can be done

I'm really, really happy to hear that! I hope the Julia runtime becomes more and more streamlined in the (near) future.

See, I was just stating my use case, without pretense that it is representative at all. Yet I received 20 upvotes in a few minutes, and Julia is maybe the only language where "time to first plot" is a thing. So I'm not completely alone in my (admittedly minoritary) concern.

You say that rewriting everything in Julia would solve my problem. I'm sure that this is the case, but this is not at all my point. Some of us do not want a shell replacement, we want a bc replacement, and julia is a nearly perfect one, if it wasn't for the outrageously slow startup time. I have zero interest in the julia REPL, I just write julia scripts (among scripts in other languages) and I'm not willing to change that.

dTal · on March 8, 2021

Julia is a compiled language, like C. Using it in this way is basically like writing a shell script in C and calling GCC on every invocation. You can't expect good performance from that, and it's really a testament to Julia that it feels so dynamic that you feel as it it should do that.

If you AOT-compiled your Julia "script" (program) to a binary and invoked that, your startup time problems would go away. Julia's "application deployment" stack is underdeveloped compared to its REPL experience, but it's still possible to do this today with PackageCompiler.jl and will only get easier with time. I think that will prove to be the "right" way to solve this problem in the long run.

(Or, if you don't care about performance, you can just turn off the compiler and interpret everything, and you basically have Python-but-in-Julia. Fast startup, slow running. Just run it with julia --compile=no)

enriquto · on March 8, 2021

> writing a shell script in C and calling GCC on every invocation. You can't expect good performance from that

Of course you can. Have you ever used the "-run" option of the TCC compiler? It's blazingly fast. With gcc it's a bit slower, but still orders of magnitude faster than julia. You can use pre-compiled libraries and the linking to your freshly compiled code is extremely fast. The fact is that compiling and linking C code is much faster that just launching the Julia environment with some packages. There's no fundamental reason for that enormous disparity in running time. I agree that it is a completely irrelevant nuisance for most people; but still, for some workflows not blessed by the Julia developers, it is the main point of friction.

EDIT: if you want to try it yourself, write the following text into a .c file, chmod +x it, and you can run it like a C script on most unix systems:

    //usr/bin/gcc -O0 "$0" -lpng && exec ./a.out "$@"
    #include <png.h>
    int main(int c, char *v[])
    {
            // do stuff with png images
            return 0;
    }

StefanKarpinski · on March 8, 2021

This seems too obvious to even comment, but timing the compilation of a no-op program doesn’t show much. The meaningful comparison would be compiling a C program that does the same thing as some Julia code with `gcc -O2`. Btw, you can also run Julia in `-O0` or even better `-O1` mode — Julia even uses these same flags at the command line. These low optimization modes are extremely snappy — time to first plot is no issue. Of course, if you want to run some compute intensive code, it’s much slower, which is why `-O1` isn’t the default.

This is not to dismiss the TTFP issue, just pointing out that your argument seems to be that gcc is faster than Julia, which is definitely not the case. Indeed, gcc is about the same speed as clang, which like Julia, uses LLVM. The way Julia uses LLVM is a bit different, but something would be very wrong if took Julia much longer to compile code with the same functionality as it takes gcc or clang. Julia spreads the compilation out over time, but when you do something complex, a lot of compilation happens all at once. However, a static compilation would not do this work any faster, static compilers just do the work in a separate phase rather than interleaved with execution.

dTal · on March 8, 2021

Point taken, but I think you overreach a bit with:

>There's no fundamental reason for that enormous disparity in running time

I mean... Julia does full type inference, which it uses to present a dynamic type interface. It's not necessarily possible to statically compile a module ahead of time, because the module is designed to be generic and will generate different code if fed different types, which is how Julia attains its extraordinary composability. In other words, it's a much nicer language than C, and correspondingly much harder to compile. I'd call that a pretty fundamental reason.

Perhaps C was a poor example for me to pick. C++, maybe?

p.s. your "script" example clobbers any file named "a.out" in the current directory.

enriquto · on March 8, 2021

> In other words, it's a much nicer language than C, and correspondingly much harder to compile.

Sounds like an anti-feature to me. What I want is "matlab with fast loops". I couldn't care less about "extraordinary composability", "full type inference" or "genericity". Well, I do care because these unneeded features make my stuff much slower. It's a hefty price to pay for uncalled-for features!

Yes, my script was just a silly joke, to show that compiling and linking C code is really fast.

dTal · on March 9, 2021

Do you want a dynamic language with the speed of C? Then you want type inference.

enriquto · on March 9, 2021

This is not necessary. There could be a single type, for example (e.g., the multidimensional array of floats).

shakow · on March 9, 2021

And how do you multiple dispatch on that?

But if you just want a free MATLAB, use Octave.

enriquto · on March 9, 2021

yep, Octave is what I currently use when I can choose. I loved the concept of Julia as an "octave with fast loops". But it seems that there are some compromises with the Julia interpreter that go against my intersts. Maybe thee's still space for a modern language for numerical computation whose efficiency is not encumbered by the need to support strings, dictionaries, multiple dispatch and the like?

shakow · on March 10, 2021

> Maybe thee's still space for a modern language for numerical computation whose efficiency is not encumbered by the need to support strings, dictionaries, multiple dispatch and the like?

I don't think so: I don't know of any scientific code that doesn't have to interact with its environment, if only to import/export data, and for that, strings at the very least are necessary.

Multiple dispatch is the same thing for me; either it or another form of polymorphism will be very quickly asked for by the users of a scientific language, as no one wants to write dozens of time the same functions for different types, and the HPC community loves its programs to go fast, so they need/want to be able to chose their types.

dTal · on March 9, 2021

Okay, so go code in Fortran and be happy. Why are you complaining?

CJefferson · on March 8, 2021

Often the reason I have a bash script is because I'm running some AI tool to solve lots of problems in parallel, gather/filter results, and then gathering results, then finally doing plotting and things.

Personally, I never really like solutions which are "you can't do 5% of stuff in X, you have to do everything in X". I like trying new things out, but I'm not willing to 100% invest everything in Julia up front, rather than just trying out some small bits.

Also, I think my collaborators would get annoyed if I rewrote all our bash scripts in Julia -- I can't expect them all to learn Julia.

cygx · on March 8, 2021

A tool with slow startup time that forces me to structure the workflow around its idiosyncrasies is less convenient than a tool with fast startup time that can be seamlessly integrated into pre-existing workflows (eg driving pipelines via Makefiles).

dunefox · on March 8, 2021

Start up time for the interpreter is 0.13s for me: ~ time julia -E "1+1"

What takes time is precompilation of packages and functions - with Julia 1.6 the precompilation is much faster now than before.

Your bash script that calls Julia 100 times is indeed not something that Julia was made for. It excels in many other areas and that's quite fine. I'm okay with plotting in a bash script in Python if it means that I can use Julia for everything else.

michaericalribo · on March 8, 2021

Your last sentence seems to ignore the reality that plotting is only one step in a larger pipeline. It sounds miserable to need to write analytics code twice, once in Julia “for everything else” and again in Python just for the plotting. I’ll just write the whole thing in Python and save myself the headache.

socialdemocrat · on March 8, 2021

So why not write the whole thing in Julia? That is the whole problem here, that the whole thing was NOT written in Julia.

Why manage two different languages? Julia is a better shell programming language than bash anyway.

michaericalribo · on March 8, 2021

I mean, the comment I was replying to answers your question:

> not something Julia was made for

dunefox · on March 8, 2021

What I meant with this is that Julia isn't intended as a bash replacement. You can write your code in Julia and circumvent the overhead of having to start up the interpreter every time. But if you try to execute it 100 times per second then of course the overhead will add up.

michaericalribo · on March 8, 2021

Ah, I understand what you mean now. And you may be right that there’s a Better Way to do it natively in Julia. But, there’s lots of friction to adopting entirely new dev practices, and I’m inclined to just stick w what tried and true methods I’m already familiar with—old habits die hard! And that’s a big friction against Julia adoption (IMO).

dunefox · on March 8, 2021

There's nothing wrong with using the tools you know. But IMO it's quite interesting to use languages that might just be a big improvement over how things have been done so far. I think Julia is such a language when compared to Python (excluding the ecosystem, of course).

Also, if you come back to Julia sometime there's this:

https://github.com/JuliaPy/PyCall.jl

https://github.com/JuliaInterop/RCall.jl

physicsguy · on March 8, 2021

My main use case for that would be: I'm generic user. I just want to run and use a Julia script for some output, by adding 'julia somescript.jl'. I don't want to modify it because I don't know the language.

dunefox · on March 8, 2021

Are you fitting, evaluating, and plotting complex models 100 times per second?

I would be quite okay with logging the results into a file and only plot it with python if this was the cost for using Julia, yes. This might not be for everybody but your scenario sounds strange for me to begin with.

michaericalribo · on March 8, 2021

No, but I am applying a serialized fitted model to 100 separate out-of-sample datasets and generating diagnostic plots for every output of predictions / scorings.

celrod · on March 8, 2021

While moving the loop into Julia (as others suggested) is probably the better option, an alternative you could consider is DaemonMode: https://github.com/dmolina/DaemonMode.jl

I.e., have a background Julia process so that you only have to pay the precompile cost once.

krull10 · on March 8, 2021

If you are going to use Julia in one of the absolute worst workflows for how it is designed, you shouldn’t be surprised it doesn’t work well... That said, have you tried using PackageCompiler to add your needed libraries to the system image? This seems to show a factor of 100 speed up for the time to first plot: https://julialang.github.io/PackageCompiler.jl/dev/examples/...

AuthorizedCust · on March 8, 2021

Why do you find R’s ecosystem “clunky”? The Tidyverse is unequaled for its elegance. I come from the CS world, so I’m supposed to like languages like Python, but I really, really like R, mainly for its elegance.

veddox · on March 8, 2021

The Tidyverse is great, but vanilla R is a monstrosity. After five years of heavy use, I still don't really understand the random idiosyncracies of its various types. Arrays and lists and dataframes and tibbles are confusingly named, and operations that work on one type often balk at the others, without telling you what's wrong. I have lost many many nerves with it.

enriquto · on March 8, 2021

I agree that for statistics and data exploration R is certainly not clunky. My use case is more discrete PDE, where the R capabilities for sparse matrices and advanced linear algebra are a bit limited (but this may be just because I'm more used to the annoyances of numpy).

newswasboring · on March 8, 2021

If this is your big setback did you try using a sysimage? The VSCode extension even has a build task for it. To make the sysimage just be in the base environment and then Ctrl+shift+B and select the Juild build sysimage task. The terminal will tell you where its saving the sysimage. It reduced my startup time to unnoticeable (at least to a person used to MATLAB/Python). I am not a bash guru so I don't know how you do it on command line but its a parameter to julia interpreter.

tpoacher · on March 8, 2021

One thing to be aware of is that you're not running a bash script which simply causes julia to "include" stuff, effectively recompiling everything each time the intepreter is run.

As long as you make sure that all the custom code you want to run is in the form of a precompiled module, I think the time required for the interpreter to launch per se shouldn't be that much of a problem.

jjoonathan · on March 8, 2021

Yes, but if the road to cached builds or running in interpreted mode is not low friction or no friction, that matters and it's the fault of the language / ecosystem.

nojito · on March 8, 2021

Calling R clunky is definitely a reach.

Parameterized reporting in R/knitr is unmatched in the industry

classified · on March 8, 2021

> I get stuck at the same point: the slow startup time…

The language maintainers steadfastly refuse to include ahead-of-time compilation. They seem super focused on their narrow use case scenario and ignore everything outside that.

socialdemocrat · on March 8, 2021

It is not nice to lie about people like that. They have never refused to do that and in fact you can already do ahead-of-time compilation in Julia. Many of us have already done it.

It is not great yet, but it is an ongoing problem, which they constantly work on and improve.

Claiming they refuse to do it is either ignorant or a flat out lie.

classified · on March 8, 2021

> It is not great yet, but it is an ongoing problem, which they constantly work on and improve.

That's just it. This effort has been dragging on for years now, slow as molasses in winter. If it were a regular goal it would have been a solved problem long ago. That hacks and workarounds for this have been deemed acceptable for so long just goes to show that it's not on the list.

dunefox · on March 8, 2021

Please show me a source where they outright refuse AOT compilation.

classified · on March 8, 2021

They don't have to outright say, "we refuse". Not doing this obvious step for years and years and years clearly shows their priorities lie elsewhere.

Sukera · on March 8, 2021

I'm sorry if I'm blunt, but the last year of compiler improvements have been nothing but targeted for this exact purpose? That's half of all spent time since 1.0! The issue tracker is filled to the brim with PRs and issues about making _everything_ faster. How does this constitute "refusal of a obvious step" for you?