No surprise there - pandas encourages ugly, inefficient code with its bloated, u...

peatmoss · on Dec 5, 2020

R’s meta programming facilities are head and shoulders above Python’s, which I think explains the brilliance of dplyr and dbplyr. But I feel like with R you have to scrape back a bunch of layers to get to the Schemey parts. I’ve always wondered what Hadley and Co would have done with dplyr and dbplyr had they had something like Racket at their disposal.

kgwgk · on Dec 5, 2020

Unfortunately R success killed xlisp-stat: http://homepage.divms.uiowa.edu/~luke/xls/xlsinfo/

Edit: or maybe it's not dead? I just found http://www.user2019.fr/static/pres/t246174.pdf

civilized · on Dec 5, 2020

I was offended the first time I encountered R's nonstandard evaluation, but it didn't take long to accept it. Now I wonder why anyone would want to write `mytable.column` a million times when it's obvious from context what `column` is referred to, and the computer can reliably figure it out for you with some simple scoping rules. It's a superior notation that facilitates focus on the real underlying problem, and data analysts love that.

em500 · on Dec 5, 2020

IMO they should just bite the bullet and learn proper SQL. I say this as a data scientist who learned SQL later than C, Matlab, R, Python/Pandas (though earlier than PySpark).

civilized · on Dec 5, 2020

I agree. SQL is nothing to be afraid of, and there's no happier place to be analyzing huge tabular datasets than in a modern columnar database

orhmeh09 · on Dec 5, 2020

R’s data.table package is faster at these things out of the box than any single instance of a database server I’ve encountered. This is frustrating because I’m trying to explain some systemic issues we suffer by not using a relational database, but it’s really hard to make my case when data.table is one install.packages away and a version upgrade from Postgres 9 to something a little faster is gatekept by bureaucracy. I’ve been trying for months!

civilized · on Dec 5, 2020

You need a columnar database for good performance. Try DuckDB to ease them into it, it's a columnar SQLite.

orhmeh09 · on Dec 6, 2020

Thanks, I’m checking it out, it seems pretty interesting to keep an eye on. Lots of properties that would be useful in our shared computing environment like not requiring root or Docker.

hated · on Dec 6, 2020

Might also be worth running a local instance of Postgres 13. Super easy to do on Windows without administrator rights.