From the earlier cited paper, this seems to target relatively simple proofs too (i.e., hardly what "expert mathematicians can prove"): "... a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture."
As another comparison point, my volt does about 3.5 miles per kWh at 60-65mph on a highway and close to double that at 20-30mph in cities. And it’s not that light.
Datomic doesn't have cryptographically-guaranteed immutability -- its logical data model is immutable and so you can query the past versions of data etc., but there isn't anything stopping one from altering the history. In a ledger like this, history cannot be modified (assuming reasonable computational limits). Datomic also has much richer data model and query languages.
There are 4 reasons proposed for the failure of OODBs, in that paper, but scalability is not among them.
Interestingly, the issues are largely C++-specific (and I know lots of people using an ORM today but none via C++), and largely driven by historical accident and market forces ("It is interesting to conjecture about the marketplace chances of O2 if they had started initially in the USA with sophisticated US venture capital backing").
I still don't think these sound like good reasons to dismiss the architecture.
I agree that a pure ODBMS makes no much sense today, but OrientDB is a Multi-Model where the Object Model is one of the supported models. You can mix objects, graphs, schema-less documents and much more + using SQL as the query language. Boom!
I don't know about Neptune -- curious to hear what it is based on -- but TitanDB never really supported cross-machine traversals for the execution engine. The data was stored in a distributed fashion (across say a Cassandra cluster), but any instance of the execution engine was single-machine, with no easy way to talk between multiple instances of the execution engine.
I don't think this is that straightforward. One of the comments on the article said it nicely: graduate students are still being trained and are not very effective researchers, and they are getting a significant education in doing research even if they are not taking classes (whether it is "valuable" or not today is a different discussion).
Already the cost for a postdoc is comparable to that of a student (including tuition) and probably about twice that of a student excluding tuition, and their research output can easily be more than twice that of a student (on average). There could be adjustments to the model, but charging tuition to the students is not that outrageous when you consider what the students are learning. Tuition is also typically charged on the standard credit rate for classes (pretty low for public universities).
There is lot of somewhat-researchy work on this topic, the primary one being the effort at Microsoft Research, called AutoAdmin, that started about 20 years ago. They looked at automatically creating indexes, views, histograms, and stuff like that. Peloton@CMU is a more recent incarnation of the same idea with newer tradeoffs.
Although it might sound easy, this turns out to be a very challenging problem for many technical reasons.
A primary reason for it not getting traction in practice is also that, database administrators don't like an automated tool messing with their setup and potentially nullifying all the tricks they might have played to improve the performance. This was especially true in big DB2/Oracle deployments, but is increasingly less true, which has opened it up for innovations in the last few years.
Aside from tooling, those systems often perform much better than PostgreSQL for large queries or transactions, as they feature much better optimizations. Even outside of newer optimizations like "columnar" storage, several of those systems do code generation from queries to avoid function calls, branches, etc., which can have huge performance implications. I worked on the internals of PostgreSQL once, and the number of function calls in the innermost loops were very high.
PostgreSQL also used to be (is?) single-threaded, which limited performance of a single query on multi-core machines -- I haven't looked into it to see if there has been any fundamental change in the architecture in the last 4-5 years.
Yes, I was just reading through that. The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work. This makes sense for PostgreSQL, but has some fundamental limitations (e.g., it requires duplicated copies of a hash table to parallelize a hash join).
>The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work.
So...it isn't single threaded then? I mean that is exactly how the most advanced competitors operate (Oracle, SQL Server) as well -- a given connection stays on one thread, with the advantages that confers, unless the planner decides to parallelize.
To be technical, MSSQL uses its own bespoke scheduling, and will preempt the thread for io. All io is nonblocking. The physical thread can vary for this reason. PGSQL really does use synchronous io and a single thread though. The former is probably more scalable but the latter has been serving PGSQL fine, too.
In the specific case of hashjoins, it does build them independently right now. There's a patch to rectify that though, by putting the hashtable also into shared memory. The coordination necessary to make multi phase batch joins and other such funny bits work, unfortunately made it infeasible to get into 10.
The biggest issue with SQL Server is that it is myopic. The tooling and everything around it is geared toward only SQL Server. The database itself is also geared around only SQL Server...making it a huge pain to get your data out to use it with something else like Elastic Search. It's geared towards being comfortable enough to lock you in and hold your data hostage.