More

anonetal · on May 29, 2019

From the earlier cited paper, this seems to target relatively simple proofs too (i.e., hardly what "expert mathematicians can prove"): "... a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture."

anonetal · on Dec 28, 2018

As another comparison point, my volt does about 3.5 miles per kWh at 60-65mph on a highway and close to double that at 20-30mph in cities. And it’s not that light.

anonetal · on Nov 28, 2018

Datomic doesn't have cryptographically-guaranteed immutability -- its logical data model is immutable and so you can query the past versions of data etc., but there isn't anything stopping one from altering the history. In a ledger like this, history cannot be modified (assuming reasonable computational limits). Datomic also has much richer data model and query languages.

anonetal · on May 21, 2018

OrientDB is a good open-source database that exposes a primarily object data model (the document and graph abstractions are built on top of it).

Lack of a good query language and scalability issues doomed those. Mike Stonebraker (https://blog.grakn.ai/what-goes-around-comes-around-52d38ee1...) had a nice summary in that article.

ken · on May 22, 2018

There are 4 reasons proposed for the failure of OODBs, in that paper, but scalability is not among them.

Interestingly, the issues are largely C++-specific (and I know lots of people using an ORM today but none via C++), and largely driven by historical accident and market forces ("It is interesting to conjecture about the marketplace chances of O2 if they had started initially in the USA with sophisticated US venture capital backing").

I still don't think these sound like good reasons to dismiss the architecture.

lvca · on May 22, 2018

I agree that a pure ODBMS makes no much sense today, but OrientDB is a Multi-Model where the Object Model is one of the supported models. You can mix objects, graphs, schema-less documents and much more + using SQL as the query language. Boom!

(Disclaimer: I am the founder of OrientDB)

anonetal · on May 15, 2018

Regarding 2: note that this is not the Cambridge Analytica data, rather similar data collected by researchers at Cambridge.

roywiggins · on May 15, 2018

You're right. It's confusing, because the data that was supplied to CA was supplied by another Cambridge researcher.

anonetal · on March 27, 2018

Regarding (1), they could just lease the cars from the car manufacturers. As I recall, that was one of the reasons why GE invested so heavily in Lyft.

smileysteve · on March 27, 2018

anonetal · on Nov 29, 2017

I don't know about Neptune -- curious to hear what it is based on -- but TitanDB never really supported cross-machine traversals for the execution engine. The data was stored in a distributed fashion (across say a Cassandra cluster), but any instance of the execution engine was single-machine, with no easy way to talk between multiple instances of the execution engine.

luisdbosquez · on Nov 29, 2017

One database service that supports horizontally scaled graphs is Azure CosmosDB Graph API: https://docs.microsoft.com/en-us/azure/cosmos-db/graph-intro...

Worth to take a look if you need a managed Gremlin solution with some degree of global distribution.

anonetal · on Nov 29, 2017

I don't think this is that straightforward. One of the comments on the article said it nicely: graduate students are still being trained and are not very effective researchers, and they are getting a significant education in doing research even if they are not taking classes (whether it is "valuable" or not today is a different discussion).

Already the cost for a postdoc is comparable to that of a student (including tuition) and probably about twice that of a student excluding tuition, and their research output can easily be more than twice that of a student (on average). There could be adjustments to the model, but charging tuition to the students is not that outrageous when you consider what the students are learning. Tuition is also typically charged on the standard credit rate for classes (pretty low for public universities).

anonetal · on Oct 5, 2017

There is lot of somewhat-researchy work on this topic, the primary one being the effort at Microsoft Research, called AutoAdmin, that started about 20 years ago. They looked at automatically creating indexes, views, histograms, and stuff like that. Peloton@CMU is a more recent incarnation of the same idea with newer tradeoffs.

Although it might sound easy, this turns out to be a very challenging problem for many technical reasons.

A primary reason for it not getting traction in practice is also that, database administrators don't like an automated tool messing with their setup and potentially nullifying all the tricks they might have played to improve the performance. This was especially true in big DB2/Oracle deployments, but is increasingly less true, which has opened it up for innovations in the last few years.

anonetal · on Oct 5, 2017

Aside from tooling, those systems often perform much better than PostgreSQL for large queries or transactions, as they feature much better optimizations. Even outside of newer optimizations like "columnar" storage, several of those systems do code generation from queries to avoid function calls, branches, etc., which can have huge performance implications. I worked on the internals of PostgreSQL once, and the number of function calls in the innermost loops were very high.

PostgreSQL also used to be (is?) single-threaded, which limited performance of a single query on multi-core machines -- I haven't looked into it to see if there has been any fundamental change in the architecture in the last 4-5 years.

grzm · on Oct 5, 2017

> PostgreSQL also used to be (is?) single-threaded, which limited performance of a single query on multi-core machines

From the submission:

"Improved Query Parallelism - Quickly conquer your analysis"

Query parallelism was introduced in 9.6 and expanded in 10.

anonetal · on Oct 5, 2017

Yes, I was just reading through that. The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work. This makes sense for PostgreSQL, but has some fundamental limitations (e.g., it requires duplicated copies of a hash table to parallelize a hash join).

endorphone · on Oct 5, 2017

>The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work.

So...it isn't single threaded then? I mean that is exactly how the most advanced competitors operate (Oracle, SQL Server) as well -- a given connection stays on one thread, with the advantages that confers, unless the planner decides to parallelize.

adzm · on Oct 5, 2017

To be technical, MSSQL uses its own bespoke scheduling, and will preempt the thread for io. All io is nonblocking. The physical thread can vary for this reason. PGSQL really does use synchronous io and a single thread though. The former is probably more scalable but the latter has been serving PGSQL fine, too.

tychver · on Oct 6, 2017

I think bitmap heap scans have had concurrent IO for quite a while now? There's the effective_io_concurrency setting for it.

jeffdavis · on Oct 5, 2017

No, processes don't create fundamental limitations. They can still share memory, it's just an "opt-in" choice.

Postgres processes share memory for all kinds of things. Hash tables may be duplicated, but not due to any fundamental limitations.

halayli · on Oct 5, 2017

PostgreSQL uses shared memory, it doesn't copy the hash table.

anarazel · on Oct 5, 2017

In the specific case of hashjoins, it does build them independently right now. There's a patch to rectify that though, by putting the hashtable also into shared memory. The coordination necessary to make multi phase batch joins and other such funny bits work, unfortunately made it infeasible to get into 10.

halayli · on Oct 5, 2017

I stand corrected, it definitely reconstructs the hash table in each process.

anarazel · on Oct 5, 2017

FWIW, here's the patchset to fix that: https://commitfest.postgresql.org/15/871/

halayli · on Oct 5, 2017

Thanks! I remember reading this thread a while back and I thought it made it in.

Tostino · on Oct 5, 2017

Yes there has been. This release expanded it significantly.

brightball · on Oct 5, 2017

The biggest issue with SQL Server is that it is myopic. The tooling and everything around it is geared toward only SQL Server. The database itself is also geared around only SQL Server...making it a huge pain to get your data out to use it with something else like Elastic Search. It's geared towards being comfortable enough to lock you in and hold your data hostage.