Spanner stores its data in Colossus, so there would be some bootstrapping issues to resolve to move it to Spanner over Bigtable. (Bigtable also has the bootstrap issues but has solved them already and there are additional difficulties due to some details that I probably am not at liberty to share)
Spanner is used for metadata for many other very very large storage systems, though.
Yeah, that was my understanding as well. Colossus stores metadata in Bigtable on top of a smaller Colossus, which stores its metadata in Bigtable on top of an even smaller Colossus, which… [insert more stacks of turtles here] ends up in Chubby.
Nice, I didn't realize, guess it needs to go into the "reevaluate" list then. Sometimes Apache products tend to blend together in my mind and I get their capabilities confused/conflated.
It's true that the project was initially developed at Cloudera, and employees continue to be the main driving force behind development. That said, we have committers and contributors from other companies as well. Roughly half the people who contributed a patch in the last 3 months have been non-Cloudera. Additionally we are very strict about doing all development upstream (eg with the first open source release we spent a lot of effort to open the entire development history going back to 2012, including JIRA, git, etc).
As for users, here are a couple examples off the top of my head who aren't currently paying for any support:
Hopefully this isn't too "pitch"-y, but: if you're looking for a database that's good at time series, will always be open source, and does support scale-out and HA, you might be interested in Apache Kudu (incubating).
FYI on the date issue, lest anyone think we filed the patent trying to steal the work done by others, the patent application says:
"This application claims to the benefit of U.S. Provisional Patent Application No. 61/911,720, entitled “HYBRIDTIME and HYBRIDCLOCKS FOR CLOCK UNCERTAINTY REDUCTION IN A DISTRIBUTED COMPUTING ENVIRONMENT”, which was filed on Dec. 4, 2013, which is incorporated by reference herein in its entirety."
(which predates the creation of the cockroachdb repo and the hybrid logical clock paper).
I know I'm diping my toe in some history here, but is there a sense of how the patent situation is going to shake out? I think this general family of algorithm is very important.
Agreed -- personally I'm against offensive use of patents like this as well, and it's my understanding that Cloudera doesn't intend to use this patent offensively. If it did, I would be upset and would consider leaving the company - I know many other employees feel the same way. The reason I agree to help write patent applications as an engineer is that I've seen the distraction and damages caused by patent trolls (or even other companies) and the importance of having a defensive portfolio.
Disclaimer: Obviously I'm not speaking for the company or making any promises here :)
Typically QSBR algorithms don't require blocking the world, or even blocking any single thread. They just require each thread to periodically check in and run a bounded amount of code which amounts to "hey, I'm not currently looking at the map".
Some other background collector thread (which is going to actually delete removed objects) just has to wait until it sees every mutator thread cross a safepoint, at which point it knows that none of those threads could be hanging onto references that have been unlinked from the data structure.
I'd recommend reading some surveys of RCU and SMR algorithms if this stuff is interesting to you.
Spanner stores its data in Colossus, so there would be some bootstrapping issues to resolve to move it to Spanner over Bigtable. (Bigtable also has the bootstrap issues but has solved them already and there are additional difficulties due to some details that I probably am not at liberty to share)
Spanner is used for metadata for many other very very large storage systems, though.