His creativity is amazing: founder of a number of database companies, including Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, and Paradigm4...at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems [1]
And his students:
-Daniel Abadi (co-founder and Chief Scientist of Hadapt)
-Michael J. Carey (faculty at UC Irvine, formerly at U. Wisconsin Madison, NAE Member and ACM Fellow)
-Robert Epstein (founder and former VP of Engineering of Sybase)
-Diane Greene (co-founder and former CEO of VMWare)
-Paula Hawthorn (founder of Britton-Lee, formerly VP of Engineering of Informix)
-Marti Hearst (Professor at UC Berkeley)
-Gerald Held (former VP of Engineering of Oracle)
-Joseph M. Hellerstein (faculty at UC Berkeley)
-Anant Jhingran (VP and CTO for IBM's Information Management Division)
-Curt Kolovson (Sr. Staff Research Scientist at VMware)
-Clifford A. Lynch (executive director of the Coalition for Networked Information)
-Mike Olson (former CEO of Sleepycat Software and founding CEO of Cloudera)
-Margo Seltzer (Professor of Computer Science at Harvard, founder and former CTO of Sleepycat Software)
-Dale Skeen (founder of Tibco, founder and CEO of Vitria)
He also has some amazing work on non-traditional databases, for example he worked on H-store (http://hstore.cs.brown.edu/) which is now voltdb, which is an in-memory distributed database.
I don't know about the episodes you're referring to, but it appears she wrote the log-structured filesystem for BSD (which builds on UFS, that was created by McKusick, a great filesystem for its day, but ultimate needed improvements). Also looks like she founded Sleepycat, and knew a bunch about Berkeley DB, which hammered the filesystem.
"""She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB and is now an Architect for Oracle Corporation."""
So, frankly, I think she probably is qualified to tell McVoy (not one of my heros) and McKusick (one of my heros) how to up their game.
I can grudgingly accept that people downvote me for saying negative things about someone without linking to evidence (that really does exist, I swear!).
But Berkeley DB not being robust back in the day shouldn't be controversial at all. It's a simple fact.
I downvoted you not because your statement was negative but because it was irrelevant. You jumped in to trash somebody tangentially mentioned apparently because you don't like their politics. When somebody gave you a substantive response, you ignored the meat and came back with personal trivia. Comments like these waste everybody's time.
I didn't downvote you myself, but I will say I have ... um... "some" experience with Berkeley DB, and it's pretty well designed for reliability. Obviously that doesn't mean nobody has ever experienced corruption, but it's been used heavily in production environments as the storage layer for many systems.
I didn't think you did and I apologize for not phrasing it better.
Berkeley DB is an old piece of software that has changed a lot over time, as have the environments it has to run in. I think the big change in robustness happened from 3.x to 4.x.
Can you provide some evidence of this episode? Right now I'm reading her PhD thesis at Berkeley on filesystems and several of her first-author papers with McKusick. Her works cites McVoy's and says it's OK.
Maybe later. I took a quick google around before I posted but they are not as easy to google as they were the last time I looked at it, some years back.
I think I originally found them via McVoy's page back in 2000, and then followed links and googled a bit. There may also have been some discussion available through Deja News.
McVoy had written some interesting stuff even if his BitKeeper thingy looked horrible and very TacKy :)
I became fascinated with what Stonebraker was saying back when I got into the biotech industry. We were dealing with huge amounts of data and the more I read from him the more it made sense to me, particularly when he talks about the lack of ACID in NoSQL stuff being a bad thing. I have a couple of projects involving VoltDB and SciDB on the backburner, and any future projects I plan on using VoltDB in if possible and applicable, and so far I am pretty convinced that they are much more useful than people understand.
If you haven't read up on either VoltDB or SciDB or Stonebraker himself, I highly suggest you do, as it might make you think twice about some of your current setups. Here's a few quotes for the fun of it:
"I think the biggest NoSQL proponent of non-ACID has been historically a guy named Jeff Dean at Google, who’s responsible for, essentially, most to all of their database offerings. And he recently … wrote a system called Spanner,” Stonebraker explained. “Spanner is a pure ACID system. So Google is moving to ACID and I think the NoSQL market will move away from eventual consistency and toward ACID.”
“My prediction is that NoSQL will come to mean not yet SQL,”
"You saw that they went for Cassandra for inbox search and HBase for messaging. The reason they're not doing that on MySQL is that sharding MySQL is a lot of effort and you have to apply that effort to each new project."
That should be enough to get your curiosity piqued.
A lot of that seems like anecdotal at a best. Now, I'm not one to argue with someone who has as much experience as Stonebreaker, but it seems like he's looking at a few specific use cases, and formulating a broad opinion on NoSQL from them.
There are plenty of use cases where ACID compliance truly isn't needed.
Also, just because Google has one new database that features ACID compliance, does not mean that "Google is moving to ACID", it simply means that Google has identified a need for a portion of their data to be stored in an ACID compliant way.
I don't disagree, but I think what he is trying to convey is that there are a lot of place where ACID is needed but isn't being put into place. He's not arguing against non-ACID, he's saying people are using non-ACID systems where they shouldn't. It's a small but important distinction.
Very well deserved. I've looked over dozens of papers for relational db's and every single one of them cites down to his foundational work. Congratulations Professor Stonebraker!
Seconded. He was on episode 199 of Software Engineering Radio two years ago (http://www.se-radio.net/2013/12/episode-199-michael-stonebra...), which I thoroughly enjoyed. It was so informative that I took a page of notes while listening! Really great stuff.
Agreed. I remember being very inspired by some of his papers when I started studying databases in grad school. His work along with all of those who developed the B-Tree and its variants[1] is really foundational to data storage and retrieval. He also helped to kick off the "NoSQL" and "NewSQL" movements with his C-Store paper.[2]
[1] Proposed by Bayer and McCreight, and independently developed by Chiat and Schwartz, and also by Cole, Radcliffe and Kaufman, improved by many including D. Knuth.
[2] C-Store: A Column Oriented DBMS. Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. VLDB, pages 553-564, 2005.
This is what adjuncting is supposed to be for - folks with industry experience moonlight as professors. The benefit is not primarily salary but networking, knowledge sharing, and helping the next generation of industry professionals.
Now it's more of a way to have 2/3 or more of the department work for unliveable wages which allows for an ever-growing administrative overhead in colleges while tuitions double every decade.
As far as I know, not true at MIT, teaching is done by tenured or tenture track professors, with exceptions like this, or SF author Joe Haldeman, that prove the rule.
But, yeah, I hear from lots of sources that adjuncts making peanuts have become the rule rather than the exception in general US academia, and there's no disputing how administrators are taking over higher education, now even desiring to wrest little the faculty still control from them.
Management in all organizations should be automated by AI, with copious amounts of override buttons sprinkled throughout.
Education administration is a like a thorn stuck in my mind, they make _more_ than everyone else and for the most part only act as a gas to support their own structure.
This is how every bureaucracy works, whether government, commercial or academic. Once the institution has enough income/cash flow for momentum, then it attract people who are expert at operating the machine itself, rather than expert in what the machine is supposed to be accomplishing. After the first one lands, they continue to accrete.
It's hard to recognize when it starts, but you'll know it's happened once you see a lot of people who are not connected with the apparent goal of the machine, and there are posters all over the place touting whatever programs the administrators have created to justify their existence, as well as packaged training programs from motivational/educational consultants (think Franklin Covey).
Ha, A fork bomb of Agent Smith crossed with Nancy in Program Outreach.
The accretion or calcification model of bureaucratic formation is compelling, something like how a coral reef grows. The randomized surface provides eddies and pockets of protection for other life to flourish, RFPs and SBIRs can nestle in a protected arena with low local competition.
I just realized that large, messy codebases also follow the reef model of bureaucracy. Hadoop is like that coral reef, providing nooks and crannies for optimizations and integrations to take hold. I used to imagine Hadoop as Whale fall [0], but it is more of a mandlebulb. Had Hadoop not provided such a rich environment the secondary ecosystem wouldn't be as vibrant. Fail to Win?
I find management structures fascinating. Whenever I interact with one I probe it to see how much autonomy each individual in it has, what rules they can bend or not follow. Once the agents participating in the bureaucracy cannot bend the rules I think it will tend towards dystopia. Maybe 1984 isn't a warning against fascism, but the natural tendency of all bureaucracies to only support them selves.
note: I might sound like the stereo type of a hackernews-bitcoin-libertarian, but I assure you my politics are much more nuanced than that. I don't think that bureaucracy as a structure is bad, but it needs to be managed with something akin to the voting logic in a triple redundant control circuit [1] [2]. Most bureaucracies exist within a positive feedback loop, which rewards them for growth instead of efficiency. It is like getting paid by LOC instead of 1/LOC or 1/runtime.
Yup. In the case of people like Stonebraker -- or Butler Lampson, who's also officially adjunct at MIT -- Think of it as: You get an office, a community, resources with which to do research (and a framework within which to ask for grant/funding if you want), opportunities to advise students, the ability to teach classes when you want. In exchange, you don't get much money -- but also almost zero responsibilities unless you choose to assume them.
(source: I'm a CS professor and I got my Ph.D. at MIT while both Stonebraker and Lampson were there.)
Adjunct definitely means something different at MIT. For instance, departments are only allowed to hire adjuncts up to 5% of the total normal faculty members of their department[1]. MIT EECS only has six [2]. In effect, this means that adjunct positions are only for really, really qualified people -- they're not there to fill out the teaching staff but to augment the experience of the rest of the faculty.
Adjunct professors can also supervise research, which I believe is uncommon at other institutions.
To know more about Professor Stonebraker I cant recommend this excellent interview[0] by se-radio enough. Its easily one of the best interviews that I've heard on a podcast. Do check it out.
Stonebraker was tech advisor starting in 2001 for Addamark/Sensage, which developed a column-oriented/columnar DB for log aggregation/analysis for security/operations. Stonebraker's own C-Store and Vertica came later and were more fully featured. While Sensage's product was integrated into some HP offerings, HP unfortunately (my perspective) chose to buy rival Arcsight and then Vertica.
Mike was my thesis advisor at Cal, and had enormous influence on all sorts of things beyond databases, including (I believe) the founding of the CS department and the negotiation of how Ingres technology spin off from Cal (which owns the IP), which became the prototype for how others would create companies like Inktomi and many more.
Keep in mind this book is really just a large collection of core papers. If you are looking for something more structured as a how-to of developing DBs this is useful as a reference but not the best introduction.
For an overview of basic concepts in DB implementation, I quite like Database Systems: The Complete Book (By Widom I think...). You could do a lot worse than An Introduction to Database Systems (CJ Date), although many dislike his opinionated style :-).
If you're talking about actually implementing a full transactional database system, strong foundational books are:
Transaction Processing: Concepts and Techniques (Gray and Reuter)
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery (Vossen and Weikum)
Neither are exactly easy reading, but the concepts therein are really important.
+1 on all of those books. If you are interested in multi-dimensional indices (R-Trees, M-Trees, and many more exotic ones). You should checkout "Foundations of Multidimensional and Metric Data Structures" by Hanan Samet. Very comprehensive! "Database Systems: The Complete Book" is fantastic (pick up the previous version it is cheaper!) but it only touches on multidimensional indexing.
He also contributed 2 modules to the recent "Tackling the Challenges of Big Data" online course from MITx. Among other things, he did a very lucid roundup of legacy vs modern db systems.
Though most articles says his open source contributions, Wikipedia page says "PostgreSQL evolved from the Ingres project at the University of California, Berkeley. In 1982 the leader of the Ingres team, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres"
I think I watched a talk by Michael a few years back about the basics of columnar databases but now I can't find it. I recognized the names of the companies he founded from the article. At least I think it was him that gave the talk! Does anyone know what I'm talking about?
He is very very good. I had his undergrad database class long time ago. His class was one of my favorites. What he taught I can still use today. I've just done a merge-join thing recently based on what I remembered from his class.
The desire to put in that sort of work to the exclusion of building things and companies, it would appear in his case. He was a full professor at Berkeley. He's also getting old, past the normal age for a MIT professor to become Emeritus.
And his students:
-Daniel Abadi (co-founder and Chief Scientist of Hadapt)
-Michael J. Carey (faculty at UC Irvine, formerly at U. Wisconsin Madison, NAE Member and ACM Fellow)
-Robert Epstein (founder and former VP of Engineering of Sybase)
-Diane Greene (co-founder and former CEO of VMWare)
-Paula Hawthorn (founder of Britton-Lee, formerly VP of Engineering of Informix)
-Marti Hearst (Professor at UC Berkeley)
-Gerald Held (former VP of Engineering of Oracle)
-Joseph M. Hellerstein (faculty at UC Berkeley)
-Anant Jhingran (VP and CTO for IBM's Information Management Division)
-Curt Kolovson (Sr. Staff Research Scientist at VMware)
-Clifford A. Lynch (executive director of the Coalition for Networked Information)
-Mike Olson (former CEO of Sleepycat Software and founding CEO of Cloudera)
-Margo Seltzer (Professor of Computer Science at Harvard, founder and former CTO of Sleepycat Software)
-Dale Skeen (founder of Tibco, founder and CEO of Vitria)
[1] http://en.wikipedia.org/wiki/Michael_Stonebraker