I never had any issues with Hadoop. Took about 2 days for me to familiarize myself with it and adhoc a script to do the staging and setup the local functions processing the data.
I really would like to understand what you consider "hard" about Hadoop or managing a cluster. It's pretty straight forward idea, architecture is dead simple, requiring no specialized hardware at any level. Anyone who is familiar with linux CLI and running a dyanamic website should be able to grok it easily, imho.
Then again, I come from the /. crowd, so YC isn't really my kind of people, generally.
Is this serious? Have you ported a program to Hadoop? Unles you use Pig or one of those helping layers it is quite hard for non-trivial problems. And those helping layers usually come with some overhead cost for non-trivial cases, too.
It was a pretty easy problem, parsing logs for performance statistics. But moving the data is the easy part and that's why I was incredulous of the OP's statement.
I'm starting to wonder if this is really "Hacker News" or if it's "we want free advice and comments from engineers on our startups so lets start a forum with technical articles"
Big Data should be on the Peta+ level. Even with 10G Ethernet it takes a lot of bandwidth and time to move things around (and it's very hard to keep 10G ethernet full at a constant rate from storage). This is hard even for telcos. Note Terabyte+ level today fits on SSD.
Not really, "Big Data" has nothing to do with how many bytes you're pushing around.
Some types of data analytics are CPU heavy and require distributed resources. Your comment about 10G isn't true. You can move around a Tb every 10 minutes or so. SSDs or a medium sized SAN could easily keep up with the bandwidth.
If your data isn't latency sensitive and run in batches, building a Hadoop cluster is a great solution to a lot of problems.
Of course big data is about number of bytes. That's what something like map reduce helps with. It depends on breaking down your input into smaller chunks, and the number of chunks is certainly related to the number of bytes.
I never had any issues with Hadoop. Took about 2 days for me to familiarize myself with it and adhoc a script to do the staging and setup the local functions processing the data.
I really would like to understand what you consider "hard" about Hadoop or managing a cluster. It's pretty straight forward idea, architecture is dead simple, requiring no specialized hardware at any level. Anyone who is familiar with linux CLI and running a dyanamic website should be able to grok it easily, imho.
Then again, I come from the /. crowd, so YC isn't really my kind of people, generally.