Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want to do something less buzzwordy with lots of real-life applications, look into distributed systems. Try running an Apache big data project yourself and write some programs/queries for it, try making a change to the project to do something cool. My suggestion to check out an Apache big data project is just that it gives you a good place to learn, not so you can be a "hadoop specialist" or anything like that.

There is way more real world usage of the distributed systems concepts and skills you'd learn there (especially in large tech companies) than any other flavor of the month. While ML is also commonly used in the industry, the signal:noise is really bad, because a lot of its uses are superfluous buzzword-driven development. However, many many companies rely on distributed systems to be able to operate at scale.



Absolutely. I often joke that my work as a data scientist is mostly creating bar graphs for people. The actual analysis is often reasonably simple, its the aggregating of the data that is hard (its messy, its not all in the one spot and there is lots of it).

So start with querying your big data to say what the top three event types are. Then slowly crank up the analysis complexity, but not too much. The data engineering has lots of scope for real solid and obvious applications.


Oh, so don't do current buzzwords but past ones like Big Data are okay.

And if you wanna learn about distributed systems nothing better than Bitcoin or any cryptocurrency based on a P2P protocol.


Big data tools are just one example of distributed systems. I suggested looking into them because there are a lot of open source ones you can play with, not because I think big data isn't a buzzword (though Spark is definitely used a lot in industry).

Crypto is of course a distributed system too (at least, many are) but in practice it's a bit different than anything you'd see in industry because it's trustless.


I agree that they are novel and interesting to learn, but practically speaking, the person's point, is they are over hyped, and honestly since most use cases popping up aren't decentralized or are decentralized, but being regulated by a centralized party, like a government, it seems that they are the most inefficient way to run a distributed system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: