Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The World Needs Data Scientists (businessintelligence.com)
35 points by T-A on Oct 19, 2013 | hide | past | favorite | 48 comments


I doubt it.

The rise and fall of data scientists (if they ever rise) will be swift.

Thing is, it's easy to collect a huge pile of data and then retrospectively look at the data set and see patterns in it (very often, patterns that don't actually exist but some over fitted model kind of suggests exist).

It's much much harder to get anything tangibly useful from that data.

Sometimes there is no tangibly useful data. It's all just random.*

'What have you got to show the last 2 weeks of effort?' ... 'Well, we definitely proved that there's absolutely no correlation between our manager's new initiative and online sales; that small spike we saw was within seasonal variation'. How long do you think you'll have a job?

What people need is better tools to auto-classify data into data sets and perform real time analysis of their data (like Splunk, which is amazing), so everyone can look at summary information in real time, at any time, and you can respond dynamically to changes.

* - I know, I know, no data of that sort is totally random. ...but it's often so noisy that no statistically valid information is recoverable.


I work in risk analytics and think this comment is dead on. Tools such as Splunk can wrap metadata facilitating analysis around arbitrary input data (e.g. admin logs, vulnerability scan results, etc). I expect that such tools will reduce the need for some of the work currently done by data scientists.

With that said, there is still a need for human driven analysis of risk data in order to compose models that prioritize and collate incoming signals in ways to assist in risk management decisions. These models are highly domain specific (e.g. a model for evaluating IT infrastructure assets' risk across a global banking business) and are nontrivial to design.

I suspect that data scientists will be useful in bridging the gap between automated risk analytics collection tools and corporate risk management by assisting in coming up with risk models based on the volumes of incoming data.

Edit: or maybe "quants", not "data scientists" will fill this role. Regardless of the naming convention, it's an interesting and complex space to watch.


Nah, data science is AI-complete, like natural language understanding. Think of current state of data science as just the next iteration / rebranding of statistics, powered by newest developments in CS, machine learning and software industry. There is no shortage of statistical data exploration and modeling applications, and we are pretty far from full software automation in this space.


AI-complete, awesome terminology you coined there.


what does it mean?


Solving it would mean solving AI.


If you are at a large company with a data-driven marketing department (hint: the good ones are), the statement

"Well, we definitely proved that there's absolutely no correlation between our manager's new initiative and online sales; that small spike we saw was within seasonal variation"

will be much appreciated and valued. The data scientist doesn't need to report to the head of Marketing, and won't feel any pressure to make someone else's manager look good.


I generally agree that people need better tools. However it is also very difficult to give people the right sort of visualization and analysis tools. If you pick a very, very specific niche, then doable. But people have tried for years and years for more generalized tools and they still mostly suck in general. This doesn't mean it is hopeless just that it is much harder than many people realize.


Good comment, and much of it also applies to vague optimism that some people put into Machine Learning methods.


Are businesses paying handsomely for data scientists?

If not, then they don't need data scientists.

Will they? Maybe, maybe not.

These ten-year industry forecast things are categorically ridiculous, whether they come from the Bureau of Labor Statistics or some private forecaster.


In my experience, people will throw money at "data scientists", if they can find them. It really is hard to find someone who is experienced in Physics (or Math, but they always seem to be physicists), has computational experience, but mostly who has the right frame of mind. A lot of people I've met who would be qualified for the role think it's beneath them: they don't want to use existing libraries, they want to progress the state of the art. And once you've done a PhD, I think it's hard for us to offer them a consistent level of challenging work.

Despite the 'science' part, I think there's a significant engineering component to the job: working within resource constraints and reusing existing tools. I've heard a lot of good explanations of how to approximate an integral, but the same people aren't really keen on working on a production recommendation system.


> And once you've done a PhD, I think it's hard for us to offer them a consistent level of challenging work.

As someone who's almost gotten their PhD in (mathematical models of) medical imaging, I think you'd be surprised at what kind of work would be consistently challenging enough to interest me.

I was at a job forum a couple weeks ago, and whenever I would bring up my incoming doctorate, company reps would be like "Oh, well our stuff must be pretty boring to you, then." I really wanted to say, "If I'm talking to you, it means what you're doing is interesting to me!"

I'm sure some PhDs are driven to improve the state of the art, but there's also a perception that a PhD won't want to go back to working on problems that are already solved in principle -- and that perception scares me, because I'm going to need a job soon....


> I was at a job forum a couple weeks ago, and whenever I would bring up my incoming doctorate, company reps would be like "Oh, well our stuff must be pretty boring to you, then." I really wanted to say, "If I'm talking to you, it means what you're doing is interesting to me!"

Yes! There is an absolute myth in industry that if someone who has a PhD shows interest, then you're best looking elsewhere, because they're likely to not find the work challenging enough and jump ship easily as a result.

I'm on the verge of completing my PhD thesis, so I can easily relate to what you're saying. I've pretty much perfected the art of starting with "I'm getting my PhD, but don't assume that I'm not interested in more "mundane" things". Pretty much, I think there's a challenge to be had in any tech job, and I think the main characteristics to convey to a possible future employer is that beyond having a great tech skill set, you know how to manage a project (HUGE skill I've gained from managing my PhD), you know how to perservere in the face of adversity (any PhD candidate that hasn't faced this hasn't done their PhD the "right" way in my opinion), and you're creative right down to your core.


As a recruiter, I’d say that you really need to tell what you’re thinking to people who might interview you. PhD shows that you’ve spent some years to get your skills Certified and thus master some of them. Bur it doesn’t mean that a person with a PhD is better than the one without but with 5-10 years experience of working in some specific area. Some good programmers and scientists(like data scientist) never get a PhD and their skills are not lower compared to their certified peers. So my advice would be: lay it on the line regarding what you’re looking for in terms of challenges and reward for them and show what you can. Papers don’t matter much nowadays but personalities do.


A PhD might not demonstrate that you're anymore capable of a specific job, but by the same reckoning plenty of employers pigeon-hole PhDs as "academic-types" with no sense of how things work in the real world. With respect to that, it is really is about breaking unfounded stereotypes and debunking myths that if you've gone through formal training and you have the "Dr." title attached to your name, that you're not really able to be a team player and achieve what's necessary in the business world. Bottomline is that whether you have a PhD or not, as you say, you have to show what you're capable of. The fact of the matter is that having completed a PhD brings with it a lot of "real world" skills than simply becoming a domain expert: project management, creativity, passion, perserverance, to name just a few. My experience, shared by others with a research background is that the the tech industry often needs "educating" that PhDs can bring meaningful value to the table, even for problems that at first might seem to not be of interest.


You should have actually said that to them. It would have most likely helped you get the job.


why would you be looking primarily at physicists if you are trying to find data scientists??


It isn't necessary, but many physicists have to analyze huge amounts of data out of necessity. Telescopes and particle accelerators generate petabytes of data. So they have a lot of practice with this sort of thing. Physicists also have to be ridiculously good at math and particularly statistics, again out of necessity.

Again, physics isn't really a prerequisite. It is just that many physicists have the relevant experience already to be a good data scientist.


Physics is pretty close to data science. In the end, physics often is about applying the scientific method to sth. Traditionally, sth is "the world", but it's not too different if it's just another set of data.


Actually yes, and they have been for a decade or more. They're called "quants" tho'. Someone who calls him or herself a "data scientist" is demonstrating a stunning lack of industry awareness - enough to rule them out for any job.


http://analytics.ncsu.edu/?page_id=248

it looks pretty good. i wonder how hard it is to get into these programs.


Gearing myself towards a data science career, it strikes me as a particularly odd way of presenting growth in the field. A lot of these articles seem to suggest that the field has been recently "discovered", when it's more a case of the technology supporting more advanced analysis. "Data science" existed before all of this hype, and I think it's a misnomer to suggest that that wasn't the case. There's definitely a greater spotlight on data science now, but I don't think that's any real reflection of how business intelligence and data analytics were done before the mega-hype kicked in.


Speaking as someone for whom statistical software puts food on my table, one of my favorite statements about data science is (roughly speaking) "If you don't have data, are you really a scientist?"

I would completely agree with your point about technology supporting more advanced analysis: Current information processing capabilities allow us to ask new questions from higher volumes of data than was possible in the past.


"data scientist" sounds much more interesting and succinct than "scientist who transform and analyzes data which can only be transformed and analyzed by computational means".


Unfortunately, because of all the hype, all it is going to get is douchebags instead.


Having graduated in May of 2001, and seeing my CS classes go from full of people to a ghost town from March-May, I wouldn't be surprised.


Count me in!


Does the 15000% YOY increase in "data scientist" job postings reflect an actual greater demand for data scientists, or a trendy retitling of vacancies that were previously give titles like "market analyst", "data product manager", "operational research analyst" or even "actuary" and "economist"?


Great! More analytics people (and that is what they are being used for) looking up our butts so as to to spy on inner behaviors, proffer the "correct" ads in front of our faces, figure out who should be hired and fired and telling artists what sort of art they have to produce.


Speaking as an artist, if you know someone with provable expertise in analysing what sort of art I should produce, I would very much like to hear from them.

Making art and making stuff you think people will like are not mutually exclusive.

(I am very serious about this, by the way - if you have expertise in analytics for storytelling, or even just interesting ideas for same, please shoot me an email.)


Well, if that's what your craft is all about Slashdot did an in depth analysis on the subject here:

http://slashdot.org/topic/bi/can-big-data-prevent-hollywoods...

Here is a white paper on how to roll out you movie based upon "data scientist" analysis.

http://www.google.com/think/research-studies/quantifying-mov...

I have no idea why you are seduced by this? Whistler likened painting portraits (which he was paid very well for) to prostitution and on his own time & dime gave us his nocturne paintings which ultimately inspiring Monet in the process.

http://www.stargonaut.com/trialog.html

Analytics tells you where you have been not where you are going. Novelty changes everything and analytics can't predict innovation. That is why Hollywood and the music industry suck.


Thanks for the links - interesting stuff!

Why I'm interested: because much like money, analytics are a terrible master but a wonderful servant.

If you're ruled by them, I agree, that's limiting - but if what they're doing is providing useful information about your audience, that's invaluable. This is particularly true if you're a recording-based artist (I am), as it partially allows you to restore the feedback that you'd get from a live audience.

As many people have said, before you can break the rules you should probably understand them. Likewise, before you go against audience expectations it's helpful to know what they expect.


This intrigues me. I'm fascinated by storytelling and the psychological impact of it, and data science/analytics is what I'm doing for a startup. I had never really considered the intersection of the two, I'm curious where your interest lies there. Analyzing successful stories to find out why they're successful? Finding ways in which stories can be improved?

I think step zero for any kind of "story analysis" would be extracting structured data from the raw text--pulling out things like emotions, foreshadowing, revelations, and so on. This is pretty interesting because I don't know that I've ever heard of this type of thing--for the most part text analysis focuses on things like sentiment, topic extraction and so on. There may be some fertile ground here for new types of analyses based on how stories work.

Curious to hear your thoughts... feel free to drop me a line as well to chat more, jason [at] applieddatalabs [dot] com


I will drop you a line. My antipathy towards the subject is that I have analytics people telling me what to design and some of them think they are the "center of infinity."

In regards to topic or plot extraction I would start with "The Seven Basic Plots - Why We Tell Stories" by Christopher Booker:

The Guardian review: http://www.theguardian.com/books/2004/nov/21/fiction.feature... )

and then break down which plots in both literature and film are "fashionable" or why they work.. We are off topic now, I will take this of line


what about videogame designer or "solar panel consultant" ?


Odd use of a pie chart for a web site devoted to data science, since pie chart portions usually add up to 100%. It would have made sense if they had shown the categories as highest degree earned, instead of degree(s) earned.


Reminds me of my favorite Tumblr lately, wtfviz (http://wtfviz.net/).


Oh god, I'm glad I didn't have coffee in my mouth with some of those. However, there are a few on there that I actually think are pretty good, if unconventional (like the senate voting visualization, which does a great job of showing polarization).


Any advice on an aspiring data scientist?

I am currently finishing my bachelors in applied math, and computer science. Started to focus on machine learning this semester and hoping to do my capstone on it.


It's work in progress but this book seems interesting so far: http://www.manning.com/zumel/

(Disclosure: written by a friend.)


I skimmed through the intro and just bought the book. Thanks for the rec!

I love the idea of being able to interact with the authors as they write the book.


Minor in economics. Skip on taking extra macro classes and instead take as many econometrics and micro that you can. Economics (or another quantitative social science) is often the missing ingredient for a solid data science background. Sure physicists have good quantitative skills, but you also need to understand what the important/relevant [business] questions are to ask.


Head over to kaggle.com and participate in a few competitions. It will look good on your cv and some companies have begun to require that candidates have a "kaggle-score"


You mean Statisticians.


The world sure doesn't need more nerds to tell it how data scientists have been around since the scientific method and it's all a bubble any way.


We need more data scientists to make more infographics!


And the world needs also with good and easy education permanent and easy access to knowledges.


Yeah for the spy agencies ....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: