My first article on “How To Become A Data Scientist” explored the basic four different types of Data Scientist – Data Business People, Data Creatives, Data Developers and Data Researchers (as per the O’Reilly study “Analysing the Analysers”). It highlighted the need for a data science team with diverse and complementary skill sets. It is clear that no one “superstar” can fulfil all the required roles, and it is up to us as recruiters to understand the requirements of any organisation to ensure that there aren’t any gaps in their capability.
Therefore, in this piece, I wish to assess in more detail the primary skills of each type of Data Scientist, investigating in which areas they might collaborate – thus starting to compile a basic profile of each role. I’ll be following up next time with the different routes to becoming a Data Scientist.
To recap: Data Business People (DB) are leaders and entrepreneurs. Data Creatives (DC) are multi-talented artists and hackers. Data Developers (DD) areprogrammers and engineers. Data Researchers (DR) are scientists andstatisticians.
As you can see by the following graphic, there is a usually a stronger skill set for each Data Scientist group. As recruiters, it is important that we identify not only the general skill set of our candidates, but also where they have particular strengths.
There are certain areas in which each type will collaborate. For example, Data Creatives might work with Data Researchers on Statistics, Data Developers might work with Data Creatives on ML/Big Data work, while Data Business People may work fairly independently on the Business side.
The skills of a Data Scientist can be broken down into 22 sub-sections, and I offer my interpretation of the key skills for each of the four Data Scientist types. This is my subjective view and of course is open to debate.
Algorithms (ex: computational complexity, CS theory) DD,DR
Back-End Programming (ex: JAVA/Rails/Objective C) DC, DD
Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS) DD, DR
Big and Distributed Data (ex: Hadoop, Map/Reduce) DB, DC, DD
Business (ex: management, business development, budgeting) DB
Classical Statistics (ex: general linear model, ANOVA) DB, DC, DR
Data Manipulation (ex: regexes, R, SAS, web scraping) DC, DR
Graphical Models (ex: social networks, Bayes networks) DD, DR
Machine Learning (ex: decision trees, neural nets, SVM, clustering) DC, DD
Math (ex: linear algebra, real analysis, calculus) DD,DR
Optimization (ex: linear, integer, convex, global) DD, DR
Product Development (ex: design, project management) DB
Science (ex: experimental design, technical writing/publishing) DC, DR
Simulation (ex: discrete, agent-based, continuous) DD,DR
Spatial Statistics (ex: geographic covariates, GIS) DC, DR
Structured Data (ex: SQL, JSON, XML) DC, DD
Surveys and Marketing (ex: multinomial modeling) DC, DR
Systems Administration (ex: *nix, DBA, cloud tech.) DC, DD
Temporal Statistics (ex: forecasting, time-series analysis) DC, DR
Unstructured Data (ex: noSQL, text mining) DC, DD
Visualisation (ex: statistical graphics, mapping, web-based data‐viz) DC, DR
The success of your Big Data organisation will depend on how your team functions within these 22 distinct areas. Collaboration between distinct work streams is the key to success and it is vital that you recruit and retain “T-shaped” individuals – i.e. with a solid general skillset plus one or two “stand-out” skills.
The next post will explore the different routes to take in becoming a Data Scientist.
We can help compile an audit of your Big Data organisation. Are you sure that you don’t have any gaps? If you do, Big Cloud can help you find the right people to fill them!
Matt Reaney is the Founder and Director at Big Cloud. Big Cloud is a talent search firm focussing on all things Big Data and helps innovative organisations across Europe, APAC and the US find the talent they need to grow.
(Image Credit: Jorge Franganillo)
Quick analysis of the 22 skills – looks like Classical Statistics and Big & Distributed Data are most important since they impact 3 of the 4 types of data scientists.
Also interesting, there are only 4 skills for the DB to know, the 2 above, as well as Business and Product Development. On the other side of the spectrum, the DC type needs to know 14 of the 22 skills.
In my experience, DB types have higher salaries than DC types, so it would seem that learning those 4 key skills would be a better investment… obviously that is easier said than done, but then again useful analysis of unstructured data is no easy task either.