Here is what a recent whitepaper by Dataiku reveals about Artificial Intelligence and Machine Learning emphasising the role of data scientists. Let’s find out.
The year 2018 was supposed to be the one where companies made revolutionary strides in the area of artificial intelligence (AI). But did this happen till now?
Turns out Artificial Intelligence (AI) is much easier talked about than executed, not to much surprise. There are still open questions that need to be answered or resolved before most companies can get to a stage where they’ve truly incorporated AI into their business in a real, monumental way.
In Spite of the hype, not many have achieved this balance- the mutual and optimal interaction between humans and machines. A recent whitepaper by Dataiku, a collaborative data science platform which enables the whole data team to explore, prototype, build and deliver their own data products more efficiently helps us find out the questions which might be there with data teams (and data team members) as well as data leaders (including Chief Data Officers, CDOs, the new c-suite kids on the block). Businesses that want to get ahead will – or at least should – work on tackling these for the rest of 2018. Giving an insight into these trends, the study also gives a peek into the hot topics going into 2019. Here is a look at the trends first :
What Exactly Is a Data Project or Data Science Project?
Are these two terms interchangeable? Data projects are simply those whose goals are to build more advanced insights. For example, a marketing attribution model whose goal is to provide insights into marketing strategy. Or predictive maintenance projects in the transportation industry that are trying to predict the demand for replacement parts in various locations. These projects can typically be tackled by leveraging a relatively simple statistical approach combined with basic business know-how. Data projects become data science projects as additional data from potentially non-traditional sources (like usage data, click data, sensor data, social data, etc.) gets added to the system and combined in order to leverage a more advanced machine-learning approach.
In data science projects, there is a natural collaboration between data scientists and data analysts where the former stay focused on potential new data sources and new predictive models. Data scientists’ models are then “packaged” by data analysts in order to be included in the analysis.
What Kind of Data Scientist Should I Be Hiring?
Data scientists come in many flavors with different strengths that may suit different types of enterprises depending on the types of problems or projects they are working on. Not to say that one type is better or worse than another type of data scientist – it all depends on what a business is looking for. But spoiler alert: the fancy ones with the PhDs tend to be unavailable (not to mention expensive), because 80 percent of them are taken by Google. On the other hand, maybe that’s not really what your business needs.
The white paper breakdowns data scientist into seven categories: The Legends, The Generalists, The Statisticians, The Dabblers (or The Software Engineers), The ML Engineers, The Vertical Experts, The Star DS Managers. It is beneficial to understand and admit there are all kinds of data scientists, even if at the end of the day they all have the same title. Perhaps in the future, there will actually be different names for data scientists with different specialities- here’s to hoping this time next year it will be an emerging trend.
Why Are So Many Data Scientists Leaving Their Jobs?
Glassdoor names data scientist as the best job in the United States for 2018 and LinkedIn puts it among the top 10. Yet at the same time, FT has released articles explaining that data scientists also top the list of developers looking for new jobs.
So what’s the deal? Data scientists are in demand, which necessarily means they are difficult to keep around – after all, they can easily find a position elsewhere. But it’s also a question of happiness: if data scientists were satisfied with the work they do, even if they could find a job elsewhere, they might be less inclined. And since the position is relatively new, many companies don’t really know what to do to retain people in these important, cutting-edge roles.
Clearly the industry is populated by winners and losers, companies that know how to best use their employees (including their data scientists) and those that don’t. But it’s not just up to the company – there’s work to be done on the side of data scientists as well, making sure that they market themselves to make their job and the work they do indispensable and visible to others to continue to grow and be able to take on more exciting projects for ever-increasing job satisfaction.
There are numerous reasons why a data manager would be tempted to leave his/her job. As a manager or business owner, put a stop to the brain drain by ensuring the data team is not siloed and that they are able to communicate and easily with others who have perhaps more business context or interesting projects to work on. Try marketing the data team by showing other teams throughout the company what the data team is there for and what they can do to help improve processes or products will surely drive more projects their way. Focus on hiring data scientists based on other skills besides simply technical ability (per above, technical abilities are important, but not everything). Namely, communication skills – a data scientist has no hope of enlightening the average business user with an Excel file. To let the data tell a story, a data scientist needs to have a veritable Swiss army knife of presentation skills to convey their results persuasively, to anyone. TIP: When hiring, ask that a data scientist create a presentation to show his or her results.
What Are the Compelling Reasons for Collaboration in Data Science?
Ever since Forbes wrote about collaboration in the world of data science in 2017, it’s been all the rage. Proponents of a collaborative model for data teams present various arguments to justify its efficiency and effectiveness, but there are also a lot of misconceptions about what collaboration actually means in this context. For example, here are a few common notions about collaboration in data science :
- Collaboration allows work to be split among several data scientists.
- Collaboration allows work to be split between more junior and more senior resources on a data team.
- Collaboration allows work to be split between different roles on a data team (such as analysts and data scientists).
Does this mean data analysts working directly on data science projects? Well – yes. And in fact, this is by far the most compelling approach to collaboration in data science to date. Because the fact is that there are parts of data science projects better suited to the skills of analysts: think data preparation or cleansing. Once they have handled their part of the project, someone who is more apt to handle the predictive or machine learning component can step in.
This means getting more mileage, so to speak, out of both analysts and data scientists, because they’re only working on the parts of the project that they do best. And because they’re not responsible for the entire project, they can work in their area of expertise on multiple projects at once..
Of course, in some organizations, the number of data science projects to deliver becomes overwhelming. In such a situations, the number of data scientists in the organization can become a limiting factor (i.e., it becomes impossible to allocate even one data scientist per project).
It is at this point that to continue reaping the benefits, the “collaboration interface” between data scientists and data analysis needs to become more formal.
Will My Company Ever Get To a Place Where We Can Deploy (+ Manage) Many Predictive Models?
Well, nothing is impossible; but it’s not going to happen by magic. If this was one of your 2018 goals and you’re nowhere close, step back and re-evaluate:
- Look at organizations that have done it already, especially those in the same industry or with the same types of data science projects. Here’s a start (videos from industry leaders talking about how they got started and executed on data science at scale).
- Ask yourself: what does it mean for your business to deploy predictive models (for example, does it mean rolling out an internal dashboard to another team within the company, or is it exposing a recommendation engine to customers)? Depending on the answer here, determine what tools will allow this to happen more easily and smoothly. For example, if the goal is to deploy a recommendation engine on the website, is the data team empowered to do this? Do they have a clear path forward to get there?
- Determine the situations in which you need to manage multiple models as well as the strategies that would allow for the avoidance of multiple model management in favour of a simpler way.
Don’ t miss our second piece on the trends to look out for in 2019. You could find the entire white paper here. (This is the first part of an article series based on a whitepaper by Dataiku)