What was your first job in the industry and what was the learning curve?
One of the most important lessons I learned when transitioning out of academia related to the deliverables of your work. As an academic, your primary output is scholarship and your audience is other people. In data science, you’re instead producing data products. In order to build those products, you need to have a strong appreciation of engineering. That makes the difference between someone who can come up with an idea and someone who can execute it.
I was also very glad to find plenty of intelligent, driven people all over the industry. In the academy, there is a tendency to presume that scholars are the true protectors of knowledge and wisdom because you can trace the academic tradition back so far in history, but in reality, there’s an enormous amount of expertise in industry as well. The learning curve for me was to assimilate industry knowledge and ways of thinking.
What did you fail to anticipate or appreciate when making the transition into industry?
The academic mindset is one of thoroughness, in which you try to account for as many potential permutations as possible. From a practical perspective, if you’re working to produce something real and tangible, then prioritization is a much more pressing concern than completeness. It’s important not to invest too much time in projects that aren’t as effective as others – the 80/20 rule is a good guide.
What was your most memorable mistake as someone tasked with execution? What did you learn through the process?
Coming from a scholarly background, I failed to completely understand the attitudes of engineers, people who are trained in Computer Science. Because of my own ignorance, I initially failed to conform to the standard cultural practices of collaborative coding, including code review. Because I failed to invest the time to learn the culture of engineers before trying to jump in and make really big changes, I ended up making things rockier than they needed to be.
For example, there was a very large data science-driven component of Nomi’s core product, the imputation algorithm. The design grew out of my specialized training as an empirical scientist and statistician. However, even though I wrote code that was well-organized and executed, because it drew on some pretty complex mathematics, it was essentially incomprehensible to anyone without a deep background in statistics. I ran into a lot of trouble and was accused of purposefully obfuscating the code. In reality, I was just proposing what I thought to the best solution to the problem.
I would caution any academic transitioning to industry to be sensitive to the cultural norms of the business unit you’re joining. In this case, I was operating as a software engineer under the technology team where there were different expectations and value systems. Had I joined a more segregated, standalone data business unit, it would have been a completely different story. In summary, have empathy to cross- departmental values, even if they conflict with your intuitions. I needed to have more sensitivity to that cultural context because it differed so starkly from my prior experience in academia.
What has been the most important lesson learned transitioning to more of a management role?
Data has huge promise to help others make better decisions. People want to feel empowered by this tool, [not threatened]. If you can find ways to empower them instead of creating tools that appear to be a black box, your work will be better received. Implement a version of active empathy and apply that to how you design your tools, as well as your team.
There is a bit of a transition between a role [as an individual contributor] producing products and a role as a manager producing relationships. Once I appreciated the difference between producing code and architecture and producing relationships, everything started to fall into place.
Who has been your most important mentor?
It’s my academic advisor David Huron, the head of the Cognitive and Systematic Musicology Laboratory at Ohio State University. He taught me the power of developing a narrative iteratively, in which you produce small pieces of persuasion that communicate single ideas strongly. You release them sequentially and then tie them up into a larger argument at the end.
He also introduced me to the command line and basic computational skills that really opened a lot of doors. He offered an example of how to be a consummate self-skeptic as well. If you can constantly try to disprove yourself every step of the way, you’re more likely to weed out mistaken beliefs.
What’s the most misunderstood thing about Big Data?
Everyone knows that Big Data has been an influential idea, but there’s little consensus on what the shift has been. Some commentators describe the transition to Big Data as just ‘doing the same thing’ but with more data. That approach defines Big Data as the same old process of summarization or data reduction, in which the goal is to extract one or two insights out of a larger data set. This represents scientific management from the 19th century and classical statistics in the 20th. It’s Fordism, optimizing things within the domain of your business by using data summarization techniques. I’ve heard many people comment that Big Data is doing the same process applied to a larger data set.
However, I would argue that Big Data represents a different paradigm: rather than finding a single best solution to a problem within an organization, we’re finding as many solutions as there are customers. Big Data’s output space is orders of magnitude larger. That enormous increase in model outputs poses a very different engineering and inference problem than those previously presented by smaller data.
What other companies are innovating in Big Data?
The companies that impress me the most really represent an ethos of curiosity, innovation, and more importantly, of sharing. There’s no question that Amazon AWS is doing Big Data correctly. Same thing can be said of Google, whose internal services and technological breakthroughs have generated immense profits, but they’re also invested in opening up their own cloud platform so that others can take advantage of them as well.
I think Datadog is an absolutely phenomenal company that has been investing enormous amounts of effort to make the visibility of data points collected from a myriad of servers instantly available. Spotify, LinkedIn, and Etsy have also done a really wonderful job of giving back to the open source community and increasing accessibility to Big Data tooling.
What advice would you impart to a younger version of yourself?
Learn to think like an entrepreneur and speak the language of economics. I only took one business class in college and I should have taken more. I underestimated, and to some extent still do, the power of the language of economics as a communication tool. My language is that of scientific and humanities scholarship, and that doesn’t transcend the same boundaries. Since it’s not nearly as universally accepted, sometimes my analogies fall flat. The language of economics is so widely understood that not having excellent command over it will be limiting.
[bctt tweet=”Learn to think like an entrepreneur and speak the language of economics.”]
Question for readers?
Is it time to professionalize Data Science? Lawyers, engineers and physicians operate under a well-defined standard of ethics and right now with the power of big data, it’s probably about time to start thinking about those same issues.
This interview is part of Schuyler’s Data Disruptors series over on the StrongDM blog.
(image credit: John Liu, CC2.0)