Trey Causey is a blogger with experience as a professional data scientist in sports analytics and e-commerce. He’s got some fantastic views about the state of the industry.
What project have you worked on do you wish you could go back to, and do better?
The easy and honest answer would be to say all of them. More concretely, I’d love to have had more time to work on my current project, the NYT 4th Down Bot before going live. The mission of the bot is to show fans that there is an analytical way to go about deciding what to do on 4th down (in American football), and that the conventional wisdom is often too conservative. Doing this means you have to really get the “obvious” calls correct as close to 100% of the time as possible, but we all know how easy it is to wander down the path to overfitting in these circumstances…
What advice do you have to younger analytics professionals and in particular PhD students in the Sciences and Social Sciences?
Students should take as many methods classes as possible. They’re far more generalizable than substantive classes in your discipline. Additionally, you’ll probably meet students from other disciplines and that’s how constructive intellectual cross-fertilization happens. Additionally, learn a little bit about software engineering (as distinct from learning to code). You’ll never have as much time as you do right now for things like learning new skills, languages, and methods. For young professionals, seek out someone more senior than yourself, either at your job or elsewhere, and try to learn from their experience. A word of warning, though, it’s hard work and a big obligation to mentor someone, so don’t feel too bad if you have hard time finding someone willing to do this at first. Make it worth their while and don’t treat it as your “right” that they spend their valuable time on you. I wish this didn’t even have to be said.
What do you wish you knew earlier about being a data scientist?
It’s cliche to say it now, but how much of my time would be spent getting data, cleaning data, fixing bugs, trying to get pieces of code to run across multiple environments, etc. The “nuts and bolts” aspect takes up so much of your time but it’s what you’re probably least prepared for coming out of school.
How do you respond when you hear the phrase ‘big data’?
Indifference.
What is the most exciting thing about your field?
Probably that it’s just beginning to even be ‘a field.’ I suspect in five years or so, the generalist ‘data scientist’ may not exist as we see more differentiation into ‘data engineer’ or ‘experimentalist’ and so on. I’m excited about the prospect of data scientists moving out of tech and into more traditional companies. We’ve only really scratched the surface of what’s possible or, amazingly, not located in San Francisco.
How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
A difficult question along the lines of “how long is a piece of string?” I think the key is to communicate early and often, define success metrics as much as possible at the *beginning* of a project, not at the end of a project. I’ve found that “spending too long” / navel-gazing is a trope that many like to level at data scientists, especially former academics, but as often as not, it’s a result of goalpost-moving and requirement-changing from management. It’s important to manage up, aggressively setting expectations, especially if you’re the only data scientist at your company.
How do you explain to C-level execs the importance of Data Science? How do you deal with the ‘educated selling’ parts of the job? In particular – how does this differ from sports and industry?
Honestly, I don’t believe I’ve met any executives who were dubious about the value of data or data science. The challenge is often either a) to temper unrealistic expectations about what is possible in a given time frame (we data scientists mostly have ourselves to blame for this) or b) to convince them to stay the course when the data reveal something unpleasant or unwelcome.
What is the most exciting thing you’ve been working on lately and tell us a bit about it.
I’m about to start a new position as the first data scientist at ChefSteps, which I’m very excited about, but I can’t tell you about what I’ve been working on there as I haven’t started yet. Otherwise, the 4th Down Bot has been a really fun project to work on. The NYT Graphics team is the best in the business and is full of extremely smart and innovative people. It’s been amazing to see the thought and time that they put into projects.
What is the biggest challenge of leading a data science team?
I’ve written a lot about unrealistic expectations that all data scientists be “unicorns” and be experts in every possible field, so for me the hardest part of building a team is finding the right people with complementary skills that can work together amicably and constructively. That’s not special to data science, though.