Ian started out as a theoretical physicist, moved into data science a few years ago and is now part of the data science team at Pivotal Labs, the agile software consulting arm of Pivotal. Ian has worked on a variety of customer engagements at Pivotal including catastrophe risk modelling, fashion & consumer analytics, factory production quality and online marketing. Ian has been building analytical and numerical models for about 10 years and started out building high performance computing models of the earliest moments of the universe after the Big Bang.
Follow Peadar’s series of interviews with data scientists here.
1. What project have you worked on do you wish you could go back to, and do better?
First of all, thanks for the opportunity to be part of this interview series! I think if you are continually learning you always look back on past work with a view to what could have been done better. Having said that I don’t think there is one particular commercial project that I would pick out to redo, but maybe I don’t have enough perspective yet. I imagine most people who have done a PhD would probably like to redo some of the technical parts but were just relieved to get it finished at the time.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
When you are doing a PhD you have a very narrow focus and it can be hard to see where your skills and experience might be valuable outside academia. I would recommend trying to get a bit of an outside perspective, go to industry meetups and any ‘post-academia’ workshops that are available in your university.
It’s helpful to try to understand what someone hiring in industry is looking out for. For me, someone leaving academia doesn’t need to have full technical ability in the new area (e.g. machine learning) but should have made an effort to start down that learning path, and they should make it easy for me to see that. I’ve seen people leaving academia just submit the same academic CV to an industry data science role as they would use for a postdoc physics research position. I would suggest asking someone in the field you want to enter to critique your CV to avoid this kind of mistake.
3. What do you wish you knew earlier about being a data scientist?
I don’t think you can overstate how much of data science is really about working with people of all different technical levels and backgrounds. Coming from a theoretical physics background, which can be quite a solitary environment, I knew that data science and especially consulting would be very different. Every day I am reminded that my role is often more about managing relationships and understanding people’s needs than just writing code.
4. How do you respond when you hear the phrase ‘big data’?
I still cringe a little, but I understand that it is a useful short-hand for a change in behaviour and scale that some parts of the tech industry are still not ready for. I like the more recent categorisation into small-, medium- and big-data as I think many companies really have medium data problems, where processing on a laptop in-memory is not feasible, but they don’t yet need a 10,000 core cluster. There is clearly a lot you can do before you start operating at the very largest scales of places like Facebook or Google. When you do start reaching those scales however, the problems are very different and the ‘big data’ technologies like Hadoop and massively parallel processing databases really come in to their own.
5. What is the most exciting thing about your field?
For me the most exciting thing is that we haven’t figured out all the ways predictive analytics and data science can help solve business problems. There are some well worn paths now, but each day new applications of machine learning and predictive algorithms are discovered, and new areas of industry become interested.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
At Pivotal Labs I have learned a lot from our software development team about how to iterate quickly to minimise the risk in a project. For me, open and clear communication is the key to managing expectations and making sure that the project is providing value. If you can show some value very quickly and then build on that iteratively, you can have a continual dialogue about progress and expectations will not easily get out of sync.
A lot of people in this field have a perfectionist streak, so knowing when to stop and what ‘good enough’ looks like is an important skill. Does the time and effort needed to eke out that next 1% in accuracy really provide enough value or is the current performance just as good given the way the model will be applied?
7. You spent sometime as a Consultant in Data Analytics. How did you manage cultural challenges, dealing with stakeholders and executives? What advice do you have for new starters about this?
Cultural challenges can be difficult, and even the differences between European and American attitudes to data protection can lead to internal problems in an organisation. As a data scientist, you often get into the ‘ugly baby’ scenario, where you have to explain to a leadership team or organisation that their carefully collected data is not quite as nice as they thought, or that their idea to run their niche business based on real time Twitter feedback is not going to be possible with the signal-to-noise ratio that is present. I think empathy is a very important trait and the way we hire people tries to select for this. If you can see the situation from the other person’s viewpoint it helps enormously when trying to resolve difficult situations.
8. How do you explain to C-level execs the importance of Data Science? How do you deal with the ‘educated selling’ parts of the job?
Some C-level execs really understand the value that data science can bring. The US has had a bit of a head start in this, and with successful projects under their belts they are ready to use data science more widely in their organisations. In Europe we are still in that learning phase I think, so making a success of that first project is important. Showing value early and often during a project can really help to drive understanding and appreciation of the possibilities that data science can provide.
A lot of people have now heard of data science and machine learning, and there are success stories in the mainstream and industry press. A few years ago this wasn’t the case and you had to spend a long time explaining at a relatively basic level how data science could be useful. You still have to do some of that, but it’s a bit easier and you can point to main-stream examples which helps a lot. As a lay-person it’s still very difficult to understand why one type of analysis is easy and another is very difficult. Randall Monroe captured this well in XKCD 1425 (http://xkcd.com/1425/) but with the progress in computer vision recently, even this example is nearly out of date!
I really enjoy the interview series so thank you for the opportunity to take part!
(image credit: Pydata London)