"Those Who Rely On Gut Instinct Or Trivial Analyses Will Be Out-competed" Interview With Data Scientist Jon Sedar

Jon is a consulting data scientist, trained in physics and machine learning, with 10 years professional background in data analysis and management consulting. He co-manages a niche data science consultancy called Applied AI, operating primarily in the insurance sector throughout UK, Ireland and Europe. He’s also an organiser and volunteer within data-for-good social movements, and occasional speaker at tech and industry events.

Follow Peadar’s series of interviews with data scientists here.

1. What project have you worked on do you wish you could go back to, and do better?

I won’t name names, but throughout my career I’ve encountered projects – and indeed full-time jobs – where major issues have popped up not due to technologies or analysis, but due to ineffective communication, either institutional or interpersonal. Just to pick an example, one particular job was an analyst’s nightmare due to overbearing senior management and too-rapid engineering – the task was to produce KPIs of the company’s health, but the entire software and hardware stack changed so frequently that getting even the most basic information out was extremely hard work. That could have been fixed by stronger communication and pushback on my part – but my opinions weren’t accepted and it wasn’t to be. Another large project (of which I was only a very minor part) was scuppered to due mishandled client expectations and caused no end of overwork for the consulting team. Every project needs better communication, always.

2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?

I’ll deal with these separately, since there are (or should be) different reasons why people are in each group.

To PhD candidates here I simply hope that they truly love their subject and are careful to gain commercially-useful skills along the way. I’ve friends who have completed PhDs, some who’ve quit midway, and some like me who considered it but instead returned to industry after an MSc. You might not plan to go into industry, but gaining the following skills is vital for academia too:

Reproducible research (version control, data management, robust / testable / actually maintainable code).
Lightweight programming (learn Python, it’s easy, able to do most things, always available, the packages are very well maintained and the community is very strong).
Statistics (Bayesian, frequentist, whatever – make sure you have a really solid grasp of the fundamentals).
Finally ensure you have proven capability in high-quality communication – and a dead-tree LaTeX publication doesn’t count. Get yourself blogging, answering questions on Stack Overflow, presenting at meetups and conferences, working with others, consulting in industry etc. As you improve upon this you’ll really distinguish yourself from the herd.

Also some flamebait: Whilst I love the idea of improving humanity’s body of knowledge in the hard sciences, I’m not convinced that a PhD in the soft sciences is worthwhile nowadays, at least not straight out of school. If you want to research the humanities just take your degree and go work for a giant search engine / social network / online retailer; you’ll get real-world issues and massive study sizes from day one.

To the younger analytics professionals, regardless the company or industry in which you find yourself, build up your skills as per the PhD advice above, polish your external profile (blogs, talks, research papers etc) and don’t ever be afraid to jump ship and try a few things out. Try to have 3 month’s pay in your savings account, maintain your friendships local and international, and set up a basic vehicle for you to do independent contracting / consulting work.

Over the years I’ve tried a lot of different jobs in a few different locations. I felt happiest once I’d set up my own company and knew that I would always have a method to market my skills independent of anyone else. Data science skills are likely to be important for a good few years yet, so if you’re well-connected, well-respected and mobile, you can try a lot of things, find what you love, and will never be out of work for long.

3. What do you wish you knew earlier about being a data scientist?

Lots to unpack in that question! If I can call myself a scientist at all, then it’s an empiricist rather than theoretician. As such I consider data the be the record of things that happen(ed) and science as the formalisation & generalisation of our understanding of those things. ‘Data scientist’ is thus a useful shorthand term for someone who specialises in learning from data, communicating insights and taking/recommending reasoned actions accordingly.

With that in mind, I’d advise my younger self to never forget that it’s that final step that matters most – allowing decision makers to take reasoned actions according to your well-communicated insights. That decision maker may be your client, your boss or even simply yourself, but without an effective application ‘data science’ is actually research & development – and chances are you’re not being paid to do R&D.

4. How do you respond when you hear the phrase ‘big data’?

I think we’re far enough along the hype cycle now that nearly all data science practitioners recognise both the possibilities and the constraints of performing large-scale analyses. Proper problem-definition and product-market fit are the most important to get right, and hopefully even your typical non-technical business leader is no longer bedazzled by the term and instead wants to see actionable insights that don’t require a major engineering project.

That said, I’m still happy to see experts in the field continue to preach that whilst gathering reams of ‘big’ data (which I take here to be primarily commercially-related data including interface interactions, system log files, audio, images, video feeds, positional info, live market movements etc.) can lead to something immensely powerful, it can easily become a giant waste of everyone’s time and resources.

Truly understanding the behaviour of a system/process, and properly cleaning, reducing and sub-sampling datasets are practices long-understood by the statistics community. A reasoned hypothesis tested with ‘small-medium’ data on a modest desktop machine beats blind number crunching any day.

5. What is the most exciting thing about your field?

Well, the tools for applying the analysis techniques, and the techniques themselves are certainly moving at a hell of a pace, but science & technology always does. I really enjoy having the opportunity to research and apply novel techniques to client problems.

More widely I’m excited to see the principles of gathering, maintaining and learning from data permeate all aspects of businesses and organizations. There’s well-developed data science platforms popping up every day, new software packages to use, heavily over-subscribed meetup groups and conferences everywhere, and it’s great to see the formalisation and commoditization of certain technical aspects. Just as it’s unlikely that anyone would try today to run an enterprise without a website, a telephone or even an accountant, I expect that a data science capability will be at the core of most businesses in future.

6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?

I assume you mean an analytical problem rather than a data management problem or something else.

I think it’s quite simple really, and just common sense to ensure that you define well the analytical problem, and the inputs and outputs of your work. What question are we trying to answer? How should the answer be presented and how will it be used? What analysis and what data will let us provide insights based on that question? What data do we have and what analysis is possible / acceptable within our organisational and technical constraints? Then prototype, develop, communicate and iterate until baked.

7. Do you feel ‘Data Science’ is a thing – or do you feel it is just some Engineering functions rebranded? Do you think we could do more of the hypothesis driven scientific enquiry?

As above, I think that in future the practice of gathering, maintaining and learning from data will be core to nearly all commercial and social enterprises. Bringing academic research to bear on real-world problems is just too useful, and those who rely on gut instinct or trivial analyses will be out-competed.

That said, I think we’re already seeing a definite split between data science (statistics, experimentation, prediction), data processing (large-scale systems development), and data engineering (acquiring, maintaining and making available high-quality data sources), and no doubt in future there will be more spin-out skills that take on a life of their own. The veritable zoo of job titles spawned from web development is a good example: UI designers, UX designers, javascript engineers, mobile app engineers, hosting and replication engineers etc etc.

Finally I’d just like to thank you for putting this series of interviews / blogposts together, it’s a really interesting resource, particularly as the data science industry is maturing.

Peadar Coyle is a Data Analytics Professional based in Luxembourg. He has helped companies solve problems using data relating to Business Process Optimization, Supply Chain Management, Air Traffic Data Analysis, Data Product Architecture and in Commercial Sales teams. He is always excited to evangelize about ‘Big Data’ and the ‘Data Mentality’, which comes from his experience as a Mathematics teacher and his Masters studies in Mathematics and Statistics. His recent speaking engagements include PyCon Sei in Florence and he will soon be speaking at PyData in Berlin and London. His expertise includes Bayesian Statistics, Optimization, Statistical Modelling and Data Products