Choosing the right language for data analysis can be almost as complicated as actually learning the language. For many reasons, R and Python are two of the most popular: R is often praised for its great features for data visualization, as it was developed with statisticians in mind; plenty of programmers love multi-purpose Python for its so-simple-a-child-could-do-it syntax.
Why not just learn both?
The fact is, your time is limited. As data scientist and Dataconomy contributor Joshua Ebner says: ‘Learning a new programming language is a large investment in your time, so you need to be strategic about which one you select. The reason to focus on one programming language is because you need to focus much more on process and technique, not syntax. You need to learn how to think about data and how to solve problems using the tools of data science’.
How do these two languages relate to one another? What are the strengths of R over Python, and vice versa? Just like there’s no single best tool in a toolbox, there’s no single programming language that’s perfect for every data problem you want to solve. However, you need to be able to devote a significant amount of your time to truly master one tool. Spending 100 hours on Python or on R will yield considerably better results than splitting your time on ten different tools. In the end, your time ROI will be higher by concentrating your efforts.
The Data Science Wars
Data science online learning platform DataCamp‘s infographic provides a basic comparison between these two programming languages from a data science and statistics perspective, perfect for aspiring data scientists looking for the right language to start with.
And The Winner Is…
Even though the infographic suggests R and Python are equally good for budding data scientists making their first steps on the field, we believe R is the winner, at least for data science beginners, who are moving on from spreadsheets into programming languages. It is not only the most widely used language among data scientists, but it is also popular in academia, and in business. R also offers a simple approach to learning the key skills of data science: data manipulation, data visualization, and machine learning. After mastering the fundamentals data science in R, you’ll probably (want to) learn other languages to solve specific problems.
(Image credit: Michael Doherty)
Brilliant ! very comprehensive comparison. The one thing you did not touch upon is integration of R & Python with Hadopp & Spark environment. As such Python programmers will find it much easier to swim in any big data environment as R is not a scripting language and python has APIs to communicate with clusters which R doesn’t.
Disclaimer: I am an R uSer learning Python exactly for the above mentioned reasons.
Great and information graphic. Just wanted to let you know that we included this in our best of the month roundup: http://www.colocationguard.com/may-2015%E2%80%B2s-best-big-data-sysadmin-start-tech-content/
I liked a lot the whole comparison but i am not happy to read in the end
“we believe R is the clear winner”
Clear winner? Really? I may be biased but i can’t disagree more.
I use python everyday and teach it with passion to data scientists. I show them mostly notebooks, cpython and pandas, and at the end of the course you notice stars in their eyes. In the last three years Python covered the most important differences in the packages, plus there are so many Python libraries that R never had and never will get. Jupyter for example started from Python, and thank to these guys you can now write R inside there too. But that is how i see it: python is the solid base, use some R here and there if really really necessary.
Maybe it is worth underlying more that Python is a cleaner language which will help you write better code, actually much much better if compared with R. The zen philosophy behind Python programming is such a great piece, so important for first time programmers; writing easier code to maintain (less bugs and expressive conventions) is probably something to take more in consideration before deciding which language to go for the first time.
I end up using those best practices in other languages too.
Great article; up to where the author decides to overwrite the infographic and draw a different conclusion. At some point, if you are a scientist (not a specialised data scientist), then learning a general purpose programming tool like Python is going to help you tremendously in automating all aspects of your work. If you require specialised statistical analysis, you will need a stats tool (R, SAS, Statistica etc.). As always, your choice is determined by your own needs.
This article is misleading. R is not a programming language from a Computer Science point of view. You may argue that it includes enough operations to code any programme, but there is a reason why very few of us programme in Assembler language. Even as an imperative paradigm R leaves a lot to be desired for, and easily leads beginners to bad vices. You should not tell a data scientist (s)he will be learning programming with a tool like R.
The limitations of R are most prominent when it comes to code re-use. Python being an object-oriented language and implementing multiple inheritance makes it intrinsically re-use friendly.
On performance, Python is one of the slowest programming languages around. If execution speed is a requisite, you certainly wont be using R, but neither will you use Python.
My recommendation would be as follows: if the data scientist wants to learn (or has in interest in) programming then go with Python; if the goal is simply to improve the tools at reach then go with R.
I personally use both R and Python, regarding them as entirely disparate things. Both are useful in different tasks.
Great comparison! Happy to see R is the winner for beginners though. Here is an R software tool that can help with R users https://www.displayr.com/features/