One of most popular posts this year came from Ferris Jumah, a data scientist at LinkedIn, who mapped the most popular skills of data scientists by scraping LinkedIn profile data. One of the common comments amongst data scientists who came across this post- as with most of our posts focused around data science skillsets- was “Surely, you can’t expect data scientists to have all these skills?”
Naturally, we don’t- every data science role involves a particular comibination of some of the skills, and anyone who had mastered all of the programming languages listed alone would be some sort of computing demi-God. Having said that, there’s always room for growth and expansion; thus, we’ve found 10 online resources to help you get acquainted with the 10 biggest skills in the Data Science Skills Network. Whether you’re a data science rookie or a seasoned professional, we hope this compiled list of some of the most excellent courses on the web proves to be useful.
1. Analytics- the SAS Enterprise Business Intelligence Course
Leveraged more at the tech-curious businessman than the seasoned data scientist, this course is still undoubtedly worth investigating for anyone looking into getting started with business reporting in SAS. They offer courses for a wide variety of BI roles, from initial platform exploration to designing, tuning and maintaining OLAP cubes.
2. Machine Learning- Coursera
Taught by Stanford Professor, Baidu’s Chief Scientist, and all-round data science rockstar Andrew Ng, this course is indisputably the course to take if you’re looking to get in to data science. The course covers data mining, pattern recognition, supervised & unsupervised learning, and draws on multiple real-world examples and applications. Plus, it’s absolutely free- the 2015 sessions have yet to be announced, but if you’re interested in delving deeper into machine learning, it’s definitely worth adding this course to your watchlist.
3. Statistics- Google Tech Talks’ Stats 202
Like the Coursera machine learning class, this series of 5 hour-long talks is based on a Stanford class. Not only do you have the hallmark of a class designed by a world-class institution, you also don’t have to pay a cent to watch it. For anyone with a basic grounding in stats, these talks are highly recommended. Key topics covered include: exploring and visualizing data, association analysis, classification, and clustering. Additional complementary resources can be found here.
4. R- Coursera’s Computing for Data Analysis
Part of Coursera’s 9-part Data Science Specialisation, this course is taught by Roger D. Peng of John Hopkins University. Originally designed for first-year graduate students in Biostatistics, this course will teach you to program in R and use R for “effective data analysis”. You’ll learn: programming in R, reading into R, accessing and creating R packages, and how to create data infographics.
5. Python- Codecademy
Simply put, this course has taught over 2.5 million Python. This 13-hour course covers all the basics, from Syntax to Strings & Console Output, from Loops to Lists. A perfect- and free- introduction to this powerful programming language.
6. SQL- GalaXQL
There are dozens of great options to learn SQL for free online. The “Learn SQL the Hard Way” and “SQL Problems & Solutions” are definitely worth looking in to. If you’re looking for something slightly more fun and interactive, try GalaXQL. GalaXQL is a visual platform, offering lessons on SQL in a database of fictional galaxies. The galaxy rendering reflects the changes you make in the database.
7. Algorithms- Udacity’s Intro to Algorithms
This intermediate course on algorithms focuses around the mission of “analysing your social network”. You’ll learn about recursion replacements and pairwise connectivity, to find the quickest route to Kevin Bacon; you’ll learn about heaps, and how they can help you keep track of your best friends. It’s not the cheapest course in the world, and they recommend you surrender 6 hours a week for three months- but if you have the time and money to spare, this course is as enlightening as it is lighthearted.
8. Hadoop- Udemy’s “Become a Certified Hadoop Developer”
This course will teach you about Hadoop right from the beginning, leaving you with proficient knowledge of programming in Hadoop and MapReduce. An advantage of the Udemy model is once you’ve paid your money, you have lifetime access to all of the course materials- meaning you can refer back to this framework and sharpen your skills at any time. The Hortonworks classes also come highly recommended.
9. Data Mining- Coursera’s “Pattern Discovery in Data Mining”
If you have cash to part with, Coursera’s entire Data Mining specialisation is well worth looking in to. However, many of the classes on the specialisation- including the excellent course on pattern discovery- can be taken for free, without certfication at the end. This course is taught by Jiawei Han of the University of Illinois, and serves as a great introduction into data mining algorithms, concepts and challenges.
10. MATLAB- MIT’s Introduction to MATLAB
Offered by MIT’S Open Courseware project, this class is described is a four-week class which serves as “an aggressively gentle introduction to MATLAB”. This class will introduce you to popular MATLAB toolboxes, and includes assigments based around MATLAB problems and challenges.