Data Science 101Understanding Big Data

SQL vs. NoSQL- What You Need to Know

fig2large

‘SQL is outdated’. ‘RDBMS can no longer meet businesses’ data management needs’. ‘New database technologies like NoSQL are the solution for today’s enterprises’. We hear statements like these alot, both inside and outside the database technologies industry. But are they accurate? Is SQL a thing of the past, and are NoSQL solutions the way forward?

In this article, we’ll outline the differences between SQL and NoSQL, the vast array of differences within NoSQL technologies themselves, and discuss if Relational Database Management Systems really are a thing of the past.

SQL vs. NoSQL; An Overview

SQL NoSQL
Data storage Stored in a relational model, with rows and columns. Rows contain all of the information about one specific entry/entity, and columns are all the separate data points; for example, you might have a row about a specific car, in which the columns are ‘Make’, ‘Model’, ‘Colour’ and so on. The term “NoSQL” encompasses a host of databases, each with different data storage models. The main ones are: document, graph, key-value and columnar. More on the distinctions between them below.
Schemas and Flexibility Each record conforms to fixed schema, meaning the columns must be decided and locked before data entry and each row must contain data for each column. This can be amended, but it involves altering the whole database and going offline. Schemas are dynamic. Information can be added on the fly, and each ‘row’ (or equivalent) doesn’t have to contain data for each ‘column’.
Scalability Scaling is vertical. In essence, more data means a bigger server, which can get very expensive. It is possible to scale an RDBMS across multiple servers, but this is a difficult and time-consuming process. Scaling is horizontal, meaning across servers. These multiple servers can be cheap commodity hardware or cloud instances, making it alot more cost-effective than vertical scaling. Many NoSQL technologies also distribute data across servers automatically.
ACID Compliancy (Atomicity, Consistency, Isolation, Durability) The vast majority of relational databases are ACID compliant. Varies between technologies, but many NoSQL solutions sacrifice ACID compliancy for performance and scalability

The Many Faces of NoSQL

Having heard the term “NoSQL”, you could be forgiven for thinking all technologies under this umbrella have the same data model. In fact, NoSQL refers to a whole host of technologies, which store and process data in different ways. Some of the main ways include:

Document Databases

This image from Document Database solution CouchDB sums up the distinction between RDBMS and Document Databases pretty well:SQL vs. NoSQLInstead of storing data in rows and columns in a table, data is stored in documents, and these documents are grouped together in collections. Each document can have a completely different structure. Document databases include the aforementioned CouchDB and MongoDB.

Key-Value Stores

Data is stored in an associative array of key-value pairs. The key is an attribute name, which is linked to a value. Well-known key value stores include Redis, Voldemort (developed by LinkedIn) and Dynamo (developed by Amazon).

Graph Databases

Used for data whose relations are represented well in a graph. Data is stored in graph structures with nodes (entities), properties (information about the entities) and lines (connections between the entities). Examples of this type of database include Neo4J and InfiniteGraph.

Columnar (or Wide-Column) Databases

Instead of ‘tables’, in columnar databases you have column families, which are containers for rows. Unlike RDBMS, you don’t need to know all of the columns up front, each row doesn’t have to have the same number of columns. Columnar databases are best suited to analysing huge datasets- big names include Cassandra and HBase.

SQL vs. NoSQL- Which to Use?

The idea that SQL and NoSQL are in direct opposition and competition with each other is flawed one, not in the least because many companies opt to use them concurrently. As with all of the technologies I’ve previously discussed, there really isn’t a ‘one-system-fits-all’ approach; choosing the right technology hinges on the use case. If your data needs are changing rapidly, you need high throughput to handle viral growth, or your data is growing fast and you need to be able to scale out quickly and efficiently, maybe NoSQL is for you. But if the data you have isn’t changing in structure and you’re experiencing moderate, manageable growth, your needs may be best met by SQL technologies. Certainly, SQL is not dead yet.

(Featured image source: InfoQ)


Eileen McNulty-Holmes – Editor

1069171_10151498260206906_1602723926_n

Eileen has five years’ experience in journalism and editing for a range of online publications. She has a degree in English Literature from the University of Exeter, and is particularly interested in big data’s application in humanities. She is a native of Shropshire, United Kingdom.

Email: eileen@dataconomy.com


Previous post

IBM's Chef Watson: Using Data to Delight Your Tastebuds

Next post

Big Data Proving to Be A Real Challenge for Data Scientists

  • Pingback: Understanding Big Data: Open Source - - Dataconomy()

  • webduvet

    what solution would suit for data collected on social P2P network for analytical purpose?

    • Eileen

      Obviously it’s difficult to say without details about your specific use case, but an SQL on Hadoop solution like Hive might be an avenue worth investigating

      • webduvet

        Hi Thanks for reply.In more detail – our database is nosql document based, We are collecting a lot of information on user activity like time, place, duration of connection, interaction with other users etc. each activity is recorded in json format on logging server. Currently we are pretty much at the beginning of our journey but looking in the future we would need some solid strategy.

  • Pingback: Choosing a NoSQL - Dataconomy()

  • Floyd Higson

    alot ain’t a word… just sayin… :) it is certainly used a lot though

  • Pingback: SQL Versus NoSQL: What are the Differences and How Do You Choose?()

  • Gabe

    Thanks for your post! It can also be added that NoSQL databases tend to represent entities in a more natural way. Normalization rules for relational databases often break entities in an
    artificial way in order to avoid redundancy (which is something NoSQL
    databases don’t care too much about). Thus, relational databases end up,
    more often than not, with objects that are not good representations of
    the real world. NoSQL databases do a better job at representing real
    world objects. It’s all fresh in my mind, I had my exam on Advanced DataBases just yesterday and thus was one of the questions!! More at https://intellipaat.com/nosql-cassandra-hbase-training/

  • Johnny Chan

    Thank you for the post Eileen! One friendly (and objective) question regarding the conclusion on “Sql vs NoSql – which one to use”. You said:

    “if the data you have isn’t changing in structure and you’re experiencing moderate, manageable growth, your needs may be best met by SQL technologies.”

    That makes me wonder, in this scenario, why NoSql (like MongoDb) would not be equality as good (or better) than Sql (like Oracle)? It just sounds to me that NoSql seems more “beeter and not worse” (i.e. good for both fast-changing and not-as-fast-changing data) than Sql (which is only good for not-as-fast-changing data)?

    Are there any examples, that actually shows Sql can do something much better than NoSql?

    (Disclaimer – I have some experience in RDMS/SQL, and Object Orientated Programming. Yet to explore NoSql like MongoDb).

    • artfuldgr

      not if your assignment is to sell noSQL and you dont program or write code…
      there is no way that non relational stuff can beat relational stuff in speed, ever…
      An index with 1 billion keys only takes 30 compares in a relational database
      2 billion keys may take 32 compares…

      you try to go through a billion xml ‘keys’ without that kind of indexing and see what happens.
      meanwhile, if you can index it, you can put it in a db… which is the point of the db

      also. you have about 70 years of computer science and top minds behind databases…

      i was able to go through 1000 records in under 10 seconds using XMLTYPE data calls on oracle.
      you let me know how to do that using solr, or other stuff on the fly, which a DB handles extremely well

      there are so many points on the article above that are just plain wrong.
      see the table with scaling..

      it claims that only new nosql has load balancing…
      load balancing of DB has been going on for several decades…

      and the claim of it being cheaper to use more machines is a joke.
      each machine needs an operating system, and the software, each requiring licenses.
      so 10 cheap boxes with licenses cost more than 1 or 2 boxes with one set of licenses
      then there is the added stuff… how much electricity does 10 machines use?
      how much cooling do 10 machines use?
      At 200 dollars a square foot, for space, how much space does 10 machines vs one use?

      this is as faddy as the java period, where java was to change the world of programming
      then java became bloatware and memory soft (ie, you memorize what it contains rather than have better programmers who can write well)

      it will be a long while before the failures and lack of stuff with this solution gets known and the whole world of it gets pare back. there is a reason that DBs have dominated the field of computers for over 40 years in some form (though i would say quite primitive before 30 years ago)

      • lambdaexpression

        “and the claim of it being cheaper to use more machines is a joke. each machine needs an operating system, and the software, each requiring licenses. ”

        Except with NoSQL (as opposed to, say, MsSQL), most solutions are open source and run on open source platforms. This point is moot with the advent of AWS solutions as well, where the database can be run entirely in the cloud…

        Scaling horizontally in NoSQL is significantly easier than SQL and a lot cheaper because you don’t have to pay for Windows… or licences to use the majority of NoSQL databases commercially.

        The vast majority of machines that you would want to run a horizontally-scaling solution such as NoSQL on would be virtual and or/server racks and not maintained by you but in the cloud, so the point about how much space, cooling and electricity are pretty much moot as well. Again – run it in the cloud. And when you run in the cloud, horizontal scaling is super easy and a lot cheaper than vertical scaling.

        That’s not to say SQL is useless or out dated… they are both different tools to serve different purposes. However to say that NoSQL is bad because horizontal scaling isn’t cheaper because “you need to pay for an operating system, and licenses” is laughable

        • disqusaurus_rex

          Why would this be more true of MongoDB? It can be run on Windows (which costs those licenses), and many forms of SQL (such as MariaDB and MySQL) can be run in free environments. Don’t think this point matters much either way.

  • Kevin Karlin

    Eileen, a couple of statements I think are misleading regarding schemas and flexibility- “Each record conforms to fixed schema, meaning the columns must be decided and locked before data entry and each row must contain data for each column. This can be amended, but it involves altering the whole database and going offline.” RDMS allows the for a field to be required or not – hence the term sparsely populated columns. From a data management standpoint, those sparsely populated data points do drive the question “Are we collecting something of value”. You don’t need to alter an entire database to make changes to schema or how data relates to other data in the database – additionally, depending on the technology and change needed, schema changes do not necessarily require going offline.

    ACID is a key concept to ensure data integrity, if your talking social graphing that’s not so important. If you’re talking banking or e-commerce it’s probably pretty critical.

    • artfuldgr

      Kevin, the person writing has an agenda to push… we now have such a project, it would be better to use the database for most of it, and solr and such for the text part (its a medical project)… however, the vendor sold them a goose egg…

      notice the bio of the author… i have been writing code for 30 years, and this is a mess… all kinds of issues…
      1) take away the database, and you take away the various levels of security, speed, indexing, and reporting.
      2) ever try to make reports for management when every change to the system requires all your report code to fork and requires you go back and rewrite all your reports?
      3) we cant get much past 100 users… the SAME system in a DB version would be able to serve thousands
      4) they have to load the data into hibernate, which is caching… so if the server goes down, whammo
      5) databases have transactions… when they fail in the middle the db can back out the changes, not so with this new stuf
      6) security that really cant be added later is now deemed to slow things down, so it will be added later
      7) management is sold on articles like this by non programmers to the point that they wont listen to programmers telling them how to fix things, speed them up, etc… this means no raises, or such, and in some cases people getting laid off because they cant make a silk purse out of a sows ear, and managment heeds to blame someone for their choices.
      8) the XML is growing larger and larger to the point that the pages dont load timely. i would have sectioned things off, and not used CLOBs and would use XMLTYPE in oracle… but this then leverages a DB they have a license for and not their new trendy stuff.

      note that this is a mission critical project, so we all expect to be fired. when an employee leaves, it takes a long time to get them up to speed so that they can do anything, cause everything that exists now, embodies everything that existe before, so old XML has to work with newer modified structures. you try to do metrics reports when your reports drop data and you dont know it as XML missing a structure for the data is considered null data and not an error… given layers, names, and fields… it sometimes takes over 100 bytes to send an empty field in an XML… the data is so blown out its farcical… more than half the data sent from the system to the browser is not used for anything other than to tell the browser you dont have anything…

      this is a trend that will die over time except for connecting your DB stuff to documents and so on.
      now they are trying to use it to replace the DB, which is the wasteful period where they painfully discover, using your tax money, that there are no panaceas and magic bullets and that their choices are very very expensive.

  • bcegkmsw

    anyone come across the “1010 database”? What do you think of it?

  • Ashish Kumar

    Nice job Eileen…thanks for sharing this information…!!!!

  • I agree, those that like to kick SQL usually don’t understand that all technologies are complementing to each other, giving the developerarchitect the choice to choose the correct db for hisher project.

  • MrDev

    I feel like the writer of this article has already decided that NoSQL is better. Too many cons mentioned for SQL too many pros mentioned for NoSQL

    • Ethan Stewart

      I agree. You could rename the table headings “why SQL is outdated” and “why noSQL kicks its ass”

  • SuperPositiveMegaPosterMan

    For NOSQL, you gain speed and failover. You lose a flexible ability to search the data. the searches for the data are built into the the schema decisions. So, at a later date, if you decided you want to look at the data in a new way, you are out of luck with NOSQL. On the other hand, if your volume grows to a few terabytes you are out of luck for the most part with SQL.

    Both of them are crummy, in other words. If your “growth” is merely volume with no pivoting of the way you use the data, no interesting new features using the same data, NOSQL supports that best. If your “growth” is interesting new features or your whole company wants to pivot, SQL supports that best.

    The reason is of course the essential change that makes NOSQL fast and redundant also makes it inflexible – reducing everything to simplistic key/value pairs. Thats why so many people are using both, because neither really supports the modern business well.

  • Pingback: X Considerations Before Switching Servers - Jixee Blog()

  • Pingback: Considerations Before Switching Servers - Jixee Blog()

  • Pingback: External Links | AWS Research()

  • Pingback: why is sql vertically scalable and nosql horizontally()

  • Pingback: Are You Ready to Rumble? NoSQL and Relational DBs Face Off()

  • I think that this technology is going to be prevailing in most of the Web sites

    http://anaqaty.net/

  • in my opinion nosql is better and i wish to us it in my website
    http://cartoons-network.com/

  • Pingback: Weekly digital resources #13: SQL, JavaScript and HTML5 - DigiTools - nicocasel blog()

  • INF1

    Eileen resumed it well at the end “The idea that SQL and NoSQL are in direct opposition and competition with each other is a flawed one”. One can have SQL and noSQL based systems coexisting in the same framework and each solution respectively doing what it does best.

    Be careful, the SQL acronym could mean multiple things and lead to confusion. In this particular article, SQL is not the Structure Query Language that is talked about, it is the SQL from a data management model or from a system architecture point of view that Eileen is referring to.

    A confusing fact is calling noSQL technology something that is not directly the opposite of that technology. NonSQL could be called dynamic data model instead to remove confusion and increase its acceptation.

    Thank you for this great article!

  • Pingback: Database setup recommendation (Page 2) - Tech Support | futures.io()

  • Pingback: Big Data and Machine Learning Applications |()

  • Pingback: On Cyber Security & How Big Data Analytics Can Help - Datameer()

  • Pingback: SQL vs. NoSQL- What You Need to Know | Datacono...()

  • Pingback: SQL vs. NoSQL- What You Need to Know | Data Sci...()

  • Harold A. Scott

    Thanks, great article, but I am new to this technology and some parts were a little too technical, can anyone point me to a high-level overview of the subject?

  • Carolina Konovich

    Great article by the way!