NoSQL and Big Data

A long time ago in a galaxy far, far away….when I defined the requirements for a new Project Information Manager (PIM) my key needs were:

  • Extensible – foresight that there were developments on the horizon that would need future development and expansion of the system to include data collection for new fields.
  • Flexibility – to meet future needs of the business and satisfy compliance necessities.
  • Future proof with regards to performance – Data needs were only going to expand (as we have all seen the volume rise exponentially). In those days it was for Packaging regulations, *WEEE, RoHS, REACH Directives and regulations and customer environmental criteria.  I could see that was only going to get more detailed and with more people globally (system was expanded from UK, to EU and then to encompass US the APAC) having access to PIM to enter, validate and analyse.

*WEEE – The Waste Electrical and Electronic Equipment Directive

ROHS – The Restriction of Hazardous Substances Directive 2002/95/EC

REACH – European Union regulation concerning the Registration, Evaluation, Authorisation & restriction of Chemicals

The explosion of data we have witnessed is proving to be too large and too complex for relational databases (RDBMS) – which my PIM was built on – to handle on their own. Fortunately for organizations, a new breed of database has risen to the big data challenge—the Not Only SQL (NoSQL) database.

If only NoSQL had been available when I defined my PIM, instead of only the big players with their reassuringly expensive solutions.

What is NoSQL?

The term “NoSQL” doesn’t necessarily mean “no SQL at all”. It is defined as “Not only SQL” because many NoSQL databases do support some elements of SQL. But they don’t rely on SQL’s fixed-schema design principles, giving NoSQL users more flexibility on structuring databases.

 More information and a definition of fixed and dynamic schema’s can be found at: http://www.dbms2.com/2011/07/31/dynamic-fixed-schema-databases/

Previously relational databases built around the SQL programming language were the standard choice (my PIM), and, in many cases, the only choice of database technologies for organizations. Now, with the emergence of various NoSQL software platforms, IT managers and business executives involved in technology decisions have more options on database deployments. NoSQL databases support dynamic schema design, offering the potential for increased flexibility, scalability and customization compared to relational software. That makes them a good fit for Web applications, content management systems and other uses involving large amounts of non-uniform data requiring frequent updates and varying field formats. In particular, NoSQL technologies are designed with “big data” needs in mind.

The array of NoSQL database choices may seem confusing or even overwhelming. NoSQL databases are grouped into four primary product categories with different architectural characteristics:

  • Document databases
  • Graph databases
  • Key-value databases
  • Wide column stores

Many NoSQL platforms are also tailored for specific purposes, and they may or may not work well with SQL technologies, which could be a necessity in some organizations. In addition, most NoSQL systems aren’t suitable replacements for relational databases in transaction processing applications, because they lack full *ACID compliance for guaranteeing transactional integrity and data consistency.

*ACID provides principles governing how changes are applied to a database. In a very simplified way:

  • (A) when you do something to change a database the change should work or fail as a whole
  • (C) the database should remain consistent (this is a pretty broad topic)
  • (I) if other things are going on at the same time they shouldn’t be able to see things mid-update
  • (D) if the system blows up (hardware or software) the database needs to be able to pick itself back up; and if it says it finished applying an update, it needs to be certain

Depending on the business problem an organization is trying to solve, IT decision makers may need to compare the benefits of NoSQL software and relational databases.

The monopoly of relational databases held by the likes of Oracle, Microsoft SQL Server, and MySQL is rapidly changing. In the last 5 years, NoSQL databases such as MongoDB and Apache Cassandra, Redis and HBase have enjoyed exponential growth in comparison to their RDBMS counterparts.

This stratospheric rise in adoption of NoSQL does not suggest that the demise of the traditional data warehouse is on the horizon. However, it does show that many organizations are turning to NoSQL as a more cloud-friendly solution to their big data problems.

It’s a much bigger market than you think!

If your organization is ready to do more with big data, here’s a comparative look at NoSQL and RDBMS to help you better decide if NoSQL is right for you.

https://www.qubole.com/blog/big-data/nosql-databases/

More information can be found at:

http://www.infoworld.com/article/2610393/application-development/the-dirty-truth-about-big-data-and-nosql.html

http://searchdatamanagement.techtarget.com/essentialguide/Guide-to-NoSQL-databases-How-they-can-help-users-meet-big-data-needs

http://www.forbes.com/sites/davefeinleib/2012/10/08/big-data-and-nosql-five-key-insights/#28ff6d903bf0

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: