Let’s look at a historicalย experiment in Big Data; a project was implemented with the intention ofย determining the population of France and its social economic factions such as health and financial prosperity.ย Using birth and death records from the French parishes a figure was obtained by multiplying the number of births by 26. Third party anomalies such as birth hygiene were also tracked and accounted for. All this was achieved using only partial parish records with an accuracy of projection to within half a million.
Guess what? This was done in 1781 by a French Genius Pierre-Simon Laplace.
[easy-tweet tweet=”Has #BigData existed since the 1700s? ” user=”comparethecloud” hashtags=”data, analytics”]
[quote_box_center]
There are many descriptions of what โBig Dataโ the quote below is from the Wikipedia description:
โBig dataย is a broad term forย dataย sets so large or complex that traditionalย dataย processing applications are inadequateโ
Another frequently used description often quoted by a plethora of cloud and Big Data marketers is:
โBig data allow us to provide future predictions based on previously collected dataโ.[/quote_box_center]
Monsieur Laplace built upon something called Bayes’ Theorem an often much maligned theory first conceived by Thomas Bayes in 1740.ย Bayes, a religious figure, wanted to make rational decisions about the existence of god and created his theorem strangely based on a cue ball thrown onto a table. Bayes wanted to discover the probability of where the ball would land.ย Each throw generated a new data point that updated his system based on prior probability, each throw brought Bayes closer to a probability.
The point here is that today we would generally call the statistical/theoretical models โalgorithmsโ but remember there was no Hadoop / Apache Spark, BigInsights or MapReduce in those days. Data sets were compiled analysed and conducted by paper and quill (Not even a ball point pen).
Let’s roll on to the 1940s, the United Kingdom was being decimated by U-boats attacking our merchant fleets during the battle of the Atlantic. From the U Boat pens of Lorient in France communications were issued that were coded by a machine called โEnigmaโ.
Cryptography had moved on considerably with many thinking the German naval code simply unbreakable (the same way many feel about today’s encryption).
Faced with impossible odds in stepped Alan Turing using a refined version of Bayesian theory, Turing slowly but surely built Bombes (early computers) and utilised long strips of thin cardboard to decrypt the German cyphers.
Turing, eventually discovered the essence of probabilities calculating large data sets to show probability that each deciphered message was part of a character set. This then allowed for the German Naval messages to be decrypted saving thousands of lives.
I would also argue that most, if not all, big data and analytics products in existence are based in some form on Bayesian principles
Roll onto 2016, so where does all this leave us today? Let’s look at artificial intelligence many of the AI algorithms are today using Bayes to decide probability and to make judgements.ย I would also argue that most, if not all, big data and analytics products in existence are based in some form on Bayesian principles.
[easy-tweet tweet=”#BigData is propelling discoveries like never before” user=”mrAndrewMcLean”]
The analysis of large data sets from DNA through to Health, Economics, Chemistry and Computer Science use these principles to forecast, analyse and refine existing data and predictions. The use of so-called Bayesian networks looking at cause and effect relationships coupled with the failing costs of computing and cloud is propelling discovery like no other time before.
My advice though, if you are thinking about adopting a big data approach in your organisation, why not look to the past and show your project sponsors that actually, Big Data is not new at all.
For further research I highly recommend the following book “The Theory That Would Not Die“.ย
Andrew McLean is the Studio Director at Disruptive Live, a Compare the Cloud brand. He is an experienced leader in the technology industry, with a background in delivering innovative & engaging live events. Andrew has a wealth of experience in producing engaging content, from live shows and webinars to roundtables and panel discussions. He has a passion for helping businesses understand the latest trends and technologies, and how they can be applied to drive growth and innovation.