It’s always an interesting idea to address technological issues and when the Foreign Affairs published “The Rise of Big Data: How’ It’s Changing the Way We Think About the World”, you can be sure it created some hype.
It was written by Viktor Mayer-Schoenberger who’s a professor of Internet Governance and Regulations at Oxford Internet Institute and Data Editor of the Economist, Kenneth Cukier.
What’s so ingenious about the piece is that it puts Big Data into a much larger context, as if you would be dipping back in time to the 3rd Century BC, where the sum of all human knowledge existed only in the Library of Alexandria, which would be about an estimate of 1,200 exabytes. However, by doing so Mayer-Schoenberger and Cukier have changed how we perceive Big Data. According to them, Big Data changes the playing field not just for technology, but for all data types and our understanding of them.
In their piece, both Cukier and Mayer-Scheonberger have identified three ways Big Data will change our understanding of data:
Before the inception of Big Data, the only way to truly understand human behavior was based on sampling of specific segmented group. Now, with Big Data, the size of the sample is much larger. In fact, we can even go as far as saying “all”.
“Big data is a matter not just of creating somewhat larger samples but of harnessing as much of the existing data as possible about what is being studied,” the authors write. “We still need statistics; we just no longer need to rely on small samples.”
Data Management experts will spend most of their time discussing the importance of data governance and data quality. There have been discussions on how these disciplines could be incorporated in the use of Big Data; that way you can make informed decisions.
The problem, however, lies in the fact that you’re talking about data that’s not under your control nor is it under an easy governance program, but when you’re dealing with larger data sets that’s generally not the problem. The only reason you believe in it is that statisticians will tell you that, but you need to remember that when you’re talking about the quality of the data set, you might encounter such problems with Big Data.
The article cleared all that up by describing the history of computer aided language translation. For instance, in the 1990’s IBM made an attempt to create a translation machine that used masks, perfect translation of parliamentary transcripts and probability. It wasn’t really appreciated, nor was it that great.
Moving forward to today, what do you look for when you want on the spot translation? You look for Google. The company has accomplished something quite astounding; they’ve used “messy” data from the internet which includes billions of translations from all over the web, resulting in a translation of over 65 languages.
“Large amounts of messy data trumped small amounts of cleaner data,” the authors write.
Its one thing to know, it’s another to actually grasp the idea of understanding data sets and moving forward with Big Data and according to the writers it’s the reason for the third shift of how we comprehend data sets, which is our focus on correlations, instead of causations.