Big Data is often used to denote a collection of data that is too big to process by conventional data analysis. In many fields there is an abundance of data, for example in chess, particle physics, and astrophysics. This keynote aims to shed light on this emerging field, to show present and future technical challenges for artificial intelligence and, above all, to demystify big data from meanings and opinions that are given to them by the laymen.
Almost all available knowledge is currently stored in large databases. Traditional data extraction methods assume that the data under investigation is structured in such a way that it is already known what the relevant information is. For Big Data, it is neither known in advance what is relevant, nor how to extract the relevant data from the data banks. Thus one of the central challenges is how to identify and extract relevant data from this enormous data source.
A second crucial difference between traditional data processing and the processing of big data is their focus. The focus has shifted from establishing causal relations to detecting correlations, In practice, correlations are easy to derive from data. Moreover, they form the relevant information for users and customers. As a case in point, we mention a well-known result in crime prevention. It is found by correlations that when it is warm outside, more crimes happen around supermarkets. The causal relation between the two is not immediately obvious, but does not affect the law-enforcement policy: there should be more police around supermarkets if it is hot, regardless of the cause of the crime. The observation that correlations are replacing causal relations in many domains, is called the computational turn. Effectively, this characterize the importance of Big Data. Moreover, finding correlations requires little intelligence, artificial or otherwise.
It follows that one of the challenges of Big Data lies in the interpretation of (correlational) results: by (1) visualization of data and (2) the forming of a narrative. Visualizing the data is important to understand the correlations that are found. It is also the first step of a narrative: what story does the data tell? For example, the Google Flu database shows regional prevalence of the flu. A narrative would be to find out the underlying cause as to why this is the case.
Big Data is a promising area for future research in artificial intelligence. First of all, the identification of relevant data is a challenging task and will continue to be more challenging as the data collected grows exponentially. Next, there is a theoretical challenge: currently, the computational turn has shifted the focus to correlation. However, as the intelligence of computer programs progresses, it may become possible to reason about data and make models and derive causations from correlations again.
Below we mention ten applications, of which two will be discussed during the keynote lecture.
- Safety (politics, military)
- Public Safety (Live View)
- Commerce (ads)
- Banking (money streams)
- Judiciary (CODR)
- Waterway transport
- Communication (twitter, phablet)
- Education (MOOC)
- Public governance
- Warfare (Multi Agent Systems, Socio Cognitive Models)
To conclude, if advances in AI are able to challenge the causation problem, the Big Data of today becomes the traditional data of tomorrow.