Open Source in Artificial Intelligence: Why it Matters to Give Away Your Software for Free

Open Source in Artificial Intelligence: Why it Matters to Give Away Your Software for Free

I. Introduction

Open sourcing some technologies is kind of counterintuitive at a first look. Why on earth should a company give away what they invested money and people in? I have already written on this trend, but I keep sharpening my thinking around it and this post is a consequence of recent new considerations.

The open source model is quite hard to be reconciled with the traditional SaaS model, especially in the financial sector. However, we are observing many firms providing cutting-edge technologies and algorithms for free. While in some cases there is a specific business motivation behind it (e.g., Google releasing Tensorflow to avoid conflict of interests with their cloud offering), the decision of open sourcing (part of) the technology actually represents an emerging trend.

Tools are nowadays less relevant than people or data and the sharing mindset is a key asset for organizations. Based on this statement, we can divide the considerations on open source in two clusters, which are business considerations and individual considerations.


II. The Business Perspective

From a business perspective, the basic idea is that is really hard to keep the pace with the current technological development and you don’t want your technology to become obsolete in three months time. It is better to give it out for free and set ...


Read More on Datafloq
A Brief History of AI

A Brief History of AI

This article appeared on Medium, it has been awarded the Silver badge by KDnuggets and the blog has been also recognized as one of top 50 AI blogs by Feedspot.

I. The origins

In spite of all the current hype, AI is not a new field of study, but it has its ground in the fifties. If we exclude the pure philosophical reasoning path that goes from the Ancient Greek to Hobbes, Leibniz, and Pascal, AI as we know it has been officially started in 1956 at Dartmouth College, where the most eminent experts gathered to brainstorm on intelligence simulation.

This happened only a few years after Asimov set his own three laws of robotics, but more relevantly after the famous paper published by Turing (1950), where he proposes for the first time the idea of a thinking machine and the more popular Turing test to assess whether such machine shows, in fact, any intelligence.

As soon as the research group at Dartmouth publicly released the contents and ideas arisen from that summer meeting, a flow of government funding was reserved for the study of creating a nonbiological intelligence.


II. The phantom menace

At that time, AI seemed to be easily reachable, but it turned out that was not the case. At ...


Read More on Datafloq
How Artificial Intelligence Impacts Financial Services

How Artificial Intelligence Impacts Financial Services

Artificial Intelligence is using structured and unstructured data in financial services to improve the customer experience and engagement, to detect outliers and anomalies, to increase revenues, reduce costs, find predictability in patterns and increase forecasts reliability…but it is not so in any other industry? We all know this story, right? So what is really peculiar about AI in financial services?

First of all, FS is an industry full of data. You might expect this data to be concentrated in big financial institutions’ hands, but most of them are actually public and thanks to the new EU payment directive (PSD2) larger datasets are available to smaller players as well. AI can then be easily developed and applied because the barriers to entry are lower with respect to other sectors.

Second, many of the underlying processes can be relatively easier to be automatized while many others can be improved by either brute force computation or speed. And historically is one of the sectors that needed this type of innovation the most, is incredibly competitive and is always looking for some new source of ROI. Bottom line: the marginal impact of AI is greater than in other sectors.

Third, the transfer of wealth across different generations makes the field really fertile for AI. AI needs ...


Read More on Datafloq
How Artificial Intelligence Affects and Changes the Insurance Industry

How Artificial Intelligence Affects and Changes the Insurance Industry

There are plenty of startups out there working at the intersection of AI and insurance, and it essential to look at least at some of them to understand the future direction of the industry, as well as the kind of improvements AI is having in the insurtech space. An interesting thing to notice is that most of the innovation is happening in the UK rather than other countries, in all the segments proposed below.

Claim processing

Shift Technology skims the valid claims from the ones that deserve further validations; Tractable instead is trying to automatize experts task for insurances; ControlExpert has a specific focus on car claims;Cognotekt optimizes internal business processes, as well as Snapsheet does; Motionscloud offers instead mobile claim management solutions; and finallyRightIndem aims to help insurers to deliver on-premise smoothing the claiming flow.

Virtual Agents & Chatbots

Spixii is an automated insurance agent who helps you buying any insurance coverage you might want; Cognicor is a virtual assistant that offers customer care services; Conversica identifies which leads intend to purchase, while Your.MD is a personal health assistant that analyzes symptoms and produces pieces of advice. MedWhat instead uses EMR (medical records) to assist the patient as it was a virtual doctor, and Babylon gives medical advice taking care of tight budget constraints. Insurifyis another personal insurance agent who works as a comparator for car ...


Read More on Datafloq
The New CxO Gang: Data, AI, and Robotics

The New CxO Gang: Data, AI, and Robotics

It has been said that this new wave of exponential technologies will threaten a lot of jobs, both blue and white-collar ones. But if from one hand many roles will disappear, from the other hand in the very short-term we are observing new people coming out from the crowd to lead this revolution and set the pace.

These are the people who really understand both the technicalities of the problems as well as have a clear view of the business implications of the new technologies and can easily plan how to embed those new capabilities in enterprise contexts.

Hence, I am going to briefly present three of them, i.e., the Chief Data Officer (CDO), the Chief Artificial Intelligence Officer (CAIO) and the Chief Robotics Officer (CRO). Sad to be said, I never heard about a ‘Chief of Data Science’, but for some strange reasons, the role is usually called either ‘Head of Data Science’ or ‘Chief Analytics Officer’ (as if data scientist won’t deserve someone at C-level to lead their efforts).

Let’s see then who they are and what they would be useful for.

I. The Chief Data Officer (CDO)

Apparently, it is a new role born in a lighter form straight after the financial crisis springing from the need to ...


Read More on Datafloq
Big Data and Risk Management in Financial Markets (Part II)

Big Data and Risk Management in Financial Markets (Part II)

I. Introduction to forecasting

If you missed the first part, I suggest you read it before going through this article. It gives a good introduction as well as an overview of traditional risk management and big data simulation. This article is instead more focused on big data forecasting.

There are nowadays several new techniques or methods borrowed from other disciplines which are used by financial practitioners with the aim of predicting future market outcomes.

Eklund and Kapetanios (2008) provided a good review of all the new predictive methods and I am borrowing here their classification of forecasting methods, which divides techniques into four groups: single equation models that use the whole datasets; models that use only a subset of the whole database, even though a complete set is provided; models that use partial datasets to estimate multiple forecasts averaged later on; and finally, multivariate models that use the whole datasets.


II. Single equation models

The first group is quite wide and includes common techniques used differently, such as ordinary least square (OLS) regression or Bayesian regression, as well as new advancements in the field, as in the case of factor models.

In the OLS model, when the time series dimension exceeds the number of observations, the generalized inverse has to be used in order to estimate the parameters.

Bayesian regression (De Mol, Giannone, ...


Read More on Datafloq
Big Data Strategy (Part III): is Your Company Data-Driven?

Big Data Strategy (Part III): is Your Company Data-Driven?

If you missed the first two parts, I have previously proposed some tips for analyzing corporate data as well as a data maturity map to understand the stage of data development of an organization. Now, in this final article, I want to conclude this mini-series with final food for thoughts and considerations on big data capabilities in a company context.

I. Where is home for big data capabilities?

First of all, I want to spend few more words regarding the organizational home (Pearson and Wegener, 2013) for data analytics. I claimed that the Centre of Excellence is the cutting-edge structure to incorporate and supervise the data functions within a company. Its main task is to coordinate cross-units activities, which include:


Maintaining and upgrading the technological infrastructures;
Deciding what data have to be gathered and from which department;
Helping with the talents recruitment;
Planning the insights generation phase and stating the privacy, compliance, and ethics policies.


However, other forms may exist, and it is essential to know them since sometimes they might fit better into preexisting business models.



Data analytics organizational models

The figure shows different combinations of data analytics independence and business models. It ranges from business units (BUs) that are completely independent one from the other, to independent BUs that join the efforts in some ...


Read More on Datafloq
AI and Speech Recognition: A Primer for Chatbots

AI and Speech Recognition: A Primer for Chatbots

Conversational User Interfaces (CUI) are at the heart of the current wave of AI development. Although many applications and products out there are simply “Mechanical Turks” — which means machines that pretend to be automatized while a hidden person is actually doing all the work — there have been many interesting advancements in speech recognition from the symbolic or statistical learning approaches.

In particular, deep learning is drastically augmenting the abilities of the bots with respect to traditional NLP (i.e., bag-of-words clustering, TF-IDF, etc.) and is creating the concept of “conversation-as-a-platform”, which is disrupting the apps market.

Our smartphone currently represents the most expensive area to be purchased per squared centimeter (even more expensive than the square meters price of houses in Beverly Hills), and it is not hard to envision that having a bot as unique interfaces will make this area worth almost zero.

None of these would be possible though without heavily investing in speech recognition research. Deep Reinforcement Learning (DFL) has been the boss in town for the past few years and it has been fed by human feedbacks. However, I personally believe that soon we will move toward a B2B (bot-to-bot) training for a very simple reason: the reward structure. Humans spend time training their bots if ...


Read More on Datafloq
The Psychology of Data Science

The Psychology of Data Science

I have recently published a piece on what it means and what it takes to be a data scientist. I want to add a further consideration, which lies at the intersection of science and psychology.

I. Data Scientists’ Characteristics

There is no scientist exactly alike another, and this is true for data scientists as well. Even if data science seems to mainly be a field run by American white male with a PhD (what I inferred from King and Magoulas, 2015), this is not conclusive at all on the ideal candidate to hire. The suggestion is to value the skills and capabilities more than titles or formal education (there are not many academic programs so well-structured to signal the right set of competencies to potential employers).

So far, in order to become a data scientist, the paths to be followed could have been unconventional and various, so it is important to assess the abilities instead of simply deciding based on the type of background or degree level. Never forget that one of the real extra-value added by data science is different field contaminations and cross-sectional applications.

But there is also another relevant aspect to take into account in choosing the right candidate for your team, and that is Psychology.

II. How ...


Read More on Datafloq
Data Security, Data Ethics, and Data Ownership

Data Security, Data Ethics, and Data Ownership

I. The problem(s)

Data security represents one of the main problems of this data yield generation, since a higher magnitude of data is correlated with a loose control and higher fraud probability, with a higher likelihood of losing own privacy, and becoming targets of illicit or unethical activities. Today more than ever a universal data regulation is needed — and some steps have already been taken toward one (OECD, 2013). This is especially true because everyone claims privacy leakages, but no one wants to give up on the extra services and customized products that companies are developing based on our personal data.

It is essential to protect individual privacy without erasing companies’ capacity to use data for driving businesses in a heterogeneous but harmonized way. Any fragment of data has to be collected with prior explicit consent, and guaranteed and controlled against manipulation and fallacies. A privacy assessment to understand how people would be affected by the use of data is crucial as well.

II. Fairness and Data Minimization

There are two important concepts to be considered from a data protection point of view: fairness and minimization.

Fairness concerns how data are obtained, and the transparency needed from organizations that are collecting them, especially about their future potential uses.

Data minimization, instead, regards the ability of gathering ...


Read More on Datafloq
Who is Best Positioned to Invest in Artificial Intelligence? A Descriptive Analysis

Who is Best Positioned to Invest in Artificial Intelligence? A Descriptive Analysis

It seems to me that the hype about AI makes really difficult for experienced investors to understand where the real value and innovation are. I would like then to humbly try to bring some clarity to what is happening on the investment side of the artificial intelligence industry.

We have seen as in the past the development of AI has been stopped by the absence of funding, and thus studying the current investment market is crucial to identify where AI is going. First of all, it should be clear that investing in AI is extremely cumbersome: the level of technical complexity goes out of the pure commercial scope, and not all the venture capitalists are able to fully comprehend the functional details of machine learning. This is why the figures of the “Advisors” and “Scientist-in-Residence” are becoming extremely important nowadays. Those roles would also help in setting the right expectations level, and figuring out what is possible and what is not.

AI investors are also slightly different from other investors: they should have a deep capital base (it is still not clear what approach will pay off), and a higher than usual risk tolerance: investing in AI is a marathon, and it might take ten years ...


Read More on Datafloq
Artificial Intelligence Classification Matrix: A Framework to Classify AI Startups

Artificial Intelligence Classification Matrix: A Framework to Classify AI Startups

All the problems discussed in the previous posts can create two major cross-sectional problems: the likely event to run out of money before hitting relevant milestones toward the next investment, as well as whether pursuing specific business applications to break even instead of focusing on product development.

In terms instead of classifying different companies operating in the space, there might be several different ways to think around machine intelligence startups (e.g., the classification proposed by Bloomberg Beta investor Shivon Zilis in 2015 is very accurate and useful for this purpose). I believe though that a too narrow framework might be counterproductive given the flexibility of the sector and the facility of transitioning from one group to another, and so I preferred to create a four-major-clusters categorization:

i) Academic spin-offs: these are the more long-term research-oriented companies, which tackle problems hard to break. The teams are usually really experienced, and they are the real innovators who make breakthroughs that advance the field;

ii) Data-as-a-service (DaaS): in this group are included companies which collect specific huge datasets, or create new data sources connecting unrelated silos;

iii) Model-as-a-service (MaaS): this seems to be the most widespread class of companies, and it is made of those firms that are commoditizing their ...


Read More on Datafloq
The 37–78 Paradigm: A New Artificial Intelligence Framework

The 37–78 Paradigm: A New Artificial Intelligence Framework

I truly believe AI is life-changing, and I want to highlight three final characteristics that AI-as-a-technology is introducing. First of all, AI is disrupting the traditional IoT business model because is bringing analytics to final customers instead of centralizing it.

Second, it is forcing business to keep customers in the loop — and the reasons why are manifold:

i) establishing trust into the product and the company;

ii) increasing the clients’ retention building customary behaviours;

iii) improve sensibly the product through feedbacks.

The shifting focus on the final user as part of the product development is quickly becoming essential, to such a point that it represents a new business paradigm, i.e., the “Paradigm 37–78”. I named this new pattern after the events of March 2016, in which AlphaGo defeated Lee Sedol in the Go game. In the move 37, AlphaGo surprised Lee Sedol with a move that no human would have ever tried or seen coming, and thus it won the second game. Lee Sedol rethought about that game, getting used to that kind of move and building the habit of thinking with a new perspective. He started realizing (and trusting) that the move made by the machine was indeed superb, and in game four he surprised ...


Read More on Datafloq
How AI is Revolutionizing Business Models

How AI is Revolutionizing Business Models

AI is introducing radical innovation even in the way we think about business, and the aim of this section is indeed to categorize different AI companies and business models.

It is possible to look at the AI sector as really similar in terms of business models to the biopharma industry: expensive and long R&D; long investment cycle; low-probability enormous returns; concentration of funding toward specific phases of development. There are anyway two differences between those two fields: the experimentation phase, that is much faster and painless for AI, and the (absent) patenting period, which forces AI to continuously evolve and to use alternative revenue models (e.g., freemium model).

I. The DeepMind Strategy and the Open Source Model

If we look from the incumbents’ side, we might notice two different nuances in their business models evolution. First, the growth model is changing. Instead of competing with emerging startups, the biggest incumbents are pursuing an aggressive acquisition strategy.

I named this new expansion strategy the “DeepMind strategy” because it has become extremely common after the acquisition of DeepMind operated by Google.

The companies are purchased when they are still early stage, in their first 1–3 years of life, where the focus is more on people and pure technological advancements rather than revenues (AI is the ...


Read More on Datafloq
How Artificial Intelligence Will Revolutionize the Insurance Industry

How Artificial Intelligence Will Revolutionize the Insurance Industry

This is a short piece summarizing applications of AI in the insurance industry. A longer article on this vertical will be out in a few weeks.

Artificial Intelligence (AI) is revolutionizing every industry, and insurance will be affected as well. As already stated in previous posts, AI today is perceived in three different ways: it is something that might answer all your questions, with an increasing degree of accuracy (“the Oracle”); it could do anything it is commanded to do (“the Genie”), or it might act autonomously to pursue a certain long-term goal (“the Sovereign”). My personal definition is the following one:

An artificial intelligence is a system that can learn how to learn, or in other words a series of instructions (an algorithm) that allows computers to write their own algorithms without being explicitly programmed for.

An artificial engine can also be classified in three ways: a narrow AI, which is nothing more than a specific domain application or task that gets better by ingesting further data and “learns” how to reduce the output error. An example here is DeepBlue for the chess game, but more generally this group includes all the functional technologies that serve a specific purpose. These systems are usually quite controllable ...


Read More on Datafloq
What You Are Too Afraid to Ask About Artificial Intelligence (Part III): Technologies

What You Are Too Afraid to Ask About Artificial Intelligence (Part III): Technologies

As we explained before, the recent surge of AI and it is rapidly becoming a dominant discipline are partially due to the exponential degree of technological progress we faced over the last few years. What it is interesting to point out though is that AI is deeply influencing and shaping the course of technology as well.

First of all, the Graphics Processing Units (GPUs) have been adapted from traditional graphical user interface applications to alternative parallel computing operations. NVIDIA is leading this flow and is pioneering the market with the CUDA platform and the recent introduction of Telsa P100 platform (the first GPU designed for hyperscale data center applications). On top of P100, they also created the first full server appliance platform (named DGX-1), which will bring deep learning to an entirely new level. Very recently, they also released the Titan X, which is the biggest GPU ever built (3,584 CUDA cores).

In general, the most impressive developments we observed are related to chips, especially Neuromorphic Processing Units (NPUs) ideated to emulate the human brain. Specific AI-chips have been created by major incumbents: IBM has released in 2016 the TrueNorth chip, which it is claimed to work very similarly to a mammalian brain. The chip is made of 5.4 billion transistors, and ...


Read More on Datafloq
What You are Too Afraid to Ask About Artificial Intelligence (Part I): Machine Learning

What You are Too Afraid to Ask About Artificial Intelligence (Part I): Machine Learning

AI is moving at a stellar speed and is probably one of most complex and present sciences. The complexity here is not meant as a level of difficulty in understanding and innovating (although of course, this is quite high), but as the degree of interrelation with other fields apparently disconnected.

There are basically two schools of thought on how an AI should be properly built: the Connectionists start from the assumption that we should draw inspiration from the neural networks of the human brain, while the Symbolists prefer to move from banks of knowledge and fixed rules on how the world works. Given these two pillars, they think it is possible to build a system capable of reasoning and interpreting.

In addition, a strong dichotomy is naturally taking shape in terms of problem-solving strategy: you can solve a problem through a simpler algorithm, which though it increases its accuracy in time (iteration approach), or you can divide the problem into smaller and smaller blocks (parallel sequential decomposition approach).

Up to date, there is not a clear answer on what approach or school of thoughts works the best, and thus I find appropriate to briefly discuss major advancements in both pure machine learning techniques (Part I) ...


Read More on Datafloq
13 Forecasts on Artificial Intelligence

13 Forecasts on Artificial Intelligence

We have discussed some AI topics in the previous posts, and it should seem now obvious the extraordinary disruptive impact AI had over the past few years. However, what everyone is now thinking of is where AI will be in five years time. I find it useful then to describe a few emerging trends we start seeing today, as well as make few predictions around machine learning future developments. The following proposed list does not want to be either exhaustive or truth-in-stone, but it comes from a series of personal considerations that might be useful when thinking about the impact of AI on our world.

The 13 Forecasts on AI

1. AI is going to require fewer data to work

Companies like Vicarious or Geometric Intelligence are working toward reducing the data burden needed to train neural networks. The amount of data required nowadays represents the major barrier for AI to be spread out (and the major competitive advantage), and the use of probabilistic induction (Lake et al., 2015) could solve this major problem for an AGI development. A less data-intensive algorithm might eventually use the concepts learned and assimilated in richer ways, either for action, imagination, or exploration.

2. New types of learning methods are the key

The new ...


Read More on Datafloq
Artificial Intelligence: What is It and Why Now?

Artificial Intelligence: What is It and Why Now?

Artificial Intelligence (AI) represents nowadays a paradigm shift that is driving at the same time the scientific progress as well as the industry evolution. Given the intense level of domain knowledge required to really appreciate the technicalities of the artificial engines, what AI is and can do is often misunderstood: the general audience is fascinated by its development and frightened by terminator-like scenarios; investors are mobilizing huge amounts of capital but they have not a clear picture of the competitive drivers that characterize companies and products; and managers are rushing to get their hands on the last software that may improve their productivities and revenues, and eventually their bonuses.

Even though the general optimism around creating advancements in artificial intelligence is evident (Muller and Bostrom, 2016), in order to foster the pace of growth facilitated by AI I believe it would be necessary to clarify some concepts.

Basic Definitions & Categorization

First, let’s describe what artificial intelligence means. According to Bostrom (2014), AI today is perceived in three different ways: it is something that might answer all your questions, with an increasing degree of accuracy (“the Oracle”); it could do anything it is commanded to do (“the Genie”), or it might act autonomously ...


Read More on Datafloq
Why Artificial Intelligence is Now More Important Than Ever

Why Artificial Intelligence is Now More Important Than Ever

The reason why we are studying AI right now more actively is clearly because of the potential applications it might have, because of the media and general public attention it received, as well as because of the incredible amount of funding investors are devoting to it as never before.

Machine learning is being quickly commoditized, and this encourages a more profound democratization of intelligence, although this is true only for low-order knowledge. If from one hand a large bucket of services and tools are now available to final users, on the other hand, the real power is concentrating into the hands of few major incumbents with the data availability and computational resources to really exploit AI to a higher level.

Apart from this technological polarization, the main problem the sector is experiencing can be divided into two key branches: first, the misalignments of i) the long term AGI research sacrificed for the short term business applications, and ii) what AI can actually do against what people think or assume it does. Both the issues stem from the high technical knowledge intrinsically required to understand it, but they are creating hype around AI. Part of the hype is clearly justified, because AI has been useful in ...


Read More on Datafloq
Big Data and Risk Management in Financial Markets (Part I)

Big Data and Risk Management in Financial Markets (Part I)

We have seen how the interdisciplinary use of big data affected many sectors. Different examples are contagion spreading (Culotta, 2010); music albums success predictions (Dhar and Chang, 2009); or presidential election (Tumasjan et al., 2010).

In financial markets, the sentiment analysis probably represents the major and most known implementation of machine learning techniques on big datasets (Bollen et al., 2011). In spite of all the hype, though, risk management is still an exception. New information and stack of technologies did not bring as many benefits to the risk management as they did to trading for instance.

A risk is indeed usually addressed from an operational perspective, from a customer relationship angle, or specifically for fraud prevention and credit scoring. However, applications strictly related to financial markets are still not so widespread, mainly because of the following problem: in theory, more information should entail a higher degree of accuracy, while in practice it (also) exponentially augments the system complexity, making really complicated to identify and timely analyze unstructured data that might be extremely valuable in such fast-paced environments.

The likelihood (and the risk) of a network systemic failure is then multiplied by the increase in the interconnection degree of the markets. More and more data can help central ...


Read More on Datafloq
Big Data Strategy (Part II): a Data Maturity Map

Big Data Strategy (Part II): a Data Maturity Map

As shown in Part I, there are a series of issues related to internal data management policies and approaches. The answers to these problems are not trivial, and we need a frame to approach them.

A Data Stage of Development Structure (DS2) is a maturity model built for this purpose, a roadmap developed to implement a revenue-generating and impactful data strategy. It can be used to assess the current situation of the company and to understand the future steps to undertake to enhance internal big data capabilities.

The following table provides a four by four matrix where the increasing stages of evolution are indicated as Primitive, Bespoke, Factory, and Scientific, while the metrics they are considered through are Culture, Data, Technology, and Talent. The final considerations are drawn in the last row, the one that concerns the financial impact on the business of a well-set data strategy.



Figure 1. Data Stage of Development Structure (DS2)

Stage one is about raising awareness: the realization that data science could be relevant to the company business. In this phase, there is neither any governance structure in place nor any pre-existing technology and above all no organization-wide buy-in. Yet, tangible projects are still the result of individual’s data enthusiasm being channeled into something actionable. The set of skills owned is ...


Read More on Datafloq
Big Data Strategy (Part I): Tips for Analyzing Your Data

Big Data Strategy (Part I): Tips for Analyzing Your Data

We have seen in a previous post what are the common misconceptions in big data analytics, and how relevant it is starting looking at data with a goal in mind.

Even if I personally believe that posing the right question is 50% of what a good data scientist should do, there are alternative approaches that can be implemented. The main one that is often suggested, in particular from non-technical professionals, is the “let the data speak” approach: a sort of magic random data discovery that should spot valuable insights that a human analyst does not notice.

Well, the reality is that this a highly inefficient method: (random) data mining it is resource consuming and potentially value-destructive. The main reasons why data mining is often ineffective is that it is undertaken without any rationale, and this leads to common mistakes such as false positives; over-fitting; neglected spurious relations; sampling biases; causation-correlation reversal; wrong variables inclusion; or eventually model selection (Doornik and Hendry, 2015; Harford, 2014). We should especially pay specific attention to the causation-correlation problem, since observational data only take into account the second aspect. However, according to Varian (2013) the problem can be easily solved through experimentations.

Hence, I think that a hybrid approach is ...


Read More on Datafloq
Data science and Big Data: Definitions and Common Myths

Data science and Big Data: Definitions and Common Myths

Big data is nowadays one of the most common buzzwords you might have heard of. There are many ways to define what big data is, and this is why probably it still remains a really difficult concept to grasp.

Someone describes big data as dataset bigger than a certain threshold, e.g., over a terabyte (Driscoll, 2010), while others look at big data as dataset that crashes conventional analytical tools like Microsoft Excel. More renowned works though identified big data as data that display features of Variety,Velocity, and Volume (Laney, 2001; McAfee and Brynjolfsson, 2012; IBM, 2013; Marr, 2015). And all of them are somehow true, although I think incomplete.

The first class of definitions is indeed partial, since it is related to a pure technological issue, i.e., the computational need overcomes the available analytical power of a single tool or machine. This would not explain however why big data came out few years ago and not back in the Nineties.

The second opinion is instead too constraining, since it assumes that all the features have to be satisfied to talk about big data, and it also seems to identify the causes that originated big data (i.e., a huge amount of fast and diverse new data sources), ...


Read More on Datafloq

Privacy Policy

Copyright © 2017 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa
X