Big Data Strategy (Part III): is Your Company Data-Driven?
IfÂ you missed the first two parts, I have previously proposed someÂ tips for analyzing corporate dataÂ as well asÂ a data maturity mapÂ to understand the stage of data development of an organization. Now, in this final article, I want to conclude this mini-series with final food for thoughts and considerations on big data capabilities in a company context.
I. Where is home for big data capabilities?
First of all, I want to spend few more words regarding the organizational home (Pearson and Wegener, 2013) for data analytics. I claimed that theÂ Centre of ExcellenceÂ is the cutting-edge structure to incorporate and supervise the data functions within a company. Its main task is to coordinate cross-units activities, which include:
Maintaining and upgrading the technological infrastructures;
Deciding what data have to be gathered and from which department;
Helping with the talents recruitment;
Planning the insights generation phase and stating the privacy, compliance, and ethics policies.
However, other forms may exist, and it is essential to know them since sometimes they might fit better into preexisting business models.
Data analytics organizational models
The figure shows different combinations of data analytics independence and business models. It ranges from business units (BUs) that are completely independent one from the other, to independent BUs that join the efforts in some ...Read More on Datafloq
AI and Speech Recognition: A Primer for Chatbots
Conversational User Interfaces (CUI) are at the heart of the current wave of AI development. Although many applications and products out there are simply “Mechanical Turks” — which means machines that pretend to be automatized while a hidden person is actually doing all the work — there have been many interesting advancements in speech recognition from the symbolic or statistical learning approaches.
In particular, deep learning is drastically augmenting the abilities of the bots with respect to traditional NLP (i.e., bag-of-words clustering, TF-IDF, etc.) and is creating the concept of “conversation-as-a-platform”, which is disrupting the apps market.
Our smartphone currently represents the most expensive area to be purchased per squared centimeter (even more expensive than the square meters price of houses in Beverly Hills), and it is not hard to envision that having a bot as unique interfaces will make this area worth almost zero.
None of these would be possible though without heavily investing in speech recognition research. Deep Reinforcement Learning (DFL) has been the boss in town for the past few years and it has been fed by human feedbacks. However, I personally believe that soon we will move toward a B2B (bot-to-bot) training for a very simple reason: the reward structure. Humans spend time training their bots if ...Read More on Datafloq
The Psychology of Data Science
I have recently published a piece on what it means and what it takes to be a data scientist. I want to add a further consideration, which lies at the intersection of science and psychology.
I. Data Scientists’ Characteristics
There is no scientist exactly alike another, and this is true for data scientists as well. Even if data science seems to mainly be a field run by American white male with a PhD (what I inferred from King and Magoulas, 2015), this is not conclusive at all on the ideal candidate to hire. The suggestion is to value the skills and capabilities more than titles or formal education (there are not many academic programs so well-structured to signal the right set of competencies to potential employers).
So far, in order to become a data scientist, the paths to be followed could have been unconventional and various, so it is important to assess the abilities instead of simply deciding based on the type of background or degree level. Never forget that one of the real extra-value added by data science is different field contaminations and cross-sectional applications.
But there is also another relevant aspect to take into account in choosing the right candidate for your team, and that is Psychology.
II. How ...Read More on Datafloq
Data Security, Data Ethics, and Data Ownership
I. The problem(s)
Data security represents one of the main problems of this data yield generation, since a higher magnitude of data is correlated with a loose control and higher fraud probability, with a higher likelihood of losing own privacy, and becoming targets of illicit or unethical activities. Today more than ever a universal data regulation is needed — and some steps have already been taken toward one (OECD, 2013). This is especially true because everyone claims privacy leakages, but no one wants to give up on the extra services and customized products that companies are developing based on our personal data.
It is essential to protect individual privacy without erasing companies’ capacity to use data for driving businesses in a heterogeneous but harmonized way. Any fragment of data has to be collected with prior explicit consent, and guaranteed and controlled against manipulation and fallacies. A privacy assessment to understand how people would be affected by the use of data is crucial as well.
II. Fairness and Data Minimization
There are two important concepts to be considered from a data protection point of view: fairness and minimization.
Fairness concerns how data are obtained, and the transparency needed from organizations that are collecting them, especially about their future potential uses.
Data minimization, instead, regards the ability of gathering ...Read More on Datafloq
Who is Best Positioned to Invest in Artificial Intelligence? A Descriptive Analysis
It seems to me that the hype about AI makes really difficult for experienced investors to understand where the real value and innovation are. I would like then to humbly try to bring some clarity to what is happening on the investment side of the artificial intelligence industry.
We have seen as in the past the development of AI has been stopped by the absence of funding, and thus studying the current investment market is crucial to identify where AI is going. First of all, it should be clear that investing in AI is extremely cumbersome: the level of technical complexity goes out of the pure commercial scope, and not all the venture capitalists are able to fully comprehend the functional details of machine learning. This is why the figures of the “Advisors” and “Scientist-in-Residence” are becoming extremely important nowadays. Those roles would also help in setting the right expectations level, and figuring out what is possible and what is not.
AI investors are also slightly different from other investors: they should have a deep capital base (it is still not clear what approach will pay off), and a higher than usual risk tolerance: investing in AI is a marathon, and it might take ten years ...Read More on Datafloq
Artificial Intelligence Classification Matrix: A Framework to Classify AI Startups
All the problems discussed in the previous posts can create two major cross-sectional problems: the likely event to run out of money before hitting relevant milestones toward the next investment, as well as whether pursuing specific business applications to break even instead of focusing on product development.
In terms instead of classifying different companies operating in the space, there might be several different ways to think around machine intelligence startups (e.g., the classification proposed by Bloomberg Beta investor Shivon Zilis in 2015 is very accurate and useful for this purpose). I believe though that a too narrow framework might be counterproductive given the flexibility of the sector and the facility of transitioning from one group to another, and so I preferred to create a four-major-clusters categorization:
i) Academic spin-offs: these are the more long-term research-oriented companies, which tackle problems hard to break. The teams are usually really experienced, and they are the real innovators who make breakthroughs that advance the field;
ii) Data-as-a-service (DaaS): in this group are included companies which collect specific huge datasets, or create new data sources connecting unrelated silos;
iii) Model-as-a-service (MaaS): this seems to be the most widespread class of companies, and it is made of those firms that are commoditizing their ...Read More on Datafloq
The 37–78 Paradigm: A New Artificial Intelligence Framework
I truly believe AI is life-changing, and I want to highlight three final characteristics that AI-as-a-technology is introducing. First of all, AI is disrupting the traditional IoT business model because is bringing analytics to final customers instead of centralizing it.
Second, it is forcing business to keep customers in the loop — and the reasons why are manifold:
i) establishing trust into the product and the company;
ii) increasing the clients’ retention building customary behaviours;
iii) improve sensibly the product through feedbacks.
The shifting focus on the final user as part of the product development is quickly becoming essential, to such a point that it represents a new business paradigm, i.e., the “Paradigm 37–78”. I named this new pattern after the events of March 2016, in which AlphaGo defeated Lee Sedol in the Go game. In the move 37, AlphaGo surprised Lee Sedol with a move that no human would have ever tried or seen coming, and thus it won the second game. Lee Sedol rethought about that game, getting used to that kind of move and building the habit of thinking with a new perspective. He started realizing (and trusting) that the move made by the machine was indeed superb, and in game four he surprised ...Read More on Datafloq
How AI is Revolutionizing Business Models
AI is introducing radical innovation even in the way we think about business, and the aim of this section is indeed to categorize different AI companies and business models.
It is possible to look at the AI sector as really similar in terms of business models to the biopharma industry: expensive and long R&D; long investment cycle; low-probability enormous returns; concentration of funding toward specific phases of development. There are anyway two differences between those two fields: the experimentation phase, that is much faster and painless for AI, and the (absent) patenting period, which forces AI to continuously evolve and to use alternative revenue models (e.g., freemium model).
I. The DeepMind Strategy and the Open Source Model
If we look from the incumbents’ side, we might notice two different nuances in their business models evolution. First, the growth model is changing. Instead of competing with emerging startups, the biggest incumbents are pursuing an aggressive acquisition strategy.
I named this new expansion strategy the “DeepMind strategy” because it has become extremely common after the acquisition of DeepMind operated by Google.
The companies are purchased when they are still early stage, in their first 1–3 years of life, where the focus is more on people and pure technological advancements rather than revenues (AI is the ...Read More on Datafloq
How Artificial Intelligence Will Revolutionize the Insurance Industry
This is a short piece summarizing applications of AI in the insurance industry. A longer article on this vertical will be out in a few weeks.
Artificial Intelligence (AI) is revolutionizing every industry, and insurance will be affected as well. As already stated in previous posts, AI today is perceived in three different ways: it is something that might answer all your questions, with an increasing degree of accuracy (“the Oracle”); it could do anything it is commanded to do (“the Genie”), or it might act autonomously to pursue a certain long-term goal (“the Sovereign”). My personal definition is the following one:
An artificial intelligence is a system that can learn how to learn, or in other words a series of instructions (an algorithm) that allows computers to write their own algorithms without being explicitly programmed for.
An artificial engine can also be classified in three ways: a narrow AI, which is nothing more than a specific domain application or task that gets better by ingesting further data and “learns” how to reduce the output error. An example here is DeepBlue for the chess game, but more generally this group includes all the functional technologies that serve a specific purpose. These systems are usually quite controllable ...Read More on Datafloq
What You Are Too Afraid to Ask About Artificial Intelligence (Part III): Technologies
As we explained before, the recent surge of AI and it is rapidly becoming a dominant discipline are partially due to the exponential degree of technological progress we faced over the last few years. What it is interesting to point out though is that AI is deeply influencing and shaping the course of technology as well.
First of all, the Graphics Processing Units (GPUs) have been adapted from traditional graphical user interface applications to alternative parallel computing operations. NVIDIA is leading this flow and is pioneering the market with the CUDA platform and the recent introduction of Telsa P100 platform (the first GPU designed for hyperscale data center applications). On top of P100, they also created the first full server appliance platform (named DGX-1), which will bring deep learning to an entirely new level. Very recently, they also released the Titan X, which is the biggest GPU ever built (3,584 CUDA cores).
In general, the most impressive developments we observed are related to chips, especially Neuromorphic Processing Units (NPUs) ideated to emulate the human brain. Specific AI-chips have been created by major incumbents: IBM has released in 2016 the TrueNorth chip, which it is claimed to work very similarly to a mammalian brain. The chip is made of 5.4 billion transistors, and ...Read More on Datafloq
What You are Too Afraid to Ask About Artificial Intelligence (Part I): Machine Learning
AI is moving at a stellar speed and is probably one of most complex and present sciences. The complexity here is not meant as a level of difficulty in understanding and innovating (although of course, this is quite high), but as the degree of interrelation with other fields apparently disconnected.
There are basically two schools of thought on how an AI should be properly built: the Connectionists start from the assumption that we should draw inspiration from the neural networks of the human brain, while the Symbolists prefer to move from banks of knowledge and fixed rules on how the world works. Given these two pillars, they think it is possible to build a system capable of reasoning and interpreting.
In addition, a strong dichotomy is naturally taking shape in terms of problem-solving strategy: you can solve a problem through a simpler algorithm, which though it increases its accuracy in time (iteration approach), or you can divide the problem into smaller and smaller blocks (parallel sequential decomposition approach).
Up to date, there is not a clear answer on what approach or school of thoughts works the best, and thus I find appropriate to briefly discuss major advancements in both pure machine learning techniques (Part I) ...Read More on Datafloq
13 Forecasts on Artificial Intelligence
We have discussed some AI topics in the previous posts, and it should seem now obvious the extraordinary disruptive impact AI had over the past few years. However, what everyone is now thinking of is where AI will be in five years time. I find it useful then to describe a few emerging trends we start seeing today, as well as make few predictions around machine learning future developments. The following proposed list does not want to be either exhaustive or truth-in-stone, but it comes from a series of personal considerations that might be useful when thinking about the impact of AI on our world.
The 13 Forecasts on AI
1. AI is going to require fewer data to work
Companies like Vicarious or Geometric Intelligence are working toward reducing the data burden needed to train neural networks. The amount of data required nowadays represents the major barrier for AI to be spread out (and the major competitive advantage), and the use of probabilistic induction (Lake et al., 2015) could solve this major problem for an AGI development. A less data-intensive algorithm might eventually use the concepts learned and assimilated in richer ways, either for action, imagination, or exploration.
2. New types of learning methods are the key
The new ...Read More on Datafloq
Artificial Intelligence: What is It and Why Now?
Artificial Intelligence (AI) represents nowadays a paradigm shift that is driving at the same time the scientific progress as well as the industry evolution. Given the intense level of domain knowledge required to really appreciate the technicalities of the artificial engines, what AI is and can do is often misunderstood: the general audience is fascinated by its development and frightened by terminator-like scenarios; investors are mobilizing huge amounts of capital but they have not a clear picture of the competitive drivers that characterize companies and products; and managers are rushing to get their hands on the last software that may improve their productivities and revenues, and eventually their bonuses.
Even though the general optimism around creating advancements in artificial intelligence is evident (Muller and Bostrom, 2016), in order to foster the pace of growth facilitated by AI I believe it would be necessary to clarify some concepts.
Basic Definitions & Categorization
First, let’s describe what artificial intelligence means. According to Bostrom (2014), AI today is perceived in three different ways: it is something that might answer all your questions, with an increasing degree of accuracy (“the Oracle”); it could do anything it is commanded to do (“the Genie”), or it might act autonomously ...Read More on Datafloq
Why Artificial Intelligence is Now More Important Than Ever
The reason why we are studying AI right now more actively is clearly because of the potential applications it might have, because of the media and general public attention it received, as well as because of the incredible amount of funding investors are devoting to it as never before.
Machine learning is being quickly commoditized, and this encourages a more profound democratization of intelligence, although this is true only for low-order knowledge. If from one hand a large bucket of services and tools are now available to final users, on the other hand, the real power is concentrating into the hands of few major incumbents with the data availability and computational resources to really exploit AI to a higher level.
Apart from this technological polarization, the main problem the sector is experiencing can be divided into two key branches: first, the misalignments of i) the long term AGI research sacrificed for the short term business applications, and ii) what AI can actually do against what people think or assume it does. Both the issues stem from the high technical knowledge intrinsically required to understand it, but they are creating hype around AI. Part of the hype is clearly justified, because AI has been useful in ...Read More on Datafloq
Big Data and Risk Management in Financial Markets (Part I)
We have seen how the interdisciplinary use of big data affected many sectors. Different examples are contagion spreading (Culotta, 2010); music albums success predictions (Dhar and Chang, 2009); or presidential election (Tumasjan et al., 2010).
In financial markets, the sentiment analysis probably represents the major and most known implementation of machine learning techniques on big datasets (Bollen et al., 2011). In spite of all the hype, though, risk management is still an exception. New information and stack of technologies did not bring as many benefits to the risk management as they did to trading for instance.
A risk is indeed usually addressed from an operational perspective, from a customer relationship angle, or specifically for fraud prevention and credit scoring. However, applications strictly related to financial markets are still not so widespread, mainly because of the following problem: in theory, more information should entail a higher degree of accuracy, while in practice it (also) exponentially augments the system complexity, making really complicated to identify and timely analyze unstructured data that might be extremely valuable in such fast-paced environments.
The likelihood (and the risk) of a network systemic failure is then multiplied by the increase in the interconnection degree of the markets. More and more data can help central ...Read More on Datafloq
Big Data Strategy (Part II): a Data Maturity Map
As shown in Part I, there are a series of issues related to internal data management policies and approaches. The answers to these problems are not trivial, and we need a frame to approach them.
A Data Stage of Development Structure (DS2) is a maturity model built for this purpose, a roadmap developed to implement a revenue-generating and impactful data strategy. It can be used to assess the current situation of the company and to understand the future steps to undertake to enhance internal big data capabilities.
The following table provides a four by four matrix where the increasing stages of evolution are indicated as Primitive, Bespoke, Factory, and Scientific, while the metrics they are considered through are Culture, Data, Technology, and Talent. The final considerations are drawn in the last row, the one that concerns the financial impact on the business of a well-set data strategy.
Figure 1. Data Stage of Development Structure (DS2)
Stage one is about raising awareness: the realization that data science could be relevant to the company business. In this phase, there is neither any governance structure in place nor any pre-existing technology and above all no organization-wide buy-in. Yet, tangible projects are still the result of individual’s data enthusiasm being channeled into something actionable. The set of skills owned is ...Read More on Datafloq
Big Data Strategy (Part I): Tips for Analyzing Your Data
We have seen in a previous post what are the common misconceptions in big data analytics, and how relevant it is starting looking at data with a goal in mind.
Even if I personally believe that posing the right question is 50% of what a good data scientist should do, there are alternative approaches that can be implemented. The main one that is often suggested, in particular from non-technical professionals, is the “let the data speak” approach: a sort of magic random data discovery that should spot valuable insights that a human analyst does not notice.
Well, the reality is that this a highly inefficient method: (random) data mining it is resource consuming and potentially value-destructive. The main reasons why data mining is often ineffective is that it is undertaken without any rationale, and this leads to common mistakes such as false positives; over-fitting; neglected spurious relations; sampling biases; causation-correlation reversal; wrong variables inclusion; or eventually model selection (Doornik and Hendry, 2015; Harford, 2014). We should especially pay specific attention to the causation-correlation problem, since observational data only take into account the second aspect. However, according to Varian (2013) the problem can be easily solved through experimentations.
Hence, I think that a hybrid approach is ...Read More on Datafloq
Data science and Big Data: Definitions and Common Myths
Big data is nowadays one of the most common buzzwords you might have heard of. There are many ways to define what big data is, and this is why probably it still remains a really difficult concept to grasp.
Someone describes big data as dataset bigger than a certain threshold, e.g., over a terabyte (Driscoll, 2010), while others look at big data as dataset that crashes conventional analytical tools like Microsoft Excel. More renowned works though identified big data as data that display features of Variety,Velocity, and Volume (Laney, 2001; McAfee and Brynjolfsson, 2012; IBM, 2013; Marr, 2015). And all of them are somehow true, although I think incomplete.
The first class of definitions is indeed partial, since it is related to a pure technological issue, i.e., the computational need overcomes the available analytical power of a single tool or machine. This would not explain however why big data came out few years ago and not back in the Nineties.
The second opinion is instead too constraining, since it assumes that all the features have to be satisfied to talk about big data, and it also seems to identify the causes that originated big data (i.e., a huge amount of fast and diverse new data sources), ...Read More on Datafloq