So, WTF is Artificial Intelligence Anyway?

So, WTF is Artificial Intelligence Anyway?

Image By Seanbatty (Pixabay)

According to Encyclopedia Britannica, artificial intelligence (AI) can be defined as:

"The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings. The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, like the ability to reason, discover meaning, generalize, or learn from previous experiences."

By now, we have all heard about how AI can make it possible for computers, machines and other electronic devices to perform increasingly complex and human-like tasks.

As all this sounds almost like magic, with machines performing increasingly complex tasks —from new gaming computers to self-driving cars – in reality most of AI technologies rely on the blend of software methods and technologies that imply collecting, processing and recognizing patterns within large amounts of data.

So, how does AI Works?

AI’s development initiated as an effort towards the creation of systems with similar human-intelligence capabilities having, according to, two main goals:

  • The Creation of Expert Systems − The systems which exhibit intelligent behavior, learn, demonstrate, explain, and advise its users.
  • The Implementation of artificial  “Human Intelligenceâ€� in Machines − Creating systems that understand, think, learn, and behave like humans.

By providing these devices with key abilities like learning from experience and adjusting to the type of input received, software providers enable them to change and adapt to produce insights by detecting these variances.

Through its evolution, AI has been continuously incorporating technological contributions from many sciences and disciplines, ranging from mathematics to biology and computer sciences and has evolved in parallel with many other sub-field disciplines or subareas of AI (Figure 1).

Some of these, subareas include:

  • Machine learning (ML). Basically, ML uses methods like statistics, neural networks, operations research and others to automate the analytical model building process that makes it possible to find hidden patterns and insights using large data sets. You can check WTF is ML here.
  • Neural networks. A neural network is a specific type of machine learning method built of interconnected units (network) able to iteratively process data by responding to external inputs and relaying data between each unit. The process requires multiple runs over the data set to find connections and derive meaning from undefined data.
  • Deep learning. A special case of ML that applies neural networks, is composed of many layers of processing units. The network has taken advantage of continuous advances in computing power as well as new training techniques to “learnâ€� complex patterns within large data sets. Among its many applications image and speech recognition can be included as preponderant ones. Check WTF is Neural Network here.
  • Natural language processing (NLP), or the technology that gives computers the ability to “understandâ€� and generate human language, written or in speech. Today, NLP includes human-computer interaction, in which both devices and humans communicate using normal, everyday language to interact.
  • Computer vision. Relying on some previous technologies mentioned and especially on pattern recognition deep learning, computer vision aims to recognize what’s in an existing image or video. By analyzing and understanding images, computers and devices can capture images or videos in real-time and interpret with accuracy what they contain.
  • Cognitive computing. A more recent addition to the AI field, cognitive computing aims also to provide information and insights that allow improving the decision-making, while enabling natural interaction between computers, devices and users. The main objective is to enable machines to mimic human processes and provide insights in a “human naturalâ€� fashion (language, images, etc.)

 Figure 1. Some of many AI sub-fields

In essence, AI consists on developing algorithms and models that can ingest large amounts of data and, using an iterative process, progressively learn and adapt to improve the information outcome.

With each iteration AI learns and acquires a new “skill� which enables it to improve the way it performs a classification or a prediction.

Today AI and many of its sub-fields are especially suited to approach problems that require working with data:
  • In large amounts
  • that it’s not structured, well organized or well formatted
  • That changes constantly
As AI finds structure and regularities in data, the algorithm keeps improving and acquires a skill, the algorithm keeps executing until it can  accurately classify or predict.

A key aspect of AI models is they adapt when given new data, which allows the model to adjust through training.

Traditional and AI based programs have important differences between them, while traditional programs are coded using a set of precise instructions and rules to answer specific questions, AI programs are flexible to allow the answering of generic questions.

According to Dr. Rajiv Desai, there are important differences between traditional and AI based software solutions which include, processing, nature of the data input and structure, among others (Figure 2):

Figure 2. Conventional programming vs AI programming (Credit: Dr. Rajiv Desai, An Educational Blog)

As opposed to conventional coding where the code is key to guide the process data in AI is, rather than the algorithm, the key value. Conventional software programs are provided with data and are told how to solve a problem, while AI programs work on exploiting inference capabilities to gain knowledge about a specific domain.

The following table (Figure 3), provided also by Dr. Rajiv Desai illustrates main differences between programming with and without AI.

Figure 3. Programming with and without AI programing (Credit: Dr. Rajiv Desai, An Educational Blog)

Due to its modular nature, AI is, in many cases, incorporated within existing applications rather than sold as an individual solution, although a new generation of programming platforms exist that enable users and organizations to develop AI-based applications.

Good But Still, What’s with All the Recent Hype With AI?

While we can date the initial development back to the 1940s —somehow in parallel to the own evolution of computer systems— it’s until recent years that AI has become almost omnipresent in any type of software system available, why?

While today  traditional computer programs can perform simple and increasingly complex tasks and analysis of data ―especially due to advances in computer processing speed as well as memory and storage power―, new business models keep increasing the demand for systems that can provide better insights and even act or decided on them, such is the case with new technologies like as mobility,  cloud computing or the internet of things.

All the previous is triggering the need for systems capable to analyze, predict and autonomously improve, features that traditional systems don't have.

So, aside from the new AI-based applications that keep emerging,  due to its modular nature, AI with all its sub set of methods and technologies have the capability to embed “intelligenceâ€� to existing software applications so today, a myriad of computers and new devices already in the market are being improved with new  AI capabilities, a reason why more and more applications are being infused with this pervasive technology.

So, today, a myriad of services, ranging from conversational platforms, to bots and smart machines are now being applied exponentially to more software and products to improve their services at homes, workplace, and even on the streets.

From Siri, added as a feature to all Apple products , to the brand new autonomous database services offered by the Oracle DWH Automation service, many products are now poised to be infused with advanced AI capabilities.

How About the Potential Applications of AI?

As mentioned, software applications in all industries and business areas keep incorporating small and big pieces of AI functionality within their domains. A good sample of the many current uses of AI include:

  • Cybersecurity. A growing number of organizations incorporate AI and ML algorithms to, for example, detect malware. ML algorithms and models can predict with increasing accuracy which files carry malware by looking into patterns within the file or how the data was accessed which can signal its presence.
  • Fraud Detection. As AI and ML algorithms improve and become more efficient, so the solutions to detect potential fraud. New systems for this purpose now incorporate AI for spotting and predicting potential cases of fraud across diverse fields, including banking or online purchasing sites. Organizations use AI’s capabilities to continuously improve their mechanisms for spotting potential cases of fraud by comparing millions of transactions and being able to distinguish between legitimate and fraudulent transactions.
  • Health Care. New AI applications can now provide personalized medicine and X-ray readings by analyzing images while AI based personal health care assistants can remind you to take your pills, exercise or eat healthier relying on the analysis of your personal health data.
  • Manufacturing. As data is streamed from connected equipment, AI based software can analyze manufacturing equipment’s data and forecast expected load and demand or predict its maintenance cycle by using specific types of deep learning networks that use sequence data.
  • Retail. AI can now provide retailers with virtual shopping capabilities, offer personalized services and recommendations for users while also gain efficient stock management and site layout via improved analysis and insight provided by AI.
  • Sports. New AI based solutions in sports can now be applied for image capturing of game plays and provide coaches with reports that can help them improve game tactics and strategy.

A we can see from the samples above, there are several cases where AI can be effectively applied for process improvement, efficient analysis, and better decision making.

How About the Software Available and its Adoption in an Organization?

Despite AI sound like a complicated and worst of all, expensive technology to adopt, currently AI has become accessible for almost any type of organization.

Now AI is embedded in so many software solutions that organizations of all sizes can adopt AI in some form and for a great deal of business uses that, it wouldn’t even be surprising if you are already using some AI enabled solution and not being unaware of it.

So, where should we start using AI within our organization? Well, this will depend on your organization’s budget, use case(s) complexity and existing expertise of AI to define what type of AI and consequently, what type of provider and vendor should we pick.

A good starting point would be to classify categories, those of companies offering AI solutions to understand in general the varied types of AI companies and how they could potentially help us to adopt some form of AI within our organization
In her blog The 3 major categories of AI companies, Catherine Lu, makes an interesting classification of AI companies, dividing them in three main categories:

  • Data science consulting firms: low productization
“Data science consulting firms are defined by their low level of productization. Their main advantage is that it’s easier for them to deliver great results, as AI models require customization and are highly dependent on customer data. Their disadvantage is that they cannot scale quickly. For companies that are expected to be high growth, they will need to figure out how to move out of this category.�

  • AI platform companies: high productization targeting many use cases
“AI platform companies offer to be the underlying infrastructure on top of which specific AI solutions live. They can allow end users to import data, perform data wrangling and transformations, train models, and perform model validation.�
This includes platforms like and Databricks.

  • Vertical AI companies: high productization targeting few use cases
“Vertical AI companies solve a particular business problem or set of problems with a productized solution. They enable their enterprise customers to achieve additional lift from AI without needing to build or maintain models in-house. Examples on this end are more numerous.�
This includes companies like DigitalGenius (customer support), Entelo (recruiting), Cylance (cybersecurity), or DataVisor (fraud detection).�

On a brief note, while Ms. Lu emphasizes her belief that vertical AI companies will be those that succeed due to their ability to provide productized solutions that scale, recent development and evolution of low code technologies make me think a bit different, as they are enabling a larger number of organizations to instead of adopt vertical solutions, to acquire AI development platforms that have lower learning curves and, consequently, enable the production of custom solutions with lesser effort but more customized capabilities.

Examples? Some include IBM (Watson), Amazon and Microsoft.

So… What’s in it for Me and my Organization?

Well, in short, AI could offer effective ways to achieve improvement in different fronts, including business operation, analytics efficiency as well as decision making improvement.
In wider view, the benefits of AI adoption can come in different forms, AI solutions deployed properly can allow organizations to streamline and improve operations via automation and adaptation while also  improving analysis processes to increase accuracy and improve chances of successful decisions.

Whether your organization decides to go easy and adopt a proven vertical AI solution or jump directly to developing AI solutions in-house, as more and more software providers keep infusing AI to their software offerings, it is only natural to expect AI will keep continuously evolving and, as it does, will be improving the way many software solutions work.

So, while science fiction novels and movies portray AI as machines and robots that can and will eventually rule the world, in reality, up to now AI is more about enhancing than replacing what humans can do, or can’t?

The BBBT Sessions: WhereScape

The BBBT Sessions: WhereScape

Originally founded in 1997 in Auckland, NZ as a data warehouse consulting company, WhereScape has evolved to become a solution provider and —especially in the last five years— a key player in the data management market and especially in the data warehousing and big data spaces.

During a great session with the BBBT, WhereScape showed their “latest and greatest� news and innovations, triggering meaningful discussions and interactions with the analysts of the BBBT.

Here, a summary and commentary of that cool session.

WhereScape at a glance

As mentioned before, during an evolution process that expands for more than 20 years, WhereScape became a solution provider of data infrastructure and automation solutions, it currently offers three main solutions:
  • WhereScape 3D. A solution to assist in planning, modeling and designing data infrastructure projects as well as enabling rapid prototyping.
  • WhereScape RED. A solution to enable fast-track development, deployment and operation of data infrastructure projects and reduce delivery times, effort, cost and risk of new projects.
  • WhereScape Data Vault Express. A solution especially designed to enable the automation of the entire life cycle of a modern database modeling method Data Vault 2.0-based project delivery.
Along with its vast experience in the consulting field and later with its software solutions, WhereScape has become a major player that navigates in the frontline of the data warehouse and data infrastructure automation market.

Is in the context of “automation� that the BBBT session with WhereScape was focused on, centering on the software company’s advances within its data infrastructure software.

Today and especially since the last couple of years, it comes as no surprise that automation is becoming increasingly important for almost all major software technology footprints, including the data warehousing and data management markets.

Well, because as Neil Barton —WhereScape’s chief technology officer— explains neatly during the briefing: automation in a data infrastructure can provide great value to an organization which can be translated into many aspects of its operation, from cost, development time and risk reduction, to enabling IT to keep the pace with business needs.
This is especially true when data warehousing has traditionally been done in a complicated and cumbersome way (Figure 1).

Figure 1.  The traditional data warehousing development/maintenance cycle (Image courtesy of WhereScape)

Consequently, WhereScape aims to provide automation and simplification of most, if not all, the process data warehousing process by incorporating not just plain automation of tasks but encapsulation of methodologies, the best practices and industry standards within to enhance simplification, provide time reduction and yet, ensuring compliance with internal and external data management regulations.

One of what I think is a unique aspect of WhereScape is the company’s “holistic� view to automation in which all the data cycle is considered via a metadata driven approach, making it possible for WhereScape to easily provide documentation and lineage and have a full cycle management approach within a single solution (Figure 2), and more so, to bring integration to WhereScape’s own set of solution.

Figure 2. Simplification and automation with WhereScape (Image courtesy of WhereScape)

Interestingly, while not trying to “reinvent the wheel�, and remaining loyal to a well proven life cycle (Figure 3.) WhereScape developed and made this life cycle an intelligent and automated one by aiming to improve all its faces from the discovery to the rest of its stages to reduce time, effort and augment efficiency.

By being metadata-driven and having an integrated structure within all its components, WhereScape  incorporates automation to the entire life cycle process and enable smoother documentation.

This is especially true under the design and operation processes —normally a complex and tedious stages— in which WhereScape introduces, for the first one, an automated model generation engine which allows speeding the common iterative design cycle as well as the ability to manage efficiency by enabling solid dependency management and integrated scheduling and logging to ensure efficient auditing and operation analysis.

Figure 3. WhereScape’s automation life cycle (Image courtesy of WhereScape)

Additionally, to showing efficient automation control and management, WhereScape also shows flexibility by offering support for a set of projects, including:
  • Data Warehouses
  • Data Marts
  • Data Vaults
  • Cloud deployments and
  • Big data support
Making WhereScape to count as an important emerging player in the data management automation market.

WhereScape at the BBBT

After a summary of what WhereScape is all about, allow me to highlight some of the most relevant aspects of what was a great presentation conducted by Neil Barton, WhereScape’s Chief Technology Officer (CTO).

The briefing gave us good info about what WhereScape has been working in recent months and what is now up to for the future, some of which include:
  • Increasing data platform coverage. WhereScape has been working to expand its set of “data infrastructure automationâ€� solutions to a wider number of data platforms, which now includes:
    • Amazon Redshift
    • Microsoft Azure SQL Data Warehouse
    • Snowflake
    • EXASOL
    • SAP HANA
    • PostgreSQL
  • Extensive support for Data Vaults.  WhereScape has worked on providing extensive support for Data Vault —a database modeling method designed to provide long-term historical storage of data coming in from multiple operational systems— and more specifically engineered for Data Vault 2.0.
  • Support for Data Streaming. The briefing included an interesting discussion about WhereScape’s new support for data streaming a feature aimed to help IT teams to manage hybrid flows of streaming real-time and traditional batch-based data by enabling the design, development and deployment of more advanced data infrastructures.
  • Reinforcing Support for Cloud Data Platforms. Within the company’s efforts expand support for data platforms it can be highlighted the continuous support for cloud-based data platforms which includes direct support for cloud-based data management engines including Snowflake, Amazon’s Redshift, and Microsoft SQL Azure.
  • Addressing a Hybrid Reality. WhereScape has taken steps to address a reality for companies that now navigate data that exist both on cloud and on-premises sources making it possible to move data between both world and to ease this transition.   

On another key moment during WhereScape’s presentation, WhereScape guided us towards what the company has developed to enable organizations to, via its solutions combined with key partners including StreamSets, include support for streaming IoT integration.

Real-time data flows, or “streaming� data sources, can now be collected from many areas within an organization’s data landscape, including from in-field units sharing sensor-based data, social media feeds to support sentiment analysis, or from internal systems feeds.

Utilizing industry-leading dataflow management technology developed by StreamSets and the proven efficiencies of WhereScape automation solutions WhereScape RED or WhereScape Data Vault Express, WhereScape automation with Streaming minimizes the learning curve for IT teams working with these new data sources and ensures best practice integration of streaming data (Figure 4).

Figure 4. WhereScape’s Streaming IoT Integration (Image courtesy of WhereScape)

Serving as a central element, WhereScape aims to provide all the necessary tools to complete the full IoT cycle from data collection to the provision of insights and information for users, automating the event queuing transformation and storage and administration stages and according to WhereScape:
  • New analytic opportunities
  • Speed up pipeline development
  • Hide complexity of underlying technologies
  • Minimize the learning curve for teams
  • Integrate with batch-based data
  • Ease ongoing management
The briefing portrays a great environment to help us confirm WhereScape’s role in the market as well as what the company is doing to improve all their solutions to adapt to new technologies and needs. A great, clear and concise briefing by WhereScape.

WhereScape: A Tough a Couple of Final Thoughts

From we can see, WhereScape has been interestingly evolving, from being a data warehouse automation solution to become what the company calls: a data infrastructure automation solution.

Interestingly, the company has shown savviness to evolve its solutions so that, via a model driven design, to incorporate not just connectors to new data sources but the necessary tools and design to enable organizations to modernize full data management platforms in complete hybrid environments.

Figure 5. WhereScape’s RED Multiple Data Model Architecture Overview (Image courtesy of WhereScape)

Of course, even though WhereScape has grown a solid and intelligent solution strategy, from a market perspective, gaining market share hasn’t and will not be a walk in the park.

As automation technologies continue to gain steam in all main software areas, a larger number of software companies are now incorporating increasingly sophisticated mechanisms to their data management and analytics solutions thus, making automation another key feature in an already extremely competitive market, such is the case for Oracle and its new Oracle Autonomous Data Warehouse Cloud Service or offerings by other competitors like Attunity Compose or Panoply

So, does WhereScape can lead in data infrastructure automation?

Of course, but will need to make it can consistently deliver IT organizations with the means to achieve effective automation within all faces —design, development, and deployment— of, especially, a data warehouse and, generally, a full fledge data infrastructure.

This includes, achieving seamless collection and processing of multiple source types and of course effective complexity cutting.

From the briefing, it seems WhereScape is, if not there, on a right way to achieve some or many of these goals, making it one of those companies up to the challenge for gaining the favor of companies with complex data infrastructures to deal with.

WhereScape is rapidly realigning to serve a new generation of data management systems and needs.

Informatica Partners with Google Cloud to Provide AI-Driven Integration

Informatica Partners with Google Cloud to Provide AI-Driven Integration

As cloud computing adoption continues to grow, so  the need for modern and more efficient business and data integration capabilities.

And while many aspects of business and data integration are being simplified and automated, the increasing sophistication of business needs and the requirement of capabilities for performing highly efficient integration continuously are forcing organizations to make continuous calls for new and ongoing digital transformation efforts.

In this vein, an interesting news came in just a couple of weeks ago when a partnership between Informatica, a big player in the integration platform as a service (iPaaS) and tech giant Google was announced.

Of course, the mere fact that two major players in the software industry decide to partner is already something worth to listen to, but the partnership is also particularly interesting because it involves the provision of artificial intelligence(AI)-driven integration services, in an enormous effort from both companies to bring integration services to a new level.   

The Details

The announcement specifically describes a new relation between Informatica and Google Cloud’s Apigee product group —the company’s group devoted to help organizations design, secure and scale application programming interfaces (APIs)— to enable customers to rapidly leverage data and applications, deployed within hybrid and multi-cloud environments through innovative AI-driven integrations and APIs.

According to both companies' communique, customers now can develop APIs that can easily enable access to applications, data and metadata and make use of AI-driven predictive analysis and recommendations powered by the Informatica CLAIRE engine, Informatica’s enterprise unified metadata intelligence engine.

The new Informatica Integration Cloud for Apigee, aims to provide “zero-code� API development and management capabilities. According to the announcement:
“Developers, integrators, and analysts will be able to point to any data or application, turn it into a secure, managed API with the click of a button, and then integrate and orchestrate business processes with a simple, wizard-driven drag and drop.�
Other relevant aspects of the Informatica-Google partnership include product-level integrations, enabling organizations to take business integration processes built in Informatica and publish them as managed and secure APIs to the Apigee Edge API Platform. Secure API proxies can be quickly built by automatically discovering Informatica business integration data and processes.

From Apigee’s side, the platform will provide customers with Informatica Intelligent Cloud Services that are an integrated edition within the Apigee Edge API Platform.

So What?

Well...while perhaps not flashy, this new partnership announcement news carries in my view a relevant message to the data management market, signaling on one hand, the increasing importance of seamless integration of enterprise software services and a new approach to designing, developing and deploying intelligent, universally embeddable and easy to use processes and data management tasks. And on the other, the continuous effort from software providers to enhance their ability to simplify business and function integration through APIs, including efforts made by other companies including Microsoft, Oracle and others.

To this point, Ed Anuff, director of product management, Google Cloud mentioned:
“Modern business isn’t just about adopting a mobile strategy or using the cloud to generate efficiency savings. Enterprises are leveraging new integrations for seamless workflows that allow them to use data and applications to create remarkable experiences for their customers, employees and partners. With the product-level integrations between Apigee Edge and Informatica's Integration Cloud, we can deliver end-to-end API life cycle management and integration capabilities to help enterprises accelerate their journey to become modern, connected digital businesses.�
Personally, I think this news keeps signaling what will a next stage in enterprise software where integration will be developed under new paradigms to enable low-code, increased device and platform portability as well as neater third party integration.

Finally, you can either take a moment to read a good piece on the Google Cloud Platform Blog introducing these and more new capabilities being introduced within the new version of the Apigee platform or throw me a comment in the lines below.
WTF is Deep Learning Anyway

WTF is Deep Learning Anyway

Following on my previous WTF post on Machine Learning, it just make sense to continue in this line of thought to address another of many popular and trendy concepts. We are talking about: Deep Learning.

So without further due, lets explain WTF is deep learning shall we?

Simply put, and as inferred from the previous post mentioned, deep learning is one of  now many approaches to machine learning we can find out there, along the lines of other approaches like decision tree learning, association rule learning, or Bayesian networks.

While deep learning is not new, was introduced by Dr. Rina Dechter in 1986, its until recent years that this approach have gained fame and popularity among users and particularly among software companies adopting it within their analytics arsenals.

Deep learning enables to train the computer to perform tasks including recognizing speech, identifying images or making predictions by, instead of organizing data to run through predefined equations, sets up basic parameters about the data and train the computer so it can “learn� by recognizing patterns and by executing many layers of processing.

So, What Has Made Deep Learning so Popular?

Many factors have played out to enable the popularity of machine learning in general as well as deep learning in articular.

Today, modern deep learning can provide a powerful framework for supervised learning and for addressing increasingly complex problems, consequently it has gained huge popularity in many fields of computing, including computer vision, speech and audio recognition, natural language processing (NLP), bioinformatics, drug design and many others but, why?

This popularity have to do on one hand, of course, with the fast evolution of deep learning algorithms but also, due to the converged evolution of core computer processing related technologies including big data, cloud computing or in-memory processing which has enabled deep learning algorithms which require intensive computer resources to be deployed in increasingly faster and more efficient computing infrastructures.

On the other, due to the evolution and consumerization of peripheral technologies like mobile and smart devices  which have made it possible to providers to embed deep learning functionality within increasing systems and for increasing use cases and reach more audiences that can use and develop deep learning in a more “naturalâ€� way.

How Does Deep Learning Works?

In general, most deep learning architectures are constructed from a type of computing system called artificial neural networks (ANN) —I know, we will get to its own WTF soon— yet they can also include other computing structures and techniques so, Inspired by the structure and the functions of the brain, deep learning usage of ANN’s recreates the interconnection of neurons by developing algorithms that mimic the biological structure of the brain.

Within an ANN, units (neurons) are organized in discrete layers and connected to other units so that each layer choses a specific feature to learn (shapes, patterns, etc.). Each layer creates a depth of “learning� or “feature to learn� so that, by adding more layers and more units within a layer, a deep network can represent functions of increasing complexity or depth. Is this layering or depth that gives the deep learning its name (Figure 1).

Figure 1.  A 3-layer neural network with three inputs, two hidden layers of 4 neurons each and one output layer. (Source: CS231n Convolutional Neural Networks for Visual Recognition)

Until now most, if not all, deep learning applications deal with tasks or problems that, as the previous figure shows, consist on mapping an input vector to an output vector, allowing the solving of problems that require large enough models and large enough datasets.

These problems are  commonly those that humans would solve relatively easy and without a need to reflect on them (identify forms and shapes, for example), and yet due to the increasing computing power available and the continuous evolution of deep learning, are now allowing computers to perform even faster than humans.

It's clear then that both machine learning in general, and deep learning in particular change the common paradigm for analytics by, instead of developing an algorithm or algorithms to instruct a computer system on how to specifically solve a problem, a model is developed and trained so that the system can “learn� and solve the problem by itself (Figure 2).

Figure 2.  Traditional programming vs Machine learning approaches. (Source: CS231n Convolutional Neural Networks for Visual Recognition)

A key advantage of deep learning is that while a traditional approach will start by using the available data to perform feature engineering and then select a model to estimate parameters within an often repetitive and complex cycle to finally get to an effective model, deep learning replaces it with layer approach in which each layer can recognize key features from patterns or regularities from the data.

Hence, deep learning replaces the formulation of a model using instead characterizations (or layers) organized hierarchically  that can “learnâ€� to recognize features from the available data (Figure 3), which result in the construction of "systems of prediction" that can:

  • Avoid use of hard parameters and business rules
  • Make better generalizations
  • Improve continuously

Figure 3.  Machine learning vs Deep learning. (Source: Xenonstack)

On the downside side, one common challenge when deploying and application of deep learning is it requires intensive computational power due to:

  1. The iterative nature of deep learning algorithms
  2. The increasing complexity as amount of layers increases
  3. The need for large volumes of data to train the neural networks

Still, deep learning’s continuous improvement feature sets an ideal stage for any organization to implement dynamic behavior features within their analytics platforms.

Applications of Deep Learning?

Today, deep learning has already applied in many industries and lines of businesses and, it keeps increasing at a constant pace. Some areas where deep learning has been successfully applied include:

Recommendation Systems

This is perhaps the flagship use case for machine learning and deep learning, companies including Amazon and Netflix have worked on using these systems to develop systems that can, with good chances of assertion, know what a viewer might be interested in watching or purchasing next, after his/her past behavior.

Deep learning enhances their recommendations in complex environments by increasingly learning users interests across multiple platforms.

Image and Speech Recognition

Another common applications of deep learning in the software industry is speech and image recognition, on the speech recognition aspect, companies like Google, Apple and Microsoft have applied deep learning to products like Google Now, Siri and Cortana to recognize voice patterns and human speech.

One the image recognition side, regardless of how challenging can be, it’s possible to find projects already applying deep learning with different levels of success, companies like DeepGlint are using deep learning to recognize and acquire real-time insights from the behavior of cars, people and practically any object.

Applications like this have huge potential in sectors including law enforcement or self-driving cars.

Natural Language Processing

Neural networks and deep learning had been key for the development of natural language processing (NLP), an area of artificial intelligence that develops techniques and solutions that allow “natural� interaction between computers and human languages, especially to enable the processing of large amounts of natural language data.

Companies like MindMeld use deep learning and other techniques to develop intelligent conversational interfaces.

We could go on describing more use cases for deep learning but perhaps its is fair to say the number and types of applications for deep learning keep growing.

What is Out There in the Market?

Currently there are varied options for using or deploying deep learning, both to start experimenting and developing or, to deploy enterprise ready solutions that apply deep learning.

For those organizations with the will for development and innovation, open source based deep learning frameworks and analytics like Tensorflow, Caffe or Pytorch represent a great opportunity to get them up and running.

Other great solutions for developing and applying deep learning solutions include data science platforms like Dataiku, Datarobot or, just recently acquired by Oracle.

Also, users and organizations can take a practical approach and use niche vertical solutions like cloud-native endpoint protection platform CrowdStrike, healthcare software provider Flatiron Health, or security intelligence & analytics (SIA) company Endgame among many others.

Today deep learning and machine learning solutions are increasingly available for small, medium and large companies while, promoting a continuous and fast evolution of these techniques within the market landscape, no surprisingly, expectations are high from users to address and solve increasingly complex problems.

It also hints that perhaps, with new advances and techniques seeing the light of day so frequently, we are just at the beginning of a new era in the analytics marketplace.

It seems deep learning is no joke, or is it?

Oracle’s New Cloud Services: A New Big Push for Automation

Oracle’s New Cloud Services: A New Big Push for Automation

With a recent announcement Oracle, the global software and hardware powerhouse follows on its continuing effort to equipe all the solutions from its Cloud Platform with autonomous capabilities.

As part of a venture that started early this year with the announcement of the first set of autonomous services —including Oracle Autonomous Data Warehouse Cloud Service— and the announcement of Oracle 18c to be Oracle’s first fully autonomous database— the company is now extending these capabilities with the launch of another set of services in the cloud.

This time the turn is for three new services: Oracle Autonomous Analytics Cloud, Oracle Autonomous Integration Cloud, and Oracle Autonomous Visual Builder Cloud which, according to Oracle, will be followed by the release of more autonomous services later through the year and which will be focused on mobile, chatbots, data integration, blockchain, security and management,  as well as more traditional database workloads including OLTP.

Built from the ground-up with advanced artificial intelligence (AI) and machine learning algorithms, Oracle’s Cloud Platform new autonomous set of services aims, according to Oracle, to automate and/or eliminate tasks so organizations can lower costs, reduce risks and accelerate innovation while also gain predictive insights.

In this regard Amit Zavery, executive vice president of development from Oracle Cloud Platform mentioned:
“Embedding AI and machine learning in these cloud services will help organizations innovate in revolutionary new ways. These new cloud services are the latest in a series of steps from Oracle to incorporate industry-first autonomous capabilities that will enable customers to significantly reduce operational costs, increase productivity, and decrease risk.�

Oracle´s new and existing autonomous services within Oracle Cloud Platform services all follow the companies guidelines and fundamental autonomous capabilities, which can be summarized as:

  • Self-Driving capabilities that can reduce/eliminate human labor throughout all processes: provisioning, securing, monitoring, storing and copying and troubleshooting.
  • Self-Securing capabilities to secure services from external attacks and malicious internal users, this includes automatically application of security updates, protection against cyberattacks as well as automatically encrypt all data.
  • Self-Repairing capabilities for providing automated protection against planned and unplanned downtime, including maintenance.
The new autonomous services announced by Oracle are planned to impact different functional aspects of an organization’s necessary enterprise software services, from analytics to software development, a brief description of these new services include:

Oracle Autonomous Analytics Cloud

This service assembles a combination of technologies including machine learning, adaptive intelligence, and service automation within an analytics platform aiming to change the way users  analyze, understand, and act on information.

Oracle’s Autonomous Analytics Cloud service includes also functionality to enable business users to uncover insights by asking questions on their mobile devices. Natural language processing techniques convert questions into queries in the back end to processed so the system can deliver visualizations on their device.

The service’s machine learning functionality can autonomously gain intelligence and proactively suggest insight on data the user might not even have asked for or reveal hidden patterns.

The service is designed so can provide predictive analytics on IoT functionality data by applying domain specific machine learning algorithms on large volumes of sensor data and historical patterns.

Oracle Autonomous Integration Cloud

This service aims to speed an organization’s complex application integration process via automation.

Business processes that expand both Oracle and non-Oracle applications sitting over on-premises and SaaS applications can be embedded and integrated through a best-practice guided autonomous application integration process using machine-learning and pre-built application integration techniques.

The Autonomous Integration Cloud Service delivers an adaptive case management system through APIs with AI and machine learning frameworks which enables also Robotic Process Automation (RPA) to deliver process automation to systems with no APIs enabled.

Autonomous Visual Builder Cloud

Oracle’s Autonomous Visual Builder is designed to help companies accelerate their mobile and web application development cycles by providing business users and developers a framework to build applications with no coding.

By using the latest industry-standard technologies, the service automates code generation and allowing its deployment with a single click.

Aside from enabling rapid application development the service also automates the delivery of mobile applications across multiple mobile platforms including iOS and Android as well as availability to development on standard open-source technologies including Oracle JET and  Swagger.

So what?

Well, with a set of significant and continuous moves to achieve automation, Oracle aims to gain significant leading edge across the software industry which has become increasingly competitive.

Oracle is making clear it will extend autonomous capabilities throughout its entire Cloud Platform, committed to provide self-driving, self-securing, and self-repairing capabilities to all its PaaS services. Yet, in my view, even with all the potential advantages these movements might bring to the company is taking not a small risk, one perhaps comparable with IBM and Watson which for some time seemed to be launched in a time where most users were not ready for all the goodness the AI system could provide them with.

This been said, it's still hard of course not to be excited about Oracle’s new promise for a new generation of fully autonomous software, able to achieve many of the end objectives for what expert systems and artificial intelligence visionaries have dreamed for.

In the meantime, I hardly can wait to know what will be the response in the market both from users and of course, from Oracle’s competitors, if any, to these new release from Oracle.

Hadoop Platforms: The Elephants in the Room

Hadoop Platforms: The Elephants in the Room

"When there’s an elephant in the room introduce him"
-Randy Paush

It is common that when speaking about Big Data two major assumptions often take place:

One: Hadoop comes to our minds right by its side, and many times are even considered synonyms, which they are not.

While Big Data is the boilerplate concept that refers to the process of handling enormous amounts of data coming in different forms  (structured and unstructured), independent of the use use of a particular technology or tool, Hadoop is in fact, a specific open source technology for dealing with these sort of voluminous data sets.

But before we continue, and as a mind refresher, let’s remind ourselves what is Hadoop with their own definition:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Commercial Hadoop distributions assemble different combinations of various open source components from the Apache Software Foundation and more specifically from the Apache Hadoop stack.

These distributions integrate all components within a single product offering as an enterprise  ready commercial solution. In many cases, some distribution offer also proprietary software, support, consulting services, and training as part of their offering.

Two: When talking about Hadoop and its commercial use, quite often three common suspects come to our minds which, due to their history and ties with the evolution of Hadoop have become major players, we are talking about Cloudera, Hortonworks and MapR.

While there’s no doubt these Hadoop-based data platforms are major players, nowadays we can find  a significant number of options from which a company can choose from. So, to follow Mr. Pausch advice, let’s take a look at a list of Hadoop-based data platforms available in the market and introduce them.

Alibaba Cloud
Solution: Alibaba E-MapReduce Service

The Alibaba Cloud Elastic MapReduce (E-MapReduce) is a cloud-based big data processing solution based on Apache Hadoop and Apache Spark. E-MapReduce's flexibly allows the platform to be applied in different big data use cases including as trend analysis, data warehousing, and analysis of continuously streaming data.

Being in the cloud, E-MapReduce offers big data processing available within a flexible and scalable platform of distributed Hadoop clusters and seamless integration with the rest of the Alibaba Cloud offerings available.

Amazon Web Services
Solution: Amazon EMR

With Amazon EMR, the company provides a cloud-based managed Hadoop framework to make it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.
With Amazon EMR is also possible to deploy and run other open source distributed frameworks including Spark, HBase, Presto, and Flink within Amazon EMR and, interact with data stored in other AWS data stores like Amazon S3 and Amazon DynamoDB.

Amazon EMR includes interesting features for log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bio-informatics capabilities.

Solution:  Arenadata Hadoop (Open Analytical Platform)

The ArenaData Unified Data Platform is composed of a set of components along with Hadoop, including all the necessary software to access, manipulate, protect and analyze data.

Arenadata Hadoop (ADH), aims at handling semi-structured and unstructured data. It's an enterprise ready Apache Hadoop based distribution. Today, Arenadata Hadoop (ADH) is certified to fully comply the ODPI (Open Data Platform initiative) standard to fully deploy and assembly a completed Apache-based set of open source products, without proprietary software.

Arenadata Hadoop provides a full set of tools for autonomous installation on physical, as well as virtual machines. A software for monitoring and administration helps the system to optimize performance on all system’s components while with Apache Ambari it provides the necessary interfaces required for integration with current administrative systems including as like Microsoft System Center and Teradata ViewPoint.

Solution: Cloudera Enterprise Data Hub

The Enterprise Data Hub (EDH) is Cloudera’s Hadoop data platform distribution, it is a solution intended for enabling fast, secure, and easy big data software available. From data science and engineering, to powering an operational database, to running large-scale analytics, all within the same product.

Offered in different flavors: Analytic DB, Operational DB, Data Science & Engineering as well as an Essentials version, Cloudera’s EDH also offers, aside from its analytics and data management capabilities, features to run in the cloud like:

  • High-performance analytics. Able to run any analytics tool of choice against cloud-native object store, Amazon S3.
  • Elasticity and flexibility. Support transient Hadoop clusters and the ability to scale up and down as needed as well as use of permanent clusters for long-running BI and operational jobs.
  • Multi-cloud provisioning. Deploy and manage Cloudera Enterprise across AWS, Google Cloud Platform, Microsoft Azure, and private networks.
  • Automated metering and billing. To only pay for what a company needs, when it needs it.

Solution:  Gluent Data Platform

Implemented in large organizations across industries including: finance, telecom, retail and healthcare around the world, the Gluent Data Platform offers a Hadoop data platform for data offloading and access as well as its analysis.

Some benefits and features offered by Gluent include, among others:

  • High parallelism in Hadoop using cheap Hadoop cluster hardware and software
  • No changes required to existing application code for connection with sources by using Gluent’s Smart Connector
  • Offers capability to choose from and use multiple data engines (like Impala, Hive and Spark) to process your data
  • No data conversion or export/import is needed when using new engines on Hadoop

Google Cloud Platform
Solution:  Cloud Dataproc

Google’s Cloud Dataproc is a fully-managed cloud service for running Apache Spark and Apache Hadoop clusters. Some of the features of Cloud Dataproc includes:

  • Automated cluster management
  • Re-sizable clusters
  • Versioning
  • High availability
  • Integration with developer tools
  • Automatic or manual configuration
  • Flexible virtual machines

Cloud Dataproc also easily integrates with other Google Cloud Platform (GCP) services to provide a complete platform for data processing, analytics and machine learning.

Solution:  Hortonworks Data Platform (HDP)

HDP is an enterprise ready and secure Apache Hadoop distribution designed on a centralized architecture based in YARN. HDP aims to address the complete set of needs for data-at-rest, as well as to power real-time customer applications and deliver robust big data analytics solutions.

Whether on-premises or in the cloud, Hortonworks provides flexibility to run the same industry-leading, open source platform to gain data insights in the data center as well as on the public cloud of choice (Microsoft Azure, Amazon Web Services or Google Cloud Platform)
Solution:  Infosys Information Platform (IIP)

IIP is a data and analytics platform designed to help enterprises leverage their data assets for innovation and enhance business growth. The solution can easily integrate with proprietary software, to allow companies to maximize value from existing investments.

According Infosys, IIP is collaborative platform that enables data engineers, data analysts and data scientists to work jointly across business domains and verticals. IIP can be deployed with ease and without vendor lock-in.

With improve security with role-based access controls that include cell-level authorizations IIP helps enterprises to simplify their data management operations and understand data better to accelerate the data-insight-action cycle.

IIP aims to be the right tool for organizations that want to gain real-time insights, get faster business value, stay compliant with updated governance and robust security, and reduce total cost of ownership with high availability.

Solution:  MapR Converged Data Platform

MapR’s Converged Data Platform integrates Hadoop, Spark, and Apache Drill along with real-time database capabilities, global event streaming, and scalable enterprise storage to provide a full enterprise ready big data management platform with Hadoop.

The MapR Platform aims to deliver enterprise grade security, reliability, and provide real-time performance capabilities while lowering both hardware and operational costs for applications and data.

The MapR Converged Data Platform has the ability to simultaneously perform analytics and applications at high speed and enable scaling and reliability. The strategy is to converge all data within a data fabric allows its storage, management, processing, and its analysis as data is being generated.

Mastodon C
Solution:  Kixi

Mastodon C’s open source data platform Kixi uses Hadoop, Cassandra and a set of open source technologies to ingest and integrate batch and real-time data within a single repository, from which the platform  can aggregate, model, and analyze it.

Some of kixi’s main features include:

  • Handling of real-time and sensor data via Apache Kafka
  • ETL and batch processing capabilities
  • Data Science capabilities for advanced data analysis
  • Ongoing support to ensure efficient data processing and continuous review and improvement of customers data pipelines and models.

Microsoft Azure
Solution:  Microsoft Azure HDInsight

Backed by Hortonworks, Azure’s HDInsight is, according to Microsoft, a fully-managed, full-spectrum open source analytics service for enterprises.

The Azure HDInsight service aims to provide a fully-managed cloud service to make it easy for organizations to process massive amounts of data via popular open source frameworks including Hadoop, Spark, Hive, LLAP, Kafka, Storm, R and others.

Azure HDInsight provides an architecture landscape for different use cases including ETL, Data Warehousing, Machine Learning, IoT and other services within an integrated platform.

Solution:  NEC Data Platform for Hadoop

Another offering powered by Hortonworks, NEC’s "Data Platform for Hadoop" is a pre-designed and pre-validated Hadoop appliance which integrates NEC's specialized hardware and Hortonworks’ Data Platform.

This NEC Hadoop-based appliance is already tuned to work with an enterprise ready Hortonworks platform, already certified for working on NEC’s server hardware.

Solutions: Oracle Big Data Cloud Service and Oracle Big Data Cloud 

Oracle has gone “big� with big data, with both its Big Data Cloud and Big Data Cloud Service, the mega tech vendor offers a couple of Hadoop-based data management platforms: The Oracle Big Data Cloud Service and Oracle Big Data Cloud.

Derived from a partnership with Cloudera, the Oracle Big Data Cloud Service aims to enable organizations to launch their Big Data efforts by providing a data platform within a secure, automated and scalable service that can easily can be fully integrated with existing enterprise data in Oracle Database. The service has been designed to:

  • Deliver high performance through dedicated instances
  • Allow dynamic scaling as needed
  • Reinforce and extend security to Hadoop and NoSQL processes
  • Deliver a comprehensive solution that includes robust data integration, capabilities and integration with R, spatial and graph software

Oracle Big Data is an enterprise-ready Hadoop data platform intended for those organizations that want to run big data workloads including batch processing, streaming and/or machine learning within a public or as a private cloud configuration.

Solution:  Qubole Data Service (Apache Hadoop as a Service)

Qubole offers an autonomous data platform implementation of Apache Hadoop in the cloud. The Apache Hadoop as a Service, part of Qubole Data Service offers a self-managing and self-optimizing implementation of Apache Hadoop that can run on different public cloud infrastructures including AWS, Azure and Oracle Cloud.

Qubole’s Hadoop service runs applications in MapReduce, Cascading, Pig, Hive, and Scalding. The service is optimized for faster workload performance and incorporates an enterprise-ready data security infrastructure.

Solution:  SAP Cloud Platform Big Data Services

SAP’s Big Data Services on its Cloud Platform is a full-service big data cloud-based Hadoop and Spark data platform.

The platform allows companies to utilize Apache Hadoop, Spark, Hive and Pig, as well as several third-party applications to take advantage of the most recent innovations in big data and attend the diverse set of use cases an organization might have.

Also, and worth mentioning, is that the service integrates with SAP Leonardo, the company’s IoT and digital innovation platform to take a systematic approach to digital innovation with SAP Leonardo’s capabilities while, according to SAP, the platform meets rigorous demands for reliability, scalability, and security.

Solution:  Syncfusion Big Data Platform

Syncfusion Big Data Platform is a full fledge Hadoop distribution designed for Windows, Linux, and Azure. One of the things that make this Hadoop platform interesting, aside from its features for managing huge data loads is its ability to easily create, deploy, and scale a secure Syncfusion Hadoop cluster with basic or Kerberos enabled authentication in a Microsoft Azure Virtual Machines environment.

Syncfusion cluster manager allows to effectively manage the resources in Microsoft Azure with options to track billing details and shut down, restart, and destroy the virtual machines as required or start and stop the virtual machines with the Hadoop cluster at scheduled intervals.

Additionally, Syncfusion Big Data Platform includes support for creating and managing Hadoop clusters within Linux environments, Azure Blob storage for Azure VM-based Hadoop clusters as well as integration with Elasticsearch and MongoDB data access with Spark, among many other features.

Solution:  T-Systems Big Data Platform

The T-Systems Big Data Platform offering is a full solution Hadoop and in-memory based solution that comprises consultancy, planning, implementation and the optimization of big data analysis solutions and processes.

Along with a partnership with Cloudera and SAP HANA, and other best of breed data management tools, T-Systems provides organizations with a Hadoop ecosystem. T-Systems’ big data solution offers a scalable big data platform in the cloud.

The solution offers a full set of functions for the collection, backup and processing of large sets of unstructured data.

Additionally, T-Systems’ big data solution includes capabilities for real-time analytics, done with SAP HANA's in-memory architecture, which allows all data to be directly stored in main memory (RAM).

Solution:  Teradata Appliance for Hadoop

The Teradata Appliance for Hadoop is Teradata’s enterprise Hadoop implementation approach. A ready-to-run enterprise platform pre-configured and optimized specifically to run enterprise class big data workloads.

The appliance features optimized versions of either Hortonworks HDP or Cloudera CDH running on top of Teradata hardware and a comprehensive set of Teradata-developed software components. Some features of the Teradata Appliance for Hadoop include:

  • Optimized hardware and flexible configurations
  • High-speed connectors and enhanced software usability features
  • Systems monitoring and management portals
  • Continuous availability and linear scalability
  • Teradata's world-class service and support

Solution: TickVault

TickVault is a Hadoop-based big data platform with the purpose of collecting, storing, transforming, analyzing and providing insights from structured and unstructured financial data. This includes trade & quote history, news and events, research and corporate actions among others.

The platform has been designed to help organizations speed development and management of financial related big data projects. The platform provides APIs and integrates them with pre-existing business software solutions including Matlab, R, or Excel, to avoid business disruptions and speed the analytics process.

Its unified web interface aims to provide easy data access and its distribution within as secure environment, allowing flexible and managing granular  permissions.

Hadoop Platforms: Mature and Enterprise Ready Big Data Platforms

From the list above its easy to see way gone are the days were just a few vendors would provide enterprise-ready option for undertaking a Hadoop-based big data project. The Hadoop space continues to evolve, while a more than decent amount  of vendors offer now reliable solutions for deploying Hadoop both on-premises or in the cloud to comply with most of the use cases an organization needs to address.

Granted is, of course, that for making a decision over which Hadoop data platform is the best for an organization much more information is needed, but this list can provide a place to start exploring the possibilities for new small or big data projects involving Hadoop.

Finally, I wouldn’t be surprised to discover there are other Hadoop platforms I had not mentioned here. Please feel free to let me know about ant other distribution I’m not considering in this list or feel free to drop me a comment or feedback below.


  • During the writing of this piece, it wasn't possible to gather link and information regarding Huawei’s FusionInsight Big Data Platform, which is why it does not appear as part of our list.
  • While IBM will remain offering a Hadoop-based offering, this will be by integrating Hortonworks to its analytics arsenal rather than the existing IBM BigInsights. For more information read here.
  • All logos and trademarks are the property of their respective owners.

WTF is Machine Learning Anyway?

WTF is Machine Learning Anyway?

In a world where we might think is being ruled and controlled by tech geeks and data scientists, during meetings and phone calls with customers I’m still, often, being hit with honest and candid questions about any given topic about the data and analytics and give my personal take on them.

In virtue of this, I’ve decided to take a shot and a series of posts to answer, as plainly as I possibly can, common questions I receive in my day-to-day life as a consultant and analyst.

Starting with my most popular question nowadays: WTF is machine learning?
So, here we go...

Machine Learning in a Tiny Nutshell

The discipline of  machine learning evolved as part of larger disciplines including data mining and artificial intelligence (AI) and, in many ways, evolving side by side with traditional statistics and data mining and other mathematical disciplines.

So, simply put, machine learning cares about the development of mathematical models and algorithms with the ability to “learnâ€� from data input, adapt  and subsequently, improve the outcome. The concept of "learning" in machine learning, yet far from simple in practice, starts with a simple definition:
  • Learning​ ​=​ ​Representation​ ​+​ ​Evaluation​ ​+​ ​Optimization
In which:

  • Representation​ is a data element called classifier represented in a formal language that a computer can handle and interpret
  • Evaluation​ consists of a function that distinguish or evaluate the good and bad classifiers; and
  • Optimization​ which represents the method to be used to search among these classifiers within the language to find the highest scoring ones
From the previous idea, machine learning can be done by applying specific learning strategies, including:
  • Supervised strategy or learning, to map the data inputs and model them against desired outputs
  • Unsupervised strategy or learning, to map the inputs and model them to find new trends
Of course, derivative ones that combine these have appeared, such is the case for the combined semi-supervised learning strategy and others. Opening the door onto a multitude of new approaches to machine learning and the incorporation of diverse data analysis disciplines to its arsenal, such is the case for predictive analytics as well as pattern recognition.
As approaches and algorithms emerge, they have been frequently organized in taxonomies and classified after different criteria, including the type of input, and output, required and its use in different situations and use case scenarios.

Some of these approaches include (in alphabetical order):
  • Association rule learning
  • Artificial neural networks
    • Deep learning
  • Bayesian networks
  • Clustering
  • Decision tree learning
  • Genetic algorithms
  • Inductive logic programming
  • Reinforcement learning
  • Representation learning
  • Rule-based machine learning
    • Learning classifier systems
  • Similarity and metric learning
  • Sparse dictionary learning
  • Support vector machines
Then, What is a Machine Learning Software Solution?

A perfect combination of factors, like the evolution of machine learning approaches and algorithms, as well as the continuous improvement in software and hardware technologies have enabled machine learning software to be applied for solving more types of problems and being adopted in increasingly number of business processes.

In essence, an machine learning software solution is simply a software piece ingrained with specific machine learning functional features aimed to solve both specific or general issues where machine learning is applicable so, we can see machine learning software evolving in two main ways:

So, today its likely that we, as information workers or as common users of a given software are in one way or the other, consuming software resources which actually use some form of machine learning technique.

Then, How Can I Use Machine Learning in My Organization?

As the adoption of machine learning increases, so does the use cases, a brief list describes some uses of machine learning applied in different industries and lines of businesses:
  • Recommendation systems. Probably its most common use case for, machine learning algorithms are deployed to analyze the online activity of an organization’s customer base to determine individual and/or collective buy or choosing preferences, enabling the system to increasingly learn about customer’s behavior to increase the system’s prediction accuracy. Companies including Amazon, Netflix or BestBuy
  • Marketing Personalization. Today, some organizations apply machine learning techniques to learn and understand better its customers and consequently to improve its marketing campaigns. From learning customers behavior, organizations can personalize, for example, which email campaigns a customer must receive and/or which direct mailings or coupons, or offerings that will likely have more impact if showed “recommendedâ€�.
  • Fraud Detection. Companies like Paypal are now using Machine Learning software solutions that analyze all their transactions, learn and identify fraudulent transactions from legitimate ones while increasing accuracy over time.

These of course are just a couple of examples of a wide set of uses cases in different industries including, healthcare, data security, healthcare and many others.


On one hand, today it is not hard to find use cases for machine learning, and it keeps growing, so if you are looking into adopting a machine learning solution, there is a good chance you will find one that fits your current needs for improving your organization’s analysis capabilities. Also, given it is possible to find many types of machine learning solutions in the market, both commercial and open source, it might not be cost prohibited to embark at least in the evaluation of some of these available options to get a sense of the benefits of having machine learning capabilities within your organization.

On the other, it is important to note, as with any other type of software, you will need to do the legwork and ensemble a coherent approach for the adoption of a machine learning initiative for your organization to get the best of a machine learning initiative, including a clear definition, scoping and evaluation of your actual needs that will help you define the best solution of choice in the market.

Small advise, don’t look for a vanilla solution, look for the most convenient for your organization.

You can find another example (pun intended) of the use ef machine learning and other technologies on Google’s latest product: The bad joke detector.

Finally, you are welcomed to leave a comment in the box below or download our very first DoT Industry Note report here.

The BBBT Sessions: Zoomdata and The New Generation of BI Solutions

The BBBT Sessions: Zoomdata and The New Generation of BI Solutions

As mentioned right at the end of my Look Back Into 2017 And Forward To 2018 post, I did start this year looking forward for an exciting 2018 and, well, it seems my dream has coming true.

Right in January, the BBBT group, which I’m proudly part of, hosted a briefing with visual analytics solution ZoomData, one of what I like to call new generation analytics solutions.

As usual, this briefing was a great opportunity to know what the company is and what its up to, in the present and for the future.

So, here is a brief report on what happened during this insightful encounter with Zoomdata and my fellow BBBT members.

About Zoomdata

Innovative company Zoomdata develops what they describe as:

“The world’s fastest visual analytics solution for big and streaming data. Using patented data sharpening and micro-query technologies, Zoomdata empowers business users to visually consume data in seconds, even across tens of billions of rows of data.�

According to the software provider, its offering enables interactive analytics across disparate data sources and helps bridge modern and legacy data architectures, enabling effective blending of real-time data streams and historical data from both on-premises and in the cloud.

Interestingly, Zoomdata delivers its offering which uses a microservices architecture to provide elastic scalability and the ability to run on-premises, in the cloud, or embedded within a third party application.

Another thing that makes Zoomdata such an appealing solution is its ability to sit right in the middle of the action (Figure 1), and by doing so, aim to become an ideal broker for the full analytics and business intelligence process, especially for analytics performed over big data sources.

Figure 1. Zoomdata in the center of the analytics process (Courtesy of Zoomdata)

Some features/advantages offered by Zoomdata include:
  • No data movement
  • Native support for widest breadth of modern data sources, including streaming and search
  • Modern elastically scalable microservices based architecture
    • Secure – in context of underlying data platform
    • Easy to embed and extend
  • Fastest time to insight – modern user experience, minimal IT intervention, no data modeling required
Zoomdata is venture-backed by Accel, Columbus Nova Technology Partners, Comcast Ventures, Goldman Sachs, NEA and Razor’s Edge

The company currently has offices in Chicago, New York, San Mateo, CA and Reston, VA.

Zoomdata: New Generation BI

Presented by current leadership team formed by Nick Halsey, president and CEO, Justin Langseth, founder and chairman, Ruhollah Farchtchi, Zoomdata’s CTO and Ian Fyfe, senior director of product management, the company spend some time into the fundamental question of why the need to build Zoomdata, and update on Zoomdata's new features and  of course, to provide the company's current figures.

Founded in 2012, the company has steadily grown since to become a company currently with more than 50 customer accounts, achieving consistent 100% sales growth year over year, and 200% growth internationally.

Additionally, Zoomdata has been able to engage in relevant partnerships with companies in the likes of Infosys, Deloitte and Atos, aiming to further expand its presence across the big data and data management landscape.

Currently, the company has extended its presence internationally to places including Tokio, Singapore and Sydney, and accounts for more than 80 employees in Reston and San Mateo offices.

OK, So Why Zoomdata?

Formed by an experienced team of executives in the business intelligence market, it is not a surprise that the origin of Zoomdata had to do with developing a solution that addresses  existing gaps traditional BI tools have.

While most of the traditional BI solutions created are fundamentally designed to work on structured transactional datasets, a growing number of companies needed, with the advent of the Big Data revolution to deal also with unstructured or multi-structured sets of data and, many times to addressed them all using, instead of combination of analytics tools, all within a single solution.

The Zoomdata realized early on that while structured data sets are not going away, companies need to incorporate these new data sets like interactions (click-streams, social) or observations (IoT and sensors) into the analytics mix (Figure 2).

Figure 2. Zoomdata in the Evolution of the Big Data Landscape (Courtesy of Zoomdata)

Realizing that “legacy� BI systems were not designed for taking over these emerging types of data sets, Zoomdata took on the challenge to provide a solution with the ability to process both traditional structured and new big data sources within a single environment and to provide a consistent analytics framework for all of them.

According to Zoomdata, this put them in a position to better address the challenges that other BI systems face, such is the case for insufficient support for streaming and/or unstructured data, and scalability limitations, among others.

Through the session, Zoomdata pinpointed —both during the presentation and the demo— its main features and benefits offered by Zoomdata to provide its users with a reliable new generation analytics solution.

So Zoomdata aims, instead of change companies existing data frameworks, to ensure organizations can deal as neatly as possible with the complete stack of “old� and “new� data sources using a single analytics solution that bonds them.

One noteworthy aspect of Zoomdata is its holistic approach to data and, despite providing strong support for Hadoop sources, the company focuses on being a connecting hub for a plethora of other data sources using native connectivity and providing access to real-time data by having a native streaming architecture.

About the Session

The Zoomdata team gave us a nice company and product presentation, as well as a good opportunity to have interesting discussions ranging from the most effective role of Zoomdata within a company’s data and analytics strategy, to the role and importance of real-time data within a big data initiative.

The demo session gave us also an opportunity to check some of the most important functionality features claimed by Zoomdata, such is the case for Zoomdata’s:

  • User experience and empowerment. From what we were able to witness from the demo, Zoomdata’s user interface looks friendly and easy to operate with a nice look and feel.
  • Wide connectivity to data sources. Zoomdata includes many connectors for both traditional structured, and modern data sources via a set of native connectors for Hadoop, NoSQL, as well as streaming and cloud data sources directly, avoiding data movement (Figure 3).
  • Embedding capabilities. The team at Zoomdata stressed the solution’s capabilities and rich set of API’s available to easy embed it to third party applications. This includes an SDK to allow development of custom analytics extensions.
  • Real-time streaming data analysis capabilities. Here, Zoomdata made emphasis on the core capabilities Zoomdata relies upon to connect and work with streaming infrastructures and effectively visualize real-time information, going beyond the traditional business intelligence approach of working with historical data.

Figure 3. Zoomdata’s Screenhot (Courtesy of Zoomdata)

Also, relevant aspects of Zoomdata’s capabilities arsenal include the fact that no data movement is needed and a flexible architecture using micro-services, including micro querying and a scalable in-memory caching configuration to increased processing speeds.

An interesting discussion erupted in the session regarding the place where Zoomdata might fit more within due to its design nature, meaning large data volumes with low process complexity or by the contrary, within smaller yet more complex data sets.

While during the discussion I did not perceive a conclusive idea in this regards, my guess is that by nature Zoomdata, due to its micro-service and embedded nature fit within deployments with large and less complex sets, there is no reason in my view, to consider Zoomdata a good candidate to more complex deployments acting as an ideal “intelligent connector�, especially with data infrastructures that by nature are decomposed and fragmented, Zoomdata can be a right interface to homologate analysis coming from different sources. In some way, this reminds me about an Enterprise Information Integration configuration with additional features.

So what?

Aside from providing a nice briefing and demo, full of examples and case  studies, the Zoomdata team did provide us with a wide view of the solution and where it might, or not, fit, it seems Zoomdata plays well within scenarios where traditional and big data sources need to be placed jointly to work together.

Zoomdata is a solution to consider especially when big data initiatives are already acting together along with traditional structured sources, and where customization and embedding within third party systems play a relevant role in a project. This might mean Zoomdata is not necessarily designed keeping non-expert users in mind. I certain data management expertise might be required to take full advantage of Zoomdata’s capabilities yet, this learning effort might pay good dividends at the end.

Want to know more about Zoomdata or the BBBT session?

You can watch the video trailer below or visit  Zoomdata’s web page or leave a comment right in the space below.

IBM Advances its High Performance Data Analytics Arsenal with its Spectrum Computing Platform

IBM Advances its High Performance Data Analytics Arsenal with its Spectrum Computing Platform

As the need for gathering data continues, organizations keep dealing with increasing amounts of information that need to be stored, processed, and analyzed faster and better, stimulating the growth and evolution of the high performance computing (HPC) market.

One key segment in this market the continues to grow, especially in recent years, is the high performance data analytics (HPDA) as organizations continue to adopt and evolve their big data and data lake initiatives, to the point that IDC forecasts that in 2018, the HPDA server market will reach $2.6 billion (23.5% CAGR) and the HPDA external storage market will add $1.6 billion (26.5% CAGR).

Is not strange then, that software companies like IBM are keen to develop solutions capable to address the specific HPDA market segment, and IBM haws been working specifically on it with its Spectrum software line.

IBM Evolves Spectrum to Keep the Pace with the HPDA Market Segment

Late in November of last year, IBM announced a brand new software offering called IBM Spectrum Computing. The announcement signals how IBM is setting itself on route to address a not huge but important market segment of the IT industry: The provision of analytics solutions for high performance workloads or High Performance Data Analytics (HPDA).

With this, IBM aims to continue its strategy to serve and provide advanced data management and analytics solutions for an increasing set of computing ecosystems and environments. The new offering will enable organizations to work on high data volumes being analyze within advanced analytics software solutions including Spark, TensorFlow or Caffé to deploy applications that use machine learning, artificial intelligence and deep learning.

As IBM mentions:

IBM Spectrum Computing uses intelligent workload and policy-driven resource management to optimize resources across the data center, on premises and in the cloud.
Also, according to IBM, the new software platforms is designed to set up distributed, mission-critical, HPC infrastructures, scalable to over 160,000 cores and enable execution of big data analytics applications at speeds up to 150x faster.

The new Spectrum Computing offering from IBM is now set to deliver open offerings developed especially to increase speed of adoption, and production of parallel processing and clustered computing tasks to allow organizations to:
  • Simplify deployment via cluster virtualization and enabling disperse systems to work together as one by including shared computing, data services and cluster management specifically configured deep learning/machine learning tasks.
  • Deliver artificial intelligence (AI) for the enterprise, enable centralized management and reporting and allow multi-tenant access with end-to-end security.
  • Openness to the latest technology so it can provide support for the latest open source tools and protocols, including Spark and other deep learning software options.
  • Ease the deployment and adoption of cognitive workloads so end users will have access to the consumption of cluster resources across applications without having the need for specialized cluster knowledge.
  • Elasticity for hybrid clouds by simplifying cloud usage in distributed clustered environments, incorporating cloud automated workload-driven provisioning and de-provisioning routines to encourage intelligent workload and data transfer to and from the cloud

IBM Spectrum Cluster Foundation Screencap (Courtesy of IBM)

According to the "big blue", the Spectrum Computing family has been designed to provide support for hybrid architectures so it can accommodate to specific workloads as well as to specific platforms to enable optimal performance. Interestingly, IBM Spectrum Computing offers support for mixed x86-IBM POWER platforms environments.

Scalable to over 160,000 cores, IBM aims to enable users to fulfill their high performance analytics deployment needs with state-of-the-art software-defined computing technology for complex distributed, mission-critical, high-performance computing (HPC) infrastructures.

Core components of the Spectrum Computing Platform include:
  • IBM Spectrum LSF. Spectrum platforms’ workload management solution which provides user management and workload administration tasks.
  • IBM Spectrum MPI. Spectrum’s message passing interface (MPI) with high-performance capabilities and configured specifically to support distributed computing environments.
  • IBM Spectrum Cluster Foundation. IBM Spectrum’s infrastructure life cycle management solution for scale-out environments and coming at no charge.
  • IBM High Performance Services. Hosted on the Softlayer (IBM Cloud), this is IBM’s HPC and storage clusters as a service infrastructure.
Additional to the launch is the coming of a new set of tools developed to help users to design artificial intelligence (AI) models using much of the leading deep learning frameworks including Tensor Flow and Caffé.

One key aspect of this set of releases includes the latest version of IBM Spectrum Scale software ―Spectrum’s advanced storage management― which now provides full support for moving workloads including unified files, objects and HDFS right from where they´re storage to where they´re analyzed.

This latest version of IBM Spectrum LSF Suites is now generally available, along with the new versions of IBM Spectrum Conductor with Spark, Deep Learning Impact and IBM Spectrum Scale.

So What?

Well, as solutions for HPDA evolve, so they market reach, we are seeing continues adoption within many industries and business segments: finance, security, retail, and the adoption of new technologies that will require solutions capable of analyzing petabytes of data through parallel processing compute resources.

The latest version of IBM Spectrum Computing aims to put IBM in the forefront of the HPDA segment and better compete with other giants in this market like HPE and Hitachi. A competition field that will be interesting to keep an eye on in the coming years.

Finally, feel free to check this Spectrum intro video from IBM and of course, to let me know your thoughts right below.

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics, Teradata, a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community.

Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community.

Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security.

One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several best practices within Kylo. This way, Teradata has given the project the necessary maturity and testing under real production environments to launch a mature product.

Using some of the latest open source capabilities, including Apache Hadoop, Apache Spark and Apache NiFi, Kylo was designed by Teradata aiming to help organizations address common challenges of a data lake implementation and provide those common use cases the will enable reduced implementation cycles that average 6 to 12 months.

Teradata’s decision to release Kylo through an open source model â€”instead of a traditional commercial one— comes also within an interesting spinoff.

Traditionally a fully commercial software provider, the company has had in recent years a core transformation, being increasingly open to new business models and approaches, including its Teradata Everywhere strategy to enable increasing access to Teradata solutions and services in all possible on-premises and cloud platforms.

This broad strategy includes increased support for the open source community, such is the case with  the Hadoop community on different projects, Presto, and now of course with Kylo.

Teradata’s business model for Kylo is based the services its big data services company Think Big can offer on top of Kylo, these optional services include support training, as well as implementation and managed services.
According to Teradata, Kylo will enable organizations to address specific challenges implied within common data lake implementation efforts, including:

  • Shortage of skilled and experienced software engineers and administrators
  • Implementation of best practices regarding data lake governance
  • Reinforce data lake adoption beyond engineers and specific IT teams
Teradata aims with Kylo for a data lake platform that requires no code and enable self-service data ingest and preparation via an intuitive user interface to help accelerate the development process by enabling reusable templates to increase productivity.

From a functions and features perspective, Kylo has been designed to provide the necessary data management capabilities for the deployment of a data lake:

  • Data Ingestion. Self-service data ingest capabilities along with data cleansing, validation, and automatic profiling.
  • Data Preparation.  Handling data capabilities through a visual SQL and interactive data transformation  user interface.
  • Data Discovery. Data searching and exploration capabilities as well as metadata, view lineage, and profile statistics.
  • Data Monitoring. Data monitoring capabilities for health of feeds and services through the complete data lake as well as tracking service level agreements (SLA’s) and troubleshoot performance.
  • Data Pipeline Design. Capabilities for designing batch and/or streaming pipeline templates in Apache NiFi to be registered with Kylo, allowing user self-services.

As per words from Oliver Ratzesberger, Executive Vice President and Chief Product Officer at Teradata:
“Kylo is an exciting first in open source data lake management, and perfectly represents Teradata’s vision around big data, analytics, and open source software. Teradata has a rich history in the development of many open source projects, including Presto and Covalent. We know how commercial and open source should work together. So we engineer the best of both worlds, and we pioneer new approaches to open source software as part of our customer-choice strategy, improving the commercial and open source landscape for everyone.�

With Kylo, Teradata aspires to play a leadership role in the data lake, governance, and stewardship market, yet a difficult goal as niche vendors like Zaloni and Podium Data or big vendor like Informatica with its Data Lake Management solution stack but at first, it looks like a solution to follow closely, especially considering price point due to its business model versus the other commercial offerings.

Want more information?

Kylo software, documentation and tutorials can be found in the Kylo project website or at the project’s GitHub site, or check the following video and its page on Youtube:

Book Commentary: Thank you for being late

Book Commentary: Thank you for being late

Not long ago I had the opportunity to read a book from my long reading list. "Thank You for Being Late: An Optimist's Guide to Thriving in the Age of Accelerations" is a book written by famous author and journalist Thomas L. Friedman and logically as you might know, a best-seller.

Admitting I have a mild tendency to avoid best-sellers ―I’ve ran into some disappointments when reading them― I was a bit reluctant to its reading, especially because this was, according to the back cover, close to things I’m familiar with being an industry analyst and consultant in the technology market.
Yeap, a typical case of “why should I read it if I know what it’s about.�

I was wrong, right from the first pages the book was engaging, entertaining and yet, quite insightful. The book guides you nicely through the recent and profound evolution of information technology in nice and fluent way.

Full of information from extensive interviews and clear narrative ―essential I think for a book that addresses technology― the book narrates nicely the events that have impacted the evolution of technology and from there, the huge effect this have had on our lives.

Moreover, it provides a structure and describes the connection between the different elements of technology that are dramatically changing the world: cloud computing, mobility, big data and of course the Internet of Things and so on.

Mr. Friedman has truly done a good job at describing and connecting the dots as how modern life has many benefits while it also contributes to our sense of unease and anxiety as we are unable to keep the pace adapting to technological advances, increasing volumes of information to digest or the worries that come by having our information moving on roads or resting in public servers potentially vulnerable to attacks and security breaches.

The book also contains several descriptions from direct conversations with those making history. Described with a good level of detail, it does not become just a gathering of facts but a nice compendium of some of their reflections and thinking, which enables us to get a glimpse of how these people transforming our lives think about the present and the future.

According to Mr. Friedman, aside from how problematic life can be in a time on continuous acceleration, the book has an optimistic approach, even when modern life can sometimes be daunting and overwhelming, much of these technologies are here to make our lives better, perhaps one of the few details I personally found to be opposite with

I would have liked to find further exploration on the not so pleasant side of technology: security concerns, sociological and society issues provoked by technology, have a more balanced view of what technology means in our lives today.

Finally, despite the fact this book talks technology in a general way and opposite to some not recommending it for people in the tech field I’ll dare to do it, as it presents a fresh perspective on the evolution of technology and the reality of the world and its potential impact in our present and our future.

It is also a nice call to slow down, reflect and live thought our time at our own speed.

Mr. Friedman's book contain good nuggets of information that for those in and out of the tech scene can be entertaining and informational.
‘D’ of Things: A look back into 2017 and forward to 2018

‘D’ of Things: A look back into 2017 and forward to 2018

(Image by Elisa Riva)

As we inevitably approach the end of the year ーa year marked with many important advances in all areas of the data management spaceー I just can’t avoid thinking with expectation and excitement what should be there just around the corner for 2018.

If 2017 was all but boring, 2018 looks like another promising one, no less fast and competitive than 2017.

But after what Niels Bohr once said:

Prediction is very difficult, especially if it's about the future.

I will avoid making big prediction statements and instead, take a look at some relevant things happened this ending year and what will be interesting to follow closely for next year.

Still, feel free to call me out next year on what I missed.

A look back at 2017

2017 has been a year full with exciting events and news and yet the following events are, in my view, those that deserve much more attention due to the transformational nature of the industry.

So here, and in no particular order, a summary of relevant trends in 2017:

Security and Governance for Big Data

As existing and new big data projects evolve —along with the myriad of data security incidents happened through the year— it is only natural that companies take further steps to make these projects interrelate more, and more efficiently with the rest of the enterprise software stack yet keeping them safe and secure. More than ever, companies need to reinforce data protection measures and their ability to govern access and usage increases as well.

Many discussions and work has been done in previous years to enable companies to increase their data governance practices, yet 2017 was especially important for companies both in the user and vendor sides.

This year major efforts were done to try crystallize these practices to consolidate big data governance and efficient security and data protection so users can get their hands on the data they need in the safest and more efficient way possible, enabling or forcing vendors to increase the capabilities of their existing data governance solutions, especially in relation with big data and Internet of Things (IoT) initiatives.

Two identifiable trends seemed to emerged as many organizations aim to consolidate big data projects within their existing data platforms in a secure and efficient way:

  • One come with the growth of a new generation of solutions that incorporate capabilities for governing and securing big data sources, offerings in the likes of Collibra’s Data Governance Center, Alation Data Catalog or coming from large data management companies, including IBM’s Infosphere Information Governance Catalog or SAS Data Governance are enabling a new generation of data governance solutions with specific capabilities for dealing with big data sources.
  • The second trend is the increased interest for integrated solutions able to view data management and governance initiatives from a single lens and via a relatively new way of data management organization called the “data lakeâ€�. This includes data lake management platforms, like those offered by Zaloni and  Podium Data, or solutions from software power houses like Microsoft with its data lake solution or Informatica’s Intelligent Data Lake.

Without a doubt, 2017 signaled an increasing consciousness within many organizations about the importance of adopting comprehensive enterprise approaches for their data management initiatives and this, to enable smoother consolidation and efficiency. A trend I certainly expect to evolve further in through 2018.
Analytics consolidation: a data scientist dream come true?

Especially in 2017, analytics was a software market on fire, especially with the increasing incorporation of machine learning, artificial intelligence and its continuous integration with existing business intelligence (BI) and enterprise performance management (EPM) solutions.

It seems an internal revolution is taking place within the realm of the analytics market with many things happening at once, including the evolution and incorporation of the so-called “data science� within many organizations, which controversies aside, has opened new avenues for the development of a new generation of analytics platforms and the emergence of new types of analytics specialists and information workers.

In 2017 we witnessed this evolution as on one side, a new generation of analytics platforms  consolidated many analytics capabilities within one single platform while others went further to incorporate the ability to automate many of the required data management processes that need to happen before including data profiling, preparation and integration capabilities.

New companies designed to be data science platforms include now the ability to consolidate many, if not all functional features for performing a full advanced analytics cycle and this trends was particularly clear through 2017.

Companies like Alpine Data, Dataiku, or DataRobot are taking data science to the enterprise software mainstream, while others like Emcien or BigML are taking innovative approaches to provide self-service and easy to use approaches using advanced and automated algorithms to effectively solve practical cases.

Additionally, major BI and analytics players have been working to take their offerings to the next stage or, even come up with brand new solutions. Examples include companies like Tableau or Qlik, the first is now working to put technology from previous acquisitions to work —including its new Hyper high performance database or former natural language startup Cleargraph—  and  expand Tableau’s analytics capabilities, the latter with new offerings including Qlik Sense and its relatively new Qlik Analytics Platform.

Moreover, and to add to this trend, during the second half of 2017 global software power houses announced or released brand new analytics platforms including SAP with SAP Data Hub and SAP Vora, Teradata’s new Analytics Platform and Intellisphere as well as IBM’s Integrated Analytics Platform.

Another key element includes the emergence of a group of next generation BI solutions both in the cloud and on-premises, all with a myriad of capabilities for analyzing new data sources and incorporating many key new features for making BI easier to handle and integrate with third party applications. Some solutions worth to take a look at include: Dundas BI, AtScale, Yellowfin, Pyramid Analytics or Phocas, to name just a few.

Preparing for the next database revolution?

While less hype at times, and with most of the interest placed on the final consumer portion of the data management software: analysts, data scientists CxO’s and others, less attention is played to what is going on with key data technologies including the database market and many of its derivatives and yet, a lot is happening in this area, so here some of the most notably events in this area in 2017.

  • The commoditization of the In-Memory DB

A continuous increase in the number and complexity of transactions to be managed, and despite the hype big data and analytics had within the software industry companies, vendors and consumers, seem to have gained renewed interest on new database management technologies for transactional systems, especially aiming to maintain efficiency over extreme transaction processing.

Renewed interest from buyers and software vendors to keep the pace with this phenomena has not only remained but somehow increased, especially in key areas where extreme transaction processing occurs the most: communications, finance and others, triggering buyers and software vendors interest for producing and deploying faster and better technology.

During the last couple of years, and especially in 2017, the adoption of in-memory technologies applied to transactional database systems have gained significant interest, especially within large companies which are renewing/updating their existing database system solution offerings.

Examples include SAP with HANA and most recently with SAP ASE’s incorporation of in-memory processing for extreme transaction processing, Oracle and its in-memory options, SQL Server in-memory capabilities or smaller yet powerful proponents including McObject, Altibase or VoltDB.

  • Distributed databases

As businesses continue to globalize operations, so the need for databases that can scale-out and carry with massively scalable applications across the globe.

While not new, distributed databases where particularly highlighted this year with the releases made by major software powerhouses: Cloud Spanner by Google, and Azure Cosmos DB by Microsoft.

These two announcements reminded us how important new database technologies will be for supporting next generation software solutions in the years to come so, it might be logical to suspect players in this field including those like GridGain and Clustrix, as well as other established players like Apache Cassandra-based company DataStax will enter to a next face of competition for new opportunities in markets like mobile and IoT. I suspect there is way more to come in this coming years.

  • Database and containers

One thing worth to follow next year and in the years to come will be the incorporation of database into containers.

An interesting series of discussions in favor —faster and automated deployments— and against —potential networking and security concerns— databases in containers have been published, analyzing the feasibility, benefits and challenges of databases offered in containers and yet, 2017 marked a significant movement towards the offering of database and data management solution images within containers, examples include Microsoft SQL Server or Cloudera on Docker.

Now, will this be a successful trend?
Only time will tell but despite documented challenges and failures it will be fair to assume the container-database combination will evolve well enough to become a viable option for some organizations.

2018: Yep, I’m looking forward to it

From what I witnessed this year, no sign we will be slowing down soon and, I suspect we will be again “drinking from the fire hose� in 2018, with many upcoming innovations.

And while I expect much more than this, I have prepared a small yet meaningful list of the things I personally would keep a close eye on this year. So I’m sharing it so you might want to keep on eye to:

The rise of database full self service and automation

Just in October, Oracle unveiled its Oracle Autonomous Database Cloud, setting the stage to what could be an interesting battlefield in the database scene as the rest of competitors take steps to follow suite or even disrupt the market with technology innovation in this field to build and release fully autonomous databases offerings.

It will also be exciting to see Oracle’s autonomous database become a reality this year and evolve while we wait for a new generation of fully automatic databases.

BTW- Personally, I don’t expect DBA’s to disappear any time soon.

The Rise of the GPU?

As many organizations try to cope with an increasing need for faster and better ways to perform advanced analytics, new technologies continue to be developed to improve data management and analytics performance and capabilities, the Graphics Processing Unit or GPU is one of those technologies, and one with huge potential.

Originally used for gaming, this processing unit is now increasingly being used for performing analytics.

Interesting will be to check how this new processing unit evolves and its embedded within the software mainstream in 2018 and in the future.

Security & Privacy

Of course, and not surprisingly, security and privacy will be in the headline for a long time.

By the way, how are you doing with your GDPR compliance project?

Surely, following progress made by companies for GDPR compliance will a topic to follow this year as the deadline approaches this year and of course the aftermath of its implementation and reinforcement. Interesting will be to see the effects and impact for companies in security, analytics and data governance during and after its implementation.

Security Analytics

Finally, in this topic, another aspect worth following in 2018 is the rise of security analytics platforms, the potential and evolution of these tools as well as the impact and effect these solutions have over organizations’ general security and privacy strategies.

So much more to come...

Of course, there is a lot more worth to cover during 2018 but while I’m writing this my head keeps bringing topics and I fear I’ll never stop and,  well, I need to do some actual work now.

But before I go, I want to thank you for being a reader of this blog during 2017, exciting things will come also for 2018 for the ‘D� of Things, so stay tuned and please feel free to let me a comment in the space blow.

Finally, I wish you all a successful 2018, full of goals accomplished and health for you and your loved ones.
Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds.
—R.D. Laing

Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations?

Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control.

Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next generation of BPM applications and practices?

BPM Effectiveness Via Automation

Effective business process management software could help you keep track efficient and accurate of your business processes.Mihai Badita, senior business analyst at UiPath, a software company that offers solutions for automating manual business processes, said, “We estimate that around 50 to 60 percent of tasks can be automated, for the time being.�

This is a bold but not unexpected statement from a relatively new company that appears to rival established robotic process automation software companies such as Blue Prism, Automation Anywhere, and Integrify—the latter offering an interesting workflow automation solution that can automate the process of collecting and routing requests—as well as market-leading BPM software providers such as Appian and Pegasystems. According to the Institute for Robotic Process Automation (IRPA), process automation can generate cost savings of 25 to 50 percent and enable business process execution on a 24/7 basis, 365 days a year.

Aside from the obvious effects that automation might have on business processes, such as cost savings and freeing up time and resources, business process automation can help many organizations address repetitive tasks that involve a great deal of detail. Many delays during business process execution are caused by these manual and repetitive tasks, and bottlenecks can arise when decisions need to be made manually. Such processes could be automated and executed entirely without human intervention.

Process robots are a set of specific software modules capable of capturing information from different systems, manipulating data, and connecting with systems for processing one or multiple transactions. Of course, it’s important to consider the role of effectively training these process robots—including programming and implementing them—to ensure efficiency and precision, making sure business rules are well-defined even before this training to ensure success of the automation strategy.

There are indications that automation will grow in the BPM arena in the coming years, with the incorporation of improved advanced machine learning techniques and artificial intelligence algorithms.

BPM Pervasiveness Through Mobility, Development, and the Cloud

Mobile technology greatly impacts business process management.Mobile technology affects perhaps no other component of the enterprise software stack as strongly as BPM. The first mobility goal of every organization has been to enable employees involved in all stages of every business process to operate independently and unrestricted by location and time. A user with a new purchase order to submit, confirm, or authorize should be able to do so using a mobile device no matter where he or she is located or what time is it.

To address security and privacy concerns and to meet specific governance and business requirements, companies realize it is imperative to take this effective yet simple solution-mobile app interaction schema to a next level of integration.

Organizations are recognizing the need for increased enterprise software integration of BPM routines at all levels, and as a result they are taking a wider approach to mobile adoption. Many organizations are taking further steps to develop and deploy custom mobile solutions, and many if not all of those deployments involve business process improvements and the ability to integrate with the rest of the enterprise software stack. A study from MGI Research notes that, at the time of the study, 75 percent of all companies reported a mobile apps development cycle of nine or less months.

With this trend, many BPM software providers are already offering customers the ability to accelerate the development of mobile and custom process-oriented applications with development tools that can either avoid or minimize the need for coding. They can also offer visual and modular components to accelerate speed of development with different degrees of integration and compliance with internal IT regulations for security, privacy, and governance. To mention just a couple, companies such as South Africa-based K2 and former French company W4, now part of Itesoft,  have developed capabilities well beyond traditional BPM features for modeling and executing business processes, to allow organizations to develop fully customizable process-oriented business applications.

Another component for the provision of pervasive business process has to do with the development of process-oriented applications with a high degree of integration with the different set of systems of record—for example, ERPs, CRMs, and others—to effectively improve the way users move across business processes and interact with existing systems. Companies such as Kofax, with its process automation offerings, aim to enable organizations to develop so-called smart process applications (SPAs), focused on process-based applications which can be well-integrated with existing systems, as well as embedded to work seamlessly in different operating and platform environments, providing the ability to execute business processes from the user’s platform of choice and device, thus preserving data accuracy and consistency across platforms.

Other important factors of a more pervasive BPM framework have to do, respectively, with the integration of BPMs mobile capabilities within larger corporate mobile strategies and solutions, including enterprise mobile management (EMM) or enterprise mobile application development platforms (MADPs) and, of course, the adoption of corporate business process management in the cloud.

Interestingly, some BPM providers are rapidly increasing their ability to incorporate more control and management capabilities to mobile app environments, such as improved security and role administration. Without being a substitute for the previous solutions mentioned, this can be an effective first step in encouraging corporate BPM apps development.

With regards to cloud adoption, aside from lower costs and faster return of investment already discussed, the possibility that specialized service providers can take care of the development and administration of a reliable and secure environment can, within many organizations, encourage rapid and effective development of mobile and embeddable process-oriented applications.

Not BI Versus BPM, But BI and BPM

Software companies have now realized Business intelligence also need to be process-oriented. A sample case of this new direction can be sampled when Swedish enterprise software provider IFS acquired a company called VisionWaves. VisionWaves, now IFS Enterprise Operational Intelligence(EOI) offering is an interesting product that aims to offer organizations a wide view of the state of an organization, via a corporate cockpit that combines views and analysis of process and performance within a single environment.

This signals an increasing interest in process and performance within the software industry. The need for information and the speed of business makes operations data analysis operate at different paces, thus creating silos that work at a different pace and sometimes even make things difficult to understand.

Some organizations are realizing that as the use of analytics becomes more important, its effectiveness and impact depends on its ability to collaborate within actual decision making at all levels. The need for information never wavers—its value remains and even increases—but the need for collaboration, process control, and performance monitoring also increases at a point when risk mitigation, opportunity identification, and actual informed decisions are to be made.

In order to improve business operations through the use of analytics, business intelligence (BI) needs to be naturally process-oriented, embedded within a user’s operational environment to provide collaboration and synergy and be, of course, efficient and fast enough to provide information in real-time.

Interesting methodology from Vitria with its operational intelligence approach, Kofax’s process and intelligence analytics, and Salient with its Collaborative Intelligence Suite—these all aim to provide users with analytics that can effectively give users a centric process-data view approach, infusing analytics right in the trenches of business operations.

Last but not least, something worth mentioning—and that in my view has great potential for improving the synergy between BI/analytics and BPM—has to do with recent efforts and developments within the decision-making process of an organization. This includes the recent announcements of the publication of the Decision Model and Notation (DMN), an industry standard modeling notation for decision management and business rules by the Object Management Group (OMG).

Widespread use of more formal methods for decision management can certainly have a big impact in the way organizations design the use of analytics that are directly involved in decision making at different levels and aspects of an organization, to gain control, measurement, and business operations effectiveness.

Conclusions—BPM for the Future

Never before has there been such an accumulated effort—from vendors incorporating new technology within BPM solutions, to user and professional groups modernizing BPM practices—to increase operation efficiency in organizations. Still, the challenges remain—achieving effective collaboration and communication of insights, obtaining an efficient analytical view of the entire organization, and closing important operational gaps, including those between technology and business.

As we noted in the beginning of this look at business process management and automation, the range of what we think and do is limited by what we fail to notice. There is also a lot of value to be unveiled within processes, if we optimize them properly and take advantage of the tools available to us.

(Originally published in TEC's Blog)

SAP Data Hub and the Rise of a New Generation of Analytics Solutions

SAP Data Hub and the Rise of a New Generation of Analytics Solutions

“Companies are looking for a unified and open approach to help them accelerate and expand the flow of data across their data landscapes for all users.

SAP Data Hub bridges the gap between Big Data and enterprise data, enabling companies to build applications that extract value from data across the organization, no matter if it lies in the cloud or on premise, in a data lake or the enterprise data warehouse, or in an SAP or non-SAP system.�

This is part of what Bernd Leukert, SAP’s member of the executive board for products & innovation mentioned during SAP’s Big Data Event held at the SAP Hudson Yards office in New York City as part of the new SAP Data Hub announcement and one that, in my view, marked the beginning of a small yet important trend within analytics consisting on the launch or renewed and integrated software platforms for analytics, BI and data science.

This movement, marked by other important announcements including Teradata’s New Analytics Platform as well as IBM’s Integrated Analytics offering marks another step directed towards what appear a movement to a new generation of platforms and a consolidation of functions and features for data analysis and data science.

According to SAP, the motivation for the new SAP Data Hub solution offers customers a:

  • Simpler, more scalable approach to data landscape integration, management and governance
  • Easier creation of powerful data processing pipelines to accelerate and expand data-centric projects
  • Modern, open architecture approach that includes support for different  data storage systems

One way SAP aims to achieve this with its Data Hub solution is to create value among all the intricate and diverse data management process that goes from data collection, passing through integration and transformation, to its preparation for generating insight and action.

To increase efficiency for all management stages including data integration, data orchestration and data governance the new SAP Data Hub creates “data pipelines� to accelerate business results, these all, coordinated under a centralized “Data Operations Cockpit�.

For what we can see, SAP aims to let the new solution emerge as the ideal data management platform for the rest of the SAP analytics and BI product stack ーincluding neat integration with SAP HANA and the ability to  take advantage of solutions like SAP Vora, SAP’s in-memory, distributed computing solutionー as well as with core Big Data sources including Apache Hadoop and Apache Spark (see figure below).

SAP’s Data Hub General Architecture (Courtesy of SAP) 

SAP Data Hub’s data pipes can access, process and transform data coming from different sources into information, to be used along with external computation and analytics libraries including Google’s TensorFlow.

Another interesting aspect of the new SAP Data Hub is that aims to provide an agile and easier way to develop and deploy data-driven applications, allowing via a central platform to develop and configure core data management activities and workflows to fasten the development process and speed results.

Key functional elements included within the new platform include:

Some of SAP’s Data Hub Major Functional Elements (Courtesy of SAP)
According to SAP this new solution will become, along with SAP Vora and SAP Cloud Platform Big Data Services a key component of SAP's Leonardo digital innovation system.

Analytics and BI on the Verge of a New Generation

As companies witness how their data landscapes grow and become more complex, new solutions are taking over the new analytics landscape and, as this has been pushed in great measure by new companies in the likes of Tableau, Qlik, or Dataiku to name just a few.

It seems now big software powerhouses are pushing hard to come with a new generation of tools to consolidate their data management and analytics offering.

With this, is not difficult to foresee a new competitive arena in a race to gain the favor of a totally new generation of data specialists, one I’m eager to keep track of, of course.

In the meantime take a look below at SAP’s Data Hub intro video and get a glimpse of this new solution.

Of course, please do let me know if you have a comment or feedback, lets keep the conversation.

* All logos and images are trademarks and property of their respective owners
Data & Analytics with Maple Flavour: Canadian Data & Analytics Companies. Part 2

Data & Analytics with Maple Flavour: Canadian Data & Analytics Companies. Part 2

In a continuation of my tour across the Canadian data management and analytics landscape started in Part 1, I will now describe a new group of companies from both ends of this great country that have incorporated “state-of-the-art� data and analytics technologies to their solutions

These companies, many of them startups ascending in the market ladder and  changing dramatically not just Canadian market but the global market with the introduction of innovative solutions in many key areas of the data management space, ranging from data visualization to advanced analysis and data warehousing.

So, here a complementary list of Canadian data solutions:

Solution(s): CrowdBabble

Crowdbabble is a social media analytics company from Toronto at the Ryerson Futures Technology Accelerator in the DMZ, that aims to help marketers to eliminate the complexities and time involved in the analysis performed to tie social media activities to business outcomes

With a Software as a Service (SaaS) platform Crowdbabble customers can measure, benchmark and optimize their social media performance.

With users in 450 cities around the world, including top customers, the company leverages a platform that enables customers to drill down and dig deeper to figure out, according to CrowdBabble, the “why� behind their social media performance strategies and tell a better story.

Some features offered by Crowbabble include:
  • 1-Click export for charts to enable fast downloads of any chart as an image to be inserted within an MS PowerPoint or Keynote presentation
  • Visual social media performance monitoring
  • Growth tracking of key metrics
  • Drill-Down for in-depth analysis into the details of data and identify the drivers of social media performance
  • Optimizes social media content to  learn which content works best for the audience,  comparing the performance of posts
CrowdBabble’s Screencap (Courtesy of CrowdBabble)

Solution(s): Envision5

ENVIRONICS Analytics is another company from Toronto with a fresh look at how analytics is being done.

With its latest edition of ENVISION5, its platform for providing business intelligence (BI) on customers and markets from anywhere in ​North America the company offers an easy-to-use and powerful cloud-based platform with a complete package set of geo and segment-based routines to provide customer insights, site evaluation and media planning as well as a large set of consumer data.

Some features offered by ENVISION5 include:

  • A web-services architecture
  • A responsive design that makes ENVISION5 tablet and mobile compatible
  • A workflow engine to enable users to define processes to follow a suggested path for importing data, analyze trade areas or create maps to locate promising prospects and develop marketing campaigns at the national, regional and local levels
  • Geographic and location capabilities to allow users to geocode and map the location of their customers and create reports to better understand who they are, what they spend money on or how they consume media
  • Capabilities for sharing data and share results across the organization using dashboards and micro-sites
  • The ability to enable users to create target groups from any customer file, whether the targeting is based upon life stage, assets, language spoken at home or views regarding technology
ENVISION5’s Screencap (Courtesy of ENVIRONICS Analytics)

Solution: Map4Decision, Map4Web

With more than 20 years of experience in the field, this company from Quebec delivers cutting-edge technology in the fields of geomatics and spatial BI systems. Originated from the team’s work at the Centre de Recherche en Géomatique from Université Laval, Intelli3 develops Map4Decision and more recently Map4Web —Intelli3‘s SaaS offering— to deliver high quality geo-data analysis to analysts and business users.

Through a quick to deploy and easy to use set of solutions involving on-demand map production, Intelli3 aims to lowering the level of uncertainty of analysis results involving geomatic and map production.

Some functional highlights included within Map4Decision and Map4Web include the ability to:

  • Explore aggregated views of information
  • Interactively drill-down to achieve more detailed views (e.g., province, region, city)
  • Dynamically intersect various themes of interest (e.g., time and territory)
  • Obtain instantaneously statistical charts, diagrams and thematic maps
  • Automatically create maps conforming to customer’s and official rules for visual communication and semiology (colors, symbols, patterns, etc.)
  • Navigate from one visualization type to another (e.g., statistical charts to multi-maps).
Map4Web’s Screencap (Courtesy of Intelli3)

Solution(s): Kaypok Insight, Kaypok Briefly 

Ontario- based company Kaypok is a company devoted to the development of enterprise unstructured text analytics solutions. Kaypok’s technology analyzes data regardless of source, this includes social media feeds, customer surveys, email, blogs and internal proprietary data and others.

According to the company: Kaypok’s high-performance algorithm  processes the noisy, unstructured information extracting usable knowledge and insights about what people are saying, sentiments and the root information elements driving analytics.

With a combination of two solutions —Kaypok Insight and Kaypok Briefly— the company targets the two major aspects of text and social data analysis:
  • Kaypok Insight provides content analytics technology whether it is an external unstructured data or internally residing enterprise application logs.
  • Kaypok Briefly allows users to analyze large numbers of textual content, aggregate data from different resources including RSS feeds, Google alerts, Blogs and Forum, filter, summarize, sort and share the content as well as find the most negative/positive articles immediately.
Kaypok’s Screencap (Courtesy of Kaypok)

Kaypok offerings are available in both Software as a Service (SaaS) and as an integrated enterprise model and compatible with various big data platforms as well as available for working on desktop and mobile devices.

Solution: Klipfolio

Klipfolio is the company behind the eponymous cloud-based application for developing and deploying real-time business dashboards to be used on many types of devices including web browsers, TV monitors and mobile devices.

The Ottawa based company claim it can virtually connect to any data source, on-premises or in the cloud: From web services, to files stored a computer, server, or in a data warehouse.

Klipfolio’s simple and flexible data architecture enables data sources to live outside the platform and allows users to create connections with data sources and allowing defining which portions of data to connect and to be pull into the application, as well as its load frequency.

From there, users can easily and quickly add pre-built data visualizations and dashboards, build data visualizations from scratch, or edit pre-built data visualizations and dashboards.

Some major functional features from Klipfolio include:
  • Support connection to over 100 cloud applications including Facebook, Twitter, Moz, Pingdom, Salesforce, Marketo, Google Analytics, Google Adwords, Xero, HubSpot and others, as well as different web services
  • Connection to local and server Excel, CSV and XML, FTP, SFTP files as well as data from DropBox, Box, and Google Drive file sharing services
  • Connection to all major database management systems including MSSQL, MySQL, Oracle Thin, Oracle OCI, Sybase SQL Anywhere, PostgreSQL, Firebird and DB2
  • Multiple ways to upload computer files including pulling the data from web services like Facebook and Google Analytics, pushing the data in from and API, as an email attachment or access the data from databases and servers
  • Share visualizations and dashboards within the organization or externally as well as define the periodicity of updates.  
Klipfolio’s Screencap (Courtesy of Klipfolio)

Solution(s): KNOMOS

West coast (Vancouver) company KNOMOS employs modern app design principles grounded in the user experience to deliver software tools for the legal industry. From law students learning in the classroom, to lawyers better serving their clients, and engaged citizens managing their practical legal affairs.

Focused on technology-driven solutions, KNOMOS has developed a data driven solution to provide effective search, analysis and visualization of legal information. Built for law students, lawyers and engaged citizens managing their practical legal affairs, KNOMOS a single access point for legal information and the tools for its management and analysis.

Some important features offered by KNOMOS’ data solution include:
  • A single access point for Federal & British Columbia (BC) laws, regulations and cases all accessed within an interactive visual navigation interface
  • A dual search display to provide an instant overview, with the size of visual search results from matches clustered by document type and relevant text previews
  • Visual navigation capabilities to help identify key information in context of a legal source’s structure, including frequency heat maps for keyword search results in a law or case
  • Citation heat maps to display the frequency of cross-references between legal sources including when a law cites, or is cited by, another law or a case, along with unique color coding for incoming & outgoing links
  • A centralized location to organize and save all content, personal notes, favorites, tags, and highlighted texts for future reference
  • Ability to pinpoint citations and link documents at the specific paragraph or section level with direct access to related content
  • Ability to filter and sort personal annotations
KNOMOS’ Screencap (Courtesy of KNOMOS)

Solution(s): Mnubo SmartObjects

Internet of Things (IoT) and artificial intelligence (AI) developer company Mnubo is an innovative company from Montreal. Mnubo delivers out-of-the-box insights, automated reports and advanced IoT data science solutions.

It offers a SaaS solution to enable product manufacturers to connect their products with its platform to ingest, enrich and analyze their product’s generated data.

Mnubo’s SmartObjects offering is a complete SaaS solution developed to avoid long roll-out plans, extensive IT resources or additional development skills with an approach to serve customers in consumer, enterprise and industrial verticals.

Major functional feature of Mnubo’s SmartObjects include:
  • Big Data storage and archival
  • Data cleanup & enrichment
  • Rich, flexible API’s that integrate with fully documented JSON REST APIs that include an advanced query language
  • Data visualization & reporting features to access pre-built dashboards and reports or create new customized dashboard without additional coding
  • A hosted and managed solution with a multi-tenant solution available on multiple cloud environments (e.g. Azure, AWS, Google, etc)
  • Plug & Play features to eliminate months of software integration and machine learning models training
  • An integrated view of data to enable real-time data is delivered to the appropriate stakeholder
  • Supports self-service to query at scale users big data repository of sensor data and enable ROI-driven insights
  • Out-of-the-box Insights to understand operations, faults, product usage, customer engagement/churn, etc. fast and easy
  • Security features including OAuth2 authorization framework, HTTP secure JSON REST API, data encryption at rest, no personally identifiable information (PII) stored
  • Cloud platform neutral design to support AWS, Azure, Google, and other cloud providers
Mnubo’ Screencap (Courtesy of Mnubo)

Solution(s): Nexalogy Free, Nexalogy Enterprise

Founded by astrophysicist Claude Théoret along with an experienced team, Montreal based company Nexalogy applies Claude’s algorithms and developed technology for studying black holes and how stars interact between them, to analyze connections between words and the people who write them throughout the social Web.

Consequently, Nexalogy can reveal undiscovered risks, opportunities, and hidden intelligence. Nexalogy’s cloud-based solution provide different social media intelligence services to companies of all sizes and industries to make better decisions.

Some capabilities and services offered by Nexalogy includes:
  • A scalable distributed system behind the analysis and data management
  • Data collection capabilities from many data sources around the web
  • Reports and visualization capabilities
  • The ability to routinely process millions of social media posts
  • Algorithms to identify themes, people, relationships, topics and content
  • Easy visual interaction with all social data including posts, topics and influencers via a set of dynamic visualizations.
Nexalogy's Screencap (Courtesy of Nexalogy)

Solution(s): PHEMI Central

PHEMI is a big data warehouse company from Vancouver that with its PHEMI Central solution allows organizations to easily access and mine data from any variety or volume.

PHEMI Central is, according to the company:

“a production-ready big data warehouse with built-in privacy, data sharing, and data governance. PHEMI Central delivers the scalability and economics of Hadoop with indexing, cataloging, fine-grained access control, and full life cycle, enterprise-grade data management.�

Built on Hadoop, PHEMI Central Big Data Warehouse aims to unlock data silo-ed and make it available for analytic and operational applications.

With the incorporation of Big Data technology, PHEMI allows users to scale to petabytes of data with cluster economics, additionally PHEMI Central adds simplified deployment and out-of-the-box operations, as well as the ability to integrate immediately with existing data sources and analytics tools.

Main features of PHEMI Central include:
  • Availability of the solution on-premises or as a managed service Amazon, Microsoft, Oracle, or ClearData HIPAA Compliant clouds
  • Ability to integrate with most leading analytics tools including:  Tableau, Qlik, Power BI, R, SAS and SPSS
  • Availability of data processing functions including: Excel/CSV Reader, Genomics Reader, (VCF/gVCF), JSON Reader, XML/HL7 Reader, Custom DPFs
  • An strong emphasis on security so that:
    • PHEMI’s access control strategy takes into account both and user attributes and data characteristics
    • Metadata and user attributes are brought together into simple yet robust rules to indicate who can see and do what with the data
    • A policy-based enforcement so access control is implemented automatically and uniformly
    • On site processing to provide the ability to securely process data so it can be presented as different views to different users based on their authorizations, without a person having to intervene
PHEMI Central Architecture (Courtesy of PHEMI)

Solution(s): RubiCore, RubiOne, Promotion Manager, Lifecycle Manager

This Toronto-based startup has been able to work with more than a dozen global, multi-billion dollar retailers and investors include Horizons Ventures, Access Industries, and the MaRS Investment Accelerator Fund.

Rubikloud’s big data architecture aims to gather retailer data from online and offline consumer behavior and use it to help retailers gain insight into their preferences: product affinity, price sensitivity among others to enable better demand prediction and forecasts.

The Rubikcloud data platform includes a series of solutions to provide easy to use yet effective analysis and data management functions for retailers to help them take control over their data and improve decisions with data.

Some key features coming from Rubikloud include:
  • The ability to discover the connections and insights within internal data
  • Collaboration capabilities to share work with other data scientists and analysts
  • The ability to deploy and manage multiple users through a built-in authentication service
  • Capabilities to develop proprietary models, at scale
  • The ability to connect models in pipelines to deploy multi-stage systems into production
  • The possibility to incorporate or benchmark against Rubikloud’s own models, trained on several years of retail data
  • A set of visualization libraries to compare and monitor the performance of models
  • Ability to gain a complete view into historical performance
  • Functionality to forecast the outcomes of promotional decisions before taking in-market action
  • Fine-tuning capabilities for pricing and promotional strategy
Rubikloud Screencap (Courtesy of Rubikloud)

Solution(s): ThinkCX

ThinkCX is a company devoted to deliver advanced automated, massive-scale analytic solutions for detection of consumer switching events in the smartphone OEM and wireless carrier marketplace across North America.

The company, from Langley, B.C. uses a patented machine learning solution that locates and confirms millions of device and carrier switching events yearly which then are used to put to work a series of analytical models.

Some key advantages of ThinkCX’s cloud-based platform considers:
  • Offering a complete Market View to, in addition to the internal churn activity, ThinkCX can provide similar insights about the competition’s subscribers
  • The platform uses commercially available external data as its only inputs, so no requirement for integration with a CRM is required
  • Simple deployment, to enable solutions to be deployed in minutes, with no heavy lifting required from IT and business teams
  • ThinkCX’s carrier and device insights can be delivered via a custom dashboard or integrated into Adobe Marketing Cloud solutions or DCM.

A final Note

As I mentioned in the first part, it is possible I’m still leaving some companies out, so please feel free to use the comment space to let me know your comments or the name of new Canadian analytics solution we all should know about.

P.S. As a note to Kevin Smith, I’ve decided to leave Keboola out of this group but, I will include them in another post devoted to data companies from Europe, as Keboola is actually based actually in the Czech Republic.

* All logos and images are trademarks and property of their respective owners

Oracle 18c Goes for Database Automation in the Cloud

Oracle 18c Goes for Database Automation in the Cloud

In what was probably the most important announcement made during 2017 ’s version of Oracle’s OpenWorld conference, the company announced the release of version 18c of its worldwide known database management system which includes two key features: to be a fully automated.

Oracle’s founder and CTO Larry Ellison made the announcement of the autonomous database, which includes database and cyber-security automation because, according to Mr. Ellison, “human processes stink�.

According to Oracle, the autonomous database will practically eliminate all human intervention associated with all database managing activities like tuning, patching, updating and maintenance by including major capabilities:

  • Self-Driving: Provides continuous adaptive performance tuning based on machine learning. Automatically upgrades and patches itself while running. Automatically applies security updates while running to protect against cyber-attacks.
  • Self-Scaling: Instantly resizes compute and storage without downtime. Cost savings are multiplied because Oracle Autonomous Database Cloud consumes less compute and storage than Amazon, with lower manual administration costs.
  • Self-Repairing: Provides automated protection from downtime. SLA guarantees 99.995 percent reliability and availability, which reduces costly planned and unplanned downtime to less than 30-minutes per year.

To achieve it, the new autonomous database has integrated applied machine learning techniques to deliver without human intervention, self-driving, self-tuning, self-recovering, and self-scaling management capabilities which aims to streamline operations and provide more efficient consumption of resources as well as higher security and reliability.

But first... the Data Warehouse

Oracle’s autonomous database service can handle different workload types including transactional, non-transactional, mixed or graph and IoT workloads yet, while the automated OLTP version is scheduled to be available by June 2018, Oracle’s first autonomous database service will be directed to data warehouse workloads, planned to be available 2017.

Much as like all their services, the design of Oracle’s Autonomous Database Cloud Service for Data Warehouse relies on machine-learning to enable automatic tune and performance optimization. By using artificial intelligence and machine learning, Oracle aims achieve autonomous control to offer reliability, performance and highly elastic data management services as well as to enable fast deployments that can be done in seconds.
According to Oracle, some features to be offered by the new service include capabilities to:
  • Execute high-performance queries and concurrent workloads with optimized query performance and with pre-configured resource profiles for different types of users
  • Deploy highly elastic pre-configured compute and storage architectures to instantaneously scale up or down, avoiding overpay for fixed blocks of resources
  • Integrate Oracle SQL DWCS all business analytics tools that support Oracle database
  • Make use of its built-in web-based Apache Zeppelin based notebooks
  • Deploy a self Driving fully automated database for self-tuning patch, upgrade itself while the system is running
  • Take advantage of its database migration utility dedicated cloud-ready migration tools for easy migration from Amazon AWS Redshift, SQL Server and other databases
  • Perform cloud-based scalable data-loading from Oracle Object Storage, AWS S3, or on-premises
  • Deploy under an enterprise grade security schema on which data is encrypted by default in the cloud, as well as in transit and at rest
The new Oracle autonomous database cloud service for data warehouse aims to eliminate manual configuration errors and ensure continuous reliability and self-correction, It also includes, according with Oracle unlimited concurrent access and an advanced clustering technology to enable organizations to scale without any downtime.

With the inclusion of this service, Oracle is expanding its data warehouse software stack portfolio, expanding its services for both on-premises and cloud platforms and with different data services, aiming to reach a greater number of organizations each with different data warehousing management needs and complexities such is the case for existing data warehouse services available within Oracle Exadata, Exadata Cloud, and now the autonomous database cloud service.

The Rise of the Automated Database?

The ideal to achieve full database automation is not new and many, if not all, software vendors have made important efforts to automate different aspects of a database administration cycle —examples include Teradata and Attunity for automating data ingestion and data warehouse or those efforts made by third party software providers like BMC with BladeLogic Database Automation— and yet, until now full automation seemed to be an impossible task.

One main reason is that database automation involves not just the ability to achieve automation for common repetitive database configuration tasks including those involved with initial schema and security configuration but much more complex tasks including database tuning and performance monitoring which requires the ability adapt to changing conditions and require the system’s ability to learn and adapt.

The evolution of machine learning, artificial intelligence and cognitive computing technologies is certainly making this automation efforts possible and of course, Oracle deserves significant credit for embracing these technologies and taking a step further and aiming to achieve fully database automation.

As we should expect, it will not take long for other software providers to join the race and join the ranks of vendors offering fully automated database solutions, so as a cautionary message, it will be critical, in my view, to start by making comprehensive assessments of these solutions capabilities and accuracy before rushing to push the automatic pilot button and get rid of your DBA’s just yet.

You might realize it will take some time before you can lower your IT footprint.

Comments? Let me know your thoughts

IBM’s Integrated Analytics System Joins the Ranks of Full Powered Analytics Platforms

IBM’s Integrated Analytics System Joins the Ranks of Full Powered Analytics Platforms

As we get deeper into an era of new software platforms both, big players and newcomers are industriously working to reshape or launch their proposed new-generation analytics platforms, especially aiming to appeal to the growing community of new information workers or “data scientists� ㅡa community always eager to attain the best possible platform to “crunch the numbers�ㅡ, examples include those including Teradata with its new analytics platform or Cloudera with its Data Science Workbench.

So now the turn is for IBM, which recently unveiled its Integrated Analytics System. IBM’s new offering represents the company’s unified data system aimed to provide organizations with easy, yet sophisticated platform for the development of data science within data from on-premises, private, public of hybrid cloud environments.

The new offering coming from the “Big Blue� company is set to incorporate a myriad of data science tools and functionality features as well as the proper data management processes for developing and deploying advanced analytics models in-place.

The new offering aims to allow data scientists to easily perform all data science tasks, including moving workloads to the public cloud to begin automating their businesses with machine learning easily and rapidly.

The system is built on the IBM common SQL engine to enable users can use a common language and engine across both hosted and cloud-based databases allowing them to move and query data across multiple data stores, including Db2 Warehouse on Cloud, or Hortonworks Data Platform.

According to IBM, the Integrated Analytics System, the product team has developed the platform to blend and make the system work seamlessly work with IBM’s Data Science Experience, Apache Spark and the Db2 Warehouse on Cloud, where:

  • The Data Science Experience is set to provide the set of necessary critical data science tools and a collaborative work space
  • Apache Spark set to enable in-memory data processing to speed analytic applications
  • Db2 Warehouse on Cloud to enable deployment and management of cloud-based Db2 Warehouse on Cloud clusters within a single management framework

All aimed for data scientists to allow them create new analytic models that then developers can make use of for developing and deploying intelligent applications easily and rapidly.

According to Vitaly Tsivin, Executive Vice President at AMC Networks:

“The combination of high performance and advanced analytics – from the Data Science Experience to the open Spark platform – gives our business analysts the ability to conduct intense data investigations with ease and speed. The Integrated Analytics System is positioned as an integral component of an enterprise data architecture solution, connecting IBM Netezza Data Warehouse and IBM PureData System for Analytics, cloud-based Db2 Warehouse on Cloud clusters, and other data sources.�

The Integrated Analytics System is built with the IBM common SQL engine to enable users to seamlessly integrate the unit with cloud-based warehouse solutions and, to provide users with an option to easily move workloads seamlessly to public or private cloud environments with Spark clusters, for their specific requirements.

Some capabilities and power include:

  • Asymmetric massively parallel processing (AMPP) with IBM Power technology and flash memory storage hardware
  • Ability to built on the IBM PureData System for Analytics, and the previous IBM Netezza data warehouse offerings
  • Support for variety of data types and data services, including the Watson Data Platform and IBM Db2 Warehouse On Cloud, to Hadoop and IBM BigSQL.

Also, the new Integrated Analytics System incorporates hybrid transactional analytical processing (HTAP) where HTAP can run predictive analytics, transactional and historical data on the same database at faster response times.

Additionally, the Integrated Analytics System is designed to provide built-in data virtualization and compatibility with the rest of the IBM data management product stack including Netezza, Db2, and IBM PureData System for Analytics.

According to IBM, later this year, the company has plans to incorporate support for HTAP within the IBM Db2 Analytics Accelerator for z/OS to enable the new platform to seamlessly integrate with IBM z Systems infrastructures.

A new “data science� platform era?

It seems a major reshaping is ongoing in the BI and analytics software market as new-generation solutions keep emerging or getting more robust.

It also seems this transformation, seen from the user perspective of view is enabling traditional business intelligence tasks to evolve, blurring the lines between the traditional BI analysis and that coming from data science, helping departments to evolve their BI teams more naturally into robust advanced analytics departments and even easing somehow the educational process these departments need to overcome to make their personnel evolve with the times.

It seems we are seeing a new era in the evolution of enterprise BI/analytics/data science platforms that are about to take over the world. A new space worth to keep an eye on, I think.

Analytics with Maple Syrup Flavour: Canadian Data & Analytics Companies. Part 1

Analytics with Maple Syrup Flavour: Canadian Data & Analytics Companies. Part 1

We all know Silicon Valley is the mecca of technology and, of course, this applies too,for the business intelligence (BI) and analytics market, as it concentrates many of its vendors.

Still, it is not hard to realize that around the world we can find tech companies developing innovative technology and software in many areas of the data management space, both already consolidated companies and vibrant startups looking to disrupt the market.

While for many people it’s not a surprise the relevant role some Canadian companies have played in the evolution of the business intelligence (BI) and analytics market, for some, it is still unclear which companies in Canada are setting a mark for the evolution of local Canadian data management technology.

Just as brief sample, we can mention these honorable mentions of Canadian companies who played a key role in the evolution of BI, including:

  • Former Ottawa-based software company Cognos, a fundamental player in the BI enterprise performance management software market acquired by IBM
  • Dundas, a longtime runner in the market who remains as a relevant player and who sold part of its dashboard and reporting technology to become part of Microsoft’s large reporting and analytics arsenal
  • Or more recently Datazen, an innovative mobile BI and data visualization developer acquired also by Microsoft

So without further due, here’s a list of some current players making waves in the BI and analytics market:

Core Analytx 
Solution(s): OnTime Analytx 

Based in Markham, Ontario. Core Analytx is the developer of OnTime Analytics, the company’s flagship product and main analytics offering.

With its solution being offered in three flavors: standard (SAAS based), enterprise (on-premises, as well as on private cloud) the company aims to encourage, guide and assist organizations with the implementation of analytics centric processes.

Core Analytx develops its proprietary technology with the principles of ease of use and self-service approach and, the provision of practical and efficient analytics products and services to serve organizations from  different industries and lines of business.

Major functions and features offered by OnTime Analytics include:

  • Data ingestion from databases, flat files, mainframes and others
  • Configurable web services for data connectivity
  • Test data Management
  • Basic and advanced analytics features
  • Custom training features
  • Data transformation capabilities
  • Data visualization and publishing capabilities
  • Importing data via a  data loader that connect to all standard databases (i.e. SQL Server, MySQL, Oracle etc.)
  • “What if Scenarioâ€� capabilities
  • Application customization via developer API
  • Ad-hoc report creation
  • Integration to key partners software including and Oracle Cloud
  • Customized security capabilities

OnTime Analytx’ Screencap (Courtesy of Core Analytx)

Solution(s): Coveo Intelligent Search Platform

Coveo is a company with a great deal of experience when it comes to search, identify and provide contextual information to end users.

Based in Quebec City, its flagship Intelligent Search Platform offers uses number of data analysis and management capabilities bundled under Coveo AIâ„¢ proprietary technology. With this technology, Coveo can search, find and deliver predictive insights across different cloud and on-premises systems.

Already well known for being a provider of enterprise search solutions, the company has expanded its solution to offer much more, using its now cloud-based solution.

Some core functional elements offered within its platform include:

  • Artificial Intelligence(AI)-powered search
  • Relevant search results
  • Advanced query suggestions and intelligent recommendations to website visitors and automatic relevance tune to recommend the best content
  • Single-Sign On (SSO), for unified search that honors user and group permissions across all enterprise content sources
  • Personal business content, like emails, can only be searched and seen by the individual user, and any designated super-users (e.g. compliance officers)
  • Usage Analytics

Coveo includes partnerships with key software companies to allow its platform to integrate and work with data from Microsoft, Sitecore and
Coveo’s Screencap (Courtesy of Coveo)

DMTI Spatial
Solution(s): Location Hub Analytics

For over 20 years, DMTI Spatial has been providing industry leading location economics and Master Address Management (MAM) solutions to Global 2000 companies and government agencies, it is also the creator of CanMap mapping solutions and the award-winning Location Hub. DMTI Spatial is headquartered in Markham, Ontario.

Location Hub Analytics is a self-service data analytics engine that provides Canada’s robust, accurate and up-to-date location-based data.

Relevant functional features of Location Hub Analytics include:

  • Automatically consolidates, cleanses, validates and geocodes your address database
  • Each record is assigned a Unique Address Identifier (UAIDâ„¢)
  • Quickly processes and analyzes data, to objectively reveal meaningful patterns and trends to help better understand customers and prospects
  • Allows you to visualize and interact with your results on a map for better data profiling
  • Enriches data with Canadian demographics' information for further analysis and greater customer intelligence
  • Helps generate new business prospect lists by infilling the addresses within a specific territory that are not in your current database

Location Hub Analytics (Courtesy of DMTI Spatial)

Solution(s): Dundas BI

Dundas is an experienced company in the business intelligence scene. Headquartered in Toronto, the company offers, via its now flagship product Dundas BI a robust BI and analytics platform.

With its BI solution, Dundas aims to give users full control over their data so it can be quickly delivered in the most actionable way. Dundas BI platform enables organizations to work with data, prepare it  and transform it and subsequently enable its visual exploration within dashboards, reports and visual data analytics tools.

Also, worth to mention is Dundas’ success relies on its ability to build a solution with a wide amount of built-in functionality, and a rich set of open APIs.

Main functional features include:

  • Customizable dashboards
  • Communication and collaboration tools
  • Slideshows
  • Rich, interactive Scorecards
  • Ad-hoc reporting
  • Mobile features
  • Predictive and advanced data analytics
  • Embedded BI with seamless data integration
  • Support for Windows authentication
  • Multi-tenancy support

Dundas BI’s Screencap (Courtesy of Dundas)

Panorama Software
Solution: NECTO

Necto is Panorama Software’s full BI and analytics solution. The Toronto based company has offices in the US, UK and Israel, the company develops a business intelligence and analytics solution that offers automated analysis and recommendations which are easily disseminated throughout the organization.

With a fully customizable layout that can be modified to fit within many organization’s language and with easy point and click functionality, Panorama aims with Necto to take collaboration to the next level with the best business intelligence reporting tools that communicates real data.

Key features offered in Necto include:

  • Centrally administered, fully web based system
  • Fully functional dashboard design capabilities & simplified info-graphics
  • Automated analysis & hidden insights
  • Easy sharing of BI content
  • High security & scalability
  • Powered with KPI alerts
  • Mashup data from multiple sources
  • Simple & fast development

Necto’s’ Screencap (Courtesy of Panorama Software)

Solution: Semeon Insights

With great deal of experience in the machine learning and artificial intelligence (AI) R&D within its corridors and offices, Montreal based Semeon develops next generation cloud-based “AI Linguistic� text analytics platform solution Semeon Insights to service businesses interested in better understanding what is being said about their brand, company, products, staff, competitors, and more.

All Semeon’s solutions are developed using its series of patented Semantic Algorithms which can determine the sentiment, intent, and predictive behaviors of clients, buyers or customers.

Key features offered by Semeon Insights include:

  • Sifts through public (Social Media, forums, blogs, review sites) as well as private data (CRM data, Customer Service data) to enhance customer-driven campaigns.
  • Use of number of techniques to uncover key insights and influencers, these techniques include:
    • Sentiment Analysis
    • Concept Clouds
    • Timeline Tracking
    • Content Classification
    • Sources/channels
    • Influencer identification
    • Data Visualization
    • Geolocation
    • Intent Analysis
  • Leverage concepts and opinions that drive public perception to fuel content creation teams and boost ROI as well as glean insights from competitor’s digital campaigns. 

Semeon’s Screencap (Courtesy of Semeon)

Ahh! And There’s More

So, in the second part of this series I will include some other start-ups and projects that will caught your attention with their innovation and opportunity for both using them or build business with them.

In the meantime and considering I might be leaving some companies out, please feel free to let me know your comments or the name of new Canadian analytics solution we all should know about.

* All logos are trademarks of their respective owners
Teradata Aims for a New Era in Data Management with its New IntelliSphere Offering

Teradata Aims for a New Era in Data Management with its New IntelliSphere Offering

As Teradata continues to expand its Teradata Everywhere initiative, major announcements came from within its 2017 Partners conference, so along with the announcement of its brand new analytics platform, the company also unveiled a new comprehensive software portfolio that adds the data management power needed behind the analytics scenario.

According to Teradata, IntelliSphere is “a comprehensive software portfolio that unlocks a wealth of key capabilities for enterprises to leverage all the core software required to ingest, access, deploy and manage a flexible analytical ecosystem�.

(Image courtesy of Teradata)

Meanwhile, Teradata IntelliSphere is intended to complement the ongoing Teradata Everywhere initiative and be a natural companion for the Teradata Analytics Platform and, an important tool to enable users across the organization to use their preferred analytic tools and engines across data sources at scale, while having all the necessary components to ensure efficient data management from ingestion to consumption.

According to Oliver Ratzesberger, Executive Vice President and Chief Product Officer at Teradata:

“With IntelliSphere, companies no longer need to purchase separate software applications to build and manage their ecosystem. Companies can design their environment to realize the full potential of their data and analytics today, with a guarantee that future updates can be leveraged immediately without another license or subscription.�

Available for purchase now, the IntelliSphere software portfolio includes a series of key capabilities to ensure efficiency in all the data process to:

  • Ingest, so  companies can easily capture and distribute high volume data streams, with ready-to-run elastic architecture and quick access for business-critical analysis.
  • Access, so companies can gain easy access to data stored in a hybrid cloud or heterogeneous technology environment.
  • Deploy applications and analytic models for easy user access and enterprise collaboration.
  • Manage, to allow ad-hoc data movement, as well as ongoing monitoring and control via an operational interface.

According to the data management company, Teradata IntelliSphere is composed of ten software components, which include:

Finally, the company mentions that in the future, all new software releases will become part the IntelliSphere bundle, a logical step towards building a consistent and more homogeneous analytics ecosystem that can help Teradata to provide simplicity and functional power to its user base.

As I mentioned in another blog in this same vein, It seems we are facing a new stage in the analytics and data management software market in which software companies are now fully renovating their offerings to consolidate as many functions as possible within single enterprise platforms that blend all analytics needs with a robust data engine.

In future posts I’ll try to bring more information about this and the rest of Teradata’s new set of offerings so, stay tuned.
Teradata includes brand New Analytics Platform to its Teradata Everywhere Initiative

Teradata includes brand New Analytics Platform to its Teradata Everywhere Initiative

(Image Courtesy of Teradata)
In a recent announcement made during its 2017 Partners conference data management software provider, Teradata made an important new addition to its global Teradata Everywhere initiative with a brand new analytics platform.

The new offering to be available for early access later this year will aim to enable users to use the analytics environment of their choice. According to the company, the new analytics platform is planned to enable access to a myriad of analytics functions and engines so users can develop full analytics processes and business solutions using the tools of their choice so initially, the new platform, will natively integrate with Teradata and Aster technology (Figure 1) and in a near future will enable integration with leading analytics engines including Spark, TensorFlow, Gluon, and Theano.

Figure 1.  Aster Analytics Functions (Courtesy of Teradata)

As corporate data is increasingly captured and stored in a wider number of formats, the platform includes support for several data types from multiple data sources, from traditional to new formats social media and IoT formats, including text, spatial, CSV, and JSON formats, or Apache Avro, as well as other open-source data types that allow programmers to dynamically process data schemas.

As part of a new set of functional features, the Teradata Analytics Platform includes the provision of different scalable analytic functions like attribution, path analytics, time series, and number of statistical, text and machine learning algorithms.

With support for multiple languages including Python, R, SAS and SQL, and different tools like Jupyter, RStudio, KNIME, SAS, and Dataiku. Teradata expects experienced users can use their tool of choice to not just develop with less disruption but to promote efficiency via code and model re-use via Teradata’s AppCenter to allow analysts to share analytic applications and deploy reusable models within web-based interface.

According to Oliver Ratzesberger, Teradata’s executive vice president and chief product officer:

“In today’s environment different users have different analytic needs, this dynamic causes a proliferation of tools and approaches that are both costly and silo-ed. We solve this dilemma with the unmatched versatility of the Teradata Analytics Platform, where we are incorporating a choice of analytic functions and engines, as well as an individual’s preferred tools and languages across data types. Combined with the industry’s best scalability, elasticity and performance, the Teradata Analytics Platform drives superior business insight for our customers.�

According to Teradata, the benefits offered by the new analytics platform include:

  • Simplification of data access to both data warehouse and data lakes
  • Speed data preparation with embedded analytics
  • Allow fast and easy access to cutting-edge advanced analytics and AI technologies
  • Support for preferred data science workbenches and languages like R, Python, and SQL
  • Helping to make prescriptive analytics operational to enable autonomous decisioning
  • Minimize risk of existing analytical architectures with Teradata Everywhere

More important, the announcement of the new analytics platform comes along with the announcement of Teradata’s new comprehensive software portfolio initiative IntelliSphere, the new company’s proposal for easy data access, ingestion, deployment, and management.

According to Teradata, the new platform is planned to be flexibly delivered on-premises or via public and private clouds as well as managed cloud options, for which all them will use the same software.

Teradata is definitely aiming to be everywhere

Teradata seems to have understood how important is and will be in the future to offer new software solutions from more open and agile architectures that play well with others and yet are solid and secure. A movement other data management companies are already exploring and adopting, such is the case for, among others, Cloudera and its new Data Science Workbench or SAS’ Open Analytics Platform.

It seems we are facing a new stage in the analytics and data management software market in which software companies are reshaping all its offerings to consolidate as many functions as possible within single enterprise platforms that blend all analytics needs with a robust data engine.
In the meantime, personally I’m eager to check the new Teradata’s Analytics Platform in action.

The BBBT Sessions: Outlier, and the Importance of Being One

The BBBT Sessions: Outlier, and the Importance of Being One

It has been some time since my last write up about my briefings with the Boulder Business Intelligence Brain Trust (BBBT), multiple business engagements and yes, perhaps a bit of laziness can be attribute to it.

Now I have the firm intention to regain coverage of these series of great analyst sessions a more regular basis, hoping of course, my hectic life will not stand in my way.

So, to resume my coverage of this great series of sessions with software vendors and analysts, I have picked one that, while not that recent, was especially significant for the BBBT group and the vendor itself. I’m talking about a new addition to the analytics and BI landscape called Outlier.

Members of the BBBT and myself had the pleasure to be witness of the official launch of this new analytics and business intelligence (BI) company and its solution to the market.

Outlier presented its solution to our analyst gathering in what was an appealing session. So here, a summary of the session and info about this newcomer to the BI and Analytics space.

About Outlier

Outlier, the company, was founded by seasoned tech entrepreneur Sean Byrnes (CEO) and experienced data scientist Mike Kim (CTO) in 2015 in Oakland, Ca. with founding of First Round CapitalHomebrew, and Susa Ventures.

Devoting more than year to develop the new solution, Outlier maintained it in beta through most of 2016, to be finally released in February of 2017 aiming to offer users a unique approach to BI and analytics.

With its product named after the company, 33,3 333Outlier aims to be well, precisely that by offering a different approach to analytics, so that it:

“Monitors your business data and notifies you when unexpected changes occur.�

Which means that, rather than taking a reactive approach in which the system waits for the business user to launch the analytics process, the system will take a proactive approach and signal or alert when these changes occur, triggering action from analysts. 

Now to be honest, this is not the first time I hear this claim from a vendor and frankly, as many modern BI solutions incorporate more sophisticated alerting mechanisms and functionality I’m less concerned on hearing it and more on discovering how each software provider addresses the issue of making analytics and BI solutions able to be proactive.

During the session, Sean Byrnes and Doug Mitarotonda, CEO and Head of Customer Development respectively, gave us a great overview of Outlier’s new approach to BI and analytics. Here, a summary of this great briefing.

Outlier AI and a New Analytics Value Chain

Being data scientists themselves, Outlier’s team understands the hardships, complexities and pains data scientists and business analysts undergo to design, prepare and deploy BI and analytics solutions so, by considering this and aiming to take a fresh approach Outlier was born, aiming to provide a new approach to business intelligence.

The approach developed by Outlier intends to ―opposed to creating dashboards or running queries against business data analysis requirements― watch consistently and automatically business data and alert of when unexpected changes occur so to do this.

Outlier connects directly to a number business data types in the likes of Google Analytics, Adobe Cloud, Salesforce, Stripe, SQL databases and many others to, then, automatically monitor or watch the data and alert of unexpected behavior.

Along with the ability to proactively monitor business data and alert of changes, Outlier can sift through metrics and dimensions, aiming to understand and identify business cycles, trends and patterns to automate the business analysis process and consequently, position themselves in the realm of a new generation of BI solutions (Figure 1).

Figure 1. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

During the BBBT session with Outlier, one key thing brought up by Sean Byrnes was the fact that the company’s leadership understands the analytics and business intelligence (BI) market is changing and yet, many companies are still struggling now, not with the availability of data but with the questions themselves, as the analytics formulation process becomes increasingly complex.

According to the company, as part of a process aimed to automate the monitoring and analytics process and, to help users ease its regular monitoring process, once deployed Outlier can provide daily headlines from key business dimensions, enabling them to ask those critical questions in the knowing there will be a regular answer but still enable them to formulate also new ones to keep discovering what is important (Figure 2).

Figure 2. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

Interestingly, I find this process to be useful, especially to:
  • Carry on with common data analysis and reporting tasks and above all that can truly automate the analytics process so it can detect when a significant change occurs.
  • Take a proactive approach to encapsulate the complexities of data management and present insights in a proper way for users to make business decisions ―act on data―
  • Filter data to recognize what is important to know when making a decision.

Outlier: It is not Just About the Common, but the Uncommon

Today many organizations can know how much they sold last month or, how much they spend on the last quarter, those are relevant yet, common questions that can be answered with relative ease but today, it is now also about discovering not just answers but new questions that can unveil new key insights, opportunities, and risks.

Outlier identified this as a key need and acted upon it, knowing that sometimes constructing the infrastructure to achieve it can become far more than a trivial task, as it many times forces organizations to radically modify existing traditional BI platforms to accommodate the introduction of new or additional analytics capabilities ―predictive, mining, etc.― that might easily fit or not with existing BI solutions within an organization.

Outlier aims to automate this process by making it possible for organizations connect directly with various sources a business analyst take data from to guide him through an automation of the monitoring process.

One key aspect of Outlier I find worth mentioning is how the company strives to augment rather than replace the capabilities of existing analytics and data management solutions, and trying to fit within a specific point of what the company calls the analytics value chain (Figure 3).

Figure 3. Outlier’s Analytics Value Chain Proposition (Courtesy of Outlier)

During the demo session, other relevant aspects of Outlier include its functionality for providing new and useful functional elements like the inclusion of headlines and dashboards or scorecards that include nicely a combination of graph and textual information (Figure 4), a large set of connectors for different data sources including traditional databases and social media sources.

Also, worth mentioning is the effort Outlier is doing for educating potential users in the field of BI and Analytics and, of course, the potential use of Outlier in different industries and lines of business by making available a section in their portal with helpful information ranging how to analyze customer acquisition cost to performing customer segmentation.

Figure 4. Outlier’s Screencap (Courtesy of Outlier)

Outlier and a New Generation of BI and Analytics Solutions

As part of a new wave of solutions developing and providing analytics and BI services, Outlier is constantly working in the introduction of new technologies and techniques to the common functional portfolio of data analysis tasks, Outlier seems to have countless appealing functions and features to modernize the current state of analytics.

Of course, Outlier will face significant competition from other incumbents already in the market such is the case for Yellowfin, Board, AtScale, Pyramid Analytics and others but, if you are in the search or just knowing about new analytics and BI offerings, it might be a good idea to check out this new solution if you think your organization requires an innovative and agile approach to analytics, with full monitoring and alerting capabilities.

Finally, you can start by checking, aside its website some additional information right from the BBBT, including a nice podcast and the session’s video trailer.
Book Commentary: Predictive Analytics by Eric Siegel

Book Commentary: Predictive Analytics by Eric Siegel

As much as we’d like to imagine that today the deployment and use of predictive analysis has now become a commodity for every organization and it’s of use in every “modern� business.

The reality is that in many cases an number of small, medium and even large organizations are still not using predictive analytics and data mining solutions as part of their core software business stack.

Reasons can be plenty: insufficient time, budget or human resources as well as a dose of inexperience and ignorance of its real potential benefits can be the cause. This and other reasons came to my mind when I had the opportunity to read the book: Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die written by former Columbia University professor and founder of Predictive Analytics World conference series, Eric Siegel.

Aside from being a well and clear written book filled with examples and bits of humor to make it enjoyable, what makes this book in my view a one is, mostly written for a general audience, in plain English, which makes it a great option for those new in the field to fully understand what predictive analytics is and the potential effect and benefits for any organization.

With plenty of industry examples and use cases, Mr. Siegel neatly introduces the reader to the world of predictive analytics, what it is, and how this discipline and its tools are currently helping an increasing number organizations in the likes of Facebook, HP, Google, Pfizer ―and other big players in their fields― to discover hidden trends, predict and plan for making better decisions with data.

Another great aspect of the book is also its clear and easy explanation of current important topics including data mining and machine learning as a key to further advanced topics including artificial intelligence and deep learning. It also does a good job of mentioning some caveats and dangers making wrong assumptions when using predictive analytics.

I especially enjoyed the central section of the book filled with examples and use cases predictive analysis in different industries and lines of business like healthcare financial and law enforcement among others as well as list of resources listed at the end of the book.

Of course for me, being a practitioner for many years, there was a small sense of wanting a bit more technical and theoric details but yet, the book is a great introduction reference for both novices that need to get the full potential of predictive analytics and those familiar with the topic but that want to know what their peers are doing to expand their view of the application of predictive analysis in their organization.

If you are still struggling to understand what predictive analysis is and what benefits can offer to your organization can do to improve your decision making and planning abilities, or want a fresh view on what are the new use cases for this discipline and software solutions, Predictive Analytics from Eric Siegel is certainly reference you should consider having in your physical or virtual bookshelf.

Have you read the book? About to do it? Don’t be shy, share your comments right below…

BOARD International: Cognitive, Mobile, and Collaborative

BOARD International: Cognitive, Mobile, and Collaborative

Business Intelligence (BI) and Enterprise Performance Management (EPM) software provider BOARD International recently released version 10.1 of its all-in-one BI and EPM solution. This release includes new user experience, collaboration, and cognitive capabilities, which will enable BOARD to enter into the cognitive computing field.

By incorporating all these new capabilities into its single BI/EPM offering, BOARD continues to uphold its philosophy of offering powerful capabilities within a single platform.

With version 10.1, BOARD aims to improve the way users interact with data significantly. The new version’s interface introduces new user interaction functionality in areas such as user experience and storytelling and is a major improvement on that of the previous version.

BOARD gave me an exclusive overview of the main features of version 10.1 and the company's product roadmap. Read on for details.

Getting On Board with Cognitive Technologies

With version 10.1, BOARD seems to be making its solution fit for a new era centered on machine learning. The solution uses natural language recognition (NLR) and natural language generation (NLG) capabilities to offer users new ways to interact with data (see Figure 1).

Figure 1. BOARD’s assistant (image courtesy of Board International)

For instance, users can now create an entire report in a drag-and-drop interface. They can also directly ‘talk’ to the system through spoken and written language. The system uses search-like strings that automatically translate human speech into words, words into queries, queries into reports, and finally reports that include the most important insights from the source information.

One key aspect of these features is that users can create a report by simply writing a search string or request. Specifically, BOARD uses a fuzzy search mechanism that searches the string for character sequences that are not only the same but similar to the query term to transform this request into a machine-generated report (Figure 2).

Figure 2. BOARD’s machine-generated report analysis (image courtesy of Board International)

BOARD can also identify, recover, and list reports that match the search criteria, such as reports generated by other users. This capability speeds up the solution development process by enabling users to identify existing work that can be used for a new purpose.

In-context Collaboration

BOARD has also improved its collaboration strategy, specifically by facilitating communication between users. The vendor has introduced an in-context collaboration feature that enables users to share their analyses, communicate via live chat, and enabling multiple users to edit and author reports in a single interface. Embedded security (Figure 3) ensures users have the right level of access and defines user groups. This enables users to share analytics securely and seems to improve the overall analysis of data and the development of analytics apps.

Figure 3. BOARD’s embedded collaboration features (Courtesy of Board International)

User Experience and Storytelling

BOARD is also continuing to focus heavily on customer experience and functional efficiency.

The latest version of BOARD’s BI and EPM platform has redesigned user interfaces, including a color-coded tile menu with icons to improve hierarchy management and touchscreen usability. In addition, the configuration panel now offers more time and analytics functions.

10.1 also introduces Presentations—a new storytelling capability that enables users to personalize their reports and save them as a live presentation. This enables users to share presentations that incorporate live information rather than static images and graphs with other users and groups, improving user collaboration.

This new feature lets BOARD stay up to date with current trends in BI and compete with other players in the field that already offer similar capabilities, such as Tableau and Yellowfin.

Mobility, Cognitive Capabilities, and Collaboration: BOARD’s Bet for the Future

BOARD also explained that it‘s paving the way for medium- and long-term product advancements.

In its latest release, BOARD has ensured its HTML 5- based client will replicate all the functionality of its existing Windows client interface in future. This will enable users to choose between mobile and desktop devices.

10.1 also introduces, new mobile apps and add-ons, which widen BOARD’s intrinsic analytics and data management capabilities and the solution’s mobile functions and features.   The company is also currently reinforcing the product’s interaction with the Microsoft Office software stack in a continuous effort to help users increase productivity. This will help users conduct BI and EPM analysis more easily as they will have access to embedded analytics services within the standard Office applications such as Word and Excel.

Lastly, 10.1 also includes more features for accessing big data sources and cloud-based technologies and has partnered with cloud CRM and Business Software leader’s also worth noting that BOARD is now expanding its North American presence. Specifically, the vendor is increasing the number of its human and material resources to reinforce its marketing and sales efforts and support and services capabilities.

BOARD 10.1 offers a good balance of analytics and enterprise performance management capabilities. It could be a solution for those looking to start using analytics or enhance their existing analytics capabilities.

(Originally published on TEC's Blog)
2017 Teradata Influencer Summit: Blending In on the New Management Era

2017 Teradata Influencer Summit: Blending In on the New Management Era

A couple of weeks ago I was fortunate to be invited to attend the 2017 influencer summit event in the beautiful venue chosen by Teradata in La Joya, California. Aside from the beautiful venue, a great event took place, one which was insightful, interesting and, well fun. A confirmation of Teradata’s evolution in both technical and b
usiness sides and a confirmation that the IT and software industry have radically changed in the last couple of years.

Since last year’s partners Conference and influencer events, Teradata keeps moving forward with its evolutive process to adapt to the new business and technical dynamics of the market. These year’s event allowed analysts, pundits and influencers alike to have a glimpse of what Teradata is doing to deliver value to customers and new customers.

More Analytics, More Integration, More Scale...

In its continuous effort Teradata is making sure its offerings are available in all shapes and forms, more precisely, in all major Cloud and on-premises flavors as part of Teradata’s everywhere strategy. This includes launching Teradata in the Azure marketplace, and increasing geographic coverage for its own Manage Cloud. In the same pace, the company is also working to rapidly adjust to business and industry changes to continuously improve solution delivery and services.

Right from the get-go John Dining, Teradata’s Executive Vice President & Chief Business Officer, gave us a clear overview the enterprise analytics and data management software provider is working on different strategic paths to ensure the company remains a top of its industry market segment.

John Dining presenting at Teradat's 2017 Influencer Summit Event

One key and noteworthy aspect of this overall strategy is Teradata’s bold approach and continuing effort to match its product development with the creation of a business coherent proposal via three areas:

  • Reinforcing its multi-genre analytics strategy, which means widening the offering of analytics capabilities to strengthen user’s capabilities in areas such as text, path and graph analysis, among others.
  • Boldening Teradata’s power to perform more versatile and flexible data movement and integration operations to support an increasing number of sources and complex operations with data. This includes increasing Teradata’s ability to incorporate intelligence and automation for data management operations as well as developing vertical solutions for specific areas such as communications, finance or lines of business like marketing and devops.
  • Increasing Teradata’s ability to scale according with customer's’ needs, especially for those with big data management demands.

One important takeaway here in my view is Teradata’s clear path from a technical perspective, focusing on real technical challenges to be addressed by a majority of organizations and yet, at the same time, changing its message to be less technical and more business oriented to provide clarity especially to the enterprise market, a market they know perfectly well.

Blended Architectures are in the Future Oh! and Yes, they Need Service

In a time where organizations seem to be increasingly reluctant to invest in consulting services and keen to look for vanilla deployment solutions, Teradata seems to be taking a more realistic approach.

On one hand, by putting specific measures to reinforce its services business, and on the other, by clearly acknowledging that blended architectures and hybrid deployments will be the norm in the following years or at least for the time being, which means having high quality consulting and services can be key to ensure success, especially in complex analytics deployment scenarios.

Aside from their incumbent software solutions, by taking aim to restructure its service and consulting areas, Teradata aims to have a better position to act upon these complex deployments that require specialized services.

According to Teradata, the company has been working to consolidate its services areas, via important acquisitions in the likes of ThinkBig, Claraview and BigData Partnership, as well as working to integrate them into a coherent service model, its Teradata Global Services Initiative.

The initiative is prepared on three main areas through:

  • Think Big Analytics, the global analytics consultancy group group leading with expertise in areas such as data science, solution development and data visualization for different industries and functions.
  • Enterprise Data Consulting, the technology-enabled group with strong expertise on analytical ecosystems, providing services ranging from architecture, data management & governance, managed services, as well as security 
  • Customer Services, the group responsible for providing value and availability of analytic platforms via change management services, and with expertise in systems and software management 

The strategy seems to be well complemented with the inclusion of a complete business value framework that, asides from a comprehensive analytics strategy for customers, education and the inclusion of Teradata’s Rapid Consulting Engagement (RACE) strategy, is aimed to help customers leverage comprehensive solution in a matter of weeks and providing “agile� development models for is customers.

Teradata’s approach seem to make perfect sense, enabling the company to grow efficiently on the technology side, especially towards a hybrid cloud approach while ensuring the offering of high quality consulting services.

Now, can this approach carry challenges for the company?

It is possible, perhaps one challenge for Teradata will be to ensure successful delivery, especially in areas where being “agile� is a must, especially talking about big data and data science projects which more often than not tend to require fast times to deployment so, Teradata will need to make sure consulting, educational and all service offerings are fine tuned, and in tune with its software and hardware offerings own evolution.

For this then, the company is working to consolidate its technical and business messaging to the company’s strategy towards: the offering of hybrid cloud Solutions, business analytics solutions and full fledged ecosystem architecture consulting.

So, part of his strategy includes, aside from reinforcing its go to cloud strategy, accelerating its existing release calendar to offer three major release a year for its flagship product Teradata Database, reinforcing its Intelliflex data warehouse appliance with new functionality and the launch of Teradata Intellibase, Teradata’s compact environment for data warehousing and continued evolution of Intellicloud, the company’s secure managed cloud offering.

So, on the Big Picture...

Many more things happened and were revealed by Teradata, both publicly and under disclosure, but from a personal view, what still sticks with me as the relevant story is how Teradata is managing to keep its transformation at a pace and form that continuous to have a fine balance between its more “traditional� data management customers and its new customers to ensure both offerings ranging in the “typical� data warehousing and analytics space and those that require innovation via new advanced analytics and big data ecosystems.

Challenges may still wait ahead for Teradata due to an increased and more fierce competition but the data warehousing company seems to be adapting well to the new data management era.

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

Logo courtesy of DOMO , Inc.
When you decide to show up at Domopalooza, Domo’s big user event, you don’t know for sure what you will find, but from the very beginning you can feel that you’ll have a unique experience. From the individual sessions and training, the partner summit and the concert line-up, to what might come from Domo’s CEO/rock-star Josh James, who certainly is one of a kind in the software industry; you know that you’ll witness a delightful event.

This year under the strings of Styx’s, Mr. James kicked off an event that amalgamated business, entertainment, fun and work in a unique way —a very Domo way.

With no more preambles, here is a summary of what happened during Domo’s 2017 DomoPalooza user conference.

Josh James at DomoPalooza 2017 (Photo courtesy of DOMO)
Key Announcements

Before entering to the subjective domain of my opinion about Domo’s event and solutions, let’s take a minute to pin point some of the important announcements made previous and during the event:
  • The first news came some days before the user event, when Domo announced its new model for rapid deployment dashboards. This solution consists of a series of tools that accelerate and ease the dashboard deployment process. Starting with its large number of connectors to diverse data sources, to a set of pre-installed and easy to configure dashboards, this model will enable developers deploy dashboards quickly and easily that decision makers can use effectively.
  • The next important announcement occurred during the conference. Domo came out with the release of Mr. Roboto —DOMO’s new set of capabilities for performing machine learning, predictive analytics and predictive intelligence. According to DOMO, the new offering will be fully integrated within DOMO’s business cloud, aiming for fast and non-disruptive business adoption. Two major features from Mr. Roboto include Alerts Center, a personalized visual console powered by advanced analytics functionality to provide insights and improve decision making. The other is its data science interface to enable users to apply predictive analytics, machine learning and other advanced analytics algorithms to its data sets. This is for sure one product I’m looking forward to analyzing further!

The introduction of new features, especially directed to narrow the technical-business gap within the C-Suite of an organization, and to facilitate decision makers an easier and customized access to insights, will enable business management and monitoring using DOMO. Some of these features include the introduction of:
  • Annotations, so information workers and decision makers can highlight significant insights in the process on top of a chart or data point. Enhancement to its Analyzer tool with the incorporation of a visual data lineage tool to enable users to track data from source to visualization.
  • Data slicing within DOMO’s cards to create more guided analysis paths business users and decision makers can take advantage of. 
  • More than 60 chart families to enhance the rich set of visual options already within DOMO’s platform. 

DOMO’s new features seem to fit well within a renewed effort from the company to address bigger enterprise markets and increase presence within segments which traditionally are occupied by other enterprise BI contenders.

It may also signal DOMO’s necessary adaptive process to comply with a market currently in a rampage for the inclusion of advanced analytic features to address larger and new user footprints within organizations, such as data scientists and a new more tech savvy generation of information workers.

There is much more behind Domo’s Curtains

Perhaps the one thing I did enjoy the most about the conference was having a continuous sense of discovery —different from previous interactions with DOMO, which somehow left me with a sense of incompletion. This time I had the chance to discover that there is much more about DOMO behind the curtains.

Having a luminary as CEO, such as Josh James, can be a two-edged sword. On one side, its glowing personality has served well to enhance DOMO’s presence in a difficult and competitive market. Josh has the type of personality that attracts, creates and sells the message, and with no doubt drives the business.

On the other end, however, if not backed and handled correctly, his strong message can create some scepticism, making some people think a company is all about a message and less about the company’s substance. But this year’s conference helped me to discover that DOMO is way more than what can be seen in the surface.

Not surprising is the fact that Josh and Chris Harrington —savvy businessmen and smart guys— have been keen to develop DOMO’s business intelligence and analytics capabilities to achieve business efficiency, working towards translating technical complexity into business oriented ease of use. To achieve this, DOMO has put together, on the technical side, a very knowledgeable team lead by Catherine Wong and Daren Thayne, DOMO’s Chief Product Officer and Chief Technology Officer respectively, both with wide experience. Their expertise goes from cloud platforms and information management to data visualization and analysis. On the business side, an experienced team that includes tech veterans like Jay Heglar and Paul Weiskopf, lead strategy and corporate development, respectively.

From a team perspective, this balance between tech experience and business innovation seems to be paying off as, according to them, the company has been growing steadily and gaining the favour of big customers such as TARGET, Univision or Sephora,  some of the customers that were present during the event.

From an enterprise BI/Analytics perspective, it seems DOMO has achieved a good balance in at least two major aspects that ensure BI adoption and consumption:

  • The way BI services can be offered to different user groups— especially to the C-level team— which requires a special degree of simplification, but at the same time an efficiency in the way the data is shown.
  • The way BI services can encapsulate complex data processing problems and hide them from the business user. 

Talking about this topic, during the conference we had the chance to see examples of the aforementioned aspects, both onstage and offstage. One with Christel Bouvron,  Head of Business Intelligence at Sephora Southeast Asia. Christel commented the following, in regards to the adoption and use of DOMO:

“We were able to hook in our data sets really quickly. I had sketched out some charts of what I wanted. They didn’t do that, but what they did was even better. I really liked that it wasn’t simply what I was asking for – they were trying to get at the business problem, the outcomes we were trying to get from it, and think about the bigger picture.�

A good example of the shift DOMO wants to convey is that they are now changing the approach from addressing a business problem with a technical perspective, to addressing the business problem with business perspective but having a technical platform in the background to support it. Of course this needs to come with the ability to effectively encapsulate technical difficulties in a way that is efficient and consumable for the business.

Christel Bouvron at DomoPalooza 2017 (Photo coutesy of DOMO)

It was also good to hear from the customers that they acknowledge that the process wasn’t always that smooth, but it helped to trigger an important cultural shift within their organization.

The takeaway

Attending Domopalooza 2017 was informative and very cool indeed. DOMO’s team showed me a thing or two about the true business of DOMO and its interaction with real customers; this includes the fact that DOMO is not a monolithic solution. Besides its already rich set of features, it enables key customization aspects to provide unique customers with unique ways to solve their problems. While DOMO is a software rather than a service company, customers expressed satisfaction with the degree of customization and services DOMO provides —this was especially true with large companies.

DOMO has done a great job to simplify the data consumption process in a way that data feeds are digestible enough. The solution concentrates more on the business problem rather than the technical one, giving many companies the flexibility and time to make the development of business intelligence solutions more agile and effective. Although these results might not be fully achieved in all cases, DOMO’s approach certainly can help organizations to from a more agile and fast deployment process, thus, more efficient and productive.

Despite being a cloud-based software company, DOMO seems to understand quite well that a great number of companies are working, for necessity or by choice, in hybrid cloud/on-premises environments, which enables the customer to easily connect and quickly interact with on-premises systems, whether this is a simple connection to a database/table source or it requires more sophisticated data extraction and transformation specifications.

There is no way that in the BI and Analytics market a company such as DOMO — or any other player in the market— will have a free ticket to success. The business intelligence market is diversifying as an increasing number of companies seem to need their services, but certainly
DOMO’s offering is, by all means, one to be considered when evaluating a new generation BI solution to meet the increasing demand for insights and data analysis.

Finally, well... what can be a better excuse to watch Styx's Mr. Roboto than this.

(All photos credited to Domo, Inc.)
A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

Many things have been said and done in the realm of analytics, but visualizations remain as the forefront of the data analysis process, where intuition and correct interpretation can help us make sense of data.

As an increasing number of tools emerge, current visualizations are far more than mere pictures in a screen, allowing for movement, exploration and interaction.

One of this tools is D3, an open-source Javascript data visualization library. D3 is perhaps the most popular tool to develop rich and interactive data visualizations, used by small and large companies such as Google and the New York Times.

With the next Open Data Science Conference in Boston coming soon, we had the opportunityto talk with DataRobot’s and ODSC speaker Morgane Ciot about her workshop session: “Intro to 3D�, the state of data visualization and her very own perspectives around the analytics market.

Morgane Ciot is a data visualization engineer at DataRobot, where she specializes in creating interactive and intuitive D3 visualizations for data analysis and machine learning. Morgane studied computer science and linguistics at McGill University in Montreal. Previously, she worked in the Network Dynamics Lab at McGill, answering questions about social media behavior using predictive models and statistical topic models.

Morgane enjoys studying about machine learning (ML), reading, writing, and staging unusual events.

Let's get to know more about Morgane and her views as a data visualization engineer.

Morgane, could you tell us a bit more about yourself, especially about your area of expertise, and what was your motivation to pursue a career in analytics and data science?

I went to school for computer science and linguistics. Those two fields naturally converge in Natural Language Processing (NLP)/Artificial Intelligence (AI), an intersection that was unfortunately not exploited by my program but that nonetheless got me interested in machine learning.

One of the computer science professors at my school was doing what essentially amounted to sociological research on social media behavior using machine learning techniques. Working with him furthered my interest in ML, NLP, and topic modeling, and I began to also explore how to visualize some of the unmanageable amounts of data we had (like, all of Reddit).

I’m probably indebted to that part of my life, and my professor, for my current position as a data viz engineer. Also, machine learning's practical ramifications are going to be game changing. I want to live closest to the eye of the storm when the singularity hits.

Based on your experience, which attributes or skills should every data master have if he/she wants to succeed, and what would be your recommendations for those looking for an opportunity at this career?

Stats, problem-solving skills, and engineering or scripting abilities all converge in the modern data scientist.

You have to be able to understand how to formulate a data science problem, how to approach it, and how to build the ad hoc tools you’ll need to solve it. At least some basic statistical knowledge is crucial. Elements of Statistical Learning by Hastie and Andrew Ng’s Coursera course both provide a solid foundational understanding of machine learning and require some statistical background.

Learn at least one programming language — Python or R are the most popular. R is the de facto language for statisticians, and Python has a thriving community and a ton of data science libraries like scikit-learn and pandas. It’s also great for writing scripts to scrape web data. If you’re feeling more adventurous, maybe look into Julia.

As usual, don’t just learn the theory. Find a tangible project to work on. Kaggle hosts competitions you can enter and has a community of experts you can learn from.

Finally, start learning about deep learning. Many of the most interesting papers in the last few years have come out of that area and we’re only just beginning to see how the theory that has been around for decades is going to be put into practice.

Talking about data visualization, what is your view of the role it plays within data science? How important is it in the overall data science process?

Data visualization is pretty fundamental to every stage of the data science process. I think how it’s used in data exploration — viewing feature distributions — is fairly obvious and well-practiced, but people often overlook how important visualizations can be even in the modeling process.

Visualizations should accompany not just how we examine our data, but also how we examine our models! There are various metrics that we can use to assess model performance, but what’s really going to convince an end user is a visualization, not a number. That's what's going to instill trust in model decisions.

Standard introductions to machine learning lionize the ROC curve, but there are plenty of other charts out there that can help us understand what and how a model is doing: plotting predicted vs. actuals, lift charts, feature importance, partial dependence, etc. — this was actually the subject of my ODSC talk last year, which should be accessible on their website.

A visualization that rank-orders the features that were most important to the predictive capacity of a model doesn’t just give you insight, it also helps you model better. You can use those top features to build faster and more accurate models. 

What do you think will be the most important data visualization trend in the next couple of years?

Data is becoming evermore important basically everywhere, but popular and even expert understanding hasn’t quite kept up.

Data is slowly consuming us, pressing down from all angles like that Star Wars scene where Luke Skywalker and Princess Leia get crushed by trash. But are people able to actually interpret that data, or are they going to wordlessly nod along to the magical incantations of “dataâ€� and “algorithmsâ€�? 

As decisions and stories become increasingly data-driven, visualizations in the media are going to become more important. Visualizations are sort of inherently democratic.

Everyone who can see can understand a trend; math is an alien language designed to make us feel dumb. I think that in journalism, interactive storytelling — displaying data with a visual and narrative focus — is going to become even more ubiquitous and important than it already is. These visualizations will become even more interactive and possibly even gamified.

The New York Times did a really cool story where you had to draw a line to guess the trend for various statistics, like the employment rate, during the Obama years, before showing you the actual trend. This kind of quasi-gamified interactivity is intuitively more helpful than viewing an array of numbers.

Expert understanding will benefit from visualizations in the same way. Models are being deployed in high-stakes industries, like healthcare and insurance, that need to know precisely why they’re making a decision. They’ll need to either use simplified models that are inherently more intelligible, at the expense of accuracy, or have powerful tools, including visualizations, to persuade their stakeholders that model decisions can be interpreted.

The EU is working on legislation called “right of explanationâ€� laws, which allows any AI-made decision to be challenged by a human. So visualizations focused on model interpretability will become more important. 

A few other things….as more and more businesses integrate with machine learning systems, visualizations and dashboards that monitor large-scale ML systems and tell users when models need to be updated will become more prevalent. And of course, we’re generating staggering amounts of new data every day, so visualizations that can accurately summarize that data while also allowing us to explore it in an efficient way — maybe also through unsupervised learning techniques like clustering and topic modeling— will be necessary. 

Please tell us a bit about DataRobot, the company you work at.

We’re a machine learning startup that offers a platform data scientists of all stripes can use to build predictive models. I’m equal parts a fan of using the product and working on it, to be honest. The app makes it insanely easy to analyze your data, build dozens of models, use the myriad visualizations and metrics we have to understand which one will be the best for your use case, and then use that one to predict on new data.

The app is essentially an opinionated platform on how to automate your data science project. I say opinionated because it’s a machine that’s been well-oiled by some of the top data scientists in the world, so it’s an opinion you can trust. And as a data scientist, the automation isn’t something to fear. We’re automating the plumbing to allow you to focus on the problem-solving, the detective work. Don’t be a luddite! 

It’s really fun working on the product because you get to learn a ton about machine learning (both the theoretic and real-world applications) almost by osmosis. It’s like putting your textbook under your pillow while you sleep, except it actually works. And since data science is such a protean field, we’re also covering new ground and creating new standards for certain concepts in machine learning. There’s also a huge emphasis, embedded in our culture and our product, on — “democratizing� is abusing the term, but really putting data science into as many hands as possible, through evangelism, teaching, workshops, and the product itself.

Shameless promotional shout-out: we are hiring! If you’re into data or machine learning or python or javascript or d3 or angular or data vis or selling these things or just fast-growing startups with some cool eclectic people, please visit our website and apply!

As a data visualization engineer at DataRobot, what are the key design principles the company applies for development of its visualizations?

The driving design principle is functionality. Above all, will a user be able to derive an insight from this visualization? Will the insight be actionable? Will that insight be delivered immediately, or is the user going to have to bend over backwards scrutinizing the chart for its underlying logic, trying to divine from its welter of hypnotic curves some hidden kernel of truth? We’re not in the business of beautiful, bespoke visualizations,  like some of the stuff the NYTimes does.

Data visualization at DataRobot can be tricky because we want to make sure the visualizations are compatible with any sort of data that passes through — and users can build predictive models for virtually any dataset — which means we have to operate at the right level of explanatory and visual abstraction. And we want users of various proficiencies to immediately intuit whether or not a model is performing well, which requires thinking about how a beginner might be able to understand the same charts an expert might expect. So by “functionality� I mean the ability to quickly intuit meaning.

That step is the second in a hierarchy of insight: the first is looking at a single-valued metric, which is only capable of giving you a high-level summary, often an average. This could be obfuscating important truths. A visualization —the second step— exposes these truths a bit further, displaying multiple values at a time over slices of your data, allowing you to see trends and anomalous spots. The third step is actually playing with the visualization. An interactive visualization confirms or denies previous insights by letting you drill down, slice, zoom, project, compare — all ways of reformulating the original view to gain deeper understanding. Interactive functionality is a sub-tenet of our driving design principle. It allows users to better understand what they’re seeing while also engaging them in (admittedly) fun ways. 

During the ODSC in Boston, you will be presenting an intro to D3, can you give us a heads up? What is D3 and what are its main features and benefits?

D3 is a data visualization library built in Javascript. It represents data in a browser interface by binding data to a webpage’s DOM elements. It’s very low-level, but there are plenty of wrapper libraries/frameworks built around it that are easier to use, such as C3.js or the much more sophisticated If you find a browser-rendered visualization toolkit, it’s probably using D3 under the hood. D3 supports transitions and defines a data update function, so you can create really beautiful custom and dynamic visualizations with it, such as these simulations or this frankly overwrought work of art.

D3 was created by Mike Bostock as a continuation of his graduate work at Stanford. Check out the awesome examples.

Please share with us some details about the session. What will attendees get from it?

Attendees will learn the basics of how D3 works. They’ll come away with a visualization in a static HTML file representing some aspect of a real-world dataset, and a vague sense of having been entertained. I’m hoping the workshop will expose them to the tool and give them a place to start if they want to do more on their own. 

What are the prerequisites attendees should have to take full advantage of your session?

Having already downloaded D3 4.0 (4.0!!!!!) will be useful, but really just a working browser — I’ll be using Chrome — and an IDE or text editor of your choice. And a Positive Attitudeâ„¢. 

Finally, on a more personal tenor, what's the best book you've read recently? 

Story of O: a bildungsroman about a young French girl's spiritual growth. Very inspiring!

Thank you Morgane for your insights and thoughts.

Morgane's “Intro to 3Dâ€� workshop session will be part of the Open Data Science Conference to take place in Boston, Ma. from May 3 to 5.

A good excuse to visit beautiful Boston and have a great data science learning experience!

Cloudera Analyst Event: Facing a New Data Management Era

Cloudera Analyst Event: Facing a New Data Management Era

I have to say that I attended this year’s Cloudera analyst event in San Francisco with a mix of excitement, expectation and a grain of salt also.

My excitement and expectation were fuelled with all that has been said about Cloudera and its close competitors in the last couple of years, and also by the fact that I am currently focusing my own research on big data and “New Data Platforms�. Moreover, when it comes to events hosted by vendors, I always recommend taking its statements with a grain of salt, because logically the information might be biased.

However, in the end, the event resulted in an enriching learning experience, full of surprises and discoveries. I learnt a lot about a company that is certainly collaborating big time in the transformation of the enterprise software industry.

The event certainly fulfilled many of my “want-to-know-more� expectations about Cloudera and its offering stack; the path the company has taken; and their view of the enterprise data management market.

Certainly, it looks like Cloudera is leading and strongly paving the way for a new generation of enterprise data software management platforms.

So, let me share with you a brief summary and comments about Cloudera’s 2017 industry analyst gathering.

OK, Machine Learning and Data Science are Hot Today

One of the themes of the event was Cloudera’s keen interest and immersion into Machine Learning and Data Science. Just a few days before the event, the company made two important announcements:

The first one was about the beta release of Cloudera Data Science Workbench (Figure 1), the company’s new self-service environment for data science on top of Cloudera Enterprise. This new offering comes directly from the smart acquisition of machine learning and data science startup,

Screencap of Cloudera's Data Science Workbench (Courtesy of Cloudera) 
Some of the capabilities of this product allow data scientists to develop on some of the most popular open source languages —R, Python and Scala— with native Apache Spark and Apache Hadoop integration, which in turn fastens project deployments, from exploration to production.

In this regard, Charles Zedlewski, senior vice president, Products at Cloudera mentioned that

“Cloudera is focused on improving the user experience for data science and engineering teams, in particular those who want to scale their analytics using Spark for data processing and machine learning. The acquisition of and its team provided a strong foundation, and Data Science Workbench now puts self-service data science at scale within reach for our customers.�

One key approach Cloudera takes with the Data Science Workbench is that it aims to enable data scientists to work in an truly open space that can expand its reach to use, for example, deep learning frameworks such as TensorFlow, Microsoft Cognitive Toolkit, MXnet or BigDL, but within a secure and contained environment.

Certainly a new offering with huge potential for Cloudera to increase its customer base, but also to reaffirm and grow its presence within existing customers which now can expand the use of the Cloudera platform without the need to look for third party options to develop on top on.

The second announcement showcases the launch of Cloudera Solution Gallery (Figure 2), which enables Cloudera to showcase its solution’s large partner base  â€”more than 2,800 globally— and a storefront of more than 100 solutions.

This news should not be taken lightly as it shows Cloudera capability to start building a complete ecosystem around this robust set of products, which in my view is a defining aspect of those companies who want to become an industry de-facto.

Figure 2. Cloudera Solution Gallery (Courtesy of Cloudera)

Cloudera: Way More than Hadoop

During an intensive two-day event filled with presentations, briefings and interviews with Cloudera’s executives and customers, a persistent message prevailed. While the company recognizes its origin as a provider of a commercial distribution for Hadoop, it is now making it clear that its current offering has expanded way beyond the Hadoop realm to become a full-fledged open source data platform. Hadoop is certainly in the core of Cloudera as the main data engine itself but, with support for 25 open source projects, its platform is currently able to offer much more than Hadoop distributed storage capabilities.
This is reflected through Cloudera’s offerings, from the full fledged Cloudera Enterprise Data Hub, its comprehensive platform, or via one of Cloudera’s special configurations:

Cloudera’s executives made it clear that the company strategy is to make sure they are able to provide, via open source offerings, efficient enterprise-ready data management solutions.

However, don’t be surprised if the message from Cloudera changes through time, especially if the company wants to put its aim on larger organizations that most of the times rely on providers that can center their IT services to the business and are not necessarily tied with any particular technology.

Cloudera is redefining itself so it can reposition its offering as a complete data management platform. This is a logical step considering that Cloudera wants to take a bigger piece of the large enterprise market, even when the company’s CEO stated that they “do not want to replace the Netezzas and Oracle’s of the world�.

Based on these events, it is clear to me that eventually, Cloudera will end up frontally competing in specific segments of the data management market —especially with IBM through its  IBM BigInsights, and Teradata, with multiple products that have left and keep leaving a very strong footprint in the data warehouse market. Either we like it or not, big data incumbents such as Cloudera seem to be destined to enter the big fight.

The Future, Cloudera and IoT

During the event I had also a chance to attend a couple of sessions specifically devoted to show Cloudera’s deployment in the context of IoT projects. Another thing worth notice is that, even when Cloudera has some really good stories to tell about IoT, the company seems not to be in a hurry to jump directly onto this wagon.

Perhaps it’s better to let this market get mature and consistent enough before devoting larger technical investments on it. It is always very important to know when and how to invest in an emerging market.

However, we should be very well aware that Cloudera, and the rest of the big data players, will be vital for the growth and evolution of the IoT market.

Figure 3. Cloudera Architecture for IoT (Courtesy of Cloudera)

It’s Hard to Grow Gracefully

Today it’s very hard, if not impossible, to deny that Hadoop is strongly immerse in the enterprise data management ecosystem of almost every industry. Cloudera’s analyst event was yet another confirmation. Large companies are now increasingly using some Cloudera’s different options and configurations for mission critical functions.

Then, for Cloudera the nub of the issue now is not about how to get to the top, but how to stay there, evolve and leave its footprint at the top.

Cloudera has been very smart and strategic to get to this position, yet it seems it has gotten to a place where the tide will get even tougher. From this point on, convincing companies to open the big wallet will take much more than a solid technical justification.

At the time of writing this post, I learnt that Cloudera has filed to go public and will trade on the NY Stock Exchange, and as an article on Fotune mentions:

“Cloudera faces tough competition in the data analytics market and cites in its filing several high-profile rivals, including Amazon Web Services, Google, Microsoft, Hewlett Packard Enterprise, and Oracle.�

It also mentions the case of Hortonworks, which:

“went public in late 2014 with its shares trading at nearly $28 during its height in April 2015. However, Hortonworks’ shares have dropped over 60% to $9.90 on Friday as the company has struggled to be profitable.�

In my opinion, in order for Cloudera to succeed while taking this critical step, they will have to show that they are more than well prepared business, technically and strategically wise, and also prepared and ready for the unexpected, because only then they will be able to grow gracefully and align to play big, with the big guys.

Keep always in mind that, as Benjamin Franklin said:

Without continual growth and progress, such words as improvement,
achievement, and success have no meaning.

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

While performing my usual Googling during preparation for one of my latest reports on enterprise performance management (EPM), I noticed a huge difference in popularity between EPM and, for example, big data (Figure 1).

From a market trend perspective, it is fair to acknowledge that the EPM software market has taken a hit from the hype surrounding the emergence of technology trends in the data management space, such as business analytics and, particularly, big data.

Figure 1: Searches for big data, compared with those for EPM (Source: Google Trends)

In the last four years, at least, interest in big data has grown exponentially, making it a huge emerging market in the software industry. The same has happened with other data management related solutions such as analytics.

While this is not that surprising, my initial reaction came with a bit of discomfort. Such a huge difference makes one wonder how many companies have simply jumped onto the big data wagon rather than making a measured and thoughtful decision regarding the best way to deploy their big data initiative to fit within the larger data management infrastructure in place, especially with regards to having the system co-exist and collaborate effectively with EPM and existing analytics solutions.

Now, don’t get me wrong; I’m not against the deployment of big data solutions and all the potential benefits. On the contrary, I think these solutions are changing the data management landscape for good. But I can’t deny that, over the past couple of years, a number of companies, once past the hype and euphoria, have raised valid concerns about the efficiency of their existing big data initiatives and have questioned its value within the overall data management machinery already in place, especially alongside EPM and analytics solutions, which are vital for measuring performance and providing the right tools for strategy and planning.

The Analytics/EPM/Big Data Conundrum
A study published by Iron Mountain and PwC titled How Organizations Can Unlock Value and Insight from the Information they Hold, for which researchers interviewed 1,800 senior business executives in Europe and North America, concluded that:

“Businesses across all sectors are falling short of realizing the information advantage.�

Even more interesting is that, in the same report, when evaluating what they call an Information Value Index, the authors realized that:

“The enterprise sector, scoring 52.6, performs only slightly better than the mid-market (48.8).�

For some, including me, this statement is surprising. One might have imagined that large companies, which commonly have large data management infrastructures, would logically have already mastered, or at least reached an acceptable level of maturity with, their general data management operations. But despite the availability of a greater number of tools and solutions to deal with data, important issues remain as to finding, on one hand, the right way to make existing and new sources of data play a better role within the intrinsic mechanics of the business, and, on the other, how these solutions can play nicely with existing data management solutions such as EPM and business intelligence (BI).

Despite a number of big data success stories—and examples do exist, including Bristol-Myers Squibb, Xerox, and The Weather Company—some information workers, especially those in key areas of the business like finance and other related areas, are:

  • somehow not understanding the potential of big data initiatives within their areas of interest and how to use these to their advantage in the operational, tactical, and strategic execution and planning of their organization, rather than using them for in tangential decisions or for relevant yet siloed management tasks.
  • oftentimes swamped with day-to-day data requests and the pressure to deliver based on the amount of data already at their disposal. This means they have a hard time deciphering exactly how to integrate these projects effectively with their own data management arsenals.

In addition, it seems that for a number of information workers on the financial business planning and execution side, key processes and operations remain isolated from others that are directly related to their areas of concern.

The Job Still Needs to Be Done
On the flip side, despite the extensive growth of and hype for big data and advanced analytics solutions, for certain business professionals, especially those in areas such as finance and operations, interest in the EPM software market has not waned.

In every organization, key people from these important areas of the business understand that improving operations and performance is an essential organizational goal. Companies still need to reduce the cost of their performance management cycles as well as make them increasingly agile to be able to promptly respond to the organization’s needs. Frequently, this implies relying on traditional practices and software capabilities.

Activities such as financial reporting, performance monitoring, and strategy planning still assume a big role in any organization concerned with improving its performance and operational efficiency (Figure 2).

Figure 2: Population’s perception of EPM functional area relevance (%)
(Source: 2016 Enterprise Performance Management Market Landscape Report)

So, as new technologies make their way into the enterprise world, a core fact remains: organizations still have basic business problems to solve, including budget and sales planning, and financial consolidation and reporting.

Not only do many organizations find the basic aspects of EPM relevant to their practices, an increasing number of them are also becoming more conscious of the importance of performing specific tasks with the software. This signals that organizations have a need to continuously improve their operations and business performance and analyze transactional information while also evolving and expanding the analytic power of the organization beyond this limit.

How Can EPM Fit Within the New Data Management Technology Framework?
When confronted with the need for better integration, some companies will find they need to deploy new data technology solutions, while others will need to make existing EPM practices work along with new technologies to increase analytics accuracy and boost business performance.

In both cases, a number of organizations have taken a holistic approach, to balance business needs by taking a series of steps to enable the integration of data management solutions. Some of these steps include:

  • taking a realistic business approach towards technology integration. Understanding the business model and its processes is the starting point. But while technical feasibility is vital, it is equally important to take into account a practical business approach to understand how a company generates value through the use of data. This usually means taking an inside-out approach to understanding, by taking control of data from internal sources and that which might come from structured information channels and/or tangible assets (production, sales, purchase orders, etc.). Only after this is done should the potential external data points be identified. In many cases these will come in the form of data from intangible assets (branding, customer experiences) that can directly benefit the specific process, both new or already in place.

  • identifying how data provided by these new technologies can be exploited. Once you understand the business model and how specific big data points can benefit the existing performance measuring process, it is possible to analyze and understand how these new incoming data sources can be incorporated or integrated into the existing data analysis cycle. This means understanding how it will be collected (period, frequency, level of granularity, etc.) and how it will be prepared, curated, and integrated into the existing process to increase its readiness for the specific business model.
  • recognizing how to amplify the value of data. By recognizing and making one or two of these sources effectively relate and improve the existing analytics portfolio, organizations can build a solid data management foundation. Once organizations can identify where these new sources of information can provide extended insights into common business processes, the value of the data can be amplified to help explain customer behavior and needs; to see how branding affects sales increases or decreases; or even to find out which sales regions need improved manufacturing processes.

All this may be easier said than done, and the effort devoted to achieving this is considerable, but if you are thinking in terms of the overall business strategy, it makes sense to take a business-to-technical approach that can have a direct impact on the efficiency, efficacy, and success of the adoption of EPM/big data projects while also improving chances of adoption, understanding, and commitment to these projects.

Companies need to understand how the value of data can be amplified by integrating key big data points with the “traditional� data management cycle so it effectively collaborates with the performance management process, from business financial monitoring to planning and strategy.

While enterprise performance management initiatives are alive and kicking, new big data technologies can be put to work alongside them to expand the EPM software’s capabilities and reach.

The full potential of big data for enterprise performance management will only be realized when enterprises are able to fully leverage all available internal and external data sources towards the same business performance management goal to better understand their knowledge-based capital.

(Originally published on TEC's Blog)
(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

 John Bensalhia talks to Yves Mulkers, freelance Data Architect and blogger at 7wData, about the benefits, developments and challenges linked with Big Data...

“I'm an explorer on new technologies and Data Visualisation, and keep my finger on what's happening with Big Data from an architecture point of view.”

So says Yves Mulkers, freelance Data Architect and social media influencer. Yves is speaking ahead of upcoming Big Data World event in London, where he will make an appearance. Listing the key benefits of what Big Data can offer, Yves says that these are:

“Scalability, cost reduction, new products and revenue streams, tailored solutions and targeting, enterprise wide insights, and Smart cities.”

Having worked as a software developer in various branches, Yves achieved great expertise and mindset in object oriented thinking and development.
“Doing the full cycle of software development from analysis, implementation, support and project management in combination with a strong empathy, he positioned himself as a technical expert bridging and listening into the needs of the business and end-users.” 

Yves says that this past year has seen a number of breakthroughs in the development of Big Data such as:
“Integrated platforms, data preparation automation, automating automation, GPU and in-memory databases, Artificial Intelligence, micro services, IoT (Internet Of Things), and self-service analytics.”

Big Data can be used to create a competitive advantage in various ways for businesses. In addition to a 360% Customer View and narrower segmentation of customers, Yves says that next generation products, real-time customization, and business models based on data products are the new approaches. In addition, better informed decisions, such as the measurement of consumer sentiment are good gauges of raising the value of what Big Data can bring.

Businesses must consider a variety of aspects in order to ensure successful Data implementation. Yves says that businesses must have clear business processes and information state diagrams, and should also ensure that they are on top of their game with respect to training and documentation. Data standards must also be developed and complied with.

For applying data analytics and applications in a business, Yves explains that there are challenges to tackle:
“Creating value from your data products, finding the right talent and tools, maturity of the organisation in information management, and trusting the results of analytics. It's worth noting that Big Data and analytics are not the same as business intelligence.”

In the next five to 10 years, Yves says that:
“Big Data will become the business intelligence of now.”

In addition to businesses and companies, aspects of Big Data will be for everyone to take advantage of:
 “Big Data will be embedded in companies strategy, and analytics will become available to everyone. “
“Data volumes will keep on growing as data products will become a commodity and improve our quality of life.”

Looking ahead to the event, Yves says that he expects it to bring a lot of value and insights.
“The combination with the sidetracks around Cloud and others, will bring a broader view on the complete architecture (business, technical and data) needed to be successful in Big Data implementations.”

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

As the “Internet of Things (IoT)” market becomes less hype and more reality, German software powerhouse SAP is aiming to move fast with important economical and research investments, aiming to become a leader in the IoT field.

One key move is the recent announcement of SAP’s Leonardo Innovation Portfolio, a comprehensive solution offering to enable organizations plan, design and deploy IoT solutions.

Of course, with these announcements we felt compelled to reach out to SAP and know from their own words, about the details of this SAP’s new IoT portfolio.

As a result we had the opportunity to speak with Rakesh Gandhi, Vice President for IOT GTM & Solutions at SAP America.

Rakesh is an innovation Enthusiast and IOT evangelist, he is currently responsible for SAP Leonardo portfolio for IoT innovation’ GTM and Solutions Management. A 12 year veteran at SAP, Rakesh has been involved in incubating new innovations of Mobile, Cloud for Customer, CEC and now IoT.

Thank you Mr. Gandhi:

Last year SAP announced an ambitious €2 Billion investment plan to help companies and government agencies to develop their IoT and Big Data initiatives. Could share with us some details about this program and what this involves in a general sense?

IoT is one of the key pillar of SAP’s strategy to enable customer’ digital transformation journey. Over past several years SAP is developing IoT portfolio working closely with our customers. Recent announcement for SAP Leonardo brand is a continuation of SAP commitment and plans in following key areas

  • Accelerate innovation of IoT solution portfolio both organic and inorganic with acquisitions. 
  • Create awareness of SAP’s IoT innovations that empowers customers to run live business with smart processes across all line of business and re-invent business model  
  • Drive customer adoption, scale service, support and co-innovation, and 
  • Most importantly grow its ecosystem of partners and startups in the IoT market

To date, summary of key announcement includes:

Key acquisitions such as:

  • Fedem: With this acquisition SAP can now build an end-to-end IoT solution in which a digital avatar continuously represents the state of operating assets through feeds from sensors, replacing the need for physical inspection with a “digital inspection.” Additionally, the solution is intended to consider complex forces in play and detect both instantaneous consequences of one-off events and long-term health effects of cyclic loads, making possible accurate monitoring of maintenance requirements and remaining-life prediction for assets.
  • This acquisition helped provide expertise and technology to accelerate the availability of key IoT capabilities in SAP HANA Cloud Platform, such as advanced lifecycle management for IoT devices, broad device connectivity, strong IoT edge capabilities that work seamlessly with a cloud back end, end-to-end role-based security and rapid development tools for IoT applications.
  • Altiscale: This acquisition is helping our customers create business value by harnessing the power of BIG DATA generated by the connected world.
The Launch of SAP Leonardo Brand for IoT Innovation portfolio: This was a major step in announcing our brand for IoT driven innovation

SAP Leonardo jumpstart program: This is a major step in our commitment to help our customers drive adoption and rapidly deployment core IoT applications in a short time frame of 3months duration with fixed scope and price.

Partners Ecosystem are critical to our success; we are working closely with partners to create an ecosystem that our customers can leverage to further simplify their deployment projects.

Additionally, SAP is on track in opening up IoT labs to collaborate on Industry 4.0 and IoT innovations with our customers, partners and startups.

Can you share with us some of the details of the new enablement program as well as the general features of the Leonardo IoT Portfolio?

What are observing in the market place is that many organizations are starting with small experimental IoT projects or may have started to collect & store sensor data with some visualization capabilities.

However, it is still generally believed that IoT as a topic is very low on maturity curve. SAP now have a very robust portfolio which has been co-innovated with our early adopter customers and proven to deliver business value.

The second challenge and general perception with customers is that of IoT is still in hype phase and difficult to deploy, we decided it is very important for SAP to support our customer’ adoption and showcase that they can go productive live in a short time frame for first pilot.

This jumpstart program supports three scenarios as three distinct separate packages viz:

  • Vehicle Insights for fleet telematics,
  • Predictive Maintenance & Service with Asset Intelligence Network for connected assets
  • Connected Goods for scenarios connected coolers, connected vending machines and such mass marketing things.
Customers can now deploy one of this scenarios in 3 months timeframe. It is a very structured 3 steps process where-in first SAP teams works with customer leveraging ½ day design thinking workshop to get an agreement on pilot deployment scope, step 2 deliver a rapid prototype to demonstrate vision to get customer buy in.

In the final step, towards the end of 3 months engagement deliver a pilot productive system.

Lastly, SAP will continue to engage with customers to help with their IoT roadmap for next processes and business case.

It seems natural to assume SAP has already started working to support IoT projects in key industries and/or lines of business, could talk about some of these industry/LoB efforts?

SAP Leonardo IoT innovation portfolio powers digital processes across line of businesses and Industry.

As an example we have released new value map [here] of supply chain processes, now referred to as digital supply chain and this is powered by SAP Leonardo IoT innovation portfolio.

The same is applicable for other LoBs e.g. customer service processes to enable predictive & proactive maintenance process and also industry specific e2e solutions powered by SAP Leonardo e.g. with SAP Connected Goods for CPG & Retail industry.

Is this program designed mostly for SAP’s existing partners and customers? How non SAP customers could take advantage of it?

Jump start program is designed to support all our customer, both existing and net new prospect customers as well.

This mirrors how SAP Leonardo portfolio of IoT solutions is designed to work with SAP or Non-SAP backend and agnostic in that regard.

Finally, what are the technical and/or business requirements for applicants of this program?

As mentioned above, initially SAP Leonardo jump start program is offered for three packages, viz: SAP Vehicles Insights, SAP Connected Goods and SAP Predictive Maintenance &service + Asset intelligence networks.

These are cloud solutions and use cases covered by each of these packages are applicable across multiple industry.

Thank you again Mr. Gandhi!

You can learn more about SAP Leonardo by reaching its web site and/or reading this post by Hans Thalbauer.
In the meantime, you can take a look at the video introduction produced by SAP.

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

Looking ahead to 2017's Big Data World event, Booz Allen Hamilton's Principal Data Scientist discusses the Big Data concept, benefits and developments in detail with John Bensalhia...

2017's Big Data World promises plenty in the way of insightful talks and discussions on the subject. One of the unmissable talks to watch out for in March will come from Kirk Borne, Booz Allen Hamilton's Principal Data Scientist, who will look at “The Self-Driving Organisation and Edge Analytics in a Smart IoT World.

“I will describe the concept of a self-driving organisation that learns, gains actionable insights, discovers next-best move, innovates, and creates value from streaming Big Data through the application of edge analytics on ubiquitous data sources in the IoT-enriched world.”

As part of this discussion, Kirk will also present an Analytics Roadmap for the IoT-enabled Cognitive Organisation.

“In this case, the “self-driving organisation” is modeled after the self-driving automobile, but applicable organisations include individual organisations, and also smart cities, smart farms, smart manufacturing, and smart X (where X can be anything). The critical technologies include machine learning, machine intelligence, embedded sensors, streaming analytics, and intelligence deployed at the edge of the network.”
“Big Data and data science are expanding beyond the boundaries of your data centre, and even beyond the Cloud, to the point of data collection at the point of action! We used to say “data at the speed of business”, but now we say “business at the speed of data.”

Having achieved a Ph.D. in astronomy from Caltech, Kirk focused most of the first 20 years of his career on astrophysics research (“colliding galaxies and other fun stuff”), including a lot of data analysis as well as modelling and simulation.

“My day job for nearly 18 years was supporting large data systems for NASA astronomy missions, including the Hubble Space Telescope. So, I was working around data all of the time.”
“When data set sizes began to grow “astronomically” in the late 1990s, I began to focus more on data mining research and data science. It became apparent to me that the whole world (and every organisation) was experiencing large growth in digital data. From these observations, I was convinced that we needed to train the next-generation workforce in data skills. So, in 2003, I left my NASA job and joined the faculty at George Mason University (GMU) within the graduate Ph.D. program in Computational Science and Informatics (Data Science).”

As a Professor of Astrophysics and Computational Science at GMU, Kirk helped to create the world’s first Data Science undergraduate degree program.

“I taught and advised students in data science until 2015, at which point the management consulting firm Booz Allen Hamilton (BAH) offered me the position as the firm’s first Principal Data Scientist. I have been working at BAH since then.”

Booz Allen Hamilton offers management consulting services to clients in many sectors: government, industry, and non-profit. “Booz Allen Hamilton (BAH) is over 100 years old, but has reinvented itself as an agile leading-edge technology consultant,” says Kirk.

“Our market focus is very broad, including healthcare, medicine, national defense, cyber-security, law enforcement, energy, finance, transportation, professional sports, systems integration, sustainability, business management, and more. We deliver systems, technology strategy, business insights, consultative services, modelling, and support services in many technology areas: digital systems, advanced analytics, data science, Internet of Things, predictive intelligence, emerging technologies, Cloud, engineering, directed energy, unmanned aerial vehicles (drones), human capital, fraud analytics, and data for social good (plus more, I am sure).”

Discussing Big Data, Kirk regards this as a “concept”.

“It is not really about “Big” or “Data”, but it is all about value creation from your data and information assets. Of course, it is data. But the focus should be on big value, not on big volume; and the goal should be to explore and exploit all of your organisation’s data assets for actionable information and insights.”
“I like to say that the key benefits of Big Data are the three D2D’s: Data-to-Discovery (data exploration), Data-to-Decisions (data exploitation), and Data-to-Dividends (or Data-to-Dollars; i.e., data monetisation).”

Looking back to the the past year, Kirk says that there have been several significant Big Data-related developments.

“These include the emergence of the citizen data scientist, which has been accompanied by a growth in self-service tools for analytics and data science. We are also seeing maturity in deep learning tools, which are now being applied in many more interesting contexts, including text analytics. Machine intelligence is also being recognised as a significant component of processes, products, and technologies across a broad spectrum of use cases: connected cars, Internet of Things, smart cities, manufacturing, supply chain, prescriptive machine maintenance, and more.”
“But I think the most notable developments are around data and machine learning ethics – this has been evoked in many discussions around privacy and fairness in algorithms, and it has been called out also in some high-profile cases of predictive modelling failures. These developments demand that we be more transparent and explanatory to our clients and to the general public about what we are doing with data, especially their data!”

Much value can be gleaned from the Smart IoT World for businesses, and in a number of ways, as Kirk explains.

“First of all, businesses can learn about the latest products, the newest ideas, and the emerging technologies. Businesses can acquire lessons learned, best practices, and key benefits, as well as find business partners to help them on this journey from digital disruption to digital transformation.”
“The “Smart” in “Smart IoT” is derived from machine learning, data science, cognitive analytics, and technologies for intelligent data understanding. More than ever, businesses need to focus more on the “I” in “IT” – the Information (i.e., the data) is now the fundamental asset, and the Technology is the enabler. IoT is about ubiquitous sensors collecting data and tracking nearly everything in your organisation: People, Processes, and Products. Smart IoT will deliver big value from Big Data.”

Kirk says that the past few years of Big Data have been described as The End of Demographics and the Age of Personalisation. The next five to ten years, on the other hand will be the Age of Hyper-Personalisation.

“More than ever, people are at the centre of business,”

 explains Kirk.

“Big Data can and will be used to engage, delight, and enhance employee experience (EX), user experience (UX), and customer experience (CX). The corresponding actionable insights for each of these human experiences will come from “360 view” Big Data collection (IoT), intelligence at the point of data collection (Edge Analytics), and rich models for behavioural insights (Data Science).”
“These developments will be witnessed in Smart Cities and Smart Organisations of all kinds. The fundamental enabler for all of this is Intelligent Data Understanding: bringing Big Data assets and Data Science models together within countless dynamic data-driven application systems.”

With Big Data World only weeks away, Kirk is looking forward to the great opportunities that it will bring.

“I expect Big Data World to be an information-packed learning experience like no other. The breadth, depth, and diversity of useful Smart IoT applications that will be on display at Big Data World will change the course of existing businesses, inspire new businesses, stimulate new markets, and grow existing capabilities to make the world a better place.”
“I look forward to learning from technology leaders about Smart Cities, IoT implementations, practical business case studies, and accelerators of digital transformation. It is not true that whoever has the most data will win; the organisation that wins is the one who acts on the most data! At Big Data World, we can expect to see many such winning solutions, insights, and applications of Big Data and Smart IoT.”

Not Your Father’s Database: Interview with VoltDB’s John Piekos

Not Your Father’s Database: Interview with VoltDB’s John Piekos

As organizations deal with challenging times ―technologically and business wise―, managing increasing volumes of data has become a key to success.

As data management rapidly evolve, the main Big Data paradigm has changed from just “big” to “big, fast, reliable and efficient”.

No more than today in the evolution of the big data and database markets, the pressure is on for software companies to deliver new and improved database solutions capable not just to deal with increasing volumes of data but also to do it faster, better and in a more reliable fashion.

A number of companies have taken the market by storm, infusing the industry with new and spectacularly advanced database software —for both transactional and non-transactional operations— that are rapidly changing the database software landscape.

One of these companies is VoltDB. This New England (Massachusetts) based company has rapidly become a reference when it comes to the offering of next-generation of database solutions and, has gained the favor of important customers in key industries such as communications, finance and gaming.

VoltDB was co-founded by no other than world known database expert and 2014 ACM A.M. Turing Award recipient, professor, Dr. Michael Stonebraker who has been key in the development of a new generation database solution and the formation of a talented team in charge of its development.

With the new VoltDB 7.0 already in the market, we had the opportunity to chat with VoltDB’s John Piekos about VoltDB’s key features and evolution.

John is VoltDB’s Vice President of Engineering at VoltDB, where he heads up VoltDB’s engineering operations, including product development, QA, technical support, documentation and field engineering.

John has more than 25 years of experience leading teams and building software, delivering both enterprise and Big Data solutions.

John has held tech leadership positions at several companies, most recently at Progress Software where he led the OpenEdge database, ObjectStore database and Orbix product lines. Previously, John was vice president of Web engineering at EasyAsk, and chief architect at Novera Software, where he led the effort to build the industry’s first Java application server.

John holds an MS in computer science from Worcester Polytechnic Institute and a BS in computer science from the University of Lowell.

Thank you John, please allow me to start with the obvious question:

What’s the idea behind VoltDB, the company, and what makes VoltDB the database, to be different from other database offerings in the market?

What if you could build a database from the ground-up, re-imagine it, re-architect it, to take advantage of modern multi-core hardware and falling RAM prices, with the goal of making it as fast as possible for heavy write use cases like OLTP and the future sensor (IoT) applications?  That was the basis of the research Dr. Stonebraker set out to investigate.

Working with the folks at MIT, Yale, and Brown, they created the H-Store project and proved out the theory that if you eliminated the overhead of traditional databases (logging, latching, buffer management, etc), ran an all in-memory workload, spread that workload across all the available CPUs on the machine and horizontally scaled that workload across multiple machines, you could get orders of magnitude performance out of the database.

The commercial realization of that effort is VoltDB.  VoltDB is fully durable, able to process hundreds of thousands to millions of multi-statement SQL transactions per second, all while producing SQL-driven real-time analytics.

Today an increasing number of emerging databases work partially or totally in-memory while existing ones are changing their design to incorporate this capability. What are in your view the most relevant features users need to look for when trying to choose from an in-memory based database?

First and foremost, users should realize that not all in-memory databases are created equal.  In short, architecture choices require trade-offs.  Some IMDBs are created to process reads (queries) faster and others, like VoltDB, are optimized for fast writes.  It is impractical (impossible) to get both the fastest writes and the fastest reads at the same time on the same data, all while maintaining high consistency because the underlying data organization and architecture is different for writes (row oriented) than it is for reads (columnar).

 It is possible to maintain two separate copies of the data, one in row format, the other in compressed column format, but that reduces the consistency level - data may not agree, or may take a while to agree between the copies.

Legacy databases can be tweaked to run in memory, but realize that, short of a complete re-write, the underlying architecture may still be disk-based, and thus incur significant (needless) processing overhead.

VoltDB defines itself as an in-memory and operational database. What does this mean in the context of Big Data and what does it mean in the context of IT’s traditional separation between transactional and analytical workloads, how does VoltDB fit or reshapes this schemas?

VoltDB supports heavy write workloads - it is capable of ingesting never-ending streams of data at high ingestion rates (100,000+/second per machine, so a cluster of a dozen nodes can process over a million transactions a second).

While processing this workload, VoltDB can calculate (via standard SQL) and deliver strongly consistent real-time analytics, either ad hoc, or optimally, as pre-computed continuous queries via our Materialized View support.

These are capabilities simply not possible with traditional relational databases.  In the Big Data space, this places VoltDB at the front end, as the ingestion engine for feeds of data, from telco, digital ad tech, mobile, online gaming, IoT, Finance and numerous other application domains.

Just recently, VoltDB passed the famous Jepsen Testing for improving safety of distributed databases with VoltDB 6.4, Could you share with us some details of the test, the challenges and the benefits it brought for VoltDB?

We have a nice landing page with this information, including Kyle’s and VoltDB’s founding engineer John Hugg’s blog.

In summary, distributed systems programming is hard. Implementing the happy path isn’t hard, but doing the correct thing (such as returning the correct answer) when things go wrong (nodes failing, networks dropping), is where most of the engineering work takes place. VoltDB prides itself on strong consistency, which means returning the correct answer at all times (or not returning an answer at all - if, for example, we don’t have all of the data available).

Kyle’s Jepsen test is one of the most stringent tests out there.  And while we hoped that VoltDB would pass on the first go-around, we knew Kyle was good at breaking databases (he’s done it to many before us!).  He found a couple of defects, thankfully finding them before any known customer found them, and we quickly went to work fixing them. Working with Kyle and eventually passing the Jepsen test was one of the 2016 engineering highlights at VoltDB. We’re quite proud of that effort.


One interesting aspect of VoltDB is that It’s a relational database complies fully with ACID and bring native SQL support, what are the differences of this design compared to, for example NoSQL and some so-called NewSQL offerings? Advantages, tradeoffs perhaps?

In general, NoSQL offerings favor availability over consistency - specifically, the database is always available to accept new content and can always provide content when queried, even if that content is not the most recent (i.e., correct) version written.

NoSQL solutions rely on non-standard query languages (some are SQL-like), to compute analytics. Additionally, NoSQL data stores do not offer rich transaction semantics, often providing “transactionality” on single key operations only.

Not all NewSQL database are created equal. Some favor faster reads (over fast writes).  Some favor geo-distributed data sets, often resulting in high latency, or at least unpredictable latency access and update patterns.  VoltDB’s focus is low and predictable OLTP (write) latency at high transactions/second scale, offering rich and strong transaction semantics.

Note that not all databases that claim to provide ACID transactions are equal. The most common place where ACID guarantees are weakened is isolation. VoltDB offers serializable isolation.

Other systems offer multiple levels of isolation, with a performance tradeoff between better performance (weak guarantees) and slower performance (strong guarantees). Isolation models like Read-Committed and Read-Snapshot are examples; many systems default to one of these.

VoltDB’s design trades off complex multi-dimensional (OLAP) style queries for high throughput OLTP-style transactions while maintaining an ACID multi-statement SQL programming interface. The system is capable of surviving single and multi-node failures.

Where failures force a choice between consistency and availability, VoltDB chooses consistency. The database supports transactionally rejoining failed nodes back to a surviving cluster and supports transactionally rebalancing existing data and processing to new nodes.

Real-world VoltDB applications achieve 99.9% latencies under 10ms at throughput exceeding 300,000 transactions per second on commodity Xeon-based 3-node clusters.

How about the handling of non-structured information within VoltDB? Is it expected VoltDB to take care of it or it integrates with other alternative solutions? What’s the common architectural scenario in those cases?

VoltDB supports the storage of JSON strings and can index, query and join on fields within those JSON values. Further, VoltDB can process streamed JSON data directly into the database using our Importers (See the answer for question #9) and custom formatters (custom decoding) - this makes it possible for VoltDB to transactionally process data in almost any format, and even to act as an ETL engine.

How does VoltDB interact with players in the Big Data space such as Hadoop, both open source and commercial distributions?

The VoltDB database supports directly exporting data into a downstream data lake.  This target could be Hadoop, Vertica, a JDBC source or even flat files.  VoltDB handles the real-time data storage and processing, as it is capable of transactionally ingesting (database “writes”) millions of events per second.

Typically the value of this data decreases with age - it becomes cold or stale - and eventually would be migrated to historical storage such as Hadoop, Spark, Vertica, etc.  Consider applications in the telco or online gaming space - the “hot data” may have a lifespan of one month in telco, or even one hour or less, in the case of game play.

Once the data becomes “historical” and is of less immediate value, it may be removed from VoltDB and stored on disk in the historical archive (such as Hadoop, Vertica, etc).

What capabilities VoltDB offers not just for database administration but for development on top of VoltDB with Python, R, or other languages?

While VoltDB offers traditional APIs such as JDBC, ODBC, Java and C++ native bindings, as well as Node.js, Go, Erlang, PHP, Python, etc., I think one of the more exciting next-generation features VoltDB offers is the ability to stream data directly into the database via our in-process Importers. VoltDB is a clustered database, meaning a database comprises one (1) or more processes (usually a machine, VM or container).

A database can be configured to have an “importer,” which is essentially a plug-in that listens to a source, reads incoming messages (events, perhaps) and transactionally processes them. If the VoltDB database is highly available, then the importer is highly available (surviving node failure).  VoltDB supports a Kafka Importer and a socket importer, as well as the ability to create your own custom importer.

Essentially this feature “eliminates the client application” and data can be transactionally streamed directly into VoltDB.  The data streamed can be JSON, CSV, TSV or any custom-defined format.  Further, the importer can choose which transactional behavior to apply to the incoming data.  This is how future applications will be designed: by hooking feeds, streams of data, directly to the database - eliminating much of the work of client application development.

We have one customer who has produced one of the top 10 games in the app store - their application streams in-game events into VoltDB at a rate upwards of 700,000 events per second.  VoltDB hosts a Marketing Optimization application that analyzes these in-game events in an effort to boost revenue.

If you had a crystal ball, how would you visualize the database landscape in 5 years from now? Major advancements?

Specialized databases will continue to carve out significant market share from established vendors.
IoT will be a major market, and will drive storage systems to support two activities: 1) Machine learning (historical analysis) on the Data Lake/Big Data; storage engines will focus on enabling data scientists to capture value from the vast increases of data, and 2) Real-time processing of streams of data. Batch processing of data is no longer acceptable - real-time becomes a “must have”.

Data creation continues to accelerate and capturing value from fresh data in real-time is the new revenue frontier.

Finally, could tell us a song that is an important part of the soundtrack of your life?  

I’m a passionate Bruce Springsteen fan (and also a runner), so it would have to be “Born to Run”.

Springsteen captures that youthful angst so perfectly, challenging us to break out of historic norms and create and experience new things, to challenge ourselves.

This perfectly captures the entrepreneurial spirit both of personal “self” as well as “professional self,” and it matches the unbridled spirit of what we’re trying to accomplish with VoltDB. “Together we could break this trap We'll run till we drop, baby we'll never go back.”

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

(Image Courtesy of Teradata)
In a post about Teradata’s 2016 Partners event I wrote about the big effort Teradata is making to ensure its software offerings are now available both on-premises and in the Cloud, in variety of forms and shapes, making a big push to ensure Teradata’s availability, especially for hybrid cloud configurations.

So, the data management and analytics software giant seems to be sticking to its promise by increasingly incorporating its flagship Teradata Database other solutions to the Cloud in the form of its own Manage Cloud for Americas and Europe, a private cloud-ready solution or via public cloud providers such as AWS and most recently announced on Microsoft’s Azure Marketplace.

To chat about this latest news and Teradata’s the overall strategy directed to the cloud we’ve sat with Teradata’s Brian Wood.

Brian Wood is director of cloud marketing at Teradata. He is a results-oriented technology marketing executive with 15+ years of digital, lead gen, sales / marketing operations & team leadership success.

Brian has an MS in Engineering Management from Stanford University, a BS in Electrical Engineering from Cornell University, and served as an F-14 Radar Intercept Officer in the US Navy.

All along 2016 and especially during its 2016 Partners conference, Teradata made it clear it is undergoing an important transformation process and, a key strategy includes its path to the cloud. Offerings such as Teradata Database on different private and public cloud configurations, including AWS, VMware, Teradata Managed Cloud, and of course Microsoft Azure are available now. Could you share some details about the progress of this strategy so far?

Thanks for asking, Jorge. It’s been a whirlwind because Teradata has advanced tremendously across all aspects of cloud deployment in the past few months; the progress has been rapid and substantial.

To be clear, hybrid cloud is central to Teradata’s strategy and it’s all about giving customers choice. One thing that’s unique to Teradata is that we offer the very same data and analytic software across all modes of deployment – whether managed cloud, public cloud, private cloud, or on-premises.

What this means to customers is that it’s easy for them to transfer data and workloads from one environment to another without hassle or loss of functionality; they can have all the features in any environment and dial it up or down as needed. Customers like this flexibility because nobody wants to locked in and it’s also helpful to be able to choose the right tool for the job and not worry about compatibility or consistency of results.

Specific cloud-related advancements in the last few months include:
  • Expanding Teradata Managed Cloud to now include both Americas and Europe
  • Increasing the scalability of Teradata Database on AWS up to 64 nodes
  • Launching Aster Analytics on AWS with support up to 33 nodes
  • Expanding Teradata Database on VMware scalability up to 32 virtual nodes
  • Bolstering our Consulting and Managed Services across all cloud options
  • And announcing upcoming availability of Teradata Database on Azure in Q1
These are just the ones that have been announced; there are many more in the pipeline queued up for release in the near future. Stay tuned!

The latest news is the availability of Teradata Database on Microsoft’s Azure Marketplace. Could you give us the details around the announcement?

We’re very excited about announcing Q1 availability for Teradata Database on Azure because many important Teradata customers have told us that Microsoft Azure is their preferred public cloud environment. We at Teradata are agnostic; whether AWS, Azure, VMware, or other future deployment options, we want what’s best for the customer and listen closely to their needs.

It all ties back to giving customers choice in how they consume Teradata, and offering the same set of capabilities across the board to make experimentation, switching, and augmentation as easy as possible.

Our offerings on Azure Marketplace will be very similar to what we offer on AWS Marketplace, including:
  • Teradata Database 15.10 (our latest version)
  • Teradata ecosystem software (including QueryGrid, Unity, Data Mover, Viewpoint, Ecosystem Manager, and more)
  • Teradata Aster Analytics for multi-genre advanced analytics
  • Teradata Consulting and Managed Services to help customers get the most value from their cloud investment
  • Azure Resource Manager Templates to facilitate the provisioning and configuration process and accelerate ecosystem deployment

What about configuration and licensing options for Teradata Database in Azure?

The configuration and licensing options for Teradata Database on Azure will be similar to what is available on AWS Marketplace. Customers use Azure Marketplace as the medium through which to find and subscribe to Teradata software; they are technically Azure customers but Teradata provides Premier Cloud Support as a bundled part of the software subscription price.

One small difference between what will be available on Azure Marketplace compared to what is now available on AWS Marketplace is subscription duration. Whereas on AWS Marketplace we currently offer both hourly and annual subscription options, on Azure Marketplace we will initially offer just an hourly option.

Most customers choose hourly for their testing phase anyway, so we expect this to be a non-issue. In Q2 we plan to introduce BYOL (Bring Your Own License) capability on both AWS Marketplace and Azure Marketplace which will enable us to create subscription durations of our choosing.

Can we expect technical and functional limitations from this version compared with the on-premises solution?

No, there are no technical or functional limitations of what is available from Teradata in the cloud versus on-premises. In fact, this is one of our key differentiators: customers consume the same best-in-class Teradata software regardless of deployment choice. As a result, customers can have confidence that their existing investment, infrastructure, training, integration, etc., is fully compatible from one environment to another.

One thing to note, of course, is that a node in one environment will likely have a different performance profile than what is experienced with a node in another environment. In other words, depending on the workload, a single node of our flagship Teradata IntelliFlex system may require up to six to ten instances or virtual machines in a public cloud environment to yield the same performance.

There are many variables that can affect performance – such as query complexity, concurrency, cores, I/O, internode bandwidth, and more – so mileage may vary according to the situation. This is why we always recommend a PoC (proof of concept) to determine what is needed to meet specific customer requirements.

Considering a hybrid cloud scenario. What can we expect in regards to the integration with the rest of the Teradata stack, especially on-premises?

Hybrid cloud is central to Teradata’s strategy; I cannot emphasize this enough. We define hybrid cloud as a customer environment consisting of a mix on managed, public, private, and on-premises resources orchestrated to work together.

We believe that customers should have choice and so we’ve made it easy to move data and workloads in between these deployment modes, all of which use the same Teradata software. As such, customers can fully leverage existing investments, including infrastructure, training, integration, etc. Nothing is stranded or wasted.

Hybrid deployment also introduces the potential for new and interesting use cases that were less economically attractive in an all-on-premises world. For example, three key hybrid cloud use cases we foresee are:
  • Cloud data labs – cloud-based sandboxes that tie back to on-premises systems
  • Cloud disaster recovery – cloud-based passive systems that are quickly brought to life only when needed
  • Cloud bursting – cloud-based augmentation of on-premises capacity to alleviate short-term periods of greater-than-usual utilization

How about migrating from existing Teradata deployments to Azure? What is the level of support Teradata and/or Azure will offer?

Teradata offers more than a dozen cloud-specific packages via our Consulting and Managed Services team to help customers get the most value from their Azure deployments in three main areas: Architecture, Implementation, and Management.

Specific to migration, we first always recommend that customers have a clear strategy and cloud architecture document prior to moving anything so that the plan and expectations are clear and realistic. We can facilitate such discussions and help surface assumptions about what may or may not be true in different deployment environments.

Once the strategy is set, our Consulting and Managed Services team is available to assist customers or completely own the migration process, including backups, transfer, validation, testing, and so on. This includes not only Teradata-to-Teradata migration (e.g., on-premises to the cloud), but also competitor-to-Teradata migrations as well. We especially love the latter ones!

Finally, can you share with us a bit of what is next for Teradata in the Cloud?

Wow, where should I start? We’re operating at breakneck pace. Seriously, we have many new cloud developments in the works right now, and we’ve been hiring cloud developers like crazy (hint: tell ‘em Brian sent you!).

You’ll see more cloud announcements from us this quarter, and without letting the cat out of bag, expect advancements in the realm of automation, configuration assistance, and an expansion in managed offers.

Cloud is a key enabler to our ability to help customers get the most value from their data, so it’s definitely an exciting time to be involved in helping define the future of Teradata.
Thanks for your questions and interest!

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

(Image courtesy of Thomas Skirde)
As I mentioned in a first blog about the book, I'm now working hard to deliver a piece that will hopefully, serve as a practical guide for the implementation of a successful modern data management platform.

I'll try to provide frequent updates and, perhaps, share some pains and gains about its development.
For now, here's some additional information, including the general outline and the type of audience is intended for.

I invite you to be part of the process and leave your comments, observations and encouragement quotes right below, or better yet, to consider:
  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book, soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up to our pre-order list, or for
  • Providing us with information about your own successful enterprise use case, which we may use in the book
Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.
So here, take a look at the update...

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

Main Post Image courtesy of Thomas Skirde 

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Recently big data and analytics company Rocana, a software company specialized in developing solutions that bring visibility for IT & DevOps announced the release of its data platform Rocana Ops.

It is in this context that we had the chance to have an excellent interview with Eric Sammer, CTO and Co-Founder of Rocana who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.

Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at Cloudera where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.

He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.

Eric is also the author of Hadoop Operations published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.

Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?

Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.

These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.

Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.

Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.

We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.

How would you describe what Rocana Ops is?

Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.

A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.

Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?

I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.

There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.

Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is Conway’s Law in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.

IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.

We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.

What are the major pains of IT Ops and how Rocana helps them to solve this pains?

Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.

Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.

Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.

Could you describe Rocana’s main architecture?

Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.

As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.

Rocana's General Architecture (Courtesy of Rocana)

A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.

Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.

What are the basic or general requirements needed for a typical Rocana deployment?

Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.

Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.

A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.

As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.

What are then, the key functional differentiators of Rocana?

Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.

Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.

 Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.

Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.

Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?

Rocana Ops 2.0 is really exciting for us.

We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.

Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.

One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!

How does Rocana interact with the open source community, especially the Apache Hadoop project?

Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.

We’re committed to collaborating with the community whenever possible. We’ve open sourced parts of Rocana Ops where we believe there’s a benefit to the community (like Osso - A modern standard for event-oriented data). As we build with projects like Apache Hadoop, Kafka, Spark, Impala, and Lucene, we look closely at places where we can contribute features, insight, feedback, testing, and (most often) fixes.

The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.

Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.

Finally, what is in your opinion the best and geekiest song ever?

Now you’re speaking my language; I studied music theory.
Lateralus by Tool for the way it plays with the fibonacci sequence and other math without it being gimmicky or unnatural.
A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.

DrivenBI Helps Companies Drive Analytics to the Next Level

DrivenBI Helps Companies Drive Analytics to the Next Level

Privately held company DrivenBI was formed in 2006 by a group of seasoned experts and investors in the business intelligence (BI) market in Taiwan and the United States. Currently based in Pasadena, California, the company has been steadily growing in the ten years since, gaining more than 400 customers in both the English and Chinese markets.

Led by founder and CEO Ben Tai (previously VP of global services with the former BusinessObjects, now part of SAP), DrivenBI would be considered part of what I call a new generation of BI and analytics solutions that is changing the analytics market panorama, especially in the realm of cloud computing.

A couple of weeks ago, I had the opportunity to speak with DrivenBI’s team and to have a briefing and demonstration, most of it in regards to their current analytics offerings and the company’s business strategy and industry perspective, all of which I will share with you here.

How DrivenBI Drives BI
DrivenBI’s portfolio is anchored by SRK, DrivenBI’s native cloud self-service BI platform and collaboration hub.

SRK provides a foundation for sourcing and collecting data in real time within a collaborative environment. Being a cloud platform, SRK can combine the benefits of a reduced IT footprint with a wide range of capabilities for efficient data management.

The SRK native cloud-centralized self-service BI solution offers many features, including:
  • the ability to blend and work with structured and unstructured data using industry-standard data formats and protocols;
  • a centralized control architecture providing security and data consistency across the platform;
  • a set of collaboration features to encourage team communication and speed decision making; and agile reporting and a well-established data processing logic.
SRK’s collaborative environment featuring data and information sharing between users within a centralized setting allows users to maintain control over every aspect and step of the BI and analytics process (figure 1).

Figure 1. DrivenBI’s SRK self-driven and collaborative platform (courtesy of DrivenBI)
DrivenBI: Driving Value throughout Industries, Lines of Business, and Business Roles

One important aspect of the philosophy embraced by DrivenBI has to do with its design approach, providing, within the same platform, valuable services across the multiple functional areas of an organization, including lines of business such as finance and marketing, inventory control, and resource management, as well as across industries such as fashion, gaming, e-commerce, and insurance.

Another element that makes DrivenBI an appealing offering is its strategic partnerships, specifically with Microsoft Azure and DrivenBI has the ability to integrate with both powerhouse cloud offerings.

I had the opportunity to play around a bit with DrivenBI’s platform, and I was impressed with the ease of use and intuitive experience in all stages of the data analytics process, especially for dynamic reporting and dashboard creation (figure 2).

Figure 2. DrivenBI’s SRK dashboard (courtesy of DrivenBI)
Other relevant benefits of the DrivenBI platform that I observed include:
  • elimination/automation of some heavy manual processes;
  • analysis and collaboration capabilities, particularly relevant for companies with organizational and geographically distributed operations, such as widespread locations, plants, and global customers;
  • support for multiple system data sources, including structured operational data, unstructured social media sources, and others.
As showcased in its business-centered approach and design, DrivenBI is one of a new generation of BI and analytics offerings that enable a reduced need for IT intervention in comparison to peer solutions like Domo, Tableau, and GoodData. These new-generation solutions are offered through cloud delivery, a method that seems to suit analytics and BI offerings and their holistic take on data collection well. By replacing expensive IT-centric BI tools, the DrivenBI cloud platform is useful for replacing or minimizing the use of complex spreadsheets and difficult analytics processes.

DrivenBI’s Agile Analytics
My experience with DrivenBI was far more than “interesting.� DrivenBI is a BI software solution that is well designed and built, intuitive, and offers a fast learning curve. Its well-made architecture makes the solution easy to use and versatile. Its approach—no spreadsheets, no programming, no data warehouse—is well-suited to those organizations that truly need agile analytics solutions. Still, I wonder how this approach fits with large BI deployments that require robust data services, especially in the realms of merging traditional analytics with big data and Internet of Things (IoT) strategies.

To sample what DrivenBI has to offer, I recommend checking out its SRK demo:

(Originally published on TEC's Blog)

Yep, I’m Writing a Book on Modern Data Management Platforms

Yep, I’m Writing a Book on Modern Data Management Platforms


Over the past couple of years, I have spent lots of time talking with vendors, users, consultants, and other analysts, as well as plenty of people from the data management community about the wave of new technologies and continued efforts aimed at finding the best software solutions to address the increasing number of issues associated with managing enterprise data. In this way, I have gathered much insight on ways to exploit the potential value of enterprise data through efficient analysis for the purpose of “gathering important knowledge that informs better decisions.

Many enterprises have had much success in deriving value from data analysis, but a more significant number of these efforts have failed to achieve much, if any, useful results. And yet other users are still struggling with finding the right software solution for their business data analysis needs, perhaps confused by the myriad solutions emerging nearly every single day.

It is precisely in this context that I’ve decided to launch this new endeavor and write a book that offers a practical perspective on those new data platform deployments that have been successful, as well as practical use cases and plausible design blueprints for your organization or data management project. The information, insight, and guidance that I will provide is based on lessons I’ve learned through research projects and other efforts examining robust and solid data management platform solutions for many organizations.

In the following months, I will be working hard to deliver a book that serves as a practical guide for the implementation of a successful modern data management platform.
The resources for this project will require crowdfunding efforts, and here is where your collaboration will be extremely valuable.
There are several ways in which you can participate:

  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book (soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up at the link below)
  • Providing us with information about your own successful enterprise use case, which we may use in the book

To let us know which of these options best fits with your spirit of collaboration, and to receive the latest updates on this book, as well as other interesting news, you just need to sign up to our email list here. Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.

In the meantime, I’d like to leave you with a brief synopsis of the contents of this book, with more details to come in the near future:

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

About the Author 
Jorge Garcia is an industry analyst in the areas of business intelligence (BI) and data management. He’s currently a principal analyst with Technology Evaluation Centers (TEC).

His experience expands for more than 25 years in all phases of application development, database, data warehouse (DWH) and analytics and BI solution design, including more than 15 years in project management, covering best practices and new technologies in the BI/DWH space.

Prior to joining TEC, he was a senior project manager and senior analyst developing BI, DWH, and data integration applications using solutions such as Oracle, SAP, Informatica, IBM, Teradata, among others. Garcia also worked on projects related to the implementation of data management solutions for the private and public sectors including banking, insurance, retail, and services.

A proud member of the Boulder BI Brain Trust, Garcia also makes frequent public speaker appearances, and is an educator and influencer in different topics related to data management.

When not busy researching, speaking, consulting, and mingling with people in this industry, Garcia finds solace as an avid reader, music lover, and soccer fan, as well as proud father "trying" to raise his three lovely kids while his wife tries to re-raise him.

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Processing data fast and efficiently has become a never ending race. With the increasing need for data consumption by companies comes along a never ending “need for speed� for processing data and consequently, the emergence of new generation of database software solutions that emerging to fulfill this need for high performance data processing.

These new database management systems that incorporate novel technology provide high speed, and more efficient access and processing of large bulks of data.

EXASOL is one of this disruptive "new" database solution. Headquartered out of Nuremberg, Germany and with offices around the globe, EXASOL has worked hard to bring a fresh, new approach to the data analytics market via the offering of a world-class database solution.

In this interview, we took the opportunity to chat with EXASOL’s Aaron Auld about the company and its innovative database solution.

Aaron Auld is the Chief Executive Officer as well as the Chairman of the Board at EXASOL, positions he has held since July 2013. He was made a board member in 2009.

As CEO and Chairman, Aaron is responsible for the strategic direction and execution of the company, as well as growing the business internationally.

Aaron embarked on his career back in 1996 at MAN Technologie AG, where he worked on large industrial projects and M&A transactions in the aerospace sector. Subsequently, he worked for the law firm Eckner-Bähr & Colleagues in the field of corporate law.

After that, the native Brit joined Océ Printing Systems GmbH as legal counsel for sales, software, R&D and IT. He then moved to Océ Holding Germany and took over the global software business as head of corporate counsel. Aaron was also involved in the IPO (Prime Standard) of Primion Technology AG in a legal capacity, and led investment management and investor relations.

Aaron studied law at the Universities of Munich and St. Gallen. Passionate about nature, Aaron likes nothing more than to relax by walking or sailing and is interested in politics and history.

So, what is EXASOL and what is the story behind it?

EXASOL is a technology vendor that develops a high-performance in-memory analytic database that was built from the ground up to analyze large volumes of data extremely fast and with a high degree of flexibility.
The company was founded back in the early 2000's in Nuremberg, Germany, and went to market with the first version of the analytic database in 2008.

Now in its sixth generation, EXASOL continues to develop and market the in-memory analytic database working with organizations across the globe to help them derive business insight from their data that helps them to drive their businesses forward.

How does the database work? Could you tell us some of the main features?

We have always focused on delivering an analytic database ultra-fast, massively scalable analytic performance. The database combines in-memory, columnar storage and massively parallel processing technologies to provide unrivaled performance, flexibility and scalability.

The database is tuning-free and therefore helps to reduce the total cost of ownership while enabling users to solve analytical tasks instead of having to cope with technical limits and constraints.

With the recently-announced version 6, the database now offers a data virtualization and data integration framework which allows users to connect to more data sources than ever before.

Also, alongside out-of-the-box support for R, Lua, Python and Java, users can integrate the analytics programming language of their choice and use it for in-database analytics.

Especially today, speed of data processing is important. I’ve read EXASOL has taken some benchmarks in this regard. Could you tell us more about it?

One of the truly independent set of benchmark tests available is offered by the Transactional Processing Council (TPC).  A few years ago we decided to take part in the TPC-H benchmark and ever since we have topped the tables in terms of not only performance (i.e. analytic speeds) but also in terms of price/performance (i.e. cost aligned with speed) when analyzing data volumes ranging from 100GB right up to 100TB.   No other database vendor comes close.
The information is available online here.

One of the features of EXASOL is that, if I’m not mistaken, is deployed on commodity hardware. How does EXASOL’s design guarantee optimal performance and reliability?

Offering flexible deployment models in terms of how businesses can benefit from EXASOL has always been important to us at EXASOL.

Years ago, the concept of the data warehouse appliance was talked about as the optimum deployment model, but in most cases it meant that vendors were forcing users to use their database on bespoke hardware, on hardware that then could not be re-purposed for any other task.  Things have changed since: while the appliance model is still offered, ours is and always has been one that uses commodity hardware.

Of course, users are free to download our software and install it on their own hardware too.
It all makes for a more open and transparent framework where there is no vendor lock-in, and for users that can only be a good thing.  What’s more, because the hardware and chip vendors are always innovating, when a new processor or server is released, users only stand to benefit as they will see yet even faster performance when they run EXASOL on that new technology.
We recently discussed this in a promotional video for Intel.

Price point related, is it intended only for large organizations, what about medium and small ones with needs for fast data processing?

We work with organizations both large and small.  The common denominator is always that they have an issue with their data analytics or incumbent database technology and that they just cannot get answers to their analytic queries fast enough.

Price-wise, our analytic database is extremely competitively priced and we offer organizations of all shapes and sizes to use our database software on terms that best fit their own requirements, be that via a perpetual license model, a subscription model, a bring-your-own license model (BYOL) – whether on-premises or in the cloud.

What would be a minimal configuration example? Server, user licensing etc.?

Users can get started today with the EXASOL Free Small Business Edition.  It is a single-node only edition of the database software and users can pin up to 200GB of data into RAM.

Given that we advocate a 1:10 ratio of RAM vs raw data volume, this means that users can put 2TB of raw data into their EXASOL database instance and still get unrivaled analytic performance on their data – all for free. There are no limitations in terms of users.

We believe this is a very compelling advantage for businesses that want to get started with EXASOL.

Later, when data volumes grow and when businesses want to make use of advanced features such as in-database analytics or data virtualization, users can then upgrade to the EXASOL Enterprise Cluster Edition which offers much more in terms of functionality.

Regarding big data requirements, could you tell us some of the possibilities to integrate or connect EXASOL with big data sources/repositories such as Hadoop and others?

EXASOL can be easily integrated into every IT infrastructure.  It is SQL-compliant and, is compatible with leading BI and ETL products such as Tableau, MicroStrategy, Birst, IBM Cognos, SAP BusinessObjects, Alteryx, Informatica, Talend, Looker and Pentaho, and provides the most flexible Hadoop connector on the market.

Furthermore, through an extensive data virtualization and integration framework, users can now analyze data from more sources more easily and faster than ever before.

Recently, the company announced that EXASOL is now available on Amazon. Could you tell us a bit more about the news? EXASOL is also available on Azure, right?

As more and more organizations are deploying applications and their systems in the cloud, it’s therefore important that we can allow them to use EXASOL in the cloud, too.  As a result, we are now available on Amazon Web Services as well as Microsoft Azure.  What’s more, we continue to offer our own cloud and hosting environment, which we call EXACloud.

Finally, on a more personal topic. Being a Scot who lives in Germany, would you go for a German beer or a Scottish whisky?

That’s an easy one.  First enjoy a nice German beer (ideally, one from a Munich brewery) before dinner, then round the evening off with by savoring a nice Scottish whisky.  The best of both worlds.

Logging challenges for containerized applications: Interview with Eduardo Silva

Logging challenges for containerized applications: Interview with Eduardo Silva

Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.

In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session: Logging for Containers which will take place during the conference.

Eduardo Silva is a principal Open Source developer at Treasure Data Inc where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.

He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.

A well known speaker, Eduardo has been speaking in events across South America and in recent Linux Foundation events in the US, Asia and Europe.

Thanks so much for your time Eduardo!

What is a container and how is it applied specifically in Linux?

When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers: cgroups and namespaces.

  • cgroups (control goups) allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
  • on the other hand namespaces (associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.

For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.

So, what is the difference between a Linux Container and, for example a virtual machine?

A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.

So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?

There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.

Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?

Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers. LXC and LXD provide the next level of abstraction to manage containers, LXCFS is a user-space file system for containers (works on top of Fuse).
Since I don't play with containers at low level, I don't have a strong preference.

And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?

Docker is the big player nowadays, it provide good security and mechanisms to manage/deploy containers. CoreOS have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like Kubernetes are already providing support for it.

You are also working on a quite interesting project called Fluent-Bit. What is the project about?

I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this, Fluentd exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born: Fluent Bit.
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:

  • Input / Output plugins
  • Event driven (async I/O operations)
  • Built-in Metrics
  • Security: SSL/TLS
  • Routing
  • Buffering
  • Fluentd Integration

Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:

  • Input
    • Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
    • Head: read initial chunks of bytes from a file.
    • Health: check remote TCP server healthy.
    • kmsg: read Kernel log messages.
    • CPU: collect CPU metrics usage, globally and per core.
    • Mem: memory usage of the system or from a specific running process.
    • TCP: expect for JSON messages over TCP.
  • Output
    • Elasticsearch database
    • Treasure Data (our cloud analytics platform)
    • NATS Messaging Server
    • HTTP end-point

So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into Elasticsearch, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?

You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?

I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.

Finally, on a personal note. Which do you consider to be the geekiest songs of this century?

That's a difficult question!
 I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).

Teradata Partners Conference 2016: Teradata Everywhere

Teradata Partners Conference 2016: Teradata Everywhere

Our technologized society is becoming opaque.
As technology becomes more ubiquitous and our relationship with digital devices ever
more seamless, our technical infrastructure seems to be increasingly intangible.
- Honor Harger

An idea that I could sense was in the air during my last meeting with Teradata’s crew in California, during their last influencer event, was confirmed and reaffirmed a couple of weeks ago during Teradata’s big partner conference: Teradata is now in full-fledged transformational mode.

Of course, for companies like Teradata that are used to being on the front line of the software industry, particularly in the data management space, transformation has now become much more than a “nice to do”. These days it’s pretty much the life breath of any organization at the top of the software food chain.

These companies have the complicated mandate to, if they want to stay at the top, be fast and smart enough to provide the software, the method, and the means to enable customers to gain technology and business improvements and the value that results from these changes.

And while it seems Teradata has taken its time for this transformation it is also evident that the company is taking it very seriously. Will this be enough to keep pace with peer vendors within a very active, competitive, and transformational market? Well, it’s hard to say, but certainly with a number of defined steps, Teradata looks like it will be able to meet its goal of remaining a key player in the data management and analytics industry.

Here we take an up-to-date look at Teradata’s business and technology strategy, including its flexible approach to deployment and ability for consistent and coherent analytics over all types of deployment, platforms, and sources of data; and then explore what the changes mean for the company and its current and future customers.

The Sentient Enterprise
As explained in detail in a previous installment, Teradata has developed a new approach towards the adoption of analytics, called the “sentient enterprise.” This approach aims to guide companies to:

  • improve their data agility
  • adopt a behavioral data platform
  • adopt an analytical application platform
  • adopt an autonomous decision platform

While we won’t give a full explanation of the model here (see the video below or my recent article on Teradata for a fuller description of the approach), there is no doubt that this is a crucial pillar for Teradata’s transformational process, as it forms the backbone of Teradata‘s approach to analytics and data management.

Teradata Video: The Sentient Enterprise

As mentioned in the previous post, one aspect of the “sentient enterprise” approach from Teradata that I particularly like is the “methodology before technology” aspect, which focuses on scoping the business problem, then selecting the right analytics methodology, and at the end choosing the right tools and technology (including tools such as automatic creation models and scoring datasets).

Teradata Everywhere
Another core element of the new Teradata approach consists of spreading its database offering wide, i.e., making it available everywhere, especially in the cloud. This movement involves putting Teradata’s powerful analytics to work. Teradata Database will now be available in different delivery modes and via different providers, including on:

  • Amazon Web Services—Teradata Database will be available for a massively parallel process (MPP) configuration and scalable for up to 32 nodes, including services such as node failure recovery and backup, as well as restoring and querying data in Amazon’s Simple Storage Service (S3). The system will be available in more than ten geographic regions.
  • Microsoft’s Azure—Teradata Database is expected to be available by Q4 of 2016 in the Microsoft Azure Marketplace. It will be offered with MPP (massively parallel processing) features and scalability for up to 32 nodes.
  • VMWare——via the Teradata Virtual Machine Edition (TVME), users have the option for deploying a virtual machine edition of Teradata Database for virtual environments and infrastructures.
  • Teradata Database as a Service—Extended availability for the Teradata Database will be available to customers in Europe through a data center hosted in Germany.

Teradata’s own on-premises IntelliFlex platform.

Availability of Teradata Database on different platforms

Borderless Analytics and Hybrid Clouds
The third element in the new Teradata Database picture involves a comprehensive provision of analytics despite the delivery mode chosen, an offering which fits the reality of many organizations—a hybrid environment consisting of both on-premises and cloud offerings.

With a strategy called Borderless Analytics, Teradata allows customers to deploy comprehensive analytics solutions within a single analytics framework. Enabled by Teradata’s solutions such as its multi-source SQL and processing QueryGrid engine and Unity, its orchestration engine for Teradata’s multi-system’s environments, this strategy purposes a way to perform consistent and coherent analytics over heterogeneous platforms with multiple systems and sources of data, i.e., in the cloud, on-premises, or virtual environments.

At the same time, this is also serving Teradata as a way to set the basis for its larger strategy for addressing the Internet of Things (IoT) market. Teradata is addressing this goal with the release of a set of new offerings called Analytics of Things Accelerators (AoTAs), comprised by technology-agnostic intellectual property that emerged as a result of Teradata’s real life IoT project engagements.

These accelerators can help organizations determine which IoT analytical techniques and sensors to use and trust. Due to the AoTAs’ enterprise readiness and design, companies can deploy them without having an enterprise scaling approach in mind, and not have to go through time-consuming experimentation phases before deployment to ensure the right analytical techniques have been used. Teradata’s AoTAs accelerate adoption, enabling deployment cost reduction and ensuring reliability. This is a noteworthy effort to provide IoT projects with an effective enterprise analytics approach.

What Does this Mean for Current and Potential Teradata Customers?
Teradata seems to have a concrete, practical, and well-thought-out strategy regarding the delivery of new generation solutions for analytics, focusing on giving omnipresence, agility, and versatility to its analytics offerings, and providing less product dependency and more business focus to its product stack.

But one thing Teradata needs to consider, given the increasing number of solutions available from its portfolio, is being sure to provide clarity and efficiency to customers regarding which solution blend to choose. This is especially true when the solution choice involves increasingly sophisticated big data solutions, a market that is getting “top notch” but certainly is still difficult to navigate into, especially for those new to big data.

Teradata’s relatively new leadership team seems to have sensed right away that the company is currently in a very crucial position not only within itself but also within the industry of providing insights. If its strategy works, Teradata might be able to not only maintain its dominance in this arena but also increase its footprint in an industry destined to expand with the advent of the Internet of Things.

For Teradata’s existing customer base, these moves could be encouraging, as they could mean being able to expand the company’s existing analytics platforms using a single platform and therefore without any friction and with and cost savings.

For those considering Teradata as a new option, it means having even more options for deploying end-to-end data management solutions using a single vendor rather than a having a “best of breed” approach. Either way though, Teradata is pushing towards the future with a new and comprehensive approach to data management and analytics in an effort to remain a key player in this fierce market.

The question is if Teradata’s strategic moves will resonate effectively within the enterprise market to compete with the existing software monsters such as Oracle, Microsoft, and SAP.

Are you a Teradata user? If so, let us know what you think in the comments section below.

(Originally published on TEC's Blog)
Salesforce Acquires BeyondCore to Enable Analytics . . . and More

Salesforce Acquires BeyondCore to Enable Analytics . . . and More

In October of 2014, Salesforce announced the launch of Salesforce Wave, the cloud-based company’s analytics cloud platform. By that time, Salesforce had already realized that to be able to compete with the powerful incumbents in the business software arena—the Oracles, SAPs and IBMs of the world—arriving to the cloud at full swing would require it to expand its offerings to the business
IT Sapiens, for Those Who Are Not

IT Sapiens, for Those Who Are Not

Perhaps one of the most refreshing moments in my analyst life is when I get the chance to witness the emergence of new tech companies—innovating and helping small and big organizations alike to solve their problems with data. This is exactly the case with Latvia-based IT Sapiens, an up-and-coming company focused on helping those small or budget-minded companies to solve their basic yet crucial
Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

For anyone with even a small amount of understanding regarding current trends in the software industry it will come as no surprise that the great majority of enterprise software companies are focusing on the incorporation of analytics, big data, cloud adoption, and especially the Internet of Things into their software solutions. In fact, these capabilities have become so ubiquitous that for
Zyme: Emergence and Evolution of Channel Data Management Software

Zyme: Emergence and Evolution of Channel Data Management Software

Previous to the official launch of the new version of Zyme’s solution, I had the opportunity to chat and be briefed by Ashish Shete, VP of Products and Engineering at Zyme, in regard to version 3.0 of what Zyme describes as its channel data management (CDM) solution platform. This conversation was noteworthy from both the software product and industry perspectives. In particular, the solution
An Interview with Dataiku’s CEO: Florian Douetteau

An Interview with Dataiku’s CEO: Florian Douetteau

As an increasing number of organizations look for ways to take their analytics platforms to higher grounds, many of them are seriously considering the incorporation of new advanced analytics disciplines, this includes hiring data science specialists and solutions that can enable the delivery of improved data analysis and insights. As a consequence, this also triggers the emergence of new companies and offerings in this area.

Dataiku is one of these new breed of companies. With its Data Science Studio (DSS) solution, Dataiku aims to offer full data science solution for both data science experienced and non-experienced users.

In this opportunity I had the chance to interview Florian Douetteau, Dataiku’s CEO and be able to pick some of his thoughts and interesting views in regards to the data management industry and of course he’s company and software solution.
A brief Bio of Florian 

In 2000, at age 20, he dropped the prestigious “Ecole Normale Supérieure”  math courses and decided to look for the largest dataset he could find, and the hardest related problem he could solve.

That’s how he started working at Exalead, a search engine company that back at the time was developing technologies in web mining, search, natural language processing (NLP) and distributed computing. At Exalead, Florian scaled to be managing VP of Product and R&D. He stayed in the company until it was acquired in 2010 by Dassault Systèmes for $150M (a pretty large amount for French standards).

Still in 2010 when the data deluge was pouring into to new seas, Florian worked in the social gaming and online advertising industry, an industry where machine learning was already being applied on petabytes of data. Between 2010 and 2013 he held several positions as consultant and CTO.

 By 2013 Florian along with other 3 co-founders creates Dataiku with the goal of making advanced data technologies accessible to companies that are not digital giants, since then one of Florian’s main goals as CEO of Dataiku is to be able of democratizing access to Data Science.

So, you can watch the video or listen to the podcast in which Florian shares with us some of his views on the fast evolution of data science, analytics, big data and of course, his data science software solution.

 Of course, please feel free to let us know your comments and questions.
Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Logo courtesy of Altiscale Let me just say right off the bat that I consider Altiscale to be a really nice alternative for the provisioning of Big Data services such as Hortonworks, Cloudera or MapR. The Palo Alto, California–based company offers a full Big Data platform based in the cloud via the Altiscale Data Cloud offering. In my view, Altiscale has dramatically increased the appeal of
Hortonworks’s New Vision for Connected Data Platforms

Hortonworks’s New Vision for Connected Data Platforms

Courtesy of Hortonworks
On March 1, I had the opportunity to attend this year’s Hortonworks Analyst Summit in San Francisco, where Hortonworks announced several product enhancements and new versions and a new definition for its strategy going forward.

Hortonworks seems to be making a serious attempt to take over the data management space, while maintaining a commitment to open sources and especially to the Apache Foundation. Thus as Hortonworks keeps gaining momentum, it’s also consolidating its corporate strategy and bringing a new balance to its message (combining both technology and business).

By reinforcing alliances, and at the same time moving further towards the business mainstream with a more concise messaging around enterprise readiness, Hortonworks is declaring itself ready to win the battle for the big data management space.

The big question is if the company’s strategy will be effective enough to succeed at this goal, especially in a market already overpopulated and fiercely defended by big software providers.

Digesting Hortonworks’s Announcements
The announcements at the Hortonworks Analyst Summit included news on both the product and partner fronts. With regards to products, Hortonworks announced new versions for both its Hadoop Data (HDP) and Hadoop Dataflow (HDF) platforms.

HDP—New Release, New Cycle
Alongside specific features to improve performance and reinforce ease of use, the latest release of Apache HDP 2.4 (figure 1) includes the latest generation of Apache’s large-scale data processing framework, Spark 1.6, along with Ambari 2.2, Apache’s project for making Hadoop management easier and more efficient.

The inclusion of Ambari seems to be an important key for the provision of a solid, centric management and monitoring tool for Hadoop clusters.

Figure 1. Hortonworks emphaszes enterprise readiness for its HDP version
(Image courtesy of Hortonworks)

Another key announcement with regard to HDP is the revelation of a new release cycle for HDP. Interestingly, it aims to provide users with a consistent product featuring core stability. The new cycle will enable, via yearly releases, HDP services such as HDFS, YARN, and MapReduce as well as Apache Zookeeper to align with a compatible version of Apache Hadoop with the “ODPi Core,” currently in version 2.7.1. These can provide standardization and ensure a stable software base for mission critical workloads.

On the flip side, those extended services that run on top of the Hadoop core, including Spark, Hive, HBase, Ambari and others will be continually released throughout the year to ensure these projects are continuously updated.

Last but not least, HDP’s new version also comes with the new Smartsense 1.2, Hortonworks’s issue resolution application, featuring automatic scheduling and uploading, as well as over 250 new recommendations and guidelines.


Growing NiFi to an Enterprise Level
Along with HDP, Hortonworks also announced version 1.2 of HDF, Hortonworks’s offering for managing data in motion by collecting, manipulating, and curating data in real time. The new version includes new streaming analytics capabilities for Apache NiFi, which powers HDF at its core, and support for Apache Storm and Apache Kafka (figure 2).

Another noteworthy feature coming to HDF is its support for integration with Kerberos, a feature which will enable and ease management of centralized authentication across the platform and other applications. According to Hortonworks, HDF 1.2 will be available to customers in Q1 of 2016.

Figure 2. Improved security and control added to Hortonworks new HDF version
(Image courtesy of Hortonworks)

Hortonworks Adds New Partners to its List
The third announcement from Hortonworks at the conference was a partnership with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise (HPE).

The collaboration mainly has to do with a bipartisan effort to enhance performance and capabilities of Apache Spark. According to Hortonworks and HPE, this collaboration will be mainly focused on the development and analysis of a new class of analytic workloads which benefit from using large pools of shared memory.

Says Scott Gnau, Hortonworks’s chief technology officer, with regard to the collaboration agreement:

This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.

According to both companies, this collaboration has already generated interesting results which include more efficient memory usage and increased performance as well as faster sorting and in-memory computations for improving Spark’s performance.

The result of these collaborations will be derived as new technology contributions for the Apache Spark community, and thus carry beneficial impacts for this important piece of the Apache Hadoop framework.

Commenting on the new collaborations, Martin Fink, executive vice president and chief technology officer of HPE and board member of Hortonworks, said:

We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.

Additionally Hortonworks signed a partnership with Impetus Technologies, Inc., another solution provider based on open source technology. The agreement includes collaboration around StreamAnalytix™, an application that provides tools for rapid and less code development of real-time analytics applications using Storm and Spark. Both companies have the aim that with the use of HDF and StreamAnalytix together, companies will gain a complete and stable platform for the efficient development and delivery of real-time analytics applications.

But The Real News Is …
Hortonworks is rapidly evolving its vision of data management and integration, and this was in my opinion the biggest news of the analyst event. Hortonworks’s strategy is to integrate the management of both data at rest (data residing in HDP) and data in motion (data HDF collects and curates in real-time), as being able to manage both can power actionable intelligence. It is in this context that Hortonworks is working to increase integration between them.

Hortonworks is now taking a new go-to-market approach to provide an increase in quality and enterprise readiness to its platforms. Along with ensuring that ease of use will avoid barriers for end use adoption its marketing message is changing. Now the Hadoop-based company sees the need to take a step further and convince businesses that open source does more than just do the job; it is in fact becoming the quintessential tool for any important data management initiative—and, of course, Hortonworks is the best vendor for the job. Along these lines, Hortonworks is taking steps to provide Spark with enterprise-ready governance, security, and operations to ensure readiness for rapid enterprise integration. This to be gained with the inclusion of Apache Ambari and other Apache projects.

One additional yet important aspect within this strategy has to do with Hortonworks’s work done around enterprise readiness, especially regarding issue tracking (figure 3) and monitoring for mission critical workloads and security reinforcement.

Figure 3. SmartSense 1.2 includes more than 250 recommendations
(Image courtesy of Hortonworks)

It will be interesting to see how this new strategy works for Hortonworks, especially within the big data market where there is extremely fierce competition and where many other vendors are pushing extremely hard to get a piece of the pie, including important partners of Hortonworks.

Taking its data management strategy to a new level is indeed bringing many opportunities for Hortonworks, but these are not without challenges as the company introduces itself into the bigger enterprise footprint of the data management industry.

What do you think about Hortonworks’s new strategy in data management? If you have any comments, please drop me a line below and I’ll respond as soon as I can.

(Originally published)
Creating a Global Dashboard. The GDELT Project

Creating a Global Dashboard. The GDELT Project

There is probably no bigger dream for a data geek like myself than creating the ultimate data dashboard or scorecard of the world. One that summarizes and enables the analysis of all the data in the world. Well, for those of you who have also dreamt about this, Kalev H. Leetaru, a senior fellow at the George Washington University Center for Cyber & Homeland Security has tapped into your
Dell Toad’s Big Jump into the BI and Analytics Market

Dell Toad’s Big Jump into the BI and Analytics Market

Having a background in software and database development and design, I have a special nostalgia and appreciation for Toad’s set of database solutions, as in my past working life I was a regular user of these and other tools for database development. Of course, Toad’s applications have grown and expanded over the years and now cover the areas within data management that are key to many
TIBCO Spotfire Aims for a TERRific Approach to R

TIBCO Spotfire Aims for a TERRific Approach to R

terrific /təˈrɪfɪk/ adjective  1. very great or intense: a terrific noise 2. (informal) very good; excellent: a terrific singer The British Dictionary R is quickly becoming the most important letter in the world of analytics. The open source environment for statistical computing is now at the center of major strategies within many software companies. R is here to stay. As mentioned
Microsoft and the Revolution… Analytics

Microsoft and the Revolution… Analytics

You say you want a revolution
Well, you know
We all want to change the world
You tell me that it's evolution
Well, you know
We all want to change the world
(Revolution, Lennon &McCartney)

With a recent announcement  Microsoft took another of multiple steps towards what is now a clear internal and external revolution regarding the future of the company.

By announcing the acquisition of Revolution Analytics, a company that in a just a few years has become a leading provider of predictive analytics solutions, Microsoft looks not just to strengthen its already wide analytics portfolio but, perhaps is also trying to increase its presence in the open source and data science communities, with the latter being one with huge future potential. An interesting movement no doubt, but…  Was this acquisition one that Microsoft needed to boost its Analytics strategy against its biggest competitors? Will this movement really give Microsoft’s revolution a better entrance to the open source space, especially within the data science community? Is Microsoft ready for open source and vice versa?

The Appeal of Revolution Analytics
Without a doubt Revolution Analytics is quite an interesting company, founded lest than 10 years ago (in 2007) it has become one of the most representative software providers of predictive analytics in the market. The formula has been, if not easy to achieve, simple and practical, Revolution R software has been created on top of the increasingly popular programming language called ‘R’.

As a programming language, R is designed especially for the development of statistical and predictive analytics applications. Because this is a language that emerged from the trenches of academia and because of its open source nature, it has grown and expanded to the business market along with a vibrant community which develops and maintains its Comprehensive R Archive Network (CRAN), R’s wide library of functions.

Revolution Analytics had the apparently simple yet pretty clever strategy of developing and enhancing its analytics platform on top of R in order to offer a debugged and commercial ready R offering. It also has been clever to offer different flavors of software, ranging from a free version to a version ready for the enterprise.

At the same time, Revolution Analytics has maintained its close relation with both the R and open source communities and has developed a wide range of partnerships with important vendors such as Teradata, HP, IBM and many others, increasing its market presence, adoption and continuing technical development.

At first glance of course, Revolution Analytics is quite an interesting bet not just for Microsoft but for many other software providers eager to step big into the predictive analytics arena but.

Not so fast Microsoft…Was it a good idea?
In an article published recently on Forbes, Dan Woods states that Microsoft’s acquisition of Revolution Analytics is the wrong way to embrace R. He explains that the acquisition represents a step forward for the R language but will limit what R could bring to Microsoft’s own business. According to Mr. Woods:

It is vital to remember that R is not a piece of software created by software engineers. Like much of the open source world, R was created by those who wanted to use it – statisticians and data scientists. As a result, the architecture of the implementation has weaknesses that show up at scale and in other inconvenient ways. Fixing this architecture requires a major rewrite.


While Microsoft will be able to make its Hadoop offering on Azure better with what Revolution has done, the open source model will inhibit the wider deployment of R throughout the rest of the Microsoft ecosystem.

Both points are absolutely valid especially considering how the open source code would need to be accommodated within the Microsoft analytics portfolio. However, I would not be surprised if Microsoft had already taken this into account and had contemplated putting R on Azure as a short-term priority and the immersion of R with the rest of the portfolio as a medium-term priority –considering that they have not just acquired the software, but the expertise of the Revolution Analytics team. Important will be then to maintain cohesion on the team to pursue these major changes.

Another interesting aspect is Mr. Woods’  comparison of Microsoft’s acquisition vs TIBCO’s approach which took a radical posture and re-implemented R to make it suitable for high performance tasks and highly compatible with its complete set of analytics offerings and, thus creating TERR.

While TIBCO’s approach is quite outstanding (it deserves its own further post), it was somehow more feasible for TIBCO due to its experience with Bell Labs S, a precursor and similar offering to R and, its longtime expertise within the predictive analytics field. Microsoft by the contrary, is in the need for shortening distances with IBM, SAS and many others to enter the space with a strong foothold, one R can certainly provide, and also to  give the company some air and space to further work on an already stable product such as the one provided by Revolution Analytics.

One thing to consider though is Microsoft’s ability to enter and maintain active a community that at times has proven to be hostile to the Seattle software giant and, of course, willing to turn their backs on them. About this David Smith, Chief Community Officer with Revolution Analytics, mentioned:

Microsoft might seem like a strange bedfellow for an open-source company, but the company continues to make great strides in the open-source arena recently. Microsoft has embraced Linux as a fully-supported operating system on its Azure cloud service.

While it’s true that Microsoft has increased its presence in the open source community, whether the inclusion of Linux under Azure, contributing to its kernel or maintaining close partnerships with Hortonworks —big data’s big name— being able to convince and conquer the huge R community can prove to be difficult yet highly significant to increase its presence in market that has huge potential.

This of course, considering that Microsoft has changed its strategy regarding its development platforms by making them available to enable free development and community growth, like with .NET, Microsoft’s now open source development platform.

Embracing the revolution
While for Microsoft the road to embrace R can potentially be bumpy, it might still prove to be the way to go, if not the only, in order to foresee a bright future in the predictive analytics market. Much work perhaps will need to be done, including rewriting and optimizing but, at the end of the day, it might be a movement that could catapult Microsoft to compete in better shape in the predictive analytics market before it is too late.

At this point it Microsoft seems to rely that the open source movement is mature enough to accept Microsoft as another common contributor, while Microsoft seems to be ready to take what appears to be a logical step to reposition itself in line with modern times and ready to embrace new tech trends.

Like any new relationship, adjustment and adaption is needed. Microsoft’s (R) evolution and transformation seems to be underway.
Have a comment? Drop me a line below. I’ll respond the soonest I can.

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potential benefits, uses, and challenges within the scope of Business Intelligence and Analytics.

In this installment of the series, and the last devoted to machine learning before we step into cognitive systems, I will attempt to provide a general overview of the Machine Learning (ML) market landscape, describing some, yes, only some, of the vendors and software products that are using ML for performing Analytics and Intelligence, so, here a brief market landscape overview.

Machine learning: a common guest with no invitation

It is quite surprising to find Machine Learning has a vast presence in many of today’s modern analytics applications. Its use is driven by:

  • The increasing need to crunch data that is more complex and more voluminous, at greater speed and with more accuracy—I mean really big data
  • The need to solve increasingly business problems that require methods out of conventional data analysis.

An increasing number of traditional and new software providers, forced by specific market needs to radically evolve their existing solutions or moved by the pure spirit of innovation, have followed the path of incorporating new data analytics techniques to their analytics offering stack, both explicitly, or simply hidden within white curtains.

For software providers that already offer advanced analytics tools such as data mining, incorporating machine learning functionality into their existing capabilities stack is an opportunity to evolve their current solutions and take analytics to the next level.

So, it is quite possible that if you are using an advanced business analytics application, especially for Big Data, you are already using some machine learning technology, whether you know it or not.

The machine learning software landscape, in brief 

One of the interesting aspects of this seemingly new need for dealing with increasingly large and complex sets of information is that many of the machine learning techniques originally used within pure research labs have already gained entrance to the business world, via their incorporation within analytics offerings. New vendors often may incorporate machine learning as the core of their analytics offering, or just as another of the functional features available in their stack.

Taking this into consideration, we can find a great deal of software products that offer machine learning functionality, to different degrees. Consider the following products, loosely grouped by type:

From the lab to the business

In this group we can find a number of products, most of them based on an open-source licensing model that can help organizations to test machine learning and maybe take their first steps.


A collection of machine learning algorithms written in Java that can be applied directly over a dataset, or can be called from a custom Java-coded program, Weka is one of the most popular machine learning tools used in research and academia. It is written under the GNU General Public License, so it can be downloaded and used freely, as long as you comply with the GNU license terms.

Because of its popularity, a lot of information is available about the use of and development with Weka. It still can prove to be challenging for some users not familiar with machine learning, but it’s quite good for those who want to uncover explore the bits and bytes of using machine learning analysis on large datasets.


Probably the most popular language and environment for statistical computing and graphics, R is a GNU project that comprises a wide variety of statistical and graphical techniques with a high degree of scalability. No wonder that R is one of the most widely used statistical tools used by students.

The way the R project is designed to work is by having a core or based system set of statistical features and functions that can be extended with a large set of function libraries provided within the Comprehensive R Archive Network (CRAN).

Within the CRAN library, it is possible to download the necessary functions for multivariate analysis, data mining, and machine learning. But it is fair to assume that it takes a bit of effort to put machine learning to work with R.

Note: R is also of special interest owing to its increasing popularity and adoption via a commercial offering for R called Revolution Analytics, an offering I discuss below.


Jubatus is an online distributed machine learning framework. It is distributed under GNU Lesser General Public License  version 2.1, which makes Jubatus another good option for the learning, trial, and—why not—exploitation of machine learning techniques within a reduced budget.

The framework can be installed in different flavors of Linux, such as Red Hat, Ubuntu, and others, as well as within the Mac OS X. Jubatus includes client libraries for C++, Python, Ruby, and Java. Some of its functional features include a list of machine learning libraries for applying different techniques such as graph mining, anomaly detection, clustering, classification, regression, recommendation, etc.

Apache Mahout

Mahout is Apache’s machine learning algorithm library. Distributed under a commercially friendly Apache software license, Mahout comprises a core set of algorithms for clustering, classification and collaborative filtering that can be implemented on distributed systems.

Mahout supports three basic types of algorithms or use cases to enable recommendation, clustering and classification tasks.

One interesting aspect of Mahout is its goal to build a strong community for the development of new and fresh machine learning algorithms.

Apache Spark

Spark is Apache Hadoop’s general engine for processing large-scale data sets. The Spark engine is also an open source engine that enables users to generate applications in Java, Scala, or Python.

Just like the rest of the Hadoop family, Spark is designed to deal with large amounts of data, both structured and unstructured. The Spark design supports cyclic data flow and in-memory computing, making it ideal for processing large data sets at high speed.

In this scenario, one of the engine’s main components is the MLlib, which is Spark’s machine learning library. The library works using the Spark engine to perform faster than MapReduce and can operate in conjunction with NumPy, Python’s core scientific computing package, giving MLlib a great deal of flexibility to design new applications in these languages.

Some of the algorithms included within MLlib are:

  • K-means clustering with K-means|| initialization
  • L1- and L2-regularized linear regression
  • L1- and L2-regularized logistic regression
  • Alternating least squares collaborative filtering, with explicit ratings or implicit feedback
  • Naïve-Bayes multinomial classification
  • Stochastic gradient descent

While this set of applications gives users hands-on machine learning, at no cost, they can still be somewhat challenging when it comes to putting these applications to work. Many of them require special skills in the art of machine learning or in Java or MapReduce to fully develop a business solution.

Still, these applications can enable new teams to start working on machine learning and experienced ones to develop complex solutions for both small and big data. 

Machine learning by the existing players

As we mentioned earlier in this series, the evolution of Business Intelligence is demanding an increasing incorporation of machine learning techniques into existing BI and Analytics tools.

A number of popular enterprise software applications have already expanded their functional coverage to include machine learning—a useful ally—within their stacks.

Here are just a couple of the vast number of software vendors that have added machine learning either to their core functionality or as an additional feature-product of their stack.


It is no secret that IBM is betting strong in the areas of advanced analytics and cognitive computing, especially with Watson, IBM’s cognitive computing initiative and an offering which we will examine in the cognitive computing part of this series. IBM can potentially enable users to develop machine learning analytics approaches via its SPSS product stack, which incorporates the ability to develop some specific machine learning algorithms via the SPSS Modeler.


Indubitably SAS is one of the key players in the advanced analytics arena, with a solid platform for performing mining and predictive analysis, for both general and industry vertical purposes. It has incorporated key machine language techniques to be adopted for different uses. Several ML techniques can be found within SAS’ vast analytics platform, from SAS Enterprise and Tex Miner products to its SAS High-Performance Optimization offering.

An interesting fact to consider is SAS’ ability to provide industry and line-of-business approaches for many of its software offerings, encapsulating functionality with prepackaged vertical functionality.

Embedded machine learning

Significantly, machine learning techniques are reaching the core of many of the existing powerhouses as well as the newcomers in the data warehouse and Big Data spaces. Via its incorporation as embedded technologies within their database technologies, some analytic and data warehouse providers have now incorporated machine learning techniques, to varying degrees, to their database structures. 


The New York-based company, a provider of Big Data and discovery software solutions, offers a set of what it calls in-database analytics in which a set of analytics capabilities is built right into 1010Data’s database management engine. Machine learning is included along with a set of in-database analytics such as clustering, forecasting, optimization, and others.


Among its multiple offerings for enterprise data warehouse and Big Data environments, Teradata offers the Teradata Warehouse Miner, an application that packages a set of data profiling and mining functions that includes machine learning algorithms alongside predictive and mining ones. The Warehouse Miner is able to perform analysis directly in the database without undergoing a data movement operation, which ease the process of data preparation. 


SAP HANA, which may be SAP’s most important technology initiative ever, will now support almost all (if not actually all) of SAP’s analytics initiatives, and its advanced analytics portfolio is not the exception.

Within HANA, SAP originally launched SAP HANA Advanced Analytics, in which a number of functions for performing mining and prediction take place. Under this set of solutions it is possible to find a set of specific algorithms for performing machine learning operations.

Additionally, SAP has expand its reach into predictive analysis and machine learning via the SAP InfiniteInsight predictive analytics and mining suite, a product developed by KXEN, which SAP recently acquired.

Revolution Analytics

As mentioned previously, the open source R language is becoming one of the most important resources for statistics and mining available in the market. Revolution Analytics, a company founded in 2007, has been able to foster the work done by the huge R community and at the same time develop a commercial offering for exploiting R benefits, giving R more power and performance resources via technology that enables the use of R for enterprise data intensive applications.

Revolution R Enterprise is Revolution Analytics’ main offering and contains the wide range of libraries provided by R enriched with major technology improvements for enabling the construction of enterprise-ready analytics applications. The application is available for download both as workstation and server versions as well as on demand via the AWS Marketplace.

The new breed of advanced analytics

The advent and hype of Big Data has also become a sweet spot for innovation in many areas of the data management spectrum, especially in the area of providing analytics for large volumes of complex data.

A new wave of fresh and innovative software providers is emerging with solutions that enable businesses to perform advanced analytics over Big Data and using machine learning as a key component or enabler for this analysis.

A couple of interesting aspects of these solutions:

  1. Their unique approach to providing specific solutions to complex problems, especially adapted for business environments, combining flexibility and ease of use to make it possible for business users with a certain degree of statistical and mathematical preparation to address complex problems in the business.
  2. Many have them have already, at least partially, configured and prepared specific solutions for common business problems within line-of-business and industries via templates or predefined models, easing the preparation, development, and deployment process.

Here is a sampling of some of these vendors and their solutions:


Being that Skytree’s tagline is “The Machine Learning Company,� it’s pretty obvious that the company has machine learning in its veins. Skytree has entered the Big Data Analytics space with a machine learning platform for performing mining, prediction, and recommendations with, according to Skytree, an enterprise-grade machine learning offering.

Skytree Server is its main offering. A Hadoop-ready machine learning platform with high-performance analytics capabilities, it can also connect to diverse data streams and can compute real-time queries, enabling high-performance analytics services for churn prediction, fraud detection, and lead scoring, among others.

Skytree also offers a series of plug-ins to be connected to the Skytree Server Foundation to improve Skytree’s existing capabilities with specific and more advanced machine learning models and techniques.


If you Google BigML, you will find that “BigML is Machine Learning for everyone.�

The company, founded in 2011 in Corvallis, Oregon, offers a cloud-based large-scale machine learning platform centered on business usability and at highly competitive costs by providing advanced analytics via a subscription-based offering.

The application enables users to prepare complete analytics solutions for a wide range of analysis scenarios, from collecting the data and designing the model to creating special analytics ensembles.

Since it is a cloud-based platform, users can start using BigML services via a number of subscription-based and/or dedicated options. An attractive approach for those organizations trying to make the best of advanced analytics with less use of technical and monetary resources.

Yottamine Analytics

Founded in 2009 by Dr. David Huang, Yottamine has taken Dr. Huang contributions to the theory of machine learning to practice and reflected it within the Yottamine Predictive Service (YPS).

YPS is an on-demand advanced analytics solution based on the use of web services, which allows users to build, deploy, and develop advanced big data analytics solutions.

As an on-demand solution it offers a series of subscription models based on clusters and nodes, with payment based on the usage of the service in terms of node hours—a pretty interesting quota approach. 

Machine learning is pervasive

Of course, this is just a sample of the many advanced analytics offerings that exist. Others are emerging. They use machine learning techniques to different degrees and for many different purposes, specific or general. New companies such as BuildingIQ, emcien, BayNote,  Recommind, and others are taking advantage of the use of machine learning to provide unique offerings in a wide range of industry and business sectors.

So what?

One of the interesting effects of companies dealing with increasing volumes of data and, of course, increasing problems to solve is that techniques such as Machine Learning and other Artificial Intelligence and Cognitive Computing methods are gaining terrain in the business world.

Companies and information workers are being forced to learn about these new disciplines and use them to find ways to improve analysis accuracy, the ability to react and decide, and prediction, encouraging the rise of what some call the data science discipline.

Many of the obscure tools for advanced analytics traditionally used in the science lab or at pure research centers are now surprisingly popular within many business organizations—not just within their research and development departments, but within all their lines of business.

But on the other hand, new software is increasingly able not only to help in the decision-making process, but also to be proactive in reproducing and automatically improving complex analysis models, recommendations, complex scenario analysis to enable early detection and prediction and, potentially, data-based decisions. 

Whether measuring social media campaign effectiveness, effectively predicting sales, detecting fraud, or performing churn analysis, these tools are remaking the way data analysis is done within many organizations.

But this might be just the beginning of a major revolution in the way software serves and interacts with humans. An increasing number of Artificial Intelligence disciplines, of which machine learning is a part, are rapidly evolving and reaching mainstream spaces in the business software world in the form of next-generation cognitive computing systems.

Offerings such as Watson from IBM might be the instigators of a new breed of solutions that go well beyond what we have so far experienced with regard to computers and the analysis process, So, I dare you to stay tuned for my next installment on Cognitive Systems and walk with me to discovery these new offerings.

Qlik: Newer, Bigger, Better?

Qlik: Newer, Bigger, Better?

Originally published in the TEC Blog

During the second half of last year and what has already passed this year, the Pennsylvania-based software company QlikTech has undergone a number of important adjustments, from its company name to a series of changes allowing the company to remain as a main force driving the evolution of the business intelligence (BI) and analytics scene. Are these innovations enough to enable the in-memory software company to retain its success and acceptance within the BI community?

From QlikTech to Qlik

One big shift in the past few months was with the company’s name, from QlikTech to Qlik; though a mainly cosmetic change, it’s still worth being taken into account, as it will enable the software provider to be more easily identified and better branded, and also to reposition its entire product portfolio stack as well as all the company’s services, resources, and communities.

Having a name that is simple to identify, as the biggest umbrella of a product stack that has been growing over time, is a smart move from business, marketing, and even technical perspectives.

Qlik goes Big… Data

A second recent event within Qlik’s realm is the revelation of their strategy regarding big data, something that Qlik had been quietly working on for some time. During a very interesting call with John Callan, senior director of global product marketing, Callan took us through some of the details of Qlik’s recently revealed strategy to help users make use of the company’s big data initiatives. Two starting statements could not state more clearly the role of Qlik, and many other BI providers, in the big data space:

QlikView as the catalyst for implementing big data

This certainly is true, as many new big data projects find their motivation in the data analysis and discovery phases, and it’s also true that an offering like QlikView can lower some of the technical and knowledge barriers when implementing a big data initiative.

The second statement was:

QliKView relieves the big data bottleneck.

According to Qlik, it grants access to a wider number of users, augmenting the potential use of big data and providing implicit benefits—access to a wider number of data sources while at the same time having access to QlikView’s in-memory computing power.

True to its goal of bringing BI closer to the business user, the approach from Qlik is to enable the use of big data and offer a new connection and integration with technology provided by some of the most important big data players in the market: Cloudera, Hortonworks, MongoDB, Google BigQuery, Teradata, HP Vertica and Attivio.

What makes QlikView so interesting in the context of big data is that, being a long-time provider of an in-memory architecture for data analysis and having a unique data association model, it can not only ensure a reliable architecture for a big data analysis platform, but it can also add speed to the process. Plus, QlikView’s data association model, along with its business user orientation, can provide an ease-of-use component, often hard to accomplish within a big data initiative.

So, while QlikView provides for its users all the necessary connectors from their big data partners, it also makes an effort to maintain simplicity of use when dealing with information coming from other more common sources.

On this same topic, one key aspect of Qlik’s approach to big data is the vendor’s flexibility regarding data sourcing; Qlik provides users with three possibilities for performing data exploration and discovery from big data sources:

  1. Loading the information within Qlik’s in-memory computing engine;
  2. Performing data discovery directly from big data sources; and
  3. A hybrid approach, which includes the possibility to combine both previous models, configuring which data should be in-memory and which should be based on direct discovery.

Having this three-pronged approach could prove to be effective for those organizations in initial phases of big data adoption, especially while undergoing initial tests, or those that already require big data services with a certain degree of functionality, but it would be interesting to see if it brings about some difficulties for users and organizations regarding finding the appropriate schema or identifying when and where to apply which approach for better performance and effectiveness.

New “Natural” Analytics

Recently, a blog written by Donald Farmer, VP of product management at Qlik, established what Qlik has been up to for some time now: working towards bringing a new generation of analytics to the market. In this sense, two things seem to be particularly interesting.

First, Qlik’s continuous work towards changing and evolving analytics from its traditional role and state and delivering new ways analysis can be performed, thus improving associations and correlations to provide extensive context. As Farmer states:

Consider how we understand customers and clients. What patterns do we see? What do they buy? How are they connected or categorized? Similarly every metric tracking our execution makes a basic comparison.

These artifacts may be carefully prepared and designed to focus on what analysts used to call an "information sweet spot"—the most valuable data for the enterprise, validated, clarified, and presented without ambiguity to support a specific range of decisions.

Second, regarding providing users with the necessary abilities to, beyond predict, actually anticipate and help discover:

It's not enough to know our quarter's sales numbers. We must compare them to the past, to our plans, and to competitors. Knowing these things, we ask what everyone in business wants to know: What can we anticipate? What does the future hold?

Particularly interesting is how Farmer addresses a core aspect of the decision-making process, which is to anticipate, especially in our modern business world which operates increasingly in real-time, ways to get away from traditional operations in a linear sequence with long time latencies.

Of course, little can be said here about Qlik’s future vision, but we can get a glimpse—Qlik has built a prototype showing its new natural analytics approach and much more in QlikView > next, Qlik’s own vision of the future of BI.

This is a vision in which analysis is carried following five basic themes, to accomplish, according to Qlik, two main objectives: 1) an understanding of what people need, and 2) an understanding of who those people are.

These five themes are:

  • Gorgeous and genius—a user interface that is intuitive and natural to use, but aiming to be productive, improving the user’s visual and analysis experience.
  • Mobility and agility—having access to the Qlik business discovery platform from any device and with seamless user experience.
  • Compulsive collaboration—providing users with more than one way to collaborate, analyze, and solve problems as a team by providing what Qlik call a “new point of access”.
  • The premier platform—Qlik’s vision for providing users with improved ways to provide new apps quickly and easily.
  • Enabling the new enterprise—Qlik aims to provide IT infrastructure with the necessary resources to offer true self-service for their users while easing the process of scaling and reconfiguring QlikView’s infrastructures to adapt to new requirements.

Qlik, Serving Modern BI with a Look Into the Future

It would be hard not to consider Qlik from its inception, as an in-memory computing pioneer in the business space, and Qlik is keeping that pioneer status two decades later, innovative both in backend and forefront design, and able to wear more than one hat in the business intelligence space. An end-to-end platform, from storage to analysis and visualization, Qlik is both adapting to the increasingly fast-paced current evolution of BI and looking into the future to maintain and gain markets in this disputed space.

However, to maintain its place in the industry it will be crucial for Qlik to maintain the pace on the many fronts where QlikView, its flagship product, is front-and-center: business-ready for those small to medium-sized customers, as well as powerful, scalable, and governable for large organizations. These days Qlik is surrounded by other innovation sharks within the BI ocean, so remaining unique, original, and predominant will prove to be increasingly difficult for Qlik and the rest of the players in the space. As with nature, let those more capable of fulfilling their customer’s need survive and prosper.

It comes as no surprise that Qlik is already looking forward to anticipating the next step in the evolution of BI and analytics. Qlik has a brand that stands for innovation, and certainly, Qlik is working to make QliView newer, better, and bigger. It will be really interesting to see how the company’s innovative vision will play out, and if it will gain the same or more traction as Qlik’s previous innovations in the market.

Have a comment on Qlik or the BI space in general? Let me know by dropping a line or two below. I’ll respond the soonest I can.
The BBBT Sessions: HortonWorks, Big Data and the Data Lake

The BBBT Sessions: HortonWorks, Big Data and the Data Lake

Some of the perks of being an analyst are the opportunities to meet with vendors and hear about their offerings, their insight on the industry and best of all, to be part of great discussions and learn from those that are the players in the industry.

For some time now, I have had the privilege of being a member of the Boulder BI Brain Trust (BBBT), an amazing group consisting of Business Intelligence and Data Management analysts, consultants and practitioners covering various specific and general topics in the area. Almost every week, the BBBT engages a software provider to give us a briefing of their software solution. Aside from being a great occasion to learn about a solution, the session is also a tremendous source for discussion. 

I will be commenting on these sessions here (in no particular order), providing information about the vendor presenting, giving my personal view, and highlighting any other discussion that might arise during the session.

I would like to start with Hortonworks, one of the key players in the Big Data space, and a company that has a strong influence on how Big Data is evolving in the IT industry.

The session

In a session conducted by David McJannet and Jim Walker, Hortonworks’ Marketing VP and Director of Product Marketing respectively, BBBT members had the chance to learn in more detail about Hortonworks’ offerings, strategy, and services aimed at bringing Hadoop to the enterprise, as well as to discuss Big Data and its insertion into the enterprise data management infrastructure especially in relation to data warehousing, analytics, and governance. Here are some of the highlights of the session…

About Hortonworks 

Hortonworks is a recently emerged company, but with a lot of experience in the Big Data space. Founded in 2011, it was formed by the original Hadoop development and operations team from Yahoo! Why is this so relevant? Well, because Hortonworks lives and breathes Hadoop, and the company makes a living by building its data solutions on top of Hadoop and many of its derivative projects. And Hadoop is arguably the most important open source software project of all time, or maybe just after Linux.

Hadoop is described on its Web page as follows:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers […].

Hortonworks focuses on driving innovation exclusively via the Apache Software Foundation, producing open source–based software that enables organizations to deal with their Big Data initiatives by delivering Apache Hadoop solutions ready for enterprise consumption. Hortonworks’ mission, as stated in Hortonworks’ presentation title:

Our mission is is to enable your Modern Data Architecture by delivering Enterprise Apache Hadoop.

Hortonworks’ commitment to Hadoop

One of the interesting aspects of Hortonworks is its commitment to Hadoop, in many regards, from the way it handles Hadoop offerings for corporate consumption, to the amount of effort Hortonworks’ team devotes to evolving and enhancing Hadoop’s capabilities. To this point, Hortonworks shared the following graph, in which it’s possible to see the level of contribution of Hortonworks to the famed Apache project in 2013.

Figure 1. List of contributors for Hadoop and number of lines contributed (Source:

In the same vein, the contribution of the Hortonworks team to Hadoop extends across its multiple sub-subprojects—HBase (Hadoop’s distributed data store), Pig (Hadoop’s large data set analysis language), and Hive (Hadoop’s data warehouse infrastructure), among others (Figure 2)—making Hortonworks a hub with some of the most important experts in Apache Hadoop and a strong commitment to its open source nature.

Figure 2. List of contributors to Hadoop and number of lines contributed (Courtesy of: Hortonworks)

Hortonworks’ approach to the business market is quite interesting. While maintaining its commitment to both Hadoop and open source ecosystems, Hortonworks has also been able to:

  1. Package corporate-ready solutions, and
  2. Ensuring strong partnerships with some important software companies such as Microsoft, Teradata, SAP, HP, RackSpace, and, most recently, Red Hat, extending Hortonworks’ reach and influence in the Big Data space and especially into corporate markets.

So what does Hortonworks offer?

Hortonworks clearly says it: They do Hadoop. And what this means is that Hortonworks flagship product—the Hortonworks Data Platform (HDP2)—is an enterprise solution based 100% on the open source Apache Hadoop platform. HDP2’s architecture uses the core set of Hadoop’s modules architected and certified for enterprise use, then includes fully tested and certified versions of Hadoop modules as well as a complete set of professional services provided by Hortonworks for its customers.

Another offering from the company includes Hortonworks sandbox, a Hadoop environment that includes interactive tutorials and the most recent Hadoop developments for learning and testing.

How does Hortonworks fit into an organization?

One of the main concerns of many organizations trying to embrace Big Data is how their Big Data initiative will fit within their existing data management infrastructure. More importantly, the organization needs to evolve its traditional data management infrastructure (Figure 3) so that Big Data adoption doesn’t generate more problems than solutions. Hortonworks is by no means  the only software provider; vendors such as Cloudera and MapR also embrace Hadoop to solve an organization’s Big Data issues, but with a different approach.

Figure 3. A traditional data management approach (Courtesy of: Hortonworks)

Wayne Eckerson explains in The Battle for the Future of Hadoop:

Last November, Cloudera finally exposed its true sentiments by introducing the Enterprise Data Hub in which Hadoop replaces the data warehouse, among other things, as the center of an organization's data management strategy. In contrast, Hortonworks takes a hybrid approach, partnering with leading commercial data management and analytics vendors to create a data environment that blends the best of Hadoop and commercial software.

During the session, aside from the heated debates about whether or not to replace the data warehouse with new information hubs, both David McJannet and Kim Walker confirmed Hortonworks’ position, which consists of enabling companies to expand their existing data infrastructures (in contrast to Cloudera’s approach)—let companies to evolve, without replacing their data management platforms (Figure 4).

Figure 4. Hortonworks expands an organization’s traditional data management capabilities for addressing Big Data (Courtesy of: Hortonworks)

The appealing part of Hortoworks schema is that its Hadoop offerings act as an expansion of the rest of the data repository spectrum (relational databases, data warehouses, data marts, and so on). This makes sense in the context of coupling new data management strategies with existing ones; while Hadoop has proven to be effective for certain tasks and type of data, some problems still need to be handled with the use of “traditional” methods and existing tools. According to Mark Madsen ( What Hadoop Is. What Hadoop Isn’t.):

What it doesn’t resolve is aspects of a database catalog, strong schema support, robust SQL, interactive response times or reasonable levels of interactive concurrency—all things needed in a data warehouse environment that delivers traditional BI functions. In this type of workload, Hadoop doesn’t come close to what a parallel analytic database can achieve, including scaling this workload into the Petabyte range.

Yet Hadoop offers features the database can’t: extremely low cost storage and retrieval, albeit through a limited SQL interface; easy compatibility with parallel programming models; extreme scalability for storing and retrieving data, provided it isn’t for interactive, concurrent, complex query use; flexible concepts of schema (as in, there is no schema other than what you impose after the fact); processing over the stored data without the limitations of SQL, without any limitations other than the use of the MapReduce model; compatibility with public or private cloud infrastructures; and free, or support-only, so a price point far below that of databases.

Hortonworks’ approach is then to enable expansion and evolution of the existing data management platform by offering an enterprise-ready version of Hadoop, one that can be nicely integrated and fill those gaps between the data warehouse and the analysis of huge amounts of non-structured (polystructured) information.

What is Hortonworks for, anyway?

Despite the hype and eagerness about Big Data, many people still don’t have a clear idea about the context and use cases where a Hadoop approach can be useful. Hortonworks showed us a good list of examples of how some of their customers are using Hortonworks. Their current deployments run mainly within the financial services, telecom, retail, and manufacturing industries and expand for applications such as fraud prevention, trading risk, call detail records, and infrastructure investment as well as for assembly-line quality assurance and many other potential uses.

How Hortonworks addresses its customers Big Data needs is by demonstrated by how a customer typically embraces Hadoop in the context of working with increasing volumes of information.

The graph below shows a diagram correlating data (volume) and the value that it can bring to the organization by enhancing an organization’s capability to derive insight.

Figure 5. Described as a “Common journey to the data lake,” Hortonworks shows the relation between data volume and its potential value in the context of addressing specific problems (Courtesy of: Hortonworks)

Another interesting thing about this is the notion of the data lake. Pentaho CTO James Dixon, who’s credited with coining the term, describes it in the following simple terms:

If you think of a datamart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Hortonworks uses Hadoop as the platform to provide a solution for the two main issues that this implies:

  1. A new approach to analytics by enabling the expansion from a single query engine and a deterministic list of questions to a schema-on-read basis, enabling information analysis that addresses polystructured as well as real-time and batch data.
  2. A  means for data warehouse optimization, expanding the boundaries of strict data schemas.

The Hortonworks Data Platform uses the full open source Hadoop platform. It provides an enterprise-ready basis for handling Big Data within an organization, and aims to fit and optimize—not disrupt—the existing data platform (Figure 6). Some of the challenges of Hadoop deployments have been to cope with the often unfriendly environment and technical lack of expertise to handle Hadoop projects properly, especially for hybrid and complex environments mixing and interconnecting both traditional and Hadoop deployments. 

The recent addition of YARN—Hadoop’s recent resource, job, and application manager—within Hadoop 2.0 and its inclusion in Hortonworks’ HDP2-enabled Hortonworks to provide more robust processing platform, which now can work and manage process loads aside from MapReduce, expanding HDP capabilities to managing both MapReduce and external applications and resources more efficiently. The Hortonworks website has a good summary of the use of YARN within HDP

Figure 6. Hortonworks Data Platform General Architecture (Courtesy of: Hortonworks)

Open source software, especially projects based on Hadoop and big data, traditionally has a Linux orientation, so it’s worth mentioning that HDP2 platform is available on both Linux and Windows operating systems.

Hortonworks Data Platform, enterprise tested

During the session, one thing David McJannet and Jim Walker emphasized was Hortonworks’ testing and quality and assurance model, which includes testing HDP directly within Yahoo’s data environment, providing Hortonworks with a vast and ideal testing platform with complex and data-flooded scenarios—a good testing scenario for any data application.

To conclude

I have no doubt that the new breed of solutions such as Hortonworks and others offer impressive and innovative approaches to the analysis and management of complex and big data problems. Clearly, frameworks such as the data warehouse need to adapt to these new conditions or die (I tend to believe they will not die).

Instead, it seems that data warehouse methodologies and platforms potentially have the necessary elements—such as enterprise readiness, methodology, and stability—to evolve and include these new computing paradigms, or at least live within these new ecosystems.

So some of the challenges of deploying Big Data solutions, aside from the natural technological issues, could come from how these new concepts fit within existing infrastructures. They need to avoid task duplication, actually streamline processes and data handling, and fit within complex IT and data governance initiatives, ultimately to procure better results and return of investment for an organization.

Hortonworks takes an approach that should appeal to many organizations by fitting within their current infrastructures and enabling a smooth yet radical evolution of their existing data management platforms, whether via its HDP2 platform or delivered via Hortonworks’ strategic partners. It will be interesting to see what their big competitors have to offer.

But don’t take my word for granted. You can replay the session with Hortonworks—just go to the BBBT web page and subscribe.

Have comments? Feel free to drop me a line and I’ll respond as soon as possible.

BI on the Go: About Functionality and Level of Satisfaction

BI on the Go: About Functionality and Level of Satisfaction

Originally published on the TEC Blog

TEC recently published its 2014 Mobile BI Buyers Guide and a related blog post in which some results from a survey on mobile business intelligence (BI) usage, needs, and trends were discussed. We thought it would be useful to take another look at what was revealed from the survey regarding what’s important for mobile BI users, and of course, how satisfied they are with the mobile BI solutions they work with. Let’s take a look at some of our findings in this regard. Here we will discuss two additional criteria and how they affect mobile BI practices and decision-making: functionality and level of satisfaction.

General Functionality: What Tops the List?

One of the questions we asked mobile BI users in the survey had to do with the functionality they find the most important when using the services of their mobile BI application. From the list we provided, including ad hoc querying, alerting, collaboration, data analysis and discovery, and dashboarding (Figure 1), users were clear that both dashboarding and data analysis/discovery are an essential part of their day-to-day lives with a mobile BI application. It is clear reporting in mobile media is slowly decreasing and leaving space for more data discovery functions.

On the other hand, two things were surprising for me; first, the level of importance that users gave to the alerting functionality above collaboration abilities. Despite the buzz around the importance of collaboration embedded within all types of enterprise software, the ability of a mobile BI application to alert users quickly about any given emergency or contingency is vital for users, especially these days, when acting in real-time is becoming increasingly important for many organizations.

Second, I was surprised that collaboration was positioned in fifth place, while the top places went to more common BI functionality features such as dashboarding, data analysis, and reporting and alerting. It seems that although collaboration is important, users have their priorities clear, and they first and foremost want analysis capabilities and other key tasks in a BI application.

Figure 1. Top functionality (Source: TEC Mobile BI Survey 2014)

Mobile BI Satisfaction Levels: Still Not There Yet?

Another question we asked in the survey refers to how satisfied users are with their mobile BI applications. As Figure 2 shows, despite not showing high levels of dissatisfaction, the survey did indicate a lot of respondents are only “somewhat satisfied,” revealing there’s still a high number of users who are not totally impressed with what a mobile BI solution can do for them. Why is this?

Many things can play into these results, from limitations of mobile BI applications to misconceptions about what a mobile BI application should or should not be able to do, but it seems that in this technological world we live in, mobile is a synonym of innovation and user experience, so users in general are giving broad attention not just to the efficiency of mobile BI applications, but increasingly to the degree of innovation of mobile apps.

Figure 2. General satisfaction level (Source: TEC Mobile BI Survey 2014)

According to an article in Enterprise Apps Today, big business intelligence vendors are not quite satisfying users. The article mentions a Gartner study looking at mobile BI based on its ability to

provide devices with reports and other dashboard content.” The study revealed that mobile BI usage “showed the highest growth among all business intelligence features considered, with 6.2 percent of respondents stating they make extensive use of mobile functionality

And, according to the article,

small independent vendors continue to lead the way on mobile business intelligence. However, mega-vendors and large independents are beginning to gain some ground. That said, they still have a good amount of ground to cover based on the number of them being graded below average.”

While I have also noted that mobile BI is experiencing a level of popularity over other BI features in recent times, our survey gave us a slightly different view, in which customer satisfaction is located mostly in the middle, with the majority of users being very or somewhat satisfied, indicating perhaps that efficiency still makes up a huge portion of what matters for BI users. Of course, many users are waiting for more than that, hoping for the real wow factor that gives them that benefit that the mobile experience might be providing them with through their non-commercial mobile applications, mobile social platforms, and even maybe in other mobile business applications.

Along the same lines, and to make things a bit more interesting, let’s mix these two results together and see what happens (Figure 3).

Figure 3. Satisfaction level vs functionality (Source: TEC Mobile BI Survey 2014)

When looking at top functionality and customer satisfaction together, it is interesting to note several things:

  1. Across the board, dashboarding remains as one of the most important features for performing business intelligence with mobile devices. Across this sample of mobile BI users, in the “not very satisfied” category of users dashboarding seems to be quite popular, perhaps signaling what we mentioned before: users are waiting to see more enriching experiences within their mobile BI applications.
  2. For those users that are “completely satisfied” with their current mobile BI solution, alerting plays an important role within their mobile BI criteria, an essential feature for enabling early issue, risk, or opportunity detection. It is possible that for these organizations having an effective way to receive alerts is key to ensuring successful operation and planning.
  3. It seems users are increasingly expecting to have more features for performing data analysis and discovery; this fact is somehow surprising as I know many business intelligence providers are making big efforts to improve their functionalities in this area.

So, it seems users recognize the importance of three main functional features (dashboarding, data discovery, and alerting) for a reliable mobile BI solution but, still, they have expectations of further evolution of mobile BI functionality in the future.

Functionality and Organization Size: How do They Relate?

In a final exercise, we segmented our respondents according to the size of their organization and their most relevant functional features (Figure 4) and noted some clear differences among different sized organizations.

Figure 4. Functionality vs company size (Source: TEC Mobile BI Survey 2014)

As the graph shows, for very small organizations, the functional interest is distributed relatively evenly throughout all six main functional features, with data analysis and discovery ranking as the most important feature. On the other hand, for corporations it is clear that dashboarding and data analysis/discovery, as well as alerting, all play a major role. This seems to be a good indication that within large corporations efficiency and strong response is extremely important for mobile BI users on staff. On the other hand, for those organizations sitting in the middle (from 250 to 1000 employees), dashboarding is clearly the most important feature, which seems to make sense, as many of these organizations might have a certain level of BI maturity where dashboarding remains key for the decision-making process.   

It is also relevant to notice that while collaboration features, which I personally expected to rank higher, did not display a high level of importance in our survey results, showing that while collaboration is a basic feature for mobile BI applications, other important features in mobile BI are a higher priority to end users.

Where Will Mobile BI Go From Here? 

In this final part of our mobile BI mini-series (in the first part we explored who is using mobile BI offerings and which vendors they are selecting) we have found that despite being an important change agent in the business intelligence space, the mobile BI arena still has a lot of potential and a lot of ground to break.

As organizations on one side (and mobile BI products on the other) mature and grow, the adoption and evolution of mobile BI applications will enable both end-users and vendors to incorporate key functionalities into mobile BI solutions, for example, reinforcing collaboration, making mobile BI customization and configuration more flexible and accessible, and enabling mobile BI to continue changing the way traditional users consume and produce business intelligence and analytics solutions.

But what do you think? Tell us your experience with mobile BI. Drop me a line below and I’ll respond as soon as I can.

Further Reading

BI on the Go . . . So, Who’s Using Mobile BI? (February 2014)
TEC 2014 Mobile BI Buyer's Guide (January 2014)
BI on the Go Infographic (January 2014)
VIDEO: Mobile Business Intelligence in the Enterprise (November 2013)
This Week in the DoT, 03/14/2014

This Week in the DoT, 03/14/2014

As my father use to say, better late than never so, here is a list of things you might want to check including news, humor and more…


In the news:

To read:

To watch:

The Internet of Things: Dr. John Barrett at TEDxCIT

Kinoma Create — The JavaScript-Powered Internet of Things Construction Kit

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Finally, to end your week with a smile:

- Agile Methodology - Applied to Other Fields...
- Big Data Analysis... in the Cloud

Bon weekend!

This Week in the DoT, 03/07

This Week in the DoT, 03/07

Another week, another month, and the year goes by...

Before heading to you local… place of weekend rest, here’s a list of things I've crossed by this week that you might want to check out.

 Have a tremendous weekend!

In the news:

To read:

To watch:

Big Data and the Rise of Augmented Intelligence: Sean Gourley at TEDxAuckland

Teradata and Big Data - from the CTO's Point of View - Stephen Brobst

Influencers on Twitter you certainly need to follow:

  • Carla Gentry (@data_nerd)
  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Mark Smith (@marksmithvr)
  • Merv Adrian  (@merv)
  • Mike Ferguson (@mikeferguson1)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Some Humor:

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry.

In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective.

Machine learning in the context of BI and Big Data analytics

Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data mining or predictive analysis—but also with complex problems in which the algorithms used need to adapt to frequent changing conditions. This is the case for successful applications of machine learning techniques in software applications such as those for spam detection, or those from Amazon to automate employee access control or Cornell for protecting animals.

But the incorporation of machine learning techniques within enterprise software is rapidly expanding to many other areas of business, especially those related to business intelligence or analytics, or in general as part of the decision support framework of an organization. As I mentioned in Part 1, as information collection increases in volume, velocity, and variety (the three Vs of big data) and as business pressures to expedite and decrease the latency of the analysis grow, new and existing business software solutions are incorporating improved ways to analyze these large data sets, taking new approaches to perform effective analysis over large and complex amounts of data sets, but most importantly, furthering the reach of what analytics and BI solutions can do.

As data sources become increasingly complex, so do the means of analyzing them, and the maturity model of the analytics BI platform is forced to accommodate the process and expand to the next level of evolution—and sometimes even revolution—of the decision-making process. So the role of a BI and analytics framework is changing from being solely a decision support companion to a framework that can trigger decision automation. To show this, I have taken the following standard BI maturity model from TEC’s BI Maturity and Software Selection Perspectives report (Figure 1) to show in a simple form some of the pressures that this complexity has on the maturity process. As a consequence, the process is expanded to a double-phase decision-making process, which implies giving the system an increased role in the decision.

Figure 1. Standard BI maturity model is being expanded by complexity of data and processes

The decision phase can happen in two ways: as a supported decision made by users, or by enabling the system to delegate the ability to make a decision to itself, automating the decision-making process based on a previous analysis and letting the system learn and adapt. By delegating the decision to the system, for the process extends the reach of analytics to prediction analysis, early warning messaging, and data discovery.

At this stage we might find more permutations of analytics platforms and frameworks that combine both assisted and automated decisions, ideally increasing the effectiveness of the process and streamlining it (Figure 2).

Figure 2. Standard BI maturity model expands to be able to automate decisions

In this context, due to new requirements coming from different directions, especially from Big Data sources in which systems deal with greater and more complex sets of data, BI and analytics platforms become, most of the time, hubs containing dynamic information that changes in volume, structure, and value over time.

In many cases decisions are still made by humans, but with software assistance to different degrees. In some more advanced cases, decisions are made by the system with no human intervention, triggering the evolution of analytics systems, especially in areas such as decision management, and closing the gap between analytics and operations, which can mean boosting tighter relations between the operations, management, and strategy of an organization.

Opportunities and challenges

The opportunities for implementing machine learning within the context of Big Data, and especially Big Data analytics, are enormous. From the point of view of decision support, it can enhance the complete decision management cycle by

  1. Enhancing existing business analytics capabilities such as mining and predictive which enable organizations to address more complex problems and enhance precision of the analysis process.
  2. Enhancing the level of support for decisions by providing increased system abilities for performing adaptable data discovery features such as detecting patterns, enabling more advanced search capabilities, reinforcing knowledge discovery by identifying correlations, and many other things, much along the same line of what data mining and predictive analytics can do.
  3. Boosting the incorporation of early detection capabilities within traditional or new BI and analytics systems, a key component of modern organizations that want to anticipate or detect short-term trends that might have great impact on an organization.
  4. Enabling the process of enabling a system to perform autonomous decisions, at least at early stages, to optimize the decision process in cases where the application can decide by itself.

Many organizations that already use machine learning can be considered to be exploiting the first level of this list—improving and enabling the analysis of large volumes of complex data. A smaller number of organizations can be considered to be transitioning to the subsequent levels of Big Data analysis using machine learning.

At this point in time, much of the case for the application of machine learning is based on reinforcing the first point of the list. But aside from its intrinsic relevance, it is, in my view, in the area of early detection and automation of decisions where machine learning has a great deal of potential to help boost BI and analytics to the next level. Of course this will occur most probably alongside other new information technologies in artificial intelligence and other fields.

Many organizations that already have robust analytics infrastructures need to take steps to incorporate them within their existing BI and analytics platforms, for example, building machine learning into their strategies. But organizations that wish to leverage machine learning potential may encounter some challenges:

  1. The complexity of applying machine learning requires a great deal of expertise. This in turn leads to the challenge of gaining the expertise to interpret the right patterns for the right causes.
  2. There may be a shortage of people who can take care of a proper deployment. Intrinsically, the challenge is to find the best people in this discipline.
  3. As an emerging technology, for some organizations it still is a challenge to measure the value of applying these types of advance analytics disciplines, especially if they don’t have sufficiently mature BI and Big Data analytics platforms.
  4. Vendors need to make these technologies increasingly suitable for the business world, easing both deployment and development processes.

Despite these challenges, there is little doubt that over time an increasing number of organizations will continue to implement machine learning techniques, all in order to enhance their analytics potential and consequently mature their analytics offerings.

Some real-life use cases

As we mentioned earlier, there are a number of cases where machine learning is being used to boost an organization’s ability to satisfy analytics needs, especially for analytics applied to Big Data platforms. Following are a couple of examples of what some organizations are doing with machine learning applied to Big Data analytics, which surprisingly are tied to solving not complex scientific projects but more business-oriented ones. These cases were taken from existing machine learning and Big Data analytics vendors, which we will describe in more detail in the next post of this series:

Improving and optimizing energy consumption

  • NV Energy, the electricity utility in northern Nevada, is now using software from Big Data analytics company BuildingIQ for an energy-efficient pilot project using machine learning at their headquarters building in Las Vegas. The 270,000-square-foot building uses BuildingIQ to reduce energy consumption by using large sets of data such as weather forecasts, energy costs and tariffs, and other datasets within proprietary algorithms to continuously improve energy consumption for the building

Optimizing revenue for online advertising

  • Adconion Media Group, an important Media Company with international reach, uses software from machine learning and Big Data analytics provider Skytree for ad arbitrage, improving predictions for finding the best match between buyers and sellers of web advertising.

Finding the right partner

  • eHarmony, the well-known matchmaking site uses advanced analytics provided by Skytree to find the best possible matches for prospective relationship seekers. Skytree machine learning finds the best possible matching scenarios for each customer, using profile data and website behavior along with specific algorithms.

This is just a small sample of real use cases of machine learning in the context of Big Data analytics. There is new but fertile ground for machine learning to take root in and grow.

So what?

Well, in the context of analytics, and specifically Big Data analytics, the application of machine learning has a lot of potential for boosting the use of analytics to higher levels, and extend its use alongside other disciplines, such as artificial intelligence and cognition. But the applications need to be approached within the context of machine learning as enabler and enhancer, and must be integrated within an organizational analytics strategy.

As with other disciplines, the success of the implementation of machine learning and its evolution to higher stages needs to be ensured by an organization’s extensive adaptability to business needs, operations, and processes.

One of the most interesting trends in analytics is its increasing pervasiveness and tighter relation with all levels of an organization. As the adoption of new features increases the power of analytics, it also closes the gap of two traditionally separated worlds within the IT space, the transactional and the non-transactional, enabling analytics to be consumed and used in ways that just a decade ago were unimaginable. The line between business operations and analysis is blurrier than ever, and disappearing. The new IT space will live within these colliding worlds with analytics being performing at each level of an organization, from operations to strategy.

In upcoming posts in this series, we will address the machine learning market landscape and look at some vendors that currently use machine learning to perform Big Data analytics. And we will go a step further, into the space of cognitive systems.

In the meantime, please feel free to drop me a line with your comment. I’ll respond as soon as I can.

This Week in the DoT, 02/28

This Week in the DoT, 02/28

Yeap, Thank God is Friday.

And before you go home and hopefully have a relaxing weekend, here is a list of some interesting things that happened in the Data of Things during this week: news, events, tweets and a bit of humor.

These are some relevant things you might want to check...

Snow Boarding in Fernie, Canada by Chris Barton

In the news:

With interesting readings:

Interesting to watch:

Live from Strata 2014: In-Hadoop analytics and IBM's Watson

What Does Collaboration Among Humans and Machines Actually Look Like? Structure:Data 2013

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

And finally, to end your week with a smile:


This Week in the DoT (Data of Things)

This Week in the DoT (Data of Things)

Every Friday, starting today, I will try to post some of what in my view were the relevant events during the week for the Data of Things, including news, videos, etc.

For today, I have a short list of influencers on Twitter — in no particular order — that you might want to follow for all data-related topics. I’m sure you will enjoy their tweets as much as I do:

Claudia Imhoff       @Claudia_Imhoff

Merv Adrian           @merv
Neil Raden             @NeilRaden
Marcus Borba         @marcusborba
Howard Dresner     @howarddresner
Curt Monash           @curtmonash
Cindi Howson         @BIScorecard
Jim Harris              @ocdqblog
Julie Hunt              @juliebhunt

Of course, the list will grow in time. For now, enjoy following this group of great data experts.

Bon weekend!

BI on the Go . . . So, Who’s Using Mobile BI?

BI on the Go . . . So, Who’s Using Mobile BI?

Piggybacking off of the success of the most recent TEC Buyer’s Guide, the 2014 Buyer's Guide on Mobile BI applications, we took the opportunity to survey users of mobile business intelligence (BI) applications and collect their impressions in regard to these tools. Most of the results of this survey with more than 250 respondents were captured in an Infographic. Additional information garnered from the survey, while not conclusive, may provide a glance into the sentiment of the respondents on their use of mobile BI applications. In this post, I’ll describe some of those results, which may be useful for those organizations evaluating a new mobile BI solution for their business needs.

Most popular mobile BI apps

The top 10 mobile BI apps in use are depicted in figure 1. Microsoft takes a clear lead, followed by the other big software powerhouses SAP, Oracle, and IBM. These results are in line with what we would expect considering that most of these vendors already have large sets of BI implementations worldwide and that customers tend to choose mobile BI offerings from their existing BI provider.

Figure 1 shows that more than 8 percent of respondents either still have not implemented or are not using a mobile BI application. This is a relatively large segment of potential BI users, especially considering that most of the existing BI software providers now have mobile BI offerings, suggesting that it’s relatively effortless to put them in place. This apparent avoidance of mobile BI offerings within some organizations may stem from the following:

  • Lack of use case for mobile BI apps
  • Technical limitations to implementing a mobile BI app
  • Budget restrictions

Figure 1. Top 10 Mobile Apps Used by Respondents to TEC’s 2014 Mobile BI Survey

Other mobile BI offerings used in the organizations of the survey respondents come from QlikView, MicroStrategy, and Tableau—all great promoters of mobile offerings in the BI space, which are rapidly increasing their footprint in not only mobile BI, but also the mobile space overall. The remaining mobile BI offerings cited in our survey come the long-time and well-known BI player Information Builders; Infor, a traditional player in the enterprise resource planning (ERP) space which has been growing its BI presence; and, last but not least, an experienced BI player from the open-source community, Pentaho, which now has a robust mobile BI solution covering most, if not all, the aspects of an enterprise BI solution.

Who’s using mobile BI and how?

We also wanted to determine who is using mobile BI solutions and which mobile BI applications. When we took our top 10 provider list and segmented it in terms of the size of the company our survey respondents work for, some results became immediately apparent. Microsoft was by far the most widely used mobile BI solution by companies with 1 to 50 employees, while SAP was the most popular solution among companies with 51 to 100 employees (see figure 2). On the other hand, there seems to be healthy competition between the big four (IBM, SAP, Microsoft, and Oracle) in the large enterprise segment, with increased presence of other players such as Information Builders, MicroStrategy, QlikView, and Tableau.

Figure 2. Top 10 Mobile BI Apps Used by Respondents According to their Company Size (TEC’s 2014 Mobile BI Survey)

Figure 2 shows that the most widely used mobile BI offerings are from Microsoft and SAP regardless of respondent’s company size—from small companies with 1 to 50 employees to large enterprises with more than 10,000 employees. These results may reflect the intense efforts both these vendors have undertaken to evolve their enterprise BI solutions with new mobile technologies and capabilities to enable customers to use mobile more seamlessly.

If we look at the type of industry the respondents’ companies belong to, we can see that the top 10 industries in figure 3. The computer, IT, and software industry takes the lead in usage of mobile BI solutions, a field of course that typically spurs trends in technology adoption. Additionally, business services and consulting and manufacturing are in second and third place, respectively, followed by finance and banking in fourth place. All these industries are in my opinion justified in their need for mobile services, as are many of their lines of business. I have to admit that I was surprised to find hotels and restaurants in the top 10 industries using mobile BI offerings, not because there is no use case for mobile BI in that industry, but because there are other industries which according to previous research appear to be more amenable to the adoption of mobile BI solutions. Examples include utilities and service industries.

Figure 3. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

If we dig a little deeper and look at what mobile BI apps are used by the top 10 industries, we see that Microsoft still leads the pack with dominant presence in the hotels and restaurants industry, and in finance and banking. On the other hand, SAP shows a prominent dominance in hotels and restaurants, and in the finance and banking and manufacturing industries.

It is noteworthy to mention that QlikView, among other vendors, has a strong presence in electronics and high-tech. Organizations in these areas typically have great technical expertise, attesting to QlikView’s technical and functional capabilities.

Figure 4. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

Additionally, Oracle shows mobile BI presence nearly across the board, from the software industry to electronics and banking. Furthermore, the three powerhouses Oracle, SAP, and Microsoft dominate the mobile BI usage in the insurance and finance and banking industries.


Based on the results of our survey on mobile BI usage, we can see that the four main players—Microsoft, SAP, IBM and Oracle—are well positioned in the mobile BI market, pretty much inheriting success from their high-profile BI solutions.

Other vendors such as Tableau, QlikView, MicroStrategy, and Information Builders are rapidly establishing themselves as major BI providers and making their presence known on the mobile BI stage.

Though the information presented in this post can be considered neither conclusive nor extensive, it can be used as a good starting point or basic point of reference for gauging what mobile BI solutions your peers in companies with a similar size as yours and in the same industry are using. This information may be useful before you embark upon the venture of acquiring a new mobile BI solution or replacing the one you already have.

Stay tuned for a second post on the survey, where I will present the most requested mobile BI functionality and the users’ level of satisfaction with their mobile BI offerings.

Link to original article

Machine Learning and Cognitive Systems, Part 1: A Primer

Machine Learning and Cognitive Systems, Part 1: A Primer

IBM’s recent announcements of three new services based in Watson technology make it clear that there is pressure in the enterprise software space to incorporate new technologies, both in hardware and software, in order to keep pace with modern business. It seems we are approaching another turning point in technology where many concepts that were previously limited to academic research or very narrow industry niches are now being considered for mainstream enterprise software applications.

Image by Penn Sate

Machine learning, along with many other disciplines within the field of artificial intelligence and cognitive systems, is gaining popularity, and it may in the not so distant future have a colossal impact on the software industry. This first part of my series on machine learning explores some basic concepts of the discipline and its potential for transforming the business intelligence and analytics space.

So, what is machine learning anyway?

In simple terms, machine learning is a branch of the larger discipline of Artificial Intelligence, which involves the design and construction of computer applications or systems that are able to learn based on their data inputs and/or outputs. Basically, a machine learning system learns by experience; that is, based on specific training, the system will be able to make generalizations based on its exposition to a number of cases and then be able to perform actions after new or unforeseen events.

The discipline of machine learning also incorporates other data analysis disciplines, ranging from predictive analytics and data mining to pattern recognition. And a variety of specific algorithms are used for this purpose, frequently organized in taxonomies, these algorithms can be used depending on the type of input required (a list of algorithms can be found in Wikipedia based on their type).

As a discipline, machine learning is not new. Initial documents and references can be traced back to the early fifties with the work of Alan Turing, Arthur Samuel, and Tom M. Mitchell. And the field has undergone extensive development since that time.

One of the more important applications of machine learning is to automate the acquisition of knowledge bases used by so-called expert systems, systems that aims to emulate the decision making process of human expertise in a field. But the scope of its application has been growing. In Applications of Machine Learning and Rule Induction, Langley and Simon review some major paradigms for machine learning scenarios, all based on a very important premise:

To improve performance on some task, and the general approach involves finding and exploiting regularities in training data.

The major approaches include using neural networks, case-based learning, genetic algorithms, rule induction, and analytic learning. While in the past they were applied independently, in recent times these paradigms or models are being used in a hybrid fashion, closing the boundaries between them and enabling the development of more effective models. The combination of analytic methods can ensure effective and repeatable and reliable results, a required component for practical usage in mainstream business and industry solutions.

According to A Few Useful Things to Know about Machine Learning, while the discipline by itself is far from simple, it is based on a simple (but not simplistic) principle:



  • representation means the use of a classifier element represented in a formal language that a computer can handle and interpret; 
  • evaluation consists of a function needed to distinguish or evaluate the good and bad classifiers; and
  • optimization represents the method used to search among these classifiers within the language to find the highest scoring ones.

As the paper states:

The fundamental goal of machine learning is to generalize beyond the examples in the training set.

This way the system can infer new decisions or correct answers that then serve to increase learning and optimize accuracy and performance.

Also, each component of the machine learning process comprises a good mix of mathematical techniques, algorithms, and methodologies that can be applied (Figure 1).

Figure 1. The Three components of learning algorithms. Source: A Few Useful Things to Know about Machine Learning

In this context, machine learning can be done by applying specific learning strategies, such as:

  • a Supervised strategy to map the data inputs and model them against desired outputs, and
  • an unsupervised strategy, to map the inputs and model them to find new trends.

Derivative ones that combine these for a semi-supervised approach and others are also be used. This opens the door onto a multitude of applications for which machine learning can be used, in many areas, to describe, prescribe, and discover what is going on within large volumes of diverse data.

The increasing presence of machine learning in business, especially for analytics
Thanks to the success of the application of machine learning within certain disciplines such as speech recognition, computer vision, bio-surveillance, and robot control, the interest in and adoption of machine language technologies has grown, particularly over the last decade. Interesting also is how, in many fields, machine learning is escaping its containment from science labs to reach commercial and business applications.

There are several scenarios where machine learning can have a key role: in those systems that are so complex that algorithms are very hard to design, or when an application requires the software to adapt to an operational environment, or with complex systems that need to work with extensive and complex data sets. In this way, machine learning methods play an increasing role not just in general in the field of computer science, but also in terms of enterprise software applications, especially for those types of applications that need in-depth data analysis and adaptability. These areas include analytics, business intelligence, and Big Data.

Why business intelligence and Big Data?

In 1958, H. P. Luhn wrote what perhaps is the first document on Business Intelligence. The abstract begins as:

An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This paper shows the flexibility of such a system in identifying known information, in finding who needs to know it and in disseminating it efficiently either in abstract form or as a complete document.

The premise of BI systems has remained pretty much the same: to collect an organization’s data from disparate sources and process it in the best possible way to produce useful information to—and perhaps this is the most important part—help decision makers to make the best informed decision for the benefit of an organization. A simple definition, not a simple task.

In this regard, Business Intelligence has been adapting and evolving with greater or lesser degrees of accuracy and failure to provide information workers with the ability to make these decisions, and has played a very important role in the decision support platform of many organizations.

This evolution has changed the role of BI systems not just to provide high-level decision support at a strategic level, but to inform an increasing number of areas involved with middle management and operations. It has also has increased the need for BI systems and initiatives to evolve so that they are able to deal with increasingly complex data analysis problems. Applications need to be boosted so that they can deal with larger and complex amounts of data and can not only provide current status results, but also predict, play with hypothetical scenarios, and finally learn to make accurate suggestions—a green field for machine learning (Figure 2).

Figure 2. Some factors triggering the need for faster, better, and improved ways for decision support, analytics, and BI systems

A good model to understand the evolution of BI systems is D. J. Power’s  history of decision support systems, which of course BI is an important part of. According to Mr. Power, decision support systems and applications have evolved in the following stages:

  1. Model Driven. Emphasizes access to and manipulation of financial, optimization and/or simulation models. Simple quantitative models provide the most elementary level of functionality. Use limited data and parameters provided by decision makers to aid decision makers in analyzing a situation, but in general large data bases are not needed for model-driven. 
  2. Data Driven. In general, a data-driven DSS emphasizes access to and manipulation of a time-series of internal company data and sometimes external and real-time data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality. Data warehouse systems that allow the manipulation of data by computerized tools tailored to a specific task and setting or by more general tools and operators provide additional functionality. Data-Driven DSS with On-line Analytical Processing OLAP provide the highest level of functionality and decision support that is linked to analysis of large collections of historical data.
  3. Communications Driven. Communications-driven DSS use network and communications technologies to facilitate decision-relevant collaboration and communication. In these systems, communication technologies are the dominant architectural component. Tools used include groupware, video conferencing and computer-based bulletin boards.
  4. Document Driven. Uses computer storage and processing technologies to provide document retrieval and analysis. Large document databases may include scanned documents, hypertext documents, images, sounds and video. Examples of documents that might be accessed by a document-driven DSS are policies and procedures, product specifications, catalogs, and corporate historical documents, including minutes of meetings and correspondence. A search engine is a primary decision-aiding tool associated with a document-driven DSS. These systems have also been called text-oriented DSS.
  5. Knowledge Driven. Knowledge-driven DSS can suggest or recommend actions to managers. These DSS are person-computer systems with specialized problem-solving expertise. The "expertise" consists of knowledge about a particular domain, understanding of problems within that domain, and "skill" at solving some of these problems.

Within these descriptions there are clear elements in place to boost the adoption of technologies and methodologies such as machine learning—collaboration, intensive management, and increase of non-traditional data (relational). The need for systems to solve complexity coincides with the advent of phenomena such as Big Data and advanced analytics in business giving a natural space for machine learning to make the entrance to help crunching big sets of complex data, and to be part of the increasingly complex machinery in place for data analysis and decision making.

Along with disciplines like data mining, natural language processing, and others, machine learning is being seen in business as a tool of choice for transforming what use to be a business intelligence application approach into a wider enterprise intelligence or analytics platform or ecosystem, which goes beyond the traditional scope of BI—focused on answering “what is going on with my business?”—to give all possible answers to “why are we doing what we’re doing?” and “how can we do it better?” and even “what should we do?”.

As businesses models are becoming more complex and producing massive amounts of data to be handled with less and less latency, decision support and BI systems are required to grow in complexity and in their ability to handle those volumes of data. This demand is boosting the growth of more sophisticated solutions to address specific business and industry problems; it’s not enough to sit out a straightforward result, systems need to provide business guidance.

Some scenarios where machine learning is gaining increased popularity in the context of analytics and BI can be found in applications for risk analysis, marketing analytics, and advanced analytics for Big Data sources.

Machine learning is a reality for business

As Tim Negris states in Getting ready for machine learning:

Despite what many business people might guess, machine learning is not in its infancy. It has come to be used very effectively across a wide array of applications.

And it’s being increasingly adopted within many analytics, Big Data, and business intelligence initiatives in the form of a component laying side-to-side with other analytics solutions, or packaged within a solution that has already adopted it as part of its functional stack.

In either case, machine learning is preparing to be part of the next evolution of enterprise intelligence business offerings.

In the next part of this series on machine learning, I will address some specifics of the usage of machine learning as part of Big Data and Advanced Analytics, as well as its role in the formation of the new so-called area of cognitive systems. In the mean time, please share comments below and let me know your thoughts.

Hello World!

Hello World!

There is a first time for everything... at least, that’s what my father used to say, and sometimes he was right. As I have been blogging for quite some time for my employers or through other channels, I think the time has come for me to have a personal blog that allows me a bit more freedom to explore what might be closer to my personal interest, where I can let go a bit, and include a deeper (or not) and personal view on topics concerning data:

Data in its several forms, with multiple layers, and from many perspectives. From traditional databases to new databases, from small to big data, simple to complex events. Intelligent and not so intelligent data.

Hello to the Data of Things!

I want to start with the iconic Hello World! phrase because it marked one of the most important moments in my career in IT. The phenomenal book written by Brian W. Kernighan and Dennis Ritchie called “The C programming language” was my introduction to the world of C and UNIX, which led, eventually, via a software programming career, to the challenging and awesome experience of data mingling.

Brian Kernighan playing tribute to Dennis Ritchie at Bell Labs

Data has become a fundamental material for almost all human activities in our lives, and as this presumably will not change and by the contrary will be reinforced, we need to think about data as a key driver in current and future human life. This blog will be devoted to talk about data, the technology, and the people who work with it, from its source, its processing, and movement, to its destination. People are changing our lives by using data in a unique or special way.

So, dearest reader, this blog is devoted to the Data of Things, from data sources and targets, the technologies involved, and those who produce it, use it, and manage it, … and maybe more.

A huge chunk to bite off, I know, but a delicious one, too. :)

Of course, do not hesitate to comment, discuss, and make this blog live… You just need to use the comment space below to start the conversation.

Privacy Policy

Copyright © 2019 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa