Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds.
—R.D. Laing

Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations?

Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control.

Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next generation of BPM applications and practices?

BPM Effectiveness Via Automation

Effective business process management software could help you keep track efficient and accurate of your business processes.Mihai Badita, senior business analyst at UiPath, a software company that offers solutions for automating manual business processes, said, “We estimate that around 50 to 60 percent of tasks can be automated, for the time being.�

This is a bold but not unexpected statement from a relatively new company that appears to rival established robotic process automation software companies such as Blue Prism, Automation Anywhere, and Integrify—the latter offering an interesting workflow automation solution that can automate the process of collecting and routing requests—as well as market-leading BPM software providers such as Appian and Pegasystems. According to the Institute for Robotic Process Automation (IRPA), process automation can generate cost savings of 25 to 50 percent and enable business process execution on a 24/7 basis, 365 days a year.

Aside from the obvious effects that automation might have on business processes, such as cost savings and freeing up time and resources, business process automation can help many organizations address repetitive tasks that involve a great deal of detail. Many delays during business process execution are caused by these manual and repetitive tasks, and bottlenecks can arise when decisions need to be made manually. Such processes could be automated and executed entirely without human intervention.

Process robots are a set of specific software modules capable of capturing information from different systems, manipulating data, and connecting with systems for processing one or multiple transactions. Of course, it’s important to consider the role of effectively training these process robots—including programming and implementing them—to ensure efficiency and precision, making sure business rules are well-defined even before this training to ensure success of the automation strategy.

There are indications that automation will grow in the BPM arena in the coming years, with the incorporation of improved advanced machine learning techniques and artificial intelligence algorithms.

BPM Pervasiveness Through Mobility, Development, and the Cloud

Mobile technology greatly impacts business process management.Mobile technology affects perhaps no other component of the enterprise software stack as strongly as BPM. The first mobility goal of every organization has been to enable employees involved in all stages of every business process to operate independently and unrestricted by location and time. A user with a new purchase order to submit, confirm, or authorize should be able to do so using a mobile device no matter where he or she is located or what time is it.

To address security and privacy concerns and to meet specific governance and business requirements, companies realize it is imperative to take this effective yet simple solution-mobile app interaction schema to a next level of integration.

Organizations are recognizing the need for increased enterprise software integration of BPM routines at all levels, and as a result they are taking a wider approach to mobile adoption. Many organizations are taking further steps to develop and deploy custom mobile solutions, and many if not all of those deployments involve business process improvements and the ability to integrate with the rest of the enterprise software stack. A study from MGI Research notes that, at the time of the study, 75 percent of all companies reported a mobile apps development cycle of nine or less months.

With this trend, many BPM software providers are already offering customers the ability to accelerate the development of mobile and custom process-oriented applications with development tools that can either avoid or minimize the need for coding. They can also offer visual and modular components to accelerate speed of development with different degrees of integration and compliance with internal IT regulations for security, privacy, and governance. To mention just a couple, companies such as South Africa-based K2 and former French company W4, now part of Itesoft,  have developed capabilities well beyond traditional BPM features for modeling and executing business processes, to allow organizations to develop fully customizable process-oriented business applications.

Another component for the provision of pervasive business process has to do with the development of process-oriented applications with a high degree of integration with the different set of systems of record—for example, ERPs, CRMs, and others—to effectively improve the way users move across business processes and interact with existing systems. Companies such as Kofax, with its process automation offerings, aim to enable organizations to develop so-called smart process applications (SPAs), focused on process-based applications which can be well-integrated with existing systems, as well as embedded to work seamlessly in different operating and platform environments, providing the ability to execute business processes from the user’s platform of choice and device, thus preserving data accuracy and consistency across platforms.

Other important factors of a more pervasive BPM framework have to do, respectively, with the integration of BPMs mobile capabilities within larger corporate mobile strategies and solutions, including enterprise mobile management (EMM) or enterprise mobile application development platforms (MADPs) and, of course, the adoption of corporate business process management in the cloud.

Interestingly, some BPM providers are rapidly increasing their ability to incorporate more control and management capabilities to mobile app environments, such as improved security and role administration. Without being a substitute for the previous solutions mentioned, this can be an effective first step in encouraging corporate BPM apps development.

With regards to cloud adoption, aside from lower costs and faster return of investment already discussed, the possibility that specialized service providers can take care of the development and administration of a reliable and secure environment can, within many organizations, encourage rapid and effective development of mobile and embeddable process-oriented applications.

Not BI Versus BPM, But BI and BPM

Software companies have now realized Business intelligence also need to be process-oriented. A sample case of this new direction can be sampled when Swedish enterprise software provider IFS acquired a company called VisionWaves. VisionWaves, now IFS Enterprise Operational Intelligence(EOI) offering is an interesting product that aims to offer organizations a wide view of the state of an organization, via a corporate cockpit that combines views and analysis of process and performance within a single environment.

This signals an increasing interest in process and performance within the software industry. The need for information and the speed of business makes operations data analysis operate at different paces, thus creating silos that work at a different pace and sometimes even make things difficult to understand.

Some organizations are realizing that as the use of analytics becomes more important, its effectiveness and impact depends on its ability to collaborate within actual decision making at all levels. The need for information never wavers—its value remains and even increases—but the need for collaboration, process control, and performance monitoring also increases at a point when risk mitigation, opportunity identification, and actual informed decisions are to be made.

In order to improve business operations through the use of analytics, business intelligence (BI) needs to be naturally process-oriented, embedded within a user’s operational environment to provide collaboration and synergy and be, of course, efficient and fast enough to provide information in real-time.

Interesting methodology from Vitria with its operational intelligence approach, Kofax’s process and intelligence analytics, and Salient with its Collaborative Intelligence Suite—these all aim to provide users with analytics that can effectively give users a centric process-data view approach, infusing analytics right in the trenches of business operations.

Last but not least, something worth mentioning—and that in my view has great potential for improving the synergy between BI/analytics and BPM—has to do with recent efforts and developments within the decision-making process of an organization. This includes the recent announcements of the publication of the Decision Model and Notation (DMN), an industry standard modeling notation for decision management and business rules by the Object Management Group (OMG).

Widespread use of more formal methods for decision management can certainly have a big impact in the way organizations design the use of analytics that are directly involved in decision making at different levels and aspects of an organization, to gain control, measurement, and business operations effectiveness.

Conclusions—BPM for the Future

Never before has there been such an accumulated effort—from vendors incorporating new technology within BPM solutions, to user and professional groups modernizing BPM practices—to increase operation efficiency in organizations. Still, the challenges remain—achieving effective collaboration and communication of insights, obtaining an efficient analytical view of the entire organization, and closing important operational gaps, including those between technology and business.

As we noted in the beginning of this look at business process management and automation, the range of what we think and do is limited by what we fail to notice. There is also a lot of value to be unveiled within processes, if we optimize them properly and take advantage of the tools available to us.

(Originally published in TEC's Blog)

SAP Data Hub and the Raise of a New Generation of Analytics Solutions

SAP Data Hub and the Raise of a New Generation of Analytics Solutions

“Companies are looking for a unified and open approach to help them accelerate and expand the flow of data across their data landscapes for all users.

SAP Data Hub bridges the gap between Big Data and enterprise data, enabling companies to build applications that extract value from data across the organization, no matter if it lies in the cloud or on premise, in a data lake or the enterprise data warehouse, or in an SAP or non-SAP system.�

This is part of what Bernd Leukert, SAP’s member of the executive board for products & innovation mentioned during SAP’s Big Data Event held at the SAP Hudson Yards office in New York City as part of the new SAP Data Hub announcement and one that, in my view, marked the beginning of a small yet important trend within analytics consisting on the launch or renewed and integrated software platforms for analytics, BI and data science.

This movement, marked by other important announcements including Teradata’s New Analytics Platform as well as IBM’s Integrated Analytics offering marks another step directed towards what appear a movement to a new generation of platforms and a consolidation of functions and features for data analysis and data science.

According to SAP, the motivation for the new SAP Data Hub solution offers customers a:

  • Simpler, more scalable approach to data landscape integration, management and governance
  • Easier creation of powerful data processing pipelines to accelerate and expand data-centric projects
  • Modern, open architecture approach that includes support for different  data storage systems

One way SAP aims to achieve this with its Data Hub solution is to create value among all the intricate and diverse data management process that goes from data collection, passing through integration and transformation, to its preparation for generating insight and action.

To increase efficiency for all management stages including data integration, data orchestration and data governance the new SAP Data Hub creates “data pipelines� to accelerate business results, these all, coordinated under a centralized “Data Operations Cockpit�.

For what we can see, SAP aims to let the new solution emerge as the ideal data management platform for the rest of the SAP analytics and BI product stack ーincluding neat integration with SAP HANA and the ability to  take advantage of solutions like SAP Vora, SAP’s in-memory, distributed computing solutionー as well as with core Big Data sources including Apache Hadoop and Apache Spark (see figure below).

SAP’s Data Hub General Architecture (Courtesy of SAP) 

SAP Data Hub’s data pipes can access, process and transform data coming from different sources into information, to be used along with external computation and analytics libraries including Google’s TensorFlow.

Another interesting aspect of the new SAP Data Hub is that aims to provide an agile and easier way to develop and deploy data-driven applications, allowing via a central platform to develop and configure core data management activities and workflows to fasten the development process and speed results.

Key functional elements included within the new platform include:

Some of SAP’s Data Hub Major Functional Elements (Courtesy of SAP)
According to SAP this new solution will become, along with SAP Vora and SAP Cloud Platform Big Data Services a key component of SAP's Leonardo digital innovation system.

Analytics and BI on the Verge of a New Generation

As companies witness how their data landscapes grow and become more complex, new solutions are taking over the new analytics landscape and, as this has been pushed in great measure by new companies in the likes of Tableau, Qlik, or Dataiku to name just a few.

It seems now big software powerhouses are pushing hard to come with a new generation of tools to consolidate their data management and analytics offering.

With this, is not difficult to foresee a new competitive arena in a race to gain the favor of a totally new generation of data specialists, one I’m eager to keep track of, of course.

In the meantime take a look below at SAP’s Data Hub intro video and get a glimpse of this new solution.

Of course, please do let me know if you have a comment or feedback, lets keep the conversation.

* All logos and images are trademarks and property of their respective owners
Data & Analytics with Maple Flavour: Canadian Data & Analytics Companies. Part 2

Data & Analytics with Maple Flavour: Canadian Data & Analytics Companies. Part 2

In a continuation of my tour across the Canadian data management and analytics landscape started in Part 1, I will now describe a new group of companies from both ends of this great country that have incorporated “state-of-the-art� data and analytics technologies to their solutions

These companies, many of them startups ascending in the market ladder and  changing dramatically not just Canadian market but the global market with the introduction of innovative solutions in many key areas of the data management space, ranging from data visualization to advanced analysis and data warehousing.

So, here a complementary list of Canadian data solutions:

Solution(s): CrowdBabble

Crowdbabble is a social media analytics company from Toronto at the Ryerson Futures Technology Accelerator in the DMZ, that aims to help marketers to eliminate the complexities and time involved in the analysis performed to tie social media activities to business outcomes

With a Software as a Service (SaaS) platform Crowdbabble customers can measure, benchmark and optimize their social media performance.

With users in 450 cities around the world, including top customers, the company leverages a platform that enables customers to drill down and dig deeper to figure out, according to CrowdBabble, the “why� behind their social media performance strategies and tell a better story.

Some features offered by Crowbabble include:
  • 1-Click export for charts to enable fast downloads of any chart as an image to be inserted within an MS PowerPoint or Keynote presentation
  • Visual social media performance monitoring
  • Growth tracking of key metrics
  • Drill-Down for in-depth analysis into the details of data and identify the drivers of social media performance
  • Optimizes social media content to  learn which content works best for the audience,  comparing the performance of posts
CrowdBabble’s Screencap (Courtesy of CrowdBabble)

Solution(s): Envision5

ENVIRONICS Analytics is another company from Toronto with a fresh look at how analytics is being done.

With its latest edition of ENVISION5, its platform for providing business intelligence (BI) on customers and markets from anywhere in ​North America the company offers an easy-to-use and powerful cloud-based platform with a complete package set of geo and segment-based routines to provide customer insights, site evaluation and media planning as well as a large set of consumer data.

Some features offered by ENVISION5 include:

  • A web-services architecture
  • A responsive design that makes ENVISION5 tablet and mobile compatible
  • A workflow engine to enable users to define processes to follow a suggested path for importing data, analyze trade areas or create maps to locate promising prospects and develop marketing campaigns at the national, regional and local levels
  • Geographic and location capabilities to allow users to geocode and map the location of their customers and create reports to better understand who they are, what they spend money on or how they consume media
  • Capabilities for sharing data and share results across the organization using dashboards and micro-sites
  • The ability to enable users to create target groups from any customer file, whether the targeting is based upon life stage, assets, language spoken at home or views regarding technology
ENVISION5’s Screencap (Courtesy of ENVIRONICS Analytics)

Solution: Map4Decision, Map4Web

With more than 20 years of experience in the field, this company from Quebec delivers cutting-edge technology in the fields of geomatics and spatial BI systems. Originated from the team’s work at the Centre de Recherche en Géomatique from Université Laval, Intelli3 develops Map4Decision and more recently Map4Web —Intelli3‘s SaaS offering— to deliver high quality geo-data analysis to analysts and business users.

Through a quick to deploy and easy to use set of solutions involving on-demand map production, Intelli3 aims to lowering the level of uncertainty of analysis results involving geomatic and map production.

Some functional highlights included within Map4Decision and Map4Web include the ability to:

  • Explore aggregated views of information
  • Interactively drill-down to achieve more detailed views (e.g., province, region, city)
  • Dynamically intersect various themes of interest (e.g., time and territory)
  • Obtain instantaneously statistical charts, diagrams and thematic maps
  • Automatically create maps conforming to customer’s and official rules for visual communication and semiology (colors, symbols, patterns, etc.)
  • Navigate from one visualization type to another (e.g., statistical charts to multi-maps).
Map4Web’s Screencap (Courtesy of Intelli3)

Solution(s): Kaypok Insight, Kaypok Briefly 

Ontario- based company Kaypok is a company devoted to the development of enterprise unstructured text analytics solutions. Kaypok’s technology analyzes data regardless of source, this includes social media feeds, customer surveys, email, blogs and internal proprietary data and others.

According to the company: Kaypok’s high-performance algorithm  processes the noisy, unstructured information extracting usable knowledge and insights about what people are saying, sentiments and the root information elements driving analytics.

With a combination of two solutions —Kaypok Insight and Kaypok Briefly— the company targets the two major aspects of text and social data analysis:
  • Kaypok Insight provides content analytics technology whether it is an external unstructured data or internally residing enterprise application logs.
  • Kaypok Briefly allows users to analyze large numbers of textual content, aggregate data from different resources including RSS feeds, Google alerts, Blogs and Forum, filter, summarize, sort and share the content as well as find the most negative/positive articles immediately.
Kaypok’s Screencap (Courtesy of Kaypok)

Kaypok offerings are available in both Software as a Service (SaaS) and as an integrated enterprise model and compatible with various big data platforms as well as available for working on desktop and mobile devices.

Solution: Klipfolio

Klipfolio is the company behind the eponymous cloud-based application for developing and deploying real-time business dashboards to be used on many types of devices including web browsers, TV monitors and mobile devices.

The Ottawa based company claim it can virtually connect to any data source, on-premises or in the cloud: From web services, to files stored a computer, server, or in a data warehouse.

Klipfolio’s simple and flexible data architecture enables data sources to live outside the platform and allows users to create connections with data sources and allowing defining which portions of data to connect and to be pull into the application, as well as its load frequency.

From there, users can easily and quickly add pre-built data visualizations and dashboards, build data visualizations from scratch, or edit pre-built data visualizations and dashboards.

Some major functional features from Klipfolio include:
  • Support connection to over 100 cloud applications including Facebook, Twitter, Moz, Pingdom, Salesforce, Marketo, Google Analytics, Google Adwords, Xero, HubSpot and others, as well as different web services
  • Connection to local and server Excel, CSV and XML, FTP, SFTP files as well as data from DropBox, Box, and Google Drive file sharing services
  • Connection to all major database management systems including MSSQL, MySQL, Oracle Thin, Oracle OCI, Sybase SQL Anywhere, PostgreSQL, Firebird and DB2
  • Multiple ways to upload computer files including pulling the data from web services like Facebook and Google Analytics, pushing the data in from and API, as an email attachment or access the data from databases and servers
  • Share visualizations and dashboards within the organization or externally as well as define the periodicity of updates.  
Klipfolio’s Screencap (Courtesy of Klipfolio)

Solution(s): KNOMOS

West coast (Vancouver) company KNOMOS employs modern app design principles grounded in the user experience to deliver software tools for the legal industry. From law students learning in the classroom, to lawyers better serving their clients, and engaged citizens managing their practical legal affairs.

Focused on technology-driven solutions, KNOMOS has developed a data driven solution to provide effective search, analysis and visualization of legal information. Built for law students, lawyers and engaged citizens managing their practical legal affairs, KNOMOS a single access point for legal information and the tools for its management and analysis.

Some important features offered by KNOMOS’ data solution include:
  • A single access point for Federal & British Columbia (BC) laws, regulations and cases all accessed within an interactive visual navigation interface
  • A dual search display to provide an instant overview, with the size of visual search results from matches clustered by document type and relevant text previews
  • Visual navigation capabilities to help identify key information in context of a legal source’s structure, including frequency heat maps for keyword search results in a law or case
  • Citation heat maps to display the frequency of cross-references between legal sources including when a law cites, or is cited by, another law or a case, along with unique color coding for incoming & outgoing links
  • A centralized location to organize and save all content, personal notes, favorites, tags, and highlighted texts for future reference
  • Ability to pinpoint citations and link documents at the specific paragraph or section level with direct access to related content
  • Ability to filter and sort personal annotations
KNOMOS’ Screencap (Courtesy of KNOMOS)

Solution(s): Mnubo SmartObjects

Internet of Things (IoT) and artificial intelligence (AI) developer company Mnubo is an innovative company from Montreal. Mnubo delivers out-of-the-box insights, automated reports and advanced IoT data science solutions.

It offers a SaaS solution to enable product manufacturers to connect their products with its platform to ingest, enrich and analyze their product’s generated data.

Mnubo’s SmartObjects offering is a complete SaaS solution developed to avoid long roll-out plans, extensive IT resources or additional development skills with an approach to serve customers in consumer, enterprise and industrial verticals.

Major functional feature of Mnubo’s SmartObjects include:
  • Big Data storage and archival
  • Data cleanup & enrichment
  • Rich, flexible API’s that integrate with fully documented JSON REST APIs that include an advanced query language
  • Data visualization & reporting features to access pre-built dashboards and reports or create new customized dashboard without additional coding
  • A hosted and managed solution with a multi-tenant solution available on multiple cloud environments (e.g. Azure, AWS, Google, etc)
  • Plug & Play features to eliminate months of software integration and machine learning models training
  • An integrated view of data to enable real-time data is delivered to the appropriate stakeholder
  • Supports self-service to query at scale users big data repository of sensor data and enable ROI-driven insights
  • Out-of-the-box Insights to understand operations, faults, product usage, customer engagement/churn, etc. fast and easy
  • Security features including OAuth2 authorization framework, HTTP secure JSON REST API, data encryption at rest, no personally identifiable information (PII) stored
  • Cloud platform neutral design to support AWS, Azure, Google, and other cloud providers
Mnubo’ Screencap (Courtesy of Mnubo)

Solution(s): Nexalogy Free, Nexalogy Enterprise

Founded by astrophysicist Claude Théoret along with an experienced team, Montreal based company Nexalogy applies Claude’s algorithms and developed technology for studying black holes and how stars interact between them, to analyze connections between words and the people who write them throughout the social Web.

Consequently, Nexalogy can reveal undiscovered risks, opportunities, and hidden intelligence. Nexalogy’s cloud-based solution provide different social media intelligence services to companies of all sizes and industries to make better decisions.

Some capabilities and services offered by Nexalogy includes:
  • A scalable distributed system behind the analysis and data management
  • Data collection capabilities from many data sources around the web
  • Reports and visualization capabilities
  • The ability to routinely process millions of social media posts
  • Algorithms to identify themes, people, relationships, topics and content
  • Easy visual interaction with all social data including posts, topics and influencers via a set of dynamic visualizations.
Nexalogy's Screencap (Courtesy of Nexalogy)

Solution(s): PHEMI Central

PHEMI is a big data warehouse company from Vancouver that with its PHEMI Central solution allows organizations to easily access and mine data from any variety or volume.

PHEMI Central is, according to the company:

“a production-ready big data warehouse with built-in privacy, data sharing, and data governance. PHEMI Central delivers the scalability and economics of Hadoop with indexing, cataloging, fine-grained access control, and full life cycle, enterprise-grade data management.�

Built on Hadoop, PHEMI Central Big Data Warehouse aims to unlock data silo-ed and make it available for analytic and operational applications.

With the incorporation of Big Data technology, PHEMI allows users to scale to petabytes of data with cluster economics, additionally PHEMI Central adds simplified deployment and out-of-the-box operations, as well as the ability to integrate immediately with existing data sources and analytics tools.

Main features of PHEMI Central include:
  • Availability of the solution on-premises or as a managed service Amazon, Microsoft, Oracle, or ClearData HIPAA Compliant clouds
  • Ability to integrate with most leading analytics tools including:  Tableau, Qlik, Power BI, R, SAS and SPSS
  • Availability of data processing functions including: Excel/CSV Reader, Genomics Reader, (VCF/gVCF), JSON Reader, XML/HL7 Reader, Custom DPFs
  • An strong emphasis on security so that:
    • PHEMI’s access control strategy takes into account both and user attributes and data characteristics
    • Metadata and user attributes are brought together into simple yet robust rules to indicate who can see and do what with the data
    • A policy-based enforcement so access control is implemented automatically and uniformly
    • On site processing to provide the ability to securely process data so it can be presented as different views to different users based on their authorizations, without a person having to intervene
PHEMI Central Architecture (Courtesy of PHEMI)

Solution(s): RubiCore, RubiOne, Promotion Manager, Lifecycle Manager

This Toronto-based startup has been able to work with more than a dozen global, multi-billion dollar retailers and investors include Horizons Ventures, Access Industries, and the MaRS Investment Accelerator Fund.

Rubikloud’s big data architecture aims to gather retailer data from online and offline consumer behavior and use it to help retailers gain insight into their preferences: product affinity, price sensitivity among others to enable better demand prediction and forecasts.

The Rubikcloud data platform includes a series of solutions to provide easy to use yet effective analysis and data management functions for retailers to help them take control over their data and improve decisions with data.

Some key features coming from Rubikloud include:
  • The ability to discover the connections and insights within internal data
  • Collaboration capabilities to share work with other data scientists and analysts
  • The ability to deploy and manage multiple users through a built-in authentication service
  • Capabilities to develop proprietary models, at scale
  • The ability to connect models in pipelines to deploy multi-stage systems into production
  • The possibility to incorporate or benchmark against Rubikloud’s own models, trained on several years of retail data
  • A set of visualization libraries to compare and monitor the performance of models
  • Ability to gain a complete view into historical performance
  • Functionality to forecast the outcomes of promotional decisions before taking in-market action
  • Fine-tuning capabilities for pricing and promotional strategy
Rubikloud Screencap (Courtesy of Rubikloud)

Solution(s): ThinkCX

ThinkCX is a company devoted to deliver advanced automated, massive-scale analytic solutions for detection of consumer switching events in the smartphone OEM and wireless carrier marketplace across North America.

The company, from Langley, B.C. uses a patented machine learning solution that locates and confirms millions of device and carrier switching events yearly which then are used to put to work a series of analytical models.

Some key advantages of ThinkCX’s cloud-based platform considers:
  • Offering a complete Market View to, in addition to the internal churn activity, ThinkCX can provide similar insights about the competition’s subscribers
  • The platform uses commercially available external data as its only inputs, so no requirement for integration with a CRM is required
  • Simple deployment, to enable solutions to be deployed in minutes, with no heavy lifting required from IT and business teams
  • ThinkCX’s carrier and device insights can be delivered via a custom dashboard or integrated into Adobe Marketing Cloud solutions or DCM.

A final Note

As I mentioned in the first part, it is possible I’m still leaving some companies out, so please feel free to use the comment space to let me know your comments or the name of new Canadian analytics solution we all should know about.

P.S. As a note to Kevin Smith, I’ve decided to leave Keboola out of this group but, I will include them in another post devoted to data companies from Europe, as Keboola is actually based actually in the Czech Republic.

* All logos and images are trademarks and property of their respective owners

Oracle 18c Goes for Database Automation in the Cloud

Oracle 18c Goes for Database Automation in the Cloud

In what was probably the most important announcement made during 2017 ’s version of Oracle’s OpenWorld conference, the company announced the release of version 18c of its worldwide known database management system which includes two key features: to be a fully automated.

Oracle’s founder and CTO Larry Ellison made the announcement of the autonomous database, which includes database and cyber-security automation because, according to Mr. Ellison, “human processes stink�.

According to Oracle, the autonomous database will practically eliminate all human intervention associated with all database managing activities like tuning, patching, updating and maintenance by including major capabilities:

  • Self-Driving: Provides continuous adaptive performance tuning based on machine learning. Automatically upgrades and patches itself while running. Automatically applies security updates while running to protect against cyber-attacks.
  • Self-Scaling: Instantly resizes compute and storage without downtime. Cost savings are multiplied because Oracle Autonomous Database Cloud consumes less compute and storage than Amazon, with lower manual administration costs.
  • Self-Repairing: Provides automated protection from downtime. SLA guarantees 99.995 percent reliability and availability, which reduces costly planned and unplanned downtime to less than 30-minutes per year.

To achieve it, the new autonomous database has integrated applied machine learning techniques to deliver without human intervention, self-driving, self-tuning, self-recovering, and self-scaling management capabilities which aims to streamline operations and provide more efficient consumption of resources as well as higher security and reliability.

But first... the Data Warehouse

Oracle’s autonomous database service can handle different workload types including transactional, non-transactional, mixed or graph and IoT workloads yet, while the automated OLTP version is scheduled to be available by June 2018, Oracle’s first autonomous database service will be directed to data warehouse workloads, planned to be available 2017.

Much as like all their services, the design of Oracle’s Autonomous Database Cloud Service for Data Warehouse relies on machine-learning to enable automatic tune and performance optimization. By using artificial intelligence and machine learning, Oracle aims achieve autonomous control to offer reliability, performance and highly elastic data management services as well as to enable fast deployments that can be done in seconds.
According to Oracle, some features to be offered by the new service include capabilities to:
  • Execute high-performance queries and concurrent workloads with optimized query performance and with pre-configured resource profiles for different types of users
  • Deploy highly elastic pre-configured compute and storage architectures to instantaneously scale up or down, avoiding overpay for fixed blocks of resources
  • Integrate Oracle SQL DWCS all business analytics tools that support Oracle database
  • Make use of its built-in web-based Apache Zeppelin based notebooks
  • Deploy a self Driving fully automated database for self-tuning patch, upgrade itself while the system is running
  • Take advantage of its database migration utility dedicated cloud-ready migration tools for easy migration from Amazon AWS Redshift, SQL Server and other databases
  • Perform cloud-based scalable data-loading from Oracle Object Storage, AWS S3, or on-premises
  • Deploy under an enterprise grade security schema on which data is encrypted by default in the cloud, as well as in transit and at rest
The new Oracle autonomous database cloud service for data warehouse aims to eliminate manual configuration errors and ensure continuous reliability and self-correction, It also includes, according with Oracle unlimited concurrent access and an advanced clustering technology to enable organizations to scale without any downtime.

With the inclusion of this service, Oracle is expanding its data warehouse software stack portfolio, expanding its services for both on-premises and cloud platforms and with different data services, aiming to reach a greater number of organizations each with different data warehousing management needs and complexities such is the case for existing data warehouse services available within Oracle Exadata, Exadata Cloud, and now the autonomous database cloud service.

The Rise of the Automated Database?

The ideal to achieve full database automation is not new and many, if not all, software vendors have made important efforts to automate different aspects of a database administration cycle —examples include Teradata and Attunity for automating data ingestion and data warehouse or those efforts made by third party software providers like BMC with BladeLogic Database Automation— and yet, until now full automation seemed to be an impossible task.

One main reason is that database automation involves not just the ability to achieve automation for common repetitive database configuration tasks including those involved with initial schema and security configuration but much more complex tasks including database tuning and performance monitoring which requires the ability adapt to changing conditions and require the system’s ability to learn and adapt.

The evolution of machine learning, artificial intelligence and cognitive computing technologies is certainly making this automation efforts possible and of course, Oracle deserves significant credit for embracing these technologies and taking a step further and aiming to achieve fully database automation.

As we should expect, it will not take long for other software providers to join the race and join the ranks of vendors offering fully automated database solutions, so as a cautionary message, it will be critical, in my view, to start by making comprehensive assessments of these solutions capabilities and accuracy before rushing to push the automatic pilot button and get rid of your DBA’s just yet.

You might realize it will take some time before you can lower your IT footprint.

Comments? Let me know your thoughts

IBM’s Integrated Analytics System Joins the Ranks of Full Powered Analytics Platforms

IBM’s Integrated Analytics System Joins the Ranks of Full Powered Analytics Platforms

As we get deeper into an era of new software platforms both, big players and newcomers are industriously working to reshape or launch their proposed new-generation analytics platforms, especially aiming to appeal to the growing community of new information workers or “data scientists� ㅡa community always eager to attain the best possible platform to “crunch the numbers�ㅡ, examples include those including Teradata with its new analytics platform or Cloudera with its Data Science Workbench.

So now the turn is for IBM, which recently unveiled its Integrated Analytics System. IBM’s new offering represents the company’s unified data system aimed to provide organizations with easy, yet sophisticated platform for the development of data science within data from on-premises, private, public of hybrid cloud environments.

The new offering coming from the “Big Blue� company is set to incorporate a myriad of data science tools and functionality features as well as the proper data management processes for developing and deploying advanced analytics models in-place.

The new offering aims to allow data scientists to easily perform all data science tasks, including moving workloads to the public cloud to begin automating their businesses with machine learning easily and rapidly.

The system is built on the IBM common SQL engine to enable users can use a common language and engine across both hosted and cloud-based databases allowing them to move and query data across multiple data stores, including Db2 Warehouse on Cloud, or Hortonworks Data Platform.

According to IBM, the Integrated Analytics System, the product team has developed the platform to blend and make the system work seamlessly work with IBM’s Data Science Experience, Apache Spark and the Db2 Warehouse on Cloud, where:

  • The Data Science Experience is set to provide the set of necessary critical data science tools and a collaborative work space
  • Apache Spark set to enable in-memory data processing to speed analytic applications
  • Db2 Warehouse on Cloud to enable deployment and management of cloud-based Db2 Warehouse on Cloud clusters within a single management framework

All aimed for data scientists to allow them create new analytic models that then developers can make use of for developing and deploying intelligent applications easily and rapidly.

According to Vitaly Tsivin, Executive Vice President at AMC Networks:

“The combination of high performance and advanced analytics – from the Data Science Experience to the open Spark platform – gives our business analysts the ability to conduct intense data investigations with ease and speed. The Integrated Analytics System is positioned as an integral component of an enterprise data architecture solution, connecting IBM Netezza Data Warehouse and IBM PureData System for Analytics, cloud-based Db2 Warehouse on Cloud clusters, and other data sources.�

The Integrated Analytics System is built with the IBM common SQL engine to enable users to seamlessly integrate the unit with cloud-based warehouse solutions and, to provide users with an option to easily move workloads seamlessly to public or private cloud environments with Spark clusters, for their specific requirements.

Some capabilities and power include:

  • Asymmetric massively parallel processing (AMPP) with IBM Power technology and flash memory storage hardware
  • Ability to built on the IBM PureData System for Analytics, and the previous IBM Netezza data warehouse offerings
  • Support for variety of data types and data services, including the Watson Data Platform and IBM Db2 Warehouse On Cloud, to Hadoop and IBM BigSQL.

Also, the new Integrated Analytics System incorporates hybrid transactional analytical processing (HTAP) where HTAP can run predictive analytics, transactional and historical data on the same database at faster response times.

Additionally, the Integrated Analytics System is designed to provide built-in data virtualization and compatibility with the rest of the IBM data management product stack including Netezza, Db2, and IBM PureData System for Analytics.

According to IBM, later this year, the company has plans to incorporate support for HTAP within the IBM Db2 Analytics Accelerator for z/OS to enable the new platform to seamlessly integrate with IBM z Systems infrastructures.

A new “data science� platform era?

It seems a major reshaping is ongoing in the BI and analytics software market as new-generation solutions keep emerging or getting more robust.

It also seems this transformation, seen from the user perspective of view is enabling traditional business intelligence tasks to evolve, blurring the lines between the traditional BI analysis and that coming from data science, helping departments to evolve their BI teams more naturally into robust advanced analytics departments and even easing somehow the educational process these departments need to overcome to make their personnel evolve with the times.

It seems we are seeing a new era in the evolution of enterprise BI/analytics/data science platforms that are about to take over the world. A new space worth to keep an eye on, I think.

Analytics with Maple Syrup Flavour: Canadian Data & Analytics Companies. Part 1

Analytics with Maple Syrup Flavour: Canadian Data & Analytics Companies. Part 1

We all know Silicon Valley is the mecca of technology and, of course, this applies too,for the business intelligence (BI) and analytics market, as it concentrates many of its vendors.

Still, it is not hard to realize that around the world we can find tech companies developing innovative technology and software in many areas of the data management space, both already consolidated companies and vibrant startups looking to disrupt the market.

While for many people it’s not a surprise the relevant role some Canadian companies have played in the evolution of the business intelligence (BI) and analytics market, for some, it is still unclear which companies in Canada are setting a mark for the evolution of local Canadian data management technology.

Just as brief sample, we can mention these honorable mentions of Canadian companies who played a key role in the evolution of BI, including:

  • Former Ottawa-based software company Cognos, a fundamental player in the BI enterprise performance management software market acquired by IBM
  • Dundas, a longtime runner in the market who remains as a relevant player and who sold part of its dashboard and reporting technology to become part of Microsoft’s large reporting and analytics arsenal
  • Or more recently Datazen, an innovative mobile BI and data visualization developer acquired also by Microsoft

So without further due, here’s a list of some current players making waves in the BI and analytics market:

Core Analytx 
Solution(s): OnTime Analytx 

Based in Markham, Ontario. Core Analytx is the developer of OnTime Analytics, the company’s flagship product and main analytics offering.

With its solution being offered in three flavors: standard (SAAS based), enterprise (on-premises, as well as on private cloud) the company aims to encourage, guide and assist organizations with the implementation of analytics centric processes.

Core Analytx develops its proprietary technology with the principles of ease of use and self-service approach and, the provision of practical and efficient analytics products and services to serve organizations from  different industries and lines of business.

Major functions and features offered by OnTime Analytics include:

  • Data ingestion from databases, flat files, mainframes and others
  • Configurable web services for data connectivity
  • Test data Management
  • Basic and advanced analytics features
  • Custom training features
  • Data transformation capabilities
  • Data visualization and publishing capabilities
  • Importing data via a  data loader that connect to all standard databases (i.e. SQL Server, MySQL, Oracle etc.)
  • “What if Scenarioâ€� capabilities
  • Application customization via developer API
  • Ad-hoc report creation
  • Integration to key partners software including and Oracle Cloud
  • Customized security capabilities

OnTime Analytx’ Screencap (Courtesy of Core Analytx)

Solution(s): Coveo Intelligent Search Platform

Coveo is a company with a great deal of experience when it comes to search, identify and provide contextual information to end users.

Based in Quebec City, its flagship Intelligent Search Platform offers uses number of data analysis and management capabilities bundled under Coveo AIâ„¢ proprietary technology. With this technology, Coveo can search, find and deliver predictive insights across different cloud and on-premises systems.

Already well known for being a provider of enterprise search solutions, the company has expanded its solution to offer much more, using its now cloud-based solution.

Some core functional elements offered within its platform include:

  • Artificial Intelligence(AI)-powered search
  • Relevant search results
  • Advanced query suggestions and intelligent recommendations to website visitors and automatic relevance tune to recommend the best content
  • Single-Sign On (SSO), for unified search that honors user and group permissions across all enterprise content sources
  • Personal business content, like emails, can only be searched and seen by the individual user, and any designated super-users (e.g. compliance officers)
  • Usage Analytics

Coveo includes partnerships with key software companies to allow its platform to integrate and work with data from Microsoft, Sitecore and
Coveo’s Screencap (Courtesy of Coveo)

DMTI Spatial
Solution(s): Location Hub Analytics

For over 20 years, DMTI Spatial has been providing industry leading location economics and Master Address Management (MAM) solutions to Global 2000 companies and government agencies, it is also the creator of CanMap mapping solutions and the award-winning Location Hub. DMTI Spatial is headquartered in Markham, Ontario.

Location Hub Analytics is a self-service data analytics engine that provides Canada’s robust, accurate and up-to-date location-based data.

Relevant functional features of Location Hub Analytics include:

  • Automatically consolidates, cleanses, validates and geocodes your address database
  • Each record is assigned a Unique Address Identifier (UAIDâ„¢)
  • Quickly processes and analyzes data, to objectively reveal meaningful patterns and trends to help better understand customers and prospects
  • Allows you to visualize and interact with your results on a map for better data profiling
  • Enriches data with Canadian demographics' information for further analysis and greater customer intelligence
  • Helps generate new business prospect lists by infilling the addresses within a specific territory that are not in your current database

Location Hub Analytics (Courtesy of DMTI Spatial)

Solution(s): Dundas BI

Dundas is an experienced company in the business intelligence scene. Headquartered in Toronto, the company offers, via its now flagship product Dundas BI a robust BI and analytics platform.

With its BI solution, Dundas aims to give users full control over their data so it can be quickly delivered in the most actionable way. Dundas BI platform enables organizations to work with data, prepare it  and transform it and subsequently enable its visual exploration within dashboards, reports and visual data analytics tools.

Also, worth to mention is Dundas’ success relies on its ability to build a solution with a wide amount of built-in functionality, and a rich set of open APIs.

Main functional features include:

  • Customizable dashboards
  • Communication and collaboration tools
  • Slideshows
  • Rich, interactive Scorecards
  • Ad-hoc reporting
  • Mobile features
  • Predictive and advanced data analytics
  • Embedded BI with seamless data integration
  • Support for Windows authentication
  • Multi-tenancy support

Dundas BI’s Screencap (Courtesy of Dundas)

Panorama Software
Solution: NECTO

Necto is Panorama Software’s full BI and analytics solution. The Toronto based company has offices in the US, UK and Israel, the company develops a business intelligence and analytics solution that offers automated analysis and recommendations which are easily disseminated throughout the organization.

With a fully customizable layout that can be modified to fit within many organization’s language and with easy point and click functionality, Panorama aims with Necto to take collaboration to the next level with the best business intelligence reporting tools that communicates real data.

Key features offered in Necto include:

  • Centrally administered, fully web based system
  • Fully functional dashboard design capabilities & simplified info-graphics
  • Automated analysis & hidden insights
  • Easy sharing of BI content
  • High security & scalability
  • Powered with KPI alerts
  • Mashup data from multiple sources
  • Simple & fast development

Necto’s’ Screencap (Courtesy of Panorama Software)

Solution: Semeon Insights

With great deal of experience in the machine learning and artificial intelligence (AI) R&D within its corridors and offices, Montreal based Semeon develops next generation cloud-based “AI Linguistic� text analytics platform solution Semeon Insights to service businesses interested in better understanding what is being said about their brand, company, products, staff, competitors, and more.

All Semeon’s solutions are developed using its series of patented Semantic Algorithms which can determine the sentiment, intent, and predictive behaviors of clients, buyers or customers.

Key features offered by Semeon Insights include:

  • Sifts through public (Social Media, forums, blogs, review sites) as well as private data (CRM data, Customer Service data) to enhance customer-driven campaigns.
  • Use of number of techniques to uncover key insights and influencers, these techniques include:
    • Sentiment Analysis
    • Concept Clouds
    • Timeline Tracking
    • Content Classification
    • Sources/channels
    • Influencer identification
    • Data Visualization
    • Geolocation
    • Intent Analysis
  • Leverage concepts and opinions that drive public perception to fuel content creation teams and boost ROI as well as glean insights from competitor’s digital campaigns. 

Semeon’s Screencap (Courtesy of Semeon)

Ahh! And There’s More

So, in the second part of this series I will include some other start-ups and projects that will caught your attention with their innovation and opportunity for both using them or build business with them.

In the meantime and considering I might be leaving some companies out, please feel free to let me know your comments or the name of new Canadian analytics solution we all should know about.

* All logos are trademarks of their respective owners
Teradata Aims for a New Era in Data Management with its New IntelliSphere Offering

Teradata Aims for a New Era in Data Management with its New IntelliSphere Offering

As Teradata continues to expand its Teradata Everywhere initiative, major announcements came from within its 2017 Partners conference, so along with the announcement of its brand new analytics platform, the company also unveiled a new comprehensive software portfolio that adds the data management power needed behind the analytics scenario.

According to Teradata, IntelliSphere is “a comprehensive software portfolio that unlocks a wealth of key capabilities for enterprises to leverage all the core software required to ingest, access, deploy and manage a flexible analytical ecosystem�.

(Image courtesy of Teradata)

Meanwhile, Teradata IntelliSphere is intended to complement the ongoing Teradata Everywhere initiative and be a natural companion for the Teradata Analytics Platform and, an important tool to enable users across the organization to use their preferred analytic tools and engines across data sources at scale, while having all the necessary components to ensure efficient data management from ingestion to consumption.

According to Oliver Ratzesberger, Executive Vice President and Chief Product Officer at Teradata:

“With IntelliSphere, companies no longer need to purchase separate software applications to build and manage their ecosystem. Companies can design their environment to realize the full potential of their data and analytics today, with a guarantee that future updates can be leveraged immediately without another license or subscription.�

Available for purchase now, the IntelliSphere software portfolio includes a series of key capabilities to ensure efficiency in all the data process to:

  • Ingest, so  companies can easily capture and distribute high volume data streams, with ready-to-run elastic architecture and quick access for business-critical analysis.
  • Access, so companies can gain easy access to data stored in a hybrid cloud or heterogeneous technology environment.
  • Deploy applications and analytic models for easy user access and enterprise collaboration.
  • Manage, to allow ad-hoc data movement, as well as ongoing monitoring and control via an operational interface.

According to the data management company, Teradata IntelliSphere is composed of ten software components, which include:

Finally, the company mentions that in the future, all new software releases will become part the IntelliSphere bundle, a logical step towards building a consistent and more homogeneous analytics ecosystem that can help Teradata to provide simplicity and functional power to its user base.

As I mentioned in another blog in this same vein, It seems we are facing a new stage in the analytics and data management software market in which software companies are now fully renovating their offerings to consolidate as many functions as possible within single enterprise platforms that blend all analytics needs with a robust data engine.

In future posts I’ll try to bring more information about this and the rest of Teradata’s new set of offerings so, stay tuned.
Teradata includes brand New Analytics Platform to its Teradata Everywhere Initiative

Teradata includes brand New Analytics Platform to its Teradata Everywhere Initiative

(Image Courtesy of Teradata)
In a recent announcement made during its 2017 Partners conference data management software provider, Teradata made an important new addition to its global Teradata Everywhere initiative with a brand new analytics platform.

The new offering to be available for early access later this year will aim to enable users to use the analytics environment of their choice. According to the company, the new analytics platform is planned to enable access to a myriad of analytics functions and engines so users can develop full analytics processes and business solutions using the tools of their choice so initially, the new platform, will natively integrate with Teradata and Aster technology (Figure 1) and in a near future will enable integration with leading analytics engines including Spark, TensorFlow, Gluon, and Theano.

Figure 1.  Aster Analytics Functions (Courtesy of Teradata)

As corporate data is increasingly captured and stored in a wider number of formats, the platform includes support for several data types from multiple data sources, from traditional to new formats social media and IoT formats, including text, spatial, CSV, and JSON formats, or Apache Avro, as well as other open-source data types that allow programmers to dynamically process data schemas.

As part of a new set of functional features, the Teradata Analytics Platform includes the provision of different scalable analytic functions like attribution, path analytics, time series, and number of statistical, text and machine learning algorithms.

With support for multiple languages including Python, R, SAS and SQL, and different tools like Jupyter, RStudio, KNIME, SAS, and Dataiku. Teradata expects experienced users can use their tool of choice to not just develop with less disruption but to promote efficiency via code and model re-use via Teradata’s AppCenter to allow analysts to share analytic applications and deploy reusable models within web-based interface.

According to Oliver Ratzesberger, Teradata’s executive vice president and chief product officer:

“In today’s environment different users have different analytic needs, this dynamic causes a proliferation of tools and approaches that are both costly and silo-ed. We solve this dilemma with the unmatched versatility of the Teradata Analytics Platform, where we are incorporating a choice of analytic functions and engines, as well as an individual’s preferred tools and languages across data types. Combined with the industry’s best scalability, elasticity and performance, the Teradata Analytics Platform drives superior business insight for our customers.�

According to Teradata, the benefits offered by the new analytics platform include:

  • Simplification of data access to both data warehouse and data lakes
  • Speed data preparation with embedded analytics
  • Allow fast and easy access to cutting-edge advanced analytics and AI technologies
  • Support for preferred data science workbenches and languages like R, Python, and SQL
  • Helping to make prescriptive analytics operational to enable autonomous decisioning
  • Minimize risk of existing analytical architectures with Teradata Everywhere

More important, the announcement of the new analytics platform comes along with the announcement of Teradata’s new comprehensive software portfolio initiative IntelliSphere, the new company’s proposal for easy data access, ingestion, deployment, and management.

According to Teradata, the new platform is planned to be flexibly delivered on-premises or via public and private clouds as well as managed cloud options, for which all them will use the same software.

Teradata is definitely aiming to be everywhere

Teradata seems to have understood how important is and will be in the future to offer new software solutions from more open and agile architectures that play well with others and yet are solid and secure. A movement other data management companies are already exploring and adopting, such is the case for, among others, Cloudera and its new Data Science Workbench or SAS’ Open Analytics Platform.

It seems we are facing a new stage in the analytics and data management software market in which software companies are reshaping all its offerings to consolidate as many functions as possible within single enterprise platforms that blend all analytics needs with a robust data engine.
In the meantime, personally I’m eager to check the new Teradata’s Analytics Platform in action.

The BBBT Sessions: Outlier, and the Importance of Being One

The BBBT Sessions: Outlier, and the Importance of Being One

It has been some time since my last write up about my briefings with the Boulder Business Intelligence Brain Trust (BBBT), multiple business engagements and yes, perhaps a bit of laziness can be attribute to it.

Now I have the firm intention to regain coverage of these series of great analyst sessions a more regular basis, hoping of course, my hectic life will not stand in my way.

So, to resume my coverage of this great series of sessions with software vendors and analysts, I have picked one that, while not that recent, was especially significant for the BBBT group and the vendor itself. I’m talking about a new addition to the analytics and BI landscape called Outlier.

Members of the BBBT and myself had the pleasure to be witness of the official launch of this new analytics and business intelligence (BI) company and its solution to the market.

Outlier presented its solution to our analyst gathering in what was an appealing session. So here, a summary of the session and info about this newcomer to the BI and Analytics space.

About Outlier

Outlier, the company, was founded by seasoned tech entrepreneur Sean Byrnes (CEO) and experienced data scientist Mike Kim (CTO) in 2015 in Oakland, Ca. with founding of First Round CapitalHomebrew, and Susa Ventures.

Devoting more than year to develop the new solution, Outlier maintained it in beta through most of 2016, to be finally released in February of 2017 aiming to offer users a unique approach to BI and analytics.

With its product named after the company, 33,3 333Outlier aims to be well, precisely that by offering a different approach to analytics, so that it:

“Monitors your business data and notifies you when unexpected changes occur.�

Which means that, rather than taking a reactive approach in which the system waits for the business user to launch the analytics process, the system will take a proactive approach and signal or alert when these changes occur, triggering action from analysts. 

Now to be honest, this is not the first time I hear this claim from a vendor and frankly, as many modern BI solutions incorporate more sophisticated alerting mechanisms and functionality I’m less concerned on hearing it and more on discovering how each software provider addresses the issue of making analytics and BI solutions able to be proactive.

During the session, Sean Byrnes and Doug Mitarotonda, CEO and Head of Customer Development respectively, gave us a great overview of Outlier’s new approach to BI and analytics. Here, a summary of this great briefing.

Outlier AI and a New Analytics Value Chain

Being data scientists themselves, Outlier’s team understands the hardships, complexities and pains data scientists and business analysts undergo to design, prepare and deploy BI and analytics solutions so, by considering this and aiming to take a fresh approach Outlier was born, aiming to provide a new approach to business intelligence.

The approach developed by Outlier intends to ―opposed to creating dashboards or running queries against business data analysis requirements― watch consistently and automatically business data and alert of when unexpected changes occur so to do this.

Outlier connects directly to a number business data types in the likes of Google Analytics, Adobe Cloud, Salesforce, Stripe, SQL databases and many others to, then, automatically monitor or watch the data and alert of unexpected behavior.

Along with the ability to proactively monitor business data and alert of changes, Outlier can sift through metrics and dimensions, aiming to understand and identify business cycles, trends and patterns to automate the business analysis process and consequently, position themselves in the realm of a new generation of BI solutions (Figure 1).

Figure 1. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

During the BBBT session with Outlier, one key thing brought up by Sean Byrnes was the fact that the company’s leadership understands the analytics and business intelligence (BI) market is changing and yet, many companies are still struggling now, not with the availability of data but with the questions themselves, as the analytics formulation process becomes increasingly complex.

According to the company, as part of a process aimed to automate the monitoring and analytics process and, to help users ease its regular monitoring process, once deployed Outlier can provide daily headlines from key business dimensions, enabling them to ask those critical questions in the knowing there will be a regular answer but still enable them to formulate also new ones to keep discovering what is important (Figure 2).

Figure 2. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

Interestingly, I find this process to be useful, especially to:
  • Carry on with common data analysis and reporting tasks and above all that can truly automate the analytics process so it can detect when a significant change occurs.
  • Take a proactive approach to encapsulate the complexities of data management and present insights in a proper way for users to make business decisions ―act on data―
  • Filter data to recognize what is important to know when making a decision.

Outlier: It is not Just About the Common, but the Uncommon

Today many organizations can know how much they sold last month or, how much they spend on the last quarter, those are relevant yet, common questions that can be answered with relative ease but today, it is now also about discovering not just answers but new questions that can unveil new key insights, opportunities, and risks.

Outlier identified this as a key need and acted upon it, knowing that sometimes constructing the infrastructure to achieve it can become far more than a trivial task, as it many times forces organizations to radically modify existing traditional BI platforms to accommodate the introduction of new or additional analytics capabilities ―predictive, mining, etc.― that might easily fit or not with existing BI solutions within an organization.

Outlier aims to automate this process by making it possible for organizations connect directly with various sources a business analyst take data from to guide him through an automation of the monitoring process.

One key aspect of Outlier I find worth mentioning is how the company strives to augment rather than replace the capabilities of existing analytics and data management solutions, and trying to fit within a specific point of what the company calls the analytics value chain (Figure 3).

Figure 3. Outlier’s Analytics Value Chain Proposition (Courtesy of Outlier)

During the demo session, other relevant aspects of Outlier include its functionality for providing new and useful functional elements like the inclusion of headlines and dashboards or scorecards that include nicely a combination of graph and textual information (Figure 4), a large set of connectors for different data sources including traditional databases and social media sources.

Also, worth mentioning is the effort Outlier is doing for educating potential users in the field of BI and Analytics and, of course, the potential use of Outlier in different industries and lines of business by making available a section in their portal with helpful information ranging how to analyze customer acquisition cost to performing customer segmentation.

Figure 4. Outlier’s Screencap (Courtesy of Outlier)

Outlier and a New Generation of BI and Analytics Solutions

As part of a new wave of solutions developing and providing analytics and BI services, Outlier is constantly working in the introduction of new technologies and techniques to the common functional portfolio of data analysis tasks, Outlier seems to have countless appealing functions and features to modernize the current state of analytics.

Of course, Outlier will face significant competition from other incumbents already in the market such is the case for Yellowfin, Board, AtScale, Pyramid Analytics and others but, if you are in the search or just knowing about new analytics and BI offerings, it might be a good idea to check out this new solution if you think your organization requires an innovative and agile approach to analytics, with full monitoring and alerting capabilities.

Finally, you can start by checking, aside its website some additional information right from the BBBT, including a nice podcast and the session’s video trailer.
Book Commentary: Predictive Analytics by Eric Siegel

Book Commentary: Predictive Analytics by Eric Siegel

As much as we’d like to imagine that today the deployment and use of predictive analysis has now become a commodity for every organization and it’s of use in every “modern� business.

The reality is that in many cases an number of small, medium and even large organizations are still not using predictive analytics and data mining solutions as part of their core software business stack.

Reasons can be plenty: insufficient time, budget or human resources as well as a dose of inexperience and ignorance of its real potential benefits can be the cause. This and other reasons came to my mind when I had the opportunity to read the book: Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die written by former Columbia University professor and founder of Predictive Analytics World conference series, Eric Siegel.

Aside from being a well and clear written book filled with examples and bits of humor to make it enjoyable, what makes this book in my view a one is, mostly written for a general audience, in plain English, which makes it a great option for those new in the field to fully understand what predictive analytics is and the potential effect and benefits for any organization.

With plenty of industry examples and use cases, Mr. Siegel neatly introduces the reader to the world of predictive analytics, what it is, and how this discipline and its tools are currently helping an increasing number organizations in the likes of Facebook, HP, Google, Pfizer ―and other big players in their fields― to discover hidden trends, predict and plan for making better decisions with data.

Another great aspect of the book is also its clear and easy explanation of current important topics including data mining and machine learning as a key to further advanced topics including artificial intelligence and deep learning. It also does a good job of mentioning some caveats and dangers making wrong assumptions when using predictive analytics.

I especially enjoyed the central section of the book filled with examples and use cases predictive analysis in different industries and lines of business like healthcare financial and law enforcement among others as well as list of resources listed at the end of the book.

Of course for me, being a practitioner for many years, there was a small sense of wanting a bit more technical and theoric details but yet, the book is a great introduction reference for both novices that need to get the full potential of predictive analytics and those familiar with the topic but that want to know what their peers are doing to expand their view of the application of predictive analysis in their organization.

If you are still struggling to understand what predictive analysis is and what benefits can offer to your organization can do to improve your decision making and planning abilities, or want a fresh view on what are the new use cases for this discipline and software solutions, Predictive Analytics from Eric Siegel is certainly reference you should consider having in your physical or virtual bookshelf.

Have you read the book? About to do it? Don’t be shy, share your comments right below…

BOARD International: Cognitive, Mobile, and Collaborative

BOARD International: Cognitive, Mobile, and Collaborative

Business Intelligence (BI) and Enterprise Performance Management (EPM) software provider BOARD International recently released version 10.1 of its all-in-one BI and EPM solution. This release includes new user experience, collaboration, and cognitive capabilities, which will enable BOARD to enter into the cognitive computing field.

By incorporating all these new capabilities into its single BI/EPM offering, BOARD continues to uphold its philosophy of offering powerful capabilities within a single platform.

With version 10.1, BOARD aims to improve the way users interact with data significantly. The new version’s interface introduces new user interaction functionality in areas such as user experience and storytelling and is a major improvement on that of the previous version.

BOARD gave me an exclusive overview of the main features of version 10.1 and the company's product roadmap. Read on for details.

Getting On Board with Cognitive Technologies

With version 10.1, BOARD seems to be making its solution fit for a new era centered on machine learning. The solution uses natural language recognition (NLR) and natural language generation (NLG) capabilities to offer users new ways to interact with data (see Figure 1).

Figure 1. BOARD’s assistant (image courtesy of Board International)

For instance, users can now create an entire report in a drag-and-drop interface. They can also directly ‘talk’ to the system through spoken and written language. The system uses search-like strings that automatically translate human speech into words, words into queries, queries into reports, and finally reports that include the most important insights from the source information.

One key aspect of these features is that users can create a report by simply writing a search string or request. Specifically, BOARD uses a fuzzy search mechanism that searches the string for character sequences that are not only the same but similar to the query term to transform this request into a machine-generated report (Figure 2).

Figure 2. BOARD’s machine-generated report analysis (image courtesy of Board International)

BOARD can also identify, recover, and list reports that match the search criteria, such as reports generated by other users. This capability speeds up the solution development process by enabling users to identify existing work that can be used for a new purpose.

In-context Collaboration

BOARD has also improved its collaboration strategy, specifically by facilitating communication between users. The vendor has introduced an in-context collaboration feature that enables users to share their analyses, communicate via live chat, and enabling multiple users to edit and author reports in a single interface. Embedded security (Figure 3) ensures users have the right level of access and defines user groups. This enables users to share analytics securely and seems to improve the overall analysis of data and the development of analytics apps.

Figure 3. BOARD’s embedded collaboration features (Courtesy of Board International)

User Experience and Storytelling

BOARD is also continuing to focus heavily on customer experience and functional efficiency.

The latest version of BOARD’s BI and EPM platform has redesigned user interfaces, including a color-coded tile menu with icons to improve hierarchy management and touchscreen usability. In addition, the configuration panel now offers more time and analytics functions.

10.1 also introduces Presentations—a new storytelling capability that enables users to personalize their reports and save them as a live presentation. This enables users to share presentations that incorporate live information rather than static images and graphs with other users and groups, improving user collaboration.

This new feature lets BOARD stay up to date with current trends in BI and compete with other players in the field that already offer similar capabilities, such as Tableau and Yellowfin.

Mobility, Cognitive Capabilities, and Collaboration: BOARD’s Bet for the Future

BOARD also explained that it‘s paving the way for medium- and long-term product advancements.

In its latest release, BOARD has ensured its HTML 5- based client will replicate all the functionality of its existing Windows client interface in future. This will enable users to choose between mobile and desktop devices.

10.1 also introduces, new mobile apps and add-ons, which widen BOARD’s intrinsic analytics and data management capabilities and the solution’s mobile functions and features.   The company is also currently reinforcing the product’s interaction with the Microsoft Office software stack in a continuous effort to help users increase productivity. This will help users conduct BI and EPM analysis more easily as they will have access to embedded analytics services within the standard Office applications such as Word and Excel.

Lastly, 10.1 also includes more features for accessing big data sources and cloud-based technologies and has partnered with cloud CRM and Business Software leader’s also worth noting that BOARD is now expanding its North American presence. Specifically, the vendor is increasing the number of its human and material resources to reinforce its marketing and sales efforts and support and services capabilities.

BOARD 10.1 offers a good balance of analytics and enterprise performance management capabilities. It could be a solution for those looking to start using analytics or enhance their existing analytics capabilities.

(Originally published on TEC's Blog)
2017 Teradata Influencer Summit: Blending In on the New Management Era

2017 Teradata Influencer Summit: Blending In on the New Management Era

A couple of weeks ago I was fortunate to be invited to attend the 2017 influencer summit event in the beautiful venue chosen by Teradata in La Joya, California. Aside from the beautiful venue, a great event took place, one which was insightful, interesting and, well fun. A confirmation of Teradata’s evolution in both technical and b
usiness sides and a confirmation that the IT and software industry have radically changed in the last couple of years.

Since last year’s partners Conference and influencer events, Teradata keeps moving forward with its evolutive process to adapt to the new business and technical dynamics of the market. These year’s event allowed analysts, pundits and influencers alike to have a glimpse of what Teradata is doing to deliver value to customers and new customers.

More Analytics, More Integration, More Scale...

In its continuous effort Teradata is making sure its offerings are available in all shapes and forms, more precisely, in all major Cloud and on-premises flavors as part of Teradata’s everywhere strategy. This includes launching Teradata in the Azure marketplace, and increasing geographic coverage for its own Manage Cloud. In the same pace, the company is also working to rapidly adjust to business and industry changes to continuously improve solution delivery and services.

Right from the get-go John Dining, Teradata’s Executive Vice President & Chief Business Officer, gave us a clear overview the enterprise analytics and data management software provider is working on different strategic paths to ensure the company remains a top of its industry market segment.

John Dining presenting at Teradat's 2017 Influencer Summit Event

One key and noteworthy aspect of this overall strategy is Teradata’s bold approach and continuing effort to match its product development with the creation of a business coherent proposal via three areas:

  • Reinforcing its multi-genre analytics strategy, which means widening the offering of analytics capabilities to strengthen user’s capabilities in areas such as text, path and graph analysis, among others.
  • Boldening Teradata’s power to perform more versatile and flexible data movement and integration operations to support an increasing number of sources and complex operations with data. This includes increasing Teradata’s ability to incorporate intelligence and automation for data management operations as well as developing vertical solutions for specific areas such as communications, finance or lines of business like marketing and devops.
  • Increasing Teradata’s ability to scale according with customer's’ needs, especially for those with big data management demands.

One important takeaway here in my view is Teradata’s clear path from a technical perspective, focusing on real technical challenges to be addressed by a majority of organizations and yet, at the same time, changing its message to be less technical and more business oriented to provide clarity especially to the enterprise market, a market they know perfectly well.

Blended Architectures are in the Future Oh! and Yes, they Need Service

In a time where organizations seem to be increasingly reluctant to invest in consulting services and keen to look for vanilla deployment solutions, Teradata seems to be taking a more realistic approach.

On one hand, by putting specific measures to reinforce its services business, and on the other, by clearly acknowledging that blended architectures and hybrid deployments will be the norm in the following years or at least for the time being, which means having high quality consulting and services can be key to ensure success, especially in complex analytics deployment scenarios.

Aside from their incumbent software solutions, by taking aim to restructure its service and consulting areas, Teradata aims to have a better position to act upon these complex deployments that require specialized services.

According to Teradata, the company has been working to consolidate its services areas, via important acquisitions in the likes of ThinkBig, Claraview and BigData Partnership, as well as working to integrate them into a coherent service model, its Teradata Global Services Initiative.

The initiative is prepared on three main areas through:

  • Think Big Analytics, the global analytics consultancy group group leading with expertise in areas such as data science, solution development and data visualization for different industries and functions.
  • Enterprise Data Consulting, the technology-enabled group with strong expertise on analytical ecosystems, providing services ranging from architecture, data management & governance, managed services, as well as security 
  • Customer Services, the group responsible for providing value and availability of analytic platforms via change management services, and with expertise in systems and software management 

The strategy seems to be well complemented with the inclusion of a complete business value framework that, asides from a comprehensive analytics strategy for customers, education and the inclusion of Teradata’s Rapid Consulting Engagement (RACE) strategy, is aimed to help customers leverage comprehensive solution in a matter of weeks and providing “agile� development models for is customers.

Teradata’s approach seem to make perfect sense, enabling the company to grow efficiently on the technology side, especially towards a hybrid cloud approach while ensuring the offering of high quality consulting services.

Now, can this approach carry challenges for the company?

It is possible, perhaps one challenge for Teradata will be to ensure successful delivery, especially in areas where being “agile� is a must, especially talking about big data and data science projects which more often than not tend to require fast times to deployment so, Teradata will need to make sure consulting, educational and all service offerings are fine tuned, and in tune with its software and hardware offerings own evolution.

For this then, the company is working to consolidate its technical and business messaging to the company’s strategy towards: the offering of hybrid cloud Solutions, business analytics solutions and full fledged ecosystem architecture consulting.

So, part of his strategy includes, aside from reinforcing its go to cloud strategy, accelerating its existing release calendar to offer three major release a year for its flagship product Teradata Database, reinforcing its Intelliflex data warehouse appliance with new functionality and the launch of Teradata Intellibase, Teradata’s compact environment for data warehousing and continued evolution of Intellicloud, the company’s secure managed cloud offering.

So, on the Big Picture...

Many more things happened and were revealed by Teradata, both publicly and under disclosure, but from a personal view, what still sticks with me as the relevant story is how Teradata is managing to keep its transformation at a pace and form that continuous to have a fine balance between its more “traditional� data management customers and its new customers to ensure both offerings ranging in the “typical� data warehousing and analytics space and those that require innovation via new advanced analytics and big data ecosystems.

Challenges may still wait ahead for Teradata due to an increased and more fierce competition but the data warehousing company seems to be adapting well to the new data management era.

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

Logo courtesy of DOMO , Inc.
When you decide to show up at Domopalooza, Domo’s big user event, you don’t know for sure what you will find, but from the very beginning you can feel that you’ll have a unique experience. From the individual sessions and training, the partner summit and the concert line-up, to what might come from Domo’s CEO/rock-star Josh James, who certainly is one of a kind in the software industry; you know that you’ll witness a delightful event.

This year under the strings of Styx’s, Mr. James kicked off an event that amalgamated business, entertainment, fun and work in a unique way —a very Domo way.

With no more preambles, here is a summary of what happened during Domo’s 2017 DomoPalooza user conference.

Josh James at DomoPalooza 2017 (Photo courtesy of DOMO)
Key Announcements

Before entering to the subjective domain of my opinion about Domo’s event and solutions, let’s take a minute to pin point some of the important announcements made previous and during the event:
  • The first news came some days before the user event, when Domo announced its new model for rapid deployment dashboards. This solution consists of a series of tools that accelerate and ease the dashboard deployment process. Starting with its large number of connectors to diverse data sources, to a set of pre-installed and easy to configure dashboards, this model will enable developers deploy dashboards quickly and easily that decision makers can use effectively.
  • The next important announcement occurred during the conference. Domo came out with the release of Mr. Roboto —DOMO’s new set of capabilities for performing machine learning, predictive analytics and predictive intelligence. According to DOMO, the new offering will be fully integrated within DOMO’s business cloud, aiming for fast and non-disruptive business adoption. Two major features from Mr. Roboto include Alerts Center, a personalized visual console powered by advanced analytics functionality to provide insights and improve decision making. The other is its data science interface to enable users to apply predictive analytics, machine learning and other advanced analytics algorithms to its data sets. This is for sure one product I’m looking forward to analyzing further!

The introduction of new features, especially directed to narrow the technical-business gap within the C-Suite of an organization, and to facilitate decision makers an easier and customized access to insights, will enable business management and monitoring using DOMO. Some of these features include the introduction of:
  • Annotations, so information workers and decision makers can highlight significant insights in the process on top of a chart or data point. Enhancement to its Analyzer tool with the incorporation of a visual data lineage tool to enable users to track data from source to visualization.
  • Data slicing within DOMO’s cards to create more guided analysis paths business users and decision makers can take advantage of. 
  • More than 60 chart families to enhance the rich set of visual options already within DOMO’s platform. 

DOMO’s new features seem to fit well within a renewed effort from the company to address bigger enterprise markets and increase presence within segments which traditionally are occupied by other enterprise BI contenders.

It may also signal DOMO’s necessary adaptive process to comply with a market currently in a rampage for the inclusion of advanced analytic features to address larger and new user footprints within organizations, such as data scientists and a new more tech savvy generation of information workers.

There is much more behind Domo’s Curtains

Perhaps the one thing I did enjoy the most about the conference was having a continuous sense of discovery —different from previous interactions with DOMO, which somehow left me with a sense of incompletion. This time I had the chance to discover that there is much more about DOMO behind the curtains.

Having a luminary as CEO, such as Josh James, can be a two-edged sword. On one side, its glowing personality has served well to enhance DOMO’s presence in a difficult and competitive market. Josh has the type of personality that attracts, creates and sells the message, and with no doubt drives the business.

On the other end, however, if not backed and handled correctly, his strong message can create some scepticism, making some people think a company is all about a message and less about the company’s substance. But this year’s conference helped me to discover that DOMO is way more than what can be seen in the surface.

Not surprising is the fact that Josh and Chris Harrington —savvy businessmen and smart guys— have been keen to develop DOMO’s business intelligence and analytics capabilities to achieve business efficiency, working towards translating technical complexity into business oriented ease of use. To achieve this, DOMO has put together, on the technical side, a very knowledgeable team lead by Catherine Wong and Daren Thayne, DOMO’s Chief Product Officer and Chief Technology Officer respectively, both with wide experience. Their expertise goes from cloud platforms and information management to data visualization and analysis. On the business side, an experienced team that includes tech veterans like Jay Heglar and Paul Weiskopf, lead strategy and corporate development, respectively.

From a team perspective, this balance between tech experience and business innovation seems to be paying off as, according to them, the company has been growing steadily and gaining the favour of big customers such as TARGET, Univision or Sephora,  some of the customers that were present during the event.

From an enterprise BI/Analytics perspective, it seems DOMO has achieved a good balance in at least two major aspects that ensure BI adoption and consumption:

  • The way BI services can be offered to different user groups— especially to the C-level team— which requires a special degree of simplification, but at the same time an efficiency in the way the data is shown.
  • The way BI services can encapsulate complex data processing problems and hide them from the business user. 

Talking about this topic, during the conference we had the chance to see examples of the aforementioned aspects, both onstage and offstage. One with Christel Bouvron,  Head of Business Intelligence at Sephora Southeast Asia. Christel commented the following, in regards to the adoption and use of DOMO:

“We were able to hook in our data sets really quickly. I had sketched out some charts of what I wanted. They didn’t do that, but what they did was even better. I really liked that it wasn’t simply what I was asking for – they were trying to get at the business problem, the outcomes we were trying to get from it, and think about the bigger picture.�

A good example of the shift DOMO wants to convey is that they are now changing the approach from addressing a business problem with a technical perspective, to addressing the business problem with business perspective but having a technical platform in the background to support it. Of course this needs to come with the ability to effectively encapsulate technical difficulties in a way that is efficient and consumable for the business.

Christel Bouvron at DomoPalooza 2017 (Photo coutesy of DOMO)

It was also good to hear from the customers that they acknowledge that the process wasn’t always that smooth, but it helped to trigger an important cultural shift within their organization.

The takeaway

Attending Domopalooza 2017 was informative and very cool indeed. DOMO’s team showed me a thing or two about the true business of DOMO and its interaction with real customers; this includes the fact that DOMO is not a monolithic solution. Besides its already rich set of features, it enables key customization aspects to provide unique customers with unique ways to solve their problems. While DOMO is a software rather than a service company, customers expressed satisfaction with the degree of customization and services DOMO provides —this was especially true with large companies.

DOMO has done a great job to simplify the data consumption process in a way that data feeds are digestible enough. The solution concentrates more on the business problem rather than the technical one, giving many companies the flexibility and time to make the development of business intelligence solutions more agile and effective. Although these results might not be fully achieved in all cases, DOMO’s approach certainly can help organizations to from a more agile and fast deployment process, thus, more efficient and productive.

Despite being a cloud-based software company, DOMO seems to understand quite well that a great number of companies are working, for necessity or by choice, in hybrid cloud/on-premises environments, which enables the customer to easily connect and quickly interact with on-premises systems, whether this is a simple connection to a database/table source or it requires more sophisticated data extraction and transformation specifications.

There is no way that in the BI and Analytics market a company such as DOMO — or any other player in the market— will have a free ticket to success. The business intelligence market is diversifying as an increasing number of companies seem to need their services, but certainly
DOMO’s offering is, by all means, one to be considered when evaluating a new generation BI solution to meet the increasing demand for insights and data analysis.

Finally, well... what can be a better excuse to watch Styx's Mr. Roboto than this.

(All photos credited to Domo, Inc.)
A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

Many things have been said and done in the realm of analytics, but visualizations remain as the forefront of the data analysis process, where intuition and correct interpretation can help us make sense of data.

As an increasing number of tools emerge, current visualizations are far more than mere pictures in a screen, allowing for movement, exploration and interaction.

One of this tools is D3, an open-source Javascript data visualization library. D3 is perhaps the most popular tool to develop rich and interactive data visualizations, used by small and large companies such as Google and the New York Times.

With the next Open Data Science Conference in Boston coming soon, we had the opportunityto talk with DataRobot’s and ODSC speaker Morgane Ciot about her workshop session: “Intro to 3D�, the state of data visualization and her very own perspectives around the analytics market.

Morgane Ciot is a data visualization engineer at DataRobot, where she specializes in creating interactive and intuitive D3 visualizations for data analysis and machine learning. Morgane studied computer science and linguistics at McGill University in Montreal. Previously, she worked in the Network Dynamics Lab at McGill, answering questions about social media behavior using predictive models and statistical topic models.

Morgane enjoys studying about machine learning (ML), reading, writing, and staging unusual events.

Let's get to know more about Morgane and her views as a data visualization engineer.

Morgane, could you tell us a bit more about yourself, especially about your area of expertise, and what was your motivation to pursue a career in analytics and data science?

I went to school for computer science and linguistics. Those two fields naturally converge in Natural Language Processing (NLP)/Artificial Intelligence (AI), an intersection that was unfortunately not exploited by my program but that nonetheless got me interested in machine learning.

One of the computer science professors at my school was doing what essentially amounted to sociological research on social media behavior using machine learning techniques. Working with him furthered my interest in ML, NLP, and topic modeling, and I began to also explore how to visualize some of the unmanageable amounts of data we had (like, all of Reddit).

I’m probably indebted to that part of my life, and my professor, for my current position as a data viz engineer. Also, machine learning's practical ramifications are going to be game changing. I want to live closest to the eye of the storm when the singularity hits.

Based on your experience, which attributes or skills should every data master have if he/she wants to succeed, and what would be your recommendations for those looking for an opportunity at this career?

Stats, problem-solving skills, and engineering or scripting abilities all converge in the modern data scientist.

You have to be able to understand how to formulate a data science problem, how to approach it, and how to build the ad hoc tools you’ll need to solve it. At least some basic statistical knowledge is crucial. Elements of Statistical Learning by Hastie and Andrew Ng’s Coursera course both provide a solid foundational understanding of machine learning and require some statistical background.

Learn at least one programming language — Python or R are the most popular. R is the de facto language for statisticians, and Python has a thriving community and a ton of data science libraries like scikit-learn and pandas. It’s also great for writing scripts to scrape web data. If you’re feeling more adventurous, maybe look into Julia.

As usual, don’t just learn the theory. Find a tangible project to work on. Kaggle hosts competitions you can enter and has a community of experts you can learn from.

Finally, start learning about deep learning. Many of the most interesting papers in the last few years have come out of that area and we’re only just beginning to see how the theory that has been around for decades is going to be put into practice.

Talking about data visualization, what is your view of the role it plays within data science? How important is it in the overall data science process?

Data visualization is pretty fundamental to every stage of the data science process. I think how it’s used in data exploration — viewing feature distributions — is fairly obvious and well-practiced, but people often overlook how important visualizations can be even in the modeling process.

Visualizations should accompany not just how we examine our data, but also how we examine our models! There are various metrics that we can use to assess model performance, but what’s really going to convince an end user is a visualization, not a number. That's what's going to instill trust in model decisions.

Standard introductions to machine learning lionize the ROC curve, but there are plenty of other charts out there that can help us understand what and how a model is doing: plotting predicted vs. actuals, lift charts, feature importance, partial dependence, etc. — this was actually the subject of my ODSC talk last year, which should be accessible on their website.

A visualization that rank-orders the features that were most important to the predictive capacity of a model doesn’t just give you insight, it also helps you model better. You can use those top features to build faster and more accurate models. 

What do you think will be the most important data visualization trend in the next couple of years?

Data is becoming evermore important basically everywhere, but popular and even expert understanding hasn’t quite kept up.

Data is slowly consuming us, pressing down from all angles like that Star Wars scene where Luke Skywalker and Princess Leia get crushed by trash. But are people able to actually interpret that data, or are they going to wordlessly nod along to the magical incantations of “dataâ€� and “algorithmsâ€�? 

As decisions and stories become increasingly data-driven, visualizations in the media are going to become more important. Visualizations are sort of inherently democratic.

Everyone who can see can understand a trend; math is an alien language designed to make us feel dumb. I think that in journalism, interactive storytelling — displaying data with a visual and narrative focus — is going to become even more ubiquitous and important than it already is. These visualizations will become even more interactive and possibly even gamified.

The New York Times did a really cool story where you had to draw a line to guess the trend for various statistics, like the employment rate, during the Obama years, before showing you the actual trend. This kind of quasi-gamified interactivity is intuitively more helpful than viewing an array of numbers.

Expert understanding will benefit from visualizations in the same way. Models are being deployed in high-stakes industries, like healthcare and insurance, that need to know precisely why they’re making a decision. They’ll need to either use simplified models that are inherently more intelligible, at the expense of accuracy, or have powerful tools, including visualizations, to persuade their stakeholders that model decisions can be interpreted.

The EU is working on legislation called “right of explanationâ€� laws, which allows any AI-made decision to be challenged by a human. So visualizations focused on model interpretability will become more important. 

A few other things….as more and more businesses integrate with machine learning systems, visualizations and dashboards that monitor large-scale ML systems and tell users when models need to be updated will become more prevalent. And of course, we’re generating staggering amounts of new data every day, so visualizations that can accurately summarize that data while also allowing us to explore it in an efficient way — maybe also through unsupervised learning techniques like clustering and topic modeling— will be necessary. 

Please tell us a bit about DataRobot, the company you work at.

We’re a machine learning startup that offers a platform data scientists of all stripes can use to build predictive models. I’m equal parts a fan of using the product and working on it, to be honest. The app makes it insanely easy to analyze your data, build dozens of models, use the myriad visualizations and metrics we have to understand which one will be the best for your use case, and then use that one to predict on new data.

The app is essentially an opinionated platform on how to automate your data science project. I say opinionated because it’s a machine that’s been well-oiled by some of the top data scientists in the world, so it’s an opinion you can trust. And as a data scientist, the automation isn’t something to fear. We’re automating the plumbing to allow you to focus on the problem-solving, the detective work. Don’t be a luddite! 

It’s really fun working on the product because you get to learn a ton about machine learning (both the theoretic and real-world applications) almost by osmosis. It’s like putting your textbook under your pillow while you sleep, except it actually works. And since data science is such a protean field, we’re also covering new ground and creating new standards for certain concepts in machine learning. There’s also a huge emphasis, embedded in our culture and our product, on — “democratizing� is abusing the term, but really putting data science into as many hands as possible, through evangelism, teaching, workshops, and the product itself.

Shameless promotional shout-out: we are hiring! If you’re into data or machine learning or python or javascript or d3 or angular or data vis or selling these things or just fast-growing startups with some cool eclectic people, please visit our website and apply!

As a data visualization engineer at DataRobot, what are the key design principles the company applies for development of its visualizations?

The driving design principle is functionality. Above all, will a user be able to derive an insight from this visualization? Will the insight be actionable? Will that insight be delivered immediately, or is the user going to have to bend over backwards scrutinizing the chart for its underlying logic, trying to divine from its welter of hypnotic curves some hidden kernel of truth? We’re not in the business of beautiful, bespoke visualizations,  like some of the stuff the NYTimes does.

Data visualization at DataRobot can be tricky because we want to make sure the visualizations are compatible with any sort of data that passes through — and users can build predictive models for virtually any dataset — which means we have to operate at the right level of explanatory and visual abstraction. And we want users of various proficiencies to immediately intuit whether or not a model is performing well, which requires thinking about how a beginner might be able to understand the same charts an expert might expect. So by “functionality� I mean the ability to quickly intuit meaning.

That step is the second in a hierarchy of insight: the first is looking at a single-valued metric, which is only capable of giving you a high-level summary, often an average. This could be obfuscating important truths. A visualization —the second step— exposes these truths a bit further, displaying multiple values at a time over slices of your data, allowing you to see trends and anomalous spots. The third step is actually playing with the visualization. An interactive visualization confirms or denies previous insights by letting you drill down, slice, zoom, project, compare — all ways of reformulating the original view to gain deeper understanding. Interactive functionality is a sub-tenet of our driving design principle. It allows users to better understand what they’re seeing while also engaging them in (admittedly) fun ways. 

During the ODSC in Boston, you will be presenting an intro to D3, can you give us a heads up? What is D3 and what are its main features and benefits?

D3 is a data visualization library built in Javascript. It represents data in a browser interface by binding data to a webpage’s DOM elements. It’s very low-level, but there are plenty of wrapper libraries/frameworks built around it that are easier to use, such as C3.js or the much more sophisticated If you find a browser-rendered visualization toolkit, it’s probably using D3 under the hood. D3 supports transitions and defines a data update function, so you can create really beautiful custom and dynamic visualizations with it, such as these simulations or this frankly overwrought work of art.

D3 was created by Mike Bostock as a continuation of his graduate work at Stanford. Check out the awesome examples.

Please share with us some details about the session. What will attendees get from it?

Attendees will learn the basics of how D3 works. They’ll come away with a visualization in a static HTML file representing some aspect of a real-world dataset, and a vague sense of having been entertained. I’m hoping the workshop will expose them to the tool and give them a place to start if they want to do more on their own. 

What are the prerequisites attendees should have to take full advantage of your session?

Having already downloaded D3 4.0 (4.0!!!!!) will be useful, but really just a working browser — I’ll be using Chrome — and an IDE or text editor of your choice. And a Positive Attitudeâ„¢. 

Finally, on a more personal tenor, what's the best book you've read recently? 

Story of O: a bildungsroman about a young French girl's spiritual growth. Very inspiring!

Thank you Morgane for your insights and thoughts.

Morgane's “Intro to 3Dâ€� workshop session will be part of the Open Data Science Conference to take place in Boston, Ma. from May 3 to 5.

A good excuse to visit beautiful Boston and have a great data science learning experience!

Cloudera Analyst Event: Facing a New Data Management Era

Cloudera Analyst Event: Facing a New Data Management Era

I have to say that I attended this year’s Cloudera analyst event in San Francisco with a mix of excitement, expectation and a grain of salt also.

My excitement and expectation were fuelled with all that has been said about Cloudera and its close competitors in the last couple of years, and also by the fact that I am currently focusing my own research on big data and “New Data Platforms�. Moreover, when it comes to events hosted by vendors, I always recommend taking its statements with a grain of salt, because logically the information might be biased.

However, in the end, the event resulted in an enriching learning experience, full of surprises and discoveries. I learnt a lot about a company that is certainly collaborating big time in the transformation of the enterprise software industry.

The event certainly fulfilled many of my “want-to-know-more� expectations about Cloudera and its offering stack; the path the company has taken; and their view of the enterprise data management market.

Certainly, it looks like Cloudera is leading and strongly paving the way for a new generation of enterprise data software management platforms.

So, let me share with you a brief summary and comments about Cloudera’s 2017 industry analyst gathering.

OK, Machine Learning and Data Science are Hot Today

One of the themes of the event was Cloudera’s keen interest and immersion into Machine Learning and Data Science. Just a few days before the event, the company made two important announcements:

The first one was about the beta release of Cloudera Data Science Workbench (Figure 1), the company’s new self-service environment for data science on top of Cloudera Enterprise. This new offering comes directly from the smart acquisition of machine learning and data science startup,

Screencap of Cloudera's Data Science Workbench (Courtesy of Cloudera) 
Some of the capabilities of this product allow data scientists to develop on some of the most popular open source languages —R, Python and Scala— with native Apache Spark and Apache Hadoop integration, which in turn fastens project deployments, from exploration to production.

In this regard, Charles Zedlewski, senior vice president, Products at Cloudera mentioned that

“Cloudera is focused on improving the user experience for data science and engineering teams, in particular those who want to scale their analytics using Spark for data processing and machine learning. The acquisition of and its team provided a strong foundation, and Data Science Workbench now puts self-service data science at scale within reach for our customers.�

One key approach Cloudera takes with the Data Science Workbench is that it aims to enable data scientists to work in an truly open space that can expand its reach to use, for example, deep learning frameworks such as TensorFlow, Microsoft Cognitive Toolkit, MXnet or BigDL, but within a secure and contained environment.

Certainly a new offering with huge potential for Cloudera to increase its customer base, but also to reaffirm and grow its presence within existing customers which now can expand the use of the Cloudera platform without the need to look for third party options to develop on top on.

The second announcement showcases the launch of Cloudera Solution Gallery (Figure 2), which enables Cloudera to showcase its solution’s large partner base  â€”more than 2,800 globally— and a storefront of more than 100 solutions.

This news should not be taken lightly as it shows Cloudera capability to start building a complete ecosystem around this robust set of products, which in my view is a defining aspect of those companies who want to become an industry de-facto.

Figure 2. Cloudera Solution Gallery (Courtesy of Cloudera)

Cloudera: Way More than Hadoop

During an intensive two-day event filled with presentations, briefings and interviews with Cloudera’s executives and customers, a persistent message prevailed. While the company recognizes its origin as a provider of a commercial distribution for Hadoop, it is now making it clear that its current offering has expanded way beyond the Hadoop realm to become a full-fledged open source data platform. Hadoop is certainly in the core of Cloudera as the main data engine itself but, with support for 25 open source projects, its platform is currently able to offer much more than Hadoop distributed storage capabilities.
This is reflected through Cloudera’s offerings, from the full fledged Cloudera Enterprise Data Hub, its comprehensive platform, or via one of Cloudera’s special configurations:

Cloudera’s executives made it clear that the company strategy is to make sure they are able to provide, via open source offerings, efficient enterprise-ready data management solutions.

However, don’t be surprised if the message from Cloudera changes through time, especially if the company wants to put its aim on larger organizations that most of the times rely on providers that can center their IT services to the business and are not necessarily tied with any particular technology.

Cloudera is redefining itself so it can reposition its offering as a complete data management platform. This is a logical step considering that Cloudera wants to take a bigger piece of the large enterprise market, even when the company’s CEO stated that they “do not want to replace the Netezzas and Oracle’s of the world�.

Based on these events, it is clear to me that eventually, Cloudera will end up frontally competing in specific segments of the data management market —especially with IBM through its  IBM BigInsights, and Teradata, with multiple products that have left and keep leaving a very strong footprint in the data warehouse market. Either we like it or not, big data incumbents such as Cloudera seem to be destined to enter the big fight.

The Future, Cloudera and IoT

During the event I had also a chance to attend a couple of sessions specifically devoted to show Cloudera’s deployment in the context of IoT projects. Another thing worth notice is that, even when Cloudera has some really good stories to tell about IoT, the company seems not to be in a hurry to jump directly onto this wagon.

Perhaps it’s better to let this market get mature and consistent enough before devoting larger technical investments on it. It is always very important to know when and how to invest in an emerging market.

However, we should be very well aware that Cloudera, and the rest of the big data players, will be vital for the growth and evolution of the IoT market.

Figure 3. Cloudera Architecture for IoT (Courtesy of Cloudera)

It’s Hard to Grow Gracefully

Today it’s very hard, if not impossible, to deny that Hadoop is strongly immerse in the enterprise data management ecosystem of almost every industry. Cloudera’s analyst event was yet another confirmation. Large companies are now increasingly using some Cloudera’s different options and configurations for mission critical functions.

Then, for Cloudera the nub of the issue now is not about how to get to the top, but how to stay there, evolve and leave its footprint at the top.

Cloudera has been very smart and strategic to get to this position, yet it seems it has gotten to a place where the tide will get even tougher. From this point on, convincing companies to open the big wallet will take much more than a solid technical justification.

At the time of writing this post, I learnt that Cloudera has filed to go public and will trade on the NY Stock Exchange, and as an article on Fotune mentions:

“Cloudera faces tough competition in the data analytics market and cites in its filing several high-profile rivals, including Amazon Web Services, Google, Microsoft, Hewlett Packard Enterprise, and Oracle.�

It also mentions the case of Hortonworks, which:

“went public in late 2014 with its shares trading at nearly $28 during its height in April 2015. However, Hortonworks’ shares have dropped over 60% to $9.90 on Friday as the company has struggled to be profitable.�

In my opinion, in order for Cloudera to succeed while taking this critical step, they will have to show that they are more than well prepared business, technically and strategically wise, and also prepared and ready for the unexpected, because only then they will be able to grow gracefully and align to play big, with the big guys.

Keep always in mind that, as Benjamin Franklin said:

Without continual growth and progress, such words as improvement,
achievement, and success have no meaning.

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

While performing my usual Googling during preparation for one of my latest reports on enterprise performance management (EPM), I noticed a huge difference in popularity between EPM and, for example, big data (Figure 1).

From a market trend perspective, it is fair to acknowledge that the EPM software market has taken a hit from the hype surrounding the emergence of technology trends in the data management space, such as business analytics and, particularly, big data.

Figure 1: Searches for big data, compared with those for EPM (Source: Google Trends)

In the last four years, at least, interest in big data has grown exponentially, making it a huge emerging market in the software industry. The same has happened with other data management related solutions such as analytics.

While this is not that surprising, my initial reaction came with a bit of discomfort. Such a huge difference makes one wonder how many companies have simply jumped onto the big data wagon rather than making a measured and thoughtful decision regarding the best way to deploy their big data initiative to fit within the larger data management infrastructure in place, especially with regards to having the system co-exist and collaborate effectively with EPM and existing analytics solutions.

Now, don’t get me wrong; I’m not against the deployment of big data solutions and all the potential benefits. On the contrary, I think these solutions are changing the data management landscape for good. But I can’t deny that, over the past couple of years, a number of companies, once past the hype and euphoria, have raised valid concerns about the efficiency of their existing big data initiatives and have questioned its value within the overall data management machinery already in place, especially alongside EPM and analytics solutions, which are vital for measuring performance and providing the right tools for strategy and planning.

The Analytics/EPM/Big Data Conundrum
A study published by Iron Mountain and PwC titled How Organizations Can Unlock Value and Insight from the Information they Hold, for which researchers interviewed 1,800 senior business executives in Europe and North America, concluded that:

“Businesses across all sectors are falling short of realizing the information advantage.�

Even more interesting is that, in the same report, when evaluating what they call an Information Value Index, the authors realized that:

“The enterprise sector, scoring 52.6, performs only slightly better than the mid-market (48.8).�

For some, including me, this statement is surprising. One might have imagined that large companies, which commonly have large data management infrastructures, would logically have already mastered, or at least reached an acceptable level of maturity with, their general data management operations. But despite the availability of a greater number of tools and solutions to deal with data, important issues remain as to finding, on one hand, the right way to make existing and new sources of data play a better role within the intrinsic mechanics of the business, and, on the other, how these solutions can play nicely with existing data management solutions such as EPM and business intelligence (BI).

Despite a number of big data success stories—and examples do exist, including Bristol-Myers Squibb, Xerox, and The Weather Company—some information workers, especially those in key areas of the business like finance and other related areas, are:

  • somehow not understanding the potential of big data initiatives within their areas of interest and how to use these to their advantage in the operational, tactical, and strategic execution and planning of their organization, rather than using them for in tangential decisions or for relevant yet siloed management tasks.
  • oftentimes swamped with day-to-day data requests and the pressure to deliver based on the amount of data already at their disposal. This means they have a hard time deciphering exactly how to integrate these projects effectively with their own data management arsenals.

In addition, it seems that for a number of information workers on the financial business planning and execution side, key processes and operations remain isolated from others that are directly related to their areas of concern.

The Job Still Needs to Be Done
On the flip side, despite the extensive growth of and hype for big data and advanced analytics solutions, for certain business professionals, especially those in areas such as finance and operations, interest in the EPM software market has not waned.

In every organization, key people from these important areas of the business understand that improving operations and performance is an essential organizational goal. Companies still need to reduce the cost of their performance management cycles as well as make them increasingly agile to be able to promptly respond to the organization’s needs. Frequently, this implies relying on traditional practices and software capabilities.

Activities such as financial reporting, performance monitoring, and strategy planning still assume a big role in any organization concerned with improving its performance and operational efficiency (Figure 2).

Figure 2: Population’s perception of EPM functional area relevance (%)
(Source: 2016 Enterprise Performance Management Market Landscape Report)

So, as new technologies make their way into the enterprise world, a core fact remains: organizations still have basic business problems to solve, including budget and sales planning, and financial consolidation and reporting.

Not only do many organizations find the basic aspects of EPM relevant to their practices, an increasing number of them are also becoming more conscious of the importance of performing specific tasks with the software. This signals that organizations have a need to continuously improve their operations and business performance and analyze transactional information while also evolving and expanding the analytic power of the organization beyond this limit.

How Can EPM Fit Within the New Data Management Technology Framework?
When confronted with the need for better integration, some companies will find they need to deploy new data technology solutions, while others will need to make existing EPM practices work along with new technologies to increase analytics accuracy and boost business performance.

In both cases, a number of organizations have taken a holistic approach, to balance business needs by taking a series of steps to enable the integration of data management solutions. Some of these steps include:

  • taking a realistic business approach towards technology integration. Understanding the business model and its processes is the starting point. But while technical feasibility is vital, it is equally important to take into account a practical business approach to understand how a company generates value through the use of data. This usually means taking an inside-out approach to understanding, by taking control of data from internal sources and that which might come from structured information channels and/or tangible assets (production, sales, purchase orders, etc.). Only after this is done should the potential external data points be identified. In many cases these will come in the form of data from intangible assets (branding, customer experiences) that can directly benefit the specific process, both new or already in place.

  • identifying how data provided by these new technologies can be exploited. Once you understand the business model and how specific big data points can benefit the existing performance measuring process, it is possible to analyze and understand how these new incoming data sources can be incorporated or integrated into the existing data analysis cycle. This means understanding how it will be collected (period, frequency, level of granularity, etc.) and how it will be prepared, curated, and integrated into the existing process to increase its readiness for the specific business model.
  • recognizing how to amplify the value of data. By recognizing and making one or two of these sources effectively relate and improve the existing analytics portfolio, organizations can build a solid data management foundation. Once organizations can identify where these new sources of information can provide extended insights into common business processes, the value of the data can be amplified to help explain customer behavior and needs; to see how branding affects sales increases or decreases; or even to find out which sales regions need improved manufacturing processes.

All this may be easier said than done, and the effort devoted to achieving this is considerable, but if you are thinking in terms of the overall business strategy, it makes sense to take a business-to-technical approach that can have a direct impact on the efficiency, efficacy, and success of the adoption of EPM/big data projects while also improving chances of adoption, understanding, and commitment to these projects.

Companies need to understand how the value of data can be amplified by integrating key big data points with the “traditional� data management cycle so it effectively collaborates with the performance management process, from business financial monitoring to planning and strategy.

While enterprise performance management initiatives are alive and kicking, new big data technologies can be put to work alongside them to expand the EPM software’s capabilities and reach.

The full potential of big data for enterprise performance management will only be realized when enterprises are able to fully leverage all available internal and external data sources towards the same business performance management goal to better understand their knowledge-based capital.

(Originally published on TEC's Blog)
(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

 John Bensalhia talks to Yves Mulkers, freelance Data Architect and blogger at 7wData, about the benefits, developments and challenges linked with Big Data...

“I'm an explorer on new technologies and Data Visualisation, and keep my finger on what's happening with Big Data from an architecture point of view.”

So says Yves Mulkers, freelance Data Architect and social media influencer. Yves is speaking ahead of upcoming Big Data World event in London, where he will make an appearance. Listing the key benefits of what Big Data can offer, Yves says that these are:

“Scalability, cost reduction, new products and revenue streams, tailored solutions and targeting, enterprise wide insights, and Smart cities.”

Having worked as a software developer in various branches, Yves achieved great expertise and mindset in object oriented thinking and development.
“Doing the full cycle of software development from analysis, implementation, support and project management in combination with a strong empathy, he positioned himself as a technical expert bridging and listening into the needs of the business and end-users.” 

Yves says that this past year has seen a number of breakthroughs in the development of Big Data such as:
“Integrated platforms, data preparation automation, automating automation, GPU and in-memory databases, Artificial Intelligence, micro services, IoT (Internet Of Things), and self-service analytics.”

Big Data can be used to create a competitive advantage in various ways for businesses. In addition to a 360% Customer View and narrower segmentation of customers, Yves says that next generation products, real-time customization, and business models based on data products are the new approaches. In addition, better informed decisions, such as the measurement of consumer sentiment are good gauges of raising the value of what Big Data can bring.

Businesses must consider a variety of aspects in order to ensure successful Data implementation. Yves says that businesses must have clear business processes and information state diagrams, and should also ensure that they are on top of their game with respect to training and documentation. Data standards must also be developed and complied with.

For applying data analytics and applications in a business, Yves explains that there are challenges to tackle:
“Creating value from your data products, finding the right talent and tools, maturity of the organisation in information management, and trusting the results of analytics. It's worth noting that Big Data and analytics are not the same as business intelligence.”

In the next five to 10 years, Yves says that:
“Big Data will become the business intelligence of now.”

In addition to businesses and companies, aspects of Big Data will be for everyone to take advantage of:
 “Big Data will be embedded in companies strategy, and analytics will become available to everyone. “
“Data volumes will keep on growing as data products will become a commodity and improve our quality of life.”

Looking ahead to the event, Yves says that he expects it to bring a lot of value and insights.
“The combination with the sidetracks around Cloud and others, will bring a broader view on the complete architecture (business, technical and data) needed to be successful in Big Data implementations.”

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

As the “Internet of Things (IoT)” market becomes less hype and more reality, German software powerhouse SAP is aiming to move fast with important economical and research investments, aiming to become a leader in the IoT field.

One key move is the recent announcement of SAP’s Leonardo Innovation Portfolio, a comprehensive solution offering to enable organizations plan, design and deploy IoT solutions.

Of course, with these announcements we felt compelled to reach out to SAP and know from their own words, about the details of this SAP’s new IoT portfolio.

As a result we had the opportunity to speak with Rakesh Gandhi, Vice President for IOT GTM & Solutions at SAP America.

Rakesh is an innovation Enthusiast and IOT evangelist, he is currently responsible for SAP Leonardo portfolio for IoT innovation’ GTM and Solutions Management. A 12 year veteran at SAP, Rakesh has been involved in incubating new innovations of Mobile, Cloud for Customer, CEC and now IoT.

Thank you Mr. Gandhi:

Last year SAP announced an ambitious €2 Billion investment plan to help companies and government agencies to develop their IoT and Big Data initiatives. Could share with us some details about this program and what this involves in a general sense?

IoT is one of the key pillar of SAP’s strategy to enable customer’ digital transformation journey. Over past several years SAP is developing IoT portfolio working closely with our customers. Recent announcement for SAP Leonardo brand is a continuation of SAP commitment and plans in following key areas

  • Accelerate innovation of IoT solution portfolio both organic and inorganic with acquisitions. 
  • Create awareness of SAP’s IoT innovations that empowers customers to run live business with smart processes across all line of business and re-invent business model  
  • Drive customer adoption, scale service, support and co-innovation, and 
  • Most importantly grow its ecosystem of partners and startups in the IoT market

To date, summary of key announcement includes:

Key acquisitions such as:

  • Fedem: With this acquisition SAP can now build an end-to-end IoT solution in which a digital avatar continuously represents the state of operating assets through feeds from sensors, replacing the need for physical inspection with a “digital inspection.” Additionally, the solution is intended to consider complex forces in play and detect both instantaneous consequences of one-off events and long-term health effects of cyclic loads, making possible accurate monitoring of maintenance requirements and remaining-life prediction for assets.
  • This acquisition helped provide expertise and technology to accelerate the availability of key IoT capabilities in SAP HANA Cloud Platform, such as advanced lifecycle management for IoT devices, broad device connectivity, strong IoT edge capabilities that work seamlessly with a cloud back end, end-to-end role-based security and rapid development tools for IoT applications.
  • Altiscale: This acquisition is helping our customers create business value by harnessing the power of BIG DATA generated by the connected world.
The Launch of SAP Leonardo Brand for IoT Innovation portfolio: This was a major step in announcing our brand for IoT driven innovation

SAP Leonardo jumpstart program: This is a major step in our commitment to help our customers drive adoption and rapidly deployment core IoT applications in a short time frame of 3months duration with fixed scope and price.

Partners Ecosystem are critical to our success; we are working closely with partners to create an ecosystem that our customers can leverage to further simplify their deployment projects.

Additionally, SAP is on track in opening up IoT labs to collaborate on Industry 4.0 and IoT innovations with our customers, partners and startups.

Can you share with us some of the details of the new enablement program as well as the general features of the Leonardo IoT Portfolio?

What are observing in the market place is that many organizations are starting with small experimental IoT projects or may have started to collect & store sensor data with some visualization capabilities.

However, it is still generally believed that IoT as a topic is very low on maturity curve. SAP now have a very robust portfolio which has been co-innovated with our early adopter customers and proven to deliver business value.

The second challenge and general perception with customers is that of IoT is still in hype phase and difficult to deploy, we decided it is very important for SAP to support our customer’ adoption and showcase that they can go productive live in a short time frame for first pilot.

This jumpstart program supports three scenarios as three distinct separate packages viz:

  • Vehicle Insights for fleet telematics,
  • Predictive Maintenance & Service with Asset Intelligence Network for connected assets
  • Connected Goods for scenarios connected coolers, connected vending machines and such mass marketing things.
Customers can now deploy one of this scenarios in 3 months timeframe. It is a very structured 3 steps process where-in first SAP teams works with customer leveraging ½ day design thinking workshop to get an agreement on pilot deployment scope, step 2 deliver a rapid prototype to demonstrate vision to get customer buy in.

In the final step, towards the end of 3 months engagement deliver a pilot productive system.

Lastly, SAP will continue to engage with customers to help with their IoT roadmap for next processes and business case.

It seems natural to assume SAP has already started working to support IoT projects in key industries and/or lines of business, could talk about some of these industry/LoB efforts?

SAP Leonardo IoT innovation portfolio powers digital processes across line of businesses and Industry.

As an example we have released new value map [here] of supply chain processes, now referred to as digital supply chain and this is powered by SAP Leonardo IoT innovation portfolio.

The same is applicable for other LoBs e.g. customer service processes to enable predictive & proactive maintenance process and also industry specific e2e solutions powered by SAP Leonardo e.g. with SAP Connected Goods for CPG & Retail industry.

Is this program designed mostly for SAP’s existing partners and customers? How non SAP customers could take advantage of it?

Jump start program is designed to support all our customer, both existing and net new prospect customers as well.

This mirrors how SAP Leonardo portfolio of IoT solutions is designed to work with SAP or Non-SAP backend and agnostic in that regard.

Finally, what are the technical and/or business requirements for applicants of this program?

As mentioned above, initially SAP Leonardo jump start program is offered for three packages, viz: SAP Vehicles Insights, SAP Connected Goods and SAP Predictive Maintenance &service + Asset intelligence networks.

These are cloud solutions and use cases covered by each of these packages are applicable across multiple industry.

Thank you again Mr. Gandhi!

You can learn more about SAP Leonardo by reaching its web site and/or reading this post by Hans Thalbauer.
In the meantime, you can take a look at the video introduction produced by SAP.

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

Looking ahead to 2017's Big Data World event, Booz Allen Hamilton's Principal Data Scientist discusses the Big Data concept, benefits and developments in detail with John Bensalhia...

2017's Big Data World promises plenty in the way of insightful talks and discussions on the subject. One of the unmissable talks to watch out for in March will come from Kirk Borne, Booz Allen Hamilton's Principal Data Scientist, who will look at “The Self-Driving Organisation and Edge Analytics in a Smart IoT World.

“I will describe the concept of a self-driving organisation that learns, gains actionable insights, discovers next-best move, innovates, and creates value from streaming Big Data through the application of edge analytics on ubiquitous data sources in the IoT-enriched world.”

As part of this discussion, Kirk will also present an Analytics Roadmap for the IoT-enabled Cognitive Organisation.

“In this case, the “self-driving organisation” is modeled after the self-driving automobile, but applicable organisations include individual organisations, and also smart cities, smart farms, smart manufacturing, and smart X (where X can be anything). The critical technologies include machine learning, machine intelligence, embedded sensors, streaming analytics, and intelligence deployed at the edge of the network.”
“Big Data and data science are expanding beyond the boundaries of your data centre, and even beyond the Cloud, to the point of data collection at the point of action! We used to say “data at the speed of business”, but now we say “business at the speed of data.”

Having achieved a Ph.D. in astronomy from Caltech, Kirk focused most of the first 20 years of his career on astrophysics research (“colliding galaxies and other fun stuff”), including a lot of data analysis as well as modelling and simulation.

“My day job for nearly 18 years was supporting large data systems for NASA astronomy missions, including the Hubble Space Telescope. So, I was working around data all of the time.”
“When data set sizes began to grow “astronomically” in the late 1990s, I began to focus more on data mining research and data science. It became apparent to me that the whole world (and every organisation) was experiencing large growth in digital data. From these observations, I was convinced that we needed to train the next-generation workforce in data skills. So, in 2003, I left my NASA job and joined the faculty at George Mason University (GMU) within the graduate Ph.D. program in Computational Science and Informatics (Data Science).”

As a Professor of Astrophysics and Computational Science at GMU, Kirk helped to create the world’s first Data Science undergraduate degree program.

“I taught and advised students in data science until 2015, at which point the management consulting firm Booz Allen Hamilton (BAH) offered me the position as the firm’s first Principal Data Scientist. I have been working at BAH since then.”

Booz Allen Hamilton offers management consulting services to clients in many sectors: government, industry, and non-profit. “Booz Allen Hamilton (BAH) is over 100 years old, but has reinvented itself as an agile leading-edge technology consultant,” says Kirk.

“Our market focus is very broad, including healthcare, medicine, national defense, cyber-security, law enforcement, energy, finance, transportation, professional sports, systems integration, sustainability, business management, and more. We deliver systems, technology strategy, business insights, consultative services, modelling, and support services in many technology areas: digital systems, advanced analytics, data science, Internet of Things, predictive intelligence, emerging technologies, Cloud, engineering, directed energy, unmanned aerial vehicles (drones), human capital, fraud analytics, and data for social good (plus more, I am sure).”

Discussing Big Data, Kirk regards this as a “concept”.

“It is not really about “Big” or “Data”, but it is all about value creation from your data and information assets. Of course, it is data. But the focus should be on big value, not on big volume; and the goal should be to explore and exploit all of your organisation’s data assets for actionable information and insights.”
“I like to say that the key benefits of Big Data are the three D2D’s: Data-to-Discovery (data exploration), Data-to-Decisions (data exploitation), and Data-to-Dividends (or Data-to-Dollars; i.e., data monetisation).”

Looking back to the the past year, Kirk says that there have been several significant Big Data-related developments.

“These include the emergence of the citizen data scientist, which has been accompanied by a growth in self-service tools for analytics and data science. We are also seeing maturity in deep learning tools, which are now being applied in many more interesting contexts, including text analytics. Machine intelligence is also being recognised as a significant component of processes, products, and technologies across a broad spectrum of use cases: connected cars, Internet of Things, smart cities, manufacturing, supply chain, prescriptive machine maintenance, and more.”
“But I think the most notable developments are around data and machine learning ethics – this has been evoked in many discussions around privacy and fairness in algorithms, and it has been called out also in some high-profile cases of predictive modelling failures. These developments demand that we be more transparent and explanatory to our clients and to the general public about what we are doing with data, especially their data!”

Much value can be gleaned from the Smart IoT World for businesses, and in a number of ways, as Kirk explains.

“First of all, businesses can learn about the latest products, the newest ideas, and the emerging technologies. Businesses can acquire lessons learned, best practices, and key benefits, as well as find business partners to help them on this journey from digital disruption to digital transformation.”
“The “Smart” in “Smart IoT” is derived from machine learning, data science, cognitive analytics, and technologies for intelligent data understanding. More than ever, businesses need to focus more on the “I” in “IT” – the Information (i.e., the data) is now the fundamental asset, and the Technology is the enabler. IoT is about ubiquitous sensors collecting data and tracking nearly everything in your organisation: People, Processes, and Products. Smart IoT will deliver big value from Big Data.”

Kirk says that the past few years of Big Data have been described as The End of Demographics and the Age of Personalisation. The next five to ten years, on the other hand will be the Age of Hyper-Personalisation.

“More than ever, people are at the centre of business,”

 explains Kirk.

“Big Data can and will be used to engage, delight, and enhance employee experience (EX), user experience (UX), and customer experience (CX). The corresponding actionable insights for each of these human experiences will come from “360 view” Big Data collection (IoT), intelligence at the point of data collection (Edge Analytics), and rich models for behavioural insights (Data Science).”
“These developments will be witnessed in Smart Cities and Smart Organisations of all kinds. The fundamental enabler for all of this is Intelligent Data Understanding: bringing Big Data assets and Data Science models together within countless dynamic data-driven application systems.”

With Big Data World only weeks away, Kirk is looking forward to the great opportunities that it will bring.

“I expect Big Data World to be an information-packed learning experience like no other. The breadth, depth, and diversity of useful Smart IoT applications that will be on display at Big Data World will change the course of existing businesses, inspire new businesses, stimulate new markets, and grow existing capabilities to make the world a better place.”
“I look forward to learning from technology leaders about Smart Cities, IoT implementations, practical business case studies, and accelerators of digital transformation. It is not true that whoever has the most data will win; the organisation that wins is the one who acts on the most data! At Big Data World, we can expect to see many such winning solutions, insights, and applications of Big Data and Smart IoT.”

Not Your Father’s Database: Interview with VoltDB’s John Piekos

Not Your Father’s Database: Interview with VoltDB’s John Piekos

As organizations deal with challenging times ―technologically and business wise―, managing increasing volumes of data has become a key to success.

As data management rapidly evolve, the main Big Data paradigm has changed from just “big” to “big, fast, reliable and efficient”.

No more than today in the evolution of the big data and database markets, the pressure is on for software companies to deliver new and improved database solutions capable not just to deal with increasing volumes of data but also to do it faster, better and in a more reliable fashion.

A number of companies have taken the market by storm, infusing the industry with new and spectacularly advanced database software —for both transactional and non-transactional operations— that are rapidly changing the database software landscape.

One of these companies is VoltDB. This New England (Massachusetts) based company has rapidly become a reference when it comes to the offering of next-generation of database solutions and, has gained the favor of important customers in key industries such as communications, finance and gaming.

VoltDB was co-founded by no other than world known database expert and 2014 ACM A.M. Turing Award recipient, professor, Dr. Michael Stonebraker who has been key in the development of a new generation database solution and the formation of a talented team in charge of its development.

With the new VoltDB 7.0 already in the market, we had the opportunity to chat with VoltDB’s John Piekos about VoltDB’s key features and evolution.

John is VoltDB’s Vice President of Engineering at VoltDB, where he heads up VoltDB’s engineering operations, including product development, QA, technical support, documentation and field engineering.

John has more than 25 years of experience leading teams and building software, delivering both enterprise and Big Data solutions.

John has held tech leadership positions at several companies, most recently at Progress Software where he led the OpenEdge database, ObjectStore database and Orbix product lines. Previously, John was vice president of Web engineering at EasyAsk, and chief architect at Novera Software, where he led the effort to build the industry’s first Java application server.

John holds an MS in computer science from Worcester Polytechnic Institute and a BS in computer science from the University of Lowell.

Thank you John, please allow me to start with the obvious question:

What’s the idea behind VoltDB, the company, and what makes VoltDB the database, to be different from other database offerings in the market?

What if you could build a database from the ground-up, re-imagine it, re-architect it, to take advantage of modern multi-core hardware and falling RAM prices, with the goal of making it as fast as possible for heavy write use cases like OLTP and the future sensor (IoT) applications?  That was the basis of the research Dr. Stonebraker set out to investigate.

Working with the folks at MIT, Yale, and Brown, they created the H-Store project and proved out the theory that if you eliminated the overhead of traditional databases (logging, latching, buffer management, etc), ran an all in-memory workload, spread that workload across all the available CPUs on the machine and horizontally scaled that workload across multiple machines, you could get orders of magnitude performance out of the database.

The commercial realization of that effort is VoltDB.  VoltDB is fully durable, able to process hundreds of thousands to millions of multi-statement SQL transactions per second, all while producing SQL-driven real-time analytics.

Today an increasing number of emerging databases work partially or totally in-memory while existing ones are changing their design to incorporate this capability. What are in your view the most relevant features users need to look for when trying to choose from an in-memory based database?

First and foremost, users should realize that not all in-memory databases are created equal.  In short, architecture choices require trade-offs.  Some IMDBs are created to process reads (queries) faster and others, like VoltDB, are optimized for fast writes.  It is impractical (impossible) to get both the fastest writes and the fastest reads at the same time on the same data, all while maintaining high consistency because the underlying data organization and architecture is different for writes (row oriented) than it is for reads (columnar).

 It is possible to maintain two separate copies of the data, one in row format, the other in compressed column format, but that reduces the consistency level - data may not agree, or may take a while to agree between the copies.

Legacy databases can be tweaked to run in memory, but realize that, short of a complete re-write, the underlying architecture may still be disk-based, and thus incur significant (needless) processing overhead.

VoltDB defines itself as an in-memory and operational database. What does this mean in the context of Big Data and what does it mean in the context of IT’s traditional separation between transactional and analytical workloads, how does VoltDB fit or reshapes this schemas?

VoltDB supports heavy write workloads - it is capable of ingesting never-ending streams of data at high ingestion rates (100,000+/second per machine, so a cluster of a dozen nodes can process over a million transactions a second).

While processing this workload, VoltDB can calculate (via standard SQL) and deliver strongly consistent real-time analytics, either ad hoc, or optimally, as pre-computed continuous queries via our Materialized View support.

These are capabilities simply not possible with traditional relational databases.  In the Big Data space, this places VoltDB at the front end, as the ingestion engine for feeds of data, from telco, digital ad tech, mobile, online gaming, IoT, Finance and numerous other application domains.

Just recently, VoltDB passed the famous Jepsen Testing for improving safety of distributed databases with VoltDB 6.4, Could you share with us some details of the test, the challenges and the benefits it brought for VoltDB?

We have a nice landing page with this information, including Kyle’s and VoltDB’s founding engineer John Hugg’s blog.

In summary, distributed systems programming is hard. Implementing the happy path isn’t hard, but doing the correct thing (such as returning the correct answer) when things go wrong (nodes failing, networks dropping), is where most of the engineering work takes place. VoltDB prides itself on strong consistency, which means returning the correct answer at all times (or not returning an answer at all - if, for example, we don’t have all of the data available).

Kyle’s Jepsen test is one of the most stringent tests out there.  And while we hoped that VoltDB would pass on the first go-around, we knew Kyle was good at breaking databases (he’s done it to many before us!).  He found a couple of defects, thankfully finding them before any known customer found them, and we quickly went to work fixing them. Working with Kyle and eventually passing the Jepsen test was one of the 2016 engineering highlights at VoltDB. We’re quite proud of that effort.


One interesting aspect of VoltDB is that It’s a relational database complies fully with ACID and bring native SQL support, what are the differences of this design compared to, for example NoSQL and some so-called NewSQL offerings? Advantages, tradeoffs perhaps?

In general, NoSQL offerings favor availability over consistency - specifically, the database is always available to accept new content and can always provide content when queried, even if that content is not the most recent (i.e., correct) version written.

NoSQL solutions rely on non-standard query languages (some are SQL-like), to compute analytics. Additionally, NoSQL data stores do not offer rich transaction semantics, often providing “transactionality” on single key operations only.

Not all NewSQL database are created equal. Some favor faster reads (over fast writes).  Some favor geo-distributed data sets, often resulting in high latency, or at least unpredictable latency access and update patterns.  VoltDB’s focus is low and predictable OLTP (write) latency at high transactions/second scale, offering rich and strong transaction semantics.

Note that not all databases that claim to provide ACID transactions are equal. The most common place where ACID guarantees are weakened is isolation. VoltDB offers serializable isolation.

Other systems offer multiple levels of isolation, with a performance tradeoff between better performance (weak guarantees) and slower performance (strong guarantees). Isolation models like Read-Committed and Read-Snapshot are examples; many systems default to one of these.

VoltDB’s design trades off complex multi-dimensional (OLAP) style queries for high throughput OLTP-style transactions while maintaining an ACID multi-statement SQL programming interface. The system is capable of surviving single and multi-node failures.

Where failures force a choice between consistency and availability, VoltDB chooses consistency. The database supports transactionally rejoining failed nodes back to a surviving cluster and supports transactionally rebalancing existing data and processing to new nodes.

Real-world VoltDB applications achieve 99.9% latencies under 10ms at throughput exceeding 300,000 transactions per second on commodity Xeon-based 3-node clusters.

How about the handling of non-structured information within VoltDB? Is it expected VoltDB to take care of it or it integrates with other alternative solutions? What’s the common architectural scenario in those cases?

VoltDB supports the storage of JSON strings and can index, query and join on fields within those JSON values. Further, VoltDB can process streamed JSON data directly into the database using our Importers (See the answer for question #9) and custom formatters (custom decoding) - this makes it possible for VoltDB to transactionally process data in almost any format, and even to act as an ETL engine.

How does VoltDB interact with players in the Big Data space such as Hadoop, both open source and commercial distributions?

The VoltDB database supports directly exporting data into a downstream data lake.  This target could be Hadoop, Vertica, a JDBC source or even flat files.  VoltDB handles the real-time data storage and processing, as it is capable of transactionally ingesting (database “writes”) millions of events per second.

Typically the value of this data decreases with age - it becomes cold or stale - and eventually would be migrated to historical storage such as Hadoop, Spark, Vertica, etc.  Consider applications in the telco or online gaming space - the “hot data” may have a lifespan of one month in telco, or even one hour or less, in the case of game play.

Once the data becomes “historical” and is of less immediate value, it may be removed from VoltDB and stored on disk in the historical archive (such as Hadoop, Vertica, etc).

What capabilities VoltDB offers not just for database administration but for development on top of VoltDB with Python, R, or other languages?

While VoltDB offers traditional APIs such as JDBC, ODBC, Java and C++ native bindings, as well as Node.js, Go, Erlang, PHP, Python, etc., I think one of the more exciting next-generation features VoltDB offers is the ability to stream data directly into the database via our in-process Importers. VoltDB is a clustered database, meaning a database comprises one (1) or more processes (usually a machine, VM or container).

A database can be configured to have an “importer,” which is essentially a plug-in that listens to a source, reads incoming messages (events, perhaps) and transactionally processes them. If the VoltDB database is highly available, then the importer is highly available (surviving node failure).  VoltDB supports a Kafka Importer and a socket importer, as well as the ability to create your own custom importer.

Essentially this feature “eliminates the client application” and data can be transactionally streamed directly into VoltDB.  The data streamed can be JSON, CSV, TSV or any custom-defined format.  Further, the importer can choose which transactional behavior to apply to the incoming data.  This is how future applications will be designed: by hooking feeds, streams of data, directly to the database - eliminating much of the work of client application development.

We have one customer who has produced one of the top 10 games in the app store - their application streams in-game events into VoltDB at a rate upwards of 700,000 events per second.  VoltDB hosts a Marketing Optimization application that analyzes these in-game events in an effort to boost revenue.

If you had a crystal ball, how would you visualize the database landscape in 5 years from now? Major advancements?

Specialized databases will continue to carve out significant market share from established vendors.
IoT will be a major market, and will drive storage systems to support two activities: 1) Machine learning (historical analysis) on the Data Lake/Big Data; storage engines will focus on enabling data scientists to capture value from the vast increases of data, and 2) Real-time processing of streams of data. Batch processing of data is no longer acceptable - real-time becomes a “must have”.

Data creation continues to accelerate and capturing value from fresh data in real-time is the new revenue frontier.

Finally, could tell us a song that is an important part of the soundtrack of your life?  

I’m a passionate Bruce Springsteen fan (and also a runner), so it would have to be “Born to Run”.

Springsteen captures that youthful angst so perfectly, challenging us to break out of historic norms and create and experience new things, to challenge ourselves.

This perfectly captures the entrepreneurial spirit both of personal “self” as well as “professional self,” and it matches the unbridled spirit of what we’re trying to accomplish with VoltDB. “Together we could break this trap We'll run till we drop, baby we'll never go back.”

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

(Image Courtesy of Teradata)
In a post about Teradata’s 2016 Partners event I wrote about the big effort Teradata is making to ensure its software offerings are now available both on-premises and in the Cloud, in variety of forms and shapes, making a big push to ensure Teradata’s availability, especially for hybrid cloud configurations.

So, the data management and analytics software giant seems to be sticking to its promise by increasingly incorporating its flagship Teradata Database other solutions to the Cloud in the form of its own Manage Cloud for Americas and Europe, a private cloud-ready solution or via public cloud providers such as AWS and most recently announced on Microsoft’s Azure Marketplace.

To chat about this latest news and Teradata’s the overall strategy directed to the cloud we’ve sat with Teradata’s Brian Wood.

Brian Wood is director of cloud marketing at Teradata. He is a results-oriented technology marketing executive with 15+ years of digital, lead gen, sales / marketing operations & team leadership success.

Brian has an MS in Engineering Management from Stanford University, a BS in Electrical Engineering from Cornell University, and served as an F-14 Radar Intercept Officer in the US Navy.

All along 2016 and especially during its 2016 Partners conference, Teradata made it clear it is undergoing an important transformation process and, a key strategy includes its path to the cloud. Offerings such as Teradata Database on different private and public cloud configurations, including AWS, VMware, Teradata Managed Cloud, and of course Microsoft Azure are available now. Could you share some details about the progress of this strategy so far?

Thanks for asking, Jorge. It’s been a whirlwind because Teradata has advanced tremendously across all aspects of cloud deployment in the past few months; the progress has been rapid and substantial.

To be clear, hybrid cloud is central to Teradata’s strategy and it’s all about giving customers choice. One thing that’s unique to Teradata is that we offer the very same data and analytic software across all modes of deployment – whether managed cloud, public cloud, private cloud, or on-premises.

What this means to customers is that it’s easy for them to transfer data and workloads from one environment to another without hassle or loss of functionality; they can have all the features in any environment and dial it up or down as needed. Customers like this flexibility because nobody wants to locked in and it’s also helpful to be able to choose the right tool for the job and not worry about compatibility or consistency of results.

Specific cloud-related advancements in the last few months include:
  • Expanding Teradata Managed Cloud to now include both Americas and Europe
  • Increasing the scalability of Teradata Database on AWS up to 64 nodes
  • Launching Aster Analytics on AWS with support up to 33 nodes
  • Expanding Teradata Database on VMware scalability up to 32 virtual nodes
  • Bolstering our Consulting and Managed Services across all cloud options
  • And announcing upcoming availability of Teradata Database on Azure in Q1
These are just the ones that have been announced; there are many more in the pipeline queued up for release in the near future. Stay tuned!

The latest news is the availability of Teradata Database on Microsoft’s Azure Marketplace. Could you give us the details around the announcement?

We’re very excited about announcing Q1 availability for Teradata Database on Azure because many important Teradata customers have told us that Microsoft Azure is their preferred public cloud environment. We at Teradata are agnostic; whether AWS, Azure, VMware, or other future deployment options, we want what’s best for the customer and listen closely to their needs.

It all ties back to giving customers choice in how they consume Teradata, and offering the same set of capabilities across the board to make experimentation, switching, and augmentation as easy as possible.

Our offerings on Azure Marketplace will be very similar to what we offer on AWS Marketplace, including:
  • Teradata Database 15.10 (our latest version)
  • Teradata ecosystem software (including QueryGrid, Unity, Data Mover, Viewpoint, Ecosystem Manager, and more)
  • Teradata Aster Analytics for multi-genre advanced analytics
  • Teradata Consulting and Managed Services to help customers get the most value from their cloud investment
  • Azure Resource Manager Templates to facilitate the provisioning and configuration process and accelerate ecosystem deployment

What about configuration and licensing options for Teradata Database in Azure?

The configuration and licensing options for Teradata Database on Azure will be similar to what is available on AWS Marketplace. Customers use Azure Marketplace as the medium through which to find and subscribe to Teradata software; they are technically Azure customers but Teradata provides Premier Cloud Support as a bundled part of the software subscription price.

One small difference between what will be available on Azure Marketplace compared to what is now available on AWS Marketplace is subscription duration. Whereas on AWS Marketplace we currently offer both hourly and annual subscription options, on Azure Marketplace we will initially offer just an hourly option.

Most customers choose hourly for their testing phase anyway, so we expect this to be a non-issue. In Q2 we plan to introduce BYOL (Bring Your Own License) capability on both AWS Marketplace and Azure Marketplace which will enable us to create subscription durations of our choosing.

Can we expect technical and functional limitations from this version compared with the on-premises solution?

No, there are no technical or functional limitations of what is available from Teradata in the cloud versus on-premises. In fact, this is one of our key differentiators: customers consume the same best-in-class Teradata software regardless of deployment choice. As a result, customers can have confidence that their existing investment, infrastructure, training, integration, etc., is fully compatible from one environment to another.

One thing to note, of course, is that a node in one environment will likely have a different performance profile than what is experienced with a node in another environment. In other words, depending on the workload, a single node of our flagship Teradata IntelliFlex system may require up to six to ten instances or virtual machines in a public cloud environment to yield the same performance.

There are many variables that can affect performance – such as query complexity, concurrency, cores, I/O, internode bandwidth, and more – so mileage may vary according to the situation. This is why we always recommend a PoC (proof of concept) to determine what is needed to meet specific customer requirements.

Considering a hybrid cloud scenario. What can we expect in regards to the integration with the rest of the Teradata stack, especially on-premises?

Hybrid cloud is central to Teradata’s strategy; I cannot emphasize this enough. We define hybrid cloud as a customer environment consisting of a mix on managed, public, private, and on-premises resources orchestrated to work together.

We believe that customers should have choice and so we’ve made it easy to move data and workloads in between these deployment modes, all of which use the same Teradata software. As such, customers can fully leverage existing investments, including infrastructure, training, integration, etc. Nothing is stranded or wasted.

Hybrid deployment also introduces the potential for new and interesting use cases that were less economically attractive in an all-on-premises world. For example, three key hybrid cloud use cases we foresee are:
  • Cloud data labs – cloud-based sandboxes that tie back to on-premises systems
  • Cloud disaster recovery – cloud-based passive systems that are quickly brought to life only when needed
  • Cloud bursting – cloud-based augmentation of on-premises capacity to alleviate short-term periods of greater-than-usual utilization

How about migrating from existing Teradata deployments to Azure? What is the level of support Teradata and/or Azure will offer?

Teradata offers more than a dozen cloud-specific packages via our Consulting and Managed Services team to help customers get the most value from their Azure deployments in three main areas: Architecture, Implementation, and Management.

Specific to migration, we first always recommend that customers have a clear strategy and cloud architecture document prior to moving anything so that the plan and expectations are clear and realistic. We can facilitate such discussions and help surface assumptions about what may or may not be true in different deployment environments.

Once the strategy is set, our Consulting and Managed Services team is available to assist customers or completely own the migration process, including backups, transfer, validation, testing, and so on. This includes not only Teradata-to-Teradata migration (e.g., on-premises to the cloud), but also competitor-to-Teradata migrations as well. We especially love the latter ones!

Finally, can you share with us a bit of what is next for Teradata in the Cloud?

Wow, where should I start? We’re operating at breakneck pace. Seriously, we have many new cloud developments in the works right now, and we’ve been hiring cloud developers like crazy (hint: tell ‘em Brian sent you!).

You’ll see more cloud announcements from us this quarter, and without letting the cat out of bag, expect advancements in the realm of automation, configuration assistance, and an expansion in managed offers.

Cloud is a key enabler to our ability to help customers get the most value from their data, so it’s definitely an exciting time to be involved in helping define the future of Teradata.
Thanks for your questions and interest!

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

(Image courtesy of Thomas Skirde)
As I mentioned in a first blog about the book, I'm now working hard to deliver a piece that will hopefully, serve as a practical guide for the implementation of a successful modern data management platform.

I'll try to provide frequent updates and, perhaps, share some pains and gains about its development.
For now, here's some additional information, including the general outline and the type of audience is intended for.

I invite you to be part of the process and leave your comments, observations and encouragement quotes right below, or better yet, to consider:
  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book, soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up to our pre-order list, or for
  • Providing us with information about your own successful enterprise use case, which we may use in the book
Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.
So here, take a look at the update...

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

Main Post Image courtesy of Thomas Skirde 

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Recently big data and analytics company Rocana, a software company specialized in developing solutions that bring visibility for IT & DevOps announced the release of its data platform Rocana Ops.

It is in this context that we had the chance to have an excellent interview with Eric Sammer, CTO and Co-Founder of Rocana who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.

Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at Cloudera where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.

He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.

Eric is also the author of Hadoop Operations published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.

Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?

Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.

These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.

Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.

Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.

We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.

How would you describe what Rocana Ops is?

Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.

A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.

Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?

I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.

There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.

Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is Conway’s Law in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.

IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.

We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.

What are the major pains of IT Ops and how Rocana helps them to solve this pains?

Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.

Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.

Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.

Could you describe Rocana’s main architecture?

Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.

As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.

Rocana's General Architecture (Courtesy of Rocana)

A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.

Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.

What are the basic or general requirements needed for a typical Rocana deployment?

Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.

Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.

A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.

As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.

What are then, the key functional differentiators of Rocana?

Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.

Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.

 Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.

Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.

Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?

Rocana Ops 2.0 is really exciting for us.

We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.

Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.

One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!

How does Rocana interact with the open source community, especially the Apache Hadoop project?

Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.

We’re committed to collaborating with the community whenever possible. We’ve open sourced parts of Rocana Ops where we believe there’s a benefit to the community (like Osso - A modern standard for event-oriented data). As we build with projects like Apache Hadoop, Kafka, Spark, Impala, and Lucene, we look closely at places where we can contribute features, insight, feedback, testing, and (most often) fixes.

The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.

Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.

Finally, what is in your opinion the best and geekiest song ever?

Now you’re speaking my language; I studied music theory.
Lateralus by Tool for the way it plays with the fibonacci sequence and other math without it being gimmicky or unnatural.
A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.

DrivenBI Helps Companies Drive Analytics to the Next Level

DrivenBI Helps Companies Drive Analytics to the Next Level

Privately held company DrivenBI was formed in 2006 by a group of seasoned experts and investors in the business intelligence (BI) market in Taiwan and the United States. Currently based in Pasadena, California, the company has been steadily growing in the ten years since, gaining more than 400 customers in both the English and Chinese markets.

Led by founder and CEO Ben Tai (previously VP of global services with the former BusinessObjects, now part of SAP), DrivenBI would be considered part of what I call a new generation of BI and analytics solutions that is changing the analytics market panorama, especially in the realm of cloud computing.

A couple of weeks ago, I had the opportunity to speak with DrivenBI’s team and to have a briefing and demonstration, most of it in regards to their current analytics offerings and the company’s business strategy and industry perspective, all of which I will share with you here.

How DrivenBI Drives BI
DrivenBI’s portfolio is anchored by SRK, DrivenBI’s native cloud self-service BI platform and collaboration hub.

SRK provides a foundation for sourcing and collecting data in real time within a collaborative environment. Being a cloud platform, SRK can combine the benefits of a reduced IT footprint with a wide range of capabilities for efficient data management.

The SRK native cloud-centralized self-service BI solution offers many features, including:
  • the ability to blend and work with structured and unstructured data using industry-standard data formats and protocols;
  • a centralized control architecture providing security and data consistency across the platform;
  • a set of collaboration features to encourage team communication and speed decision making; and agile reporting and a well-established data processing logic.
SRK’s collaborative environment featuring data and information sharing between users within a centralized setting allows users to maintain control over every aspect and step of the BI and analytics process (figure 1).

Figure 1. DrivenBI’s SRK self-driven and collaborative platform (courtesy of DrivenBI)
DrivenBI: Driving Value throughout Industries, Lines of Business, and Business Roles

One important aspect of the philosophy embraced by DrivenBI has to do with its design approach, providing, within the same platform, valuable services across the multiple functional areas of an organization, including lines of business such as finance and marketing, inventory control, and resource management, as well as across industries such as fashion, gaming, e-commerce, and insurance.

Another element that makes DrivenBI an appealing offering is its strategic partnerships, specifically with Microsoft Azure and DrivenBI has the ability to integrate with both powerhouse cloud offerings.

I had the opportunity to play around a bit with DrivenBI’s platform, and I was impressed with the ease of use and intuitive experience in all stages of the data analytics process, especially for dynamic reporting and dashboard creation (figure 2).

Figure 2. DrivenBI’s SRK dashboard (courtesy of DrivenBI)
Other relevant benefits of the DrivenBI platform that I observed include:
  • elimination/automation of some heavy manual processes;
  • analysis and collaboration capabilities, particularly relevant for companies with organizational and geographically distributed operations, such as widespread locations, plants, and global customers;
  • support for multiple system data sources, including structured operational data, unstructured social media sources, and others.
As showcased in its business-centered approach and design, DrivenBI is one of a new generation of BI and analytics offerings that enable a reduced need for IT intervention in comparison to peer solutions like Domo, Tableau, and GoodData. These new-generation solutions are offered through cloud delivery, a method that seems to suit analytics and BI offerings and their holistic take on data collection well. By replacing expensive IT-centric BI tools, the DrivenBI cloud platform is useful for replacing or minimizing the use of complex spreadsheets and difficult analytics processes.

DrivenBI’s Agile Analytics
My experience with DrivenBI was far more than “interesting.� DrivenBI is a BI software solution that is well designed and built, intuitive, and offers a fast learning curve. Its well-made architecture makes the solution easy to use and versatile. Its approach—no spreadsheets, no programming, no data warehouse—is well-suited to those organizations that truly need agile analytics solutions. Still, I wonder how this approach fits with large BI deployments that require robust data services, especially in the realms of merging traditional analytics with big data and Internet of Things (IoT) strategies.

To sample what DrivenBI has to offer, I recommend checking out its SRK demo:

(Originally published on TEC's Blog)

Yep, I’m Writing a Book on Modern Data Management Platforms

Yep, I’m Writing a Book on Modern Data Management Platforms


Over the past couple of years, I have spent lots of time talking with vendors, users, consultants, and other analysts, as well as plenty of people from the data management community about the wave of new technologies and continued efforts aimed at finding the best software solutions to address the increasing number of issues associated with managing enterprise data. In this way, I have gathered much insight on ways to exploit the potential value of enterprise data through efficient analysis for the purpose of “gathering important knowledge that informs better decisions.

Many enterprises have had much success in deriving value from data analysis, but a more significant number of these efforts have failed to achieve much, if any, useful results. And yet other users are still struggling with finding the right software solution for their business data analysis needs, perhaps confused by the myriad solutions emerging nearly every single day.

It is precisely in this context that I’ve decided to launch this new endeavor and write a book that offers a practical perspective on those new data platform deployments that have been successful, as well as practical use cases and plausible design blueprints for your organization or data management project. The information, insight, and guidance that I will provide is based on lessons I’ve learned through research projects and other efforts examining robust and solid data management platform solutions for many organizations.

In the following months, I will be working hard to deliver a book that serves as a practical guide for the implementation of a successful modern data management platform.
The resources for this project will require crowdfunding efforts, and here is where your collaboration will be extremely valuable.
There are several ways in which you can participate:

  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book (soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up at the link below)
  • Providing us with information about your own successful enterprise use case, which we may use in the book

To let us know which of these options best fits with your spirit of collaboration, and to receive the latest updates on this book, as well as other interesting news, you just need to sign up to our email list here. Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.

In the meantime, I’d like to leave you with a brief synopsis of the contents of this book, with more details to come in the near future:

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

About the Author 
Jorge Garcia is an industry analyst in the areas of business intelligence (BI) and data management. He’s currently a principal analyst with Technology Evaluation Centers (TEC).

His experience expands for more than 25 years in all phases of application development, database, data warehouse (DWH) and analytics and BI solution design, including more than 15 years in project management, covering best practices and new technologies in the BI/DWH space.

Prior to joining TEC, he was a senior project manager and senior analyst developing BI, DWH, and data integration applications using solutions such as Oracle, SAP, Informatica, IBM, Teradata, among others. Garcia also worked on projects related to the implementation of data management solutions for the private and public sectors including banking, insurance, retail, and services.

A proud member of the Boulder BI Brain Trust, Garcia also makes frequent public speaker appearances, and is an educator and influencer in different topics related to data management.

When not busy researching, speaking, consulting, and mingling with people in this industry, Garcia finds solace as an avid reader, music lover, and soccer fan, as well as proud father "trying" to raise his three lovely kids while his wife tries to re-raise him.

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Processing data fast and efficiently has become a never ending race. With the increasing need for data consumption by companies comes along a never ending “need for speed� for processing data and consequently, the emergence of new generation of database software solutions that emerging to fulfill this need for high performance data processing.

These new database management systems that incorporate novel technology provide high speed, and more efficient access and processing of large bulks of data.

EXASOL is one of this disruptive "new" database solution. Headquartered out of Nuremberg, Germany and with offices around the globe, EXASOL has worked hard to bring a fresh, new approach to the data analytics market via the offering of a world-class database solution.

In this interview, we took the opportunity to chat with EXASOL’s Aaron Auld about the company and its innovative database solution.

Aaron Auld is the Chief Executive Officer as well as the Chairman of the Board at EXASOL, positions he has held since July 2013. He was made a board member in 2009.

As CEO and Chairman, Aaron is responsible for the strategic direction and execution of the company, as well as growing the business internationally.

Aaron embarked on his career back in 1996 at MAN Technologie AG, where he worked on large industrial projects and M&A transactions in the aerospace sector. Subsequently, he worked for the law firm Eckner-Bähr & Colleagues in the field of corporate law.

After that, the native Brit joined Océ Printing Systems GmbH as legal counsel for sales, software, R&D and IT. He then moved to Océ Holding Germany and took over the global software business as head of corporate counsel. Aaron was also involved in the IPO (Prime Standard) of Primion Technology AG in a legal capacity, and led investment management and investor relations.

Aaron studied law at the Universities of Munich and St. Gallen. Passionate about nature, Aaron likes nothing more than to relax by walking or sailing and is interested in politics and history.

So, what is EXASOL and what is the story behind it?

EXASOL is a technology vendor that develops a high-performance in-memory analytic database that was built from the ground up to analyze large volumes of data extremely fast and with a high degree of flexibility.
The company was founded back in the early 2000's in Nuremberg, Germany, and went to market with the first version of the analytic database in 2008.

Now in its sixth generation, EXASOL continues to develop and market the in-memory analytic database working with organizations across the globe to help them derive business insight from their data that helps them to drive their businesses forward.

How does the database work? Could you tell us some of the main features?

We have always focused on delivering an analytic database ultra-fast, massively scalable analytic performance. The database combines in-memory, columnar storage and massively parallel processing technologies to provide unrivaled performance, flexibility and scalability.

The database is tuning-free and therefore helps to reduce the total cost of ownership while enabling users to solve analytical tasks instead of having to cope with technical limits and constraints.

With the recently-announced version 6, the database now offers a data virtualization and data integration framework which allows users to connect to more data sources than ever before.

Also, alongside out-of-the-box support for R, Lua, Python and Java, users can integrate the analytics programming language of their choice and use it for in-database analytics.

Especially today, speed of data processing is important. I’ve read EXASOL has taken some benchmarks in this regard. Could you tell us more about it?

One of the truly independent set of benchmark tests available is offered by the Transactional Processing Council (TPC).  A few years ago we decided to take part in the TPC-H benchmark and ever since we have topped the tables in terms of not only performance (i.e. analytic speeds) but also in terms of price/performance (i.e. cost aligned with speed) when analyzing data volumes ranging from 100GB right up to 100TB.   No other database vendor comes close.
The information is available online here.

One of the features of EXASOL is that, if I’m not mistaken, is deployed on commodity hardware. How does EXASOL’s design guarantee optimal performance and reliability?

Offering flexible deployment models in terms of how businesses can benefit from EXASOL has always been important to us at EXASOL.

Years ago, the concept of the data warehouse appliance was talked about as the optimum deployment model, but in most cases it meant that vendors were forcing users to use their database on bespoke hardware, on hardware that then could not be re-purposed for any other task.  Things have changed since: while the appliance model is still offered, ours is and always has been one that uses commodity hardware.

Of course, users are free to download our software and install it on their own hardware too.
It all makes for a more open and transparent framework where there is no vendor lock-in, and for users that can only be a good thing.  What’s more, because the hardware and chip vendors are always innovating, when a new processor or server is released, users only stand to benefit as they will see yet even faster performance when they run EXASOL on that new technology.
We recently discussed this in a promotional video for Intel.

Price point related, is it intended only for large organizations, what about medium and small ones with needs for fast data processing?

We work with organizations both large and small.  The common denominator is always that they have an issue with their data analytics or incumbent database technology and that they just cannot get answers to their analytic queries fast enough.

Price-wise, our analytic database is extremely competitively priced and we offer organizations of all shapes and sizes to use our database software on terms that best fit their own requirements, be that via a perpetual license model, a subscription model, a bring-your-own license model (BYOL) – whether on-premises or in the cloud.

What would be a minimal configuration example? Server, user licensing etc.?

Users can get started today with the EXASOL Free Small Business Edition.  It is a single-node only edition of the database software and users can pin up to 200GB of data into RAM.

Given that we advocate a 1:10 ratio of RAM vs raw data volume, this means that users can put 2TB of raw data into their EXASOL database instance and still get unrivaled analytic performance on their data – all for free. There are no limitations in terms of users.

We believe this is a very compelling advantage for businesses that want to get started with EXASOL.

Later, when data volumes grow and when businesses want to make use of advanced features such as in-database analytics or data virtualization, users can then upgrade to the EXASOL Enterprise Cluster Edition which offers much more in terms of functionality.

Regarding big data requirements, could you tell us some of the possibilities to integrate or connect EXASOL with big data sources/repositories such as Hadoop and others?

EXASOL can be easily integrated into every IT infrastructure.  It is SQL-compliant and, is compatible with leading BI and ETL products such as Tableau, MicroStrategy, Birst, IBM Cognos, SAP BusinessObjects, Alteryx, Informatica, Talend, Looker and Pentaho, and provides the most flexible Hadoop connector on the market.

Furthermore, through an extensive data virtualization and integration framework, users can now analyze data from more sources more easily and faster than ever before.

Recently, the company announced that EXASOL is now available on Amazon. Could you tell us a bit more about the news? EXASOL is also available on Azure, right?

As more and more organizations are deploying applications and their systems in the cloud, it’s therefore important that we can allow them to use EXASOL in the cloud, too.  As a result, we are now available on Amazon Web Services as well as Microsoft Azure.  What’s more, we continue to offer our own cloud and hosting environment, which we call EXACloud.

Finally, on a more personal topic. Being a Scot who lives in Germany, would you go for a German beer or a Scottish whisky?

That’s an easy one.  First enjoy a nice German beer (ideally, one from a Munich brewery) before dinner, then round the evening off with by savoring a nice Scottish whisky.  The best of both worlds.

Logging challenges for containerized applications: Interview with Eduardo Silva

Logging challenges for containerized applications: Interview with Eduardo Silva

Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.

In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session: Logging for Containers which will take place during the conference.

Eduardo Silva is a principal Open Source developer at Treasure Data Inc where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.

He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.

A well known speaker, Eduardo has been speaking in events across South America and in recent Linux Foundation events in the US, Asia and Europe.

Thanks so much for your time Eduardo!

What is a container and how is it applied specifically in Linux?

When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers: cgroups and namespaces.

  • cgroups (control goups) allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
  • on the other hand namespaces (associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.

For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.

So, what is the difference between a Linux Container and, for example a virtual machine?

A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.

So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?

There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.

Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?

Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers. LXC and LXD provide the next level of abstraction to manage containers, LXCFS is a user-space file system for containers (works on top of Fuse).
Since I don't play with containers at low level, I don't have a strong preference.

And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?

Docker is the big player nowadays, it provide good security and mechanisms to manage/deploy containers. CoreOS have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like Kubernetes are already providing support for it.

You are also working on a quite interesting project called Fluent-Bit. What is the project about?

I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this, Fluentd exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born: Fluent Bit.
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:

  • Input / Output plugins
  • Event driven (async I/O operations)
  • Built-in Metrics
  • Security: SSL/TLS
  • Routing
  • Buffering
  • Fluentd Integration

Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:

  • Input
    • Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
    • Head: read initial chunks of bytes from a file.
    • Health: check remote TCP server healthy.
    • kmsg: read Kernel log messages.
    • CPU: collect CPU metrics usage, globally and per core.
    • Mem: memory usage of the system or from a specific running process.
    • TCP: expect for JSON messages over TCP.
  • Output
    • Elasticsearch database
    • Treasure Data (our cloud analytics platform)
    • NATS Messaging Server
    • HTTP end-point

So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into Elasticsearch, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?

You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?

I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.

Finally, on a personal note. Which do you consider to be the geekiest songs of this century?

That's a difficult question!
 I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).

Teradata Partners Conference 2016: Teradata Everywhere

Teradata Partners Conference 2016: Teradata Everywhere

Our technologized society is becoming opaque.
As technology becomes more ubiquitous and our relationship with digital devices ever
more seamless, our technical infrastructure seems to be increasingly intangible.
- Honor Harger

An idea that I could sense was in the air during my last meeting with Teradata’s crew in California, during their last influencer event, was confirmed and reaffirmed a couple of weeks ago during Teradata’s big partner conference: Teradata is now in full-fledged transformational mode.

Of course, for companies like Teradata that are used to being on the front line of the software industry, particularly in the data management space, transformation has now become much more than a “nice to do”. These days it’s pretty much the life breath of any organization at the top of the software food chain.

These companies have the complicated mandate to, if they want to stay at the top, be fast and smart enough to provide the software, the method, and the means to enable customers to gain technology and business improvements and the value that results from these changes.

And while it seems Teradata has taken its time for this transformation it is also evident that the company is taking it very seriously. Will this be enough to keep pace with peer vendors within a very active, competitive, and transformational market? Well, it’s hard to say, but certainly with a number of defined steps, Teradata looks like it will be able to meet its goal of remaining a key player in the data management and analytics industry.

Here we take an up-to-date look at Teradata’s business and technology strategy, including its flexible approach to deployment and ability for consistent and coherent analytics over all types of deployment, platforms, and sources of data; and then explore what the changes mean for the company and its current and future customers.

The Sentient Enterprise
As explained in detail in a previous installment, Teradata has developed a new approach towards the adoption of analytics, called the “sentient enterprise.” This approach aims to guide companies to:

  • improve their data agility
  • adopt a behavioral data platform
  • adopt an analytical application platform
  • adopt an autonomous decision platform

While we won’t give a full explanation of the model here (see the video below or my recent article on Teradata for a fuller description of the approach), there is no doubt that this is a crucial pillar for Teradata’s transformational process, as it forms the backbone of Teradata‘s approach to analytics and data management.

Teradata Video: The Sentient Enterprise

As mentioned in the previous post, one aspect of the “sentient enterprise” approach from Teradata that I particularly like is the “methodology before technology” aspect, which focuses on scoping the business problem, then selecting the right analytics methodology, and at the end choosing the right tools and technology (including tools such as automatic creation models and scoring datasets).

Teradata Everywhere
Another core element of the new Teradata approach consists of spreading its database offering wide, i.e., making it available everywhere, especially in the cloud. This movement involves putting Teradata’s powerful analytics to work. Teradata Database will now be available in different delivery modes and via different providers, including on:

  • Amazon Web Services—Teradata Database will be available for a massively parallel process (MPP) configuration and scalable for up to 32 nodes, including services such as node failure recovery and backup, as well as restoring and querying data in Amazon’s Simple Storage Service (S3). The system will be available in more than ten geographic regions.
  • Microsoft’s Azure—Teradata Database is expected to be available by Q4 of 2016 in the Microsoft Azure Marketplace. It will be offered with MPP (massively parallel processing) features and scalability for up to 32 nodes.
  • VMWare——via the Teradata Virtual Machine Edition (TVME), users have the option for deploying a virtual machine edition of Teradata Database for virtual environments and infrastructures.
  • Teradata Database as a Service—Extended availability for the Teradata Database will be available to customers in Europe through a data center hosted in Germany.

Teradata’s own on-premises IntelliFlex platform.

Availability of Teradata Database on different platforms

Borderless Analytics and Hybrid Clouds
The third element in the new Teradata Database picture involves a comprehensive provision of analytics despite the delivery mode chosen, an offering which fits the reality of many organizations—a hybrid environment consisting of both on-premises and cloud offerings.

With a strategy called Borderless Analytics, Teradata allows customers to deploy comprehensive analytics solutions within a single analytics framework. Enabled by Teradata’s solutions such as its multi-source SQL and processing QueryGrid engine and Unity, its orchestration engine for Teradata’s multi-system’s environments, this strategy purposes a way to perform consistent and coherent analytics over heterogeneous platforms with multiple systems and sources of data, i.e., in the cloud, on-premises, or virtual environments.

At the same time, this is also serving Teradata as a way to set the basis for its larger strategy for addressing the Internet of Things (IoT) market. Teradata is addressing this goal with the release of a set of new offerings called Analytics of Things Accelerators (AoTAs), comprised by technology-agnostic intellectual property that emerged as a result of Teradata’s real life IoT project engagements.

These accelerators can help organizations determine which IoT analytical techniques and sensors to use and trust. Due to the AoTAs’ enterprise readiness and design, companies can deploy them without having an enterprise scaling approach in mind, and not have to go through time-consuming experimentation phases before deployment to ensure the right analytical techniques have been used. Teradata’s AoTAs accelerate adoption, enabling deployment cost reduction and ensuring reliability. This is a noteworthy effort to provide IoT projects with an effective enterprise analytics approach.

What Does this Mean for Current and Potential Teradata Customers?
Teradata seems to have a concrete, practical, and well-thought-out strategy regarding the delivery of new generation solutions for analytics, focusing on giving omnipresence, agility, and versatility to its analytics offerings, and providing less product dependency and more business focus to its product stack.

But one thing Teradata needs to consider, given the increasing number of solutions available from its portfolio, is being sure to provide clarity and efficiency to customers regarding which solution blend to choose. This is especially true when the solution choice involves increasingly sophisticated big data solutions, a market that is getting “top notch” but certainly is still difficult to navigate into, especially for those new to big data.

Teradata’s relatively new leadership team seems to have sensed right away that the company is currently in a very crucial position not only within itself but also within the industry of providing insights. If its strategy works, Teradata might be able to not only maintain its dominance in this arena but also increase its footprint in an industry destined to expand with the advent of the Internet of Things.

For Teradata’s existing customer base, these moves could be encouraging, as they could mean being able to expand the company’s existing analytics platforms using a single platform and therefore without any friction and with and cost savings.

For those considering Teradata as a new option, it means having even more options for deploying end-to-end data management solutions using a single vendor rather than a having a “best of breed” approach. Either way though, Teradata is pushing towards the future with a new and comprehensive approach to data management and analytics in an effort to remain a key player in this fierce market.

The question is if Teradata’s strategic moves will resonate effectively within the enterprise market to compete with the existing software monsters such as Oracle, Microsoft, and SAP.

Are you a Teradata user? If so, let us know what you think in the comments section below.

(Originally published on TEC's Blog)
Salesforce Acquires BeyondCore to Enable Analytics . . . and More

Salesforce Acquires BeyondCore to Enable Analytics . . . and More

In October of 2014, Salesforce announced the launch of Salesforce Wave, the cloud-based company’s analytics cloud platform. By that time, Salesforce had already realized that to be able to compete with the powerful incumbents in the business software arena—the Oracles, SAPs and IBMs of the world—arriving to the cloud at full swing would require it to expand its offerings to the business
IT Sapiens, for Those Who Are Not

IT Sapiens, for Those Who Are Not

Perhaps one of the most refreshing moments in my analyst life is when I get the chance to witness the emergence of new tech companies—innovating and helping small and big organizations alike to solve their problems with data. This is exactly the case with Latvia-based IT Sapiens, an up-and-coming company focused on helping those small or budget-minded companies to solve their basic yet crucial
Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

For anyone with even a small amount of understanding regarding current trends in the software industry it will come as no surprise that the great majority of enterprise software companies are focusing on the incorporation of analytics, big data, cloud adoption, and especially the Internet of Things into their software solutions. In fact, these capabilities have become so ubiquitous that for
Zyme: Emergence and Evolution of Channel Data Management Software

Zyme: Emergence and Evolution of Channel Data Management Software

Previous to the official launch of the new version of Zyme’s solution, I had the opportunity to chat and be briefed by Ashish Shete, VP of Products and Engineering at Zyme, in regard to version 3.0 of what Zyme describes as its channel data management (CDM) solution platform. This conversation was noteworthy from both the software product and industry perspectives. In particular, the solution
An Interview with Dataiku’s CEO: Florian Douetteau

An Interview with Dataiku’s CEO: Florian Douetteau

As an increasing number of organizations look for ways to take their analytics platforms to higher grounds, many of them are seriously considering the incorporation of new advanced analytics disciplines, this includes hiring data science specialists and solutions that can enable the delivery of improved data analysis and insights. As a consequence, this also triggers the emergence of new companies and offerings in this area.

Dataiku is one of these new breed of companies. With its Data Science Studio (DSS) solution, Dataiku aims to offer full data science solution for both data science experienced and non-experienced users.

In this opportunity I had the chance to interview Florian Douetteau, Dataiku’s CEO and be able to pick some of his thoughts and interesting views in regards to the data management industry and of course he’s company and software solution.
A brief Bio of Florian 

In 2000, at age 20, he dropped the prestigious “Ecole Normale Supérieure”  math courses and decided to look for the largest dataset he could find, and the hardest related problem he could solve.

That’s how he started working at Exalead, a search engine company that back at the time was developing technologies in web mining, search, natural language processing (NLP) and distributed computing. At Exalead, Florian scaled to be managing VP of Product and R&D. He stayed in the company until it was acquired in 2010 by Dassault Systèmes for $150M (a pretty large amount for French standards).

Still in 2010 when the data deluge was pouring into to new seas, Florian worked in the social gaming and online advertising industry, an industry where machine learning was already being applied on petabytes of data. Between 2010 and 2013 he held several positions as consultant and CTO.

 By 2013 Florian along with other 3 co-founders creates Dataiku with the goal of making advanced data technologies accessible to companies that are not digital giants, since then one of Florian’s main goals as CEO of Dataiku is to be able of democratizing access to Data Science.

So, you can watch the video or listen to the podcast in which Florian shares with us some of his views on the fast evolution of data science, analytics, big data and of course, his data science software solution.

 Of course, please feel free to let us know your comments and questions.
Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Logo courtesy of Altiscale Let me just say right off the bat that I consider Altiscale to be a really nice alternative for the provisioning of Big Data services such as Hortonworks, Cloudera or MapR. The Palo Alto, California–based company offers a full Big Data platform based in the cloud via the Altiscale Data Cloud offering. In my view, Altiscale has dramatically increased the appeal of
Hortonworks’s New Vision for Connected Data Platforms

Hortonworks’s New Vision for Connected Data Platforms

Courtesy of Hortonworks
On March 1, I had the opportunity to attend this year’s Hortonworks Analyst Summit in San Francisco, where Hortonworks announced several product enhancements and new versions and a new definition for its strategy going forward.

Hortonworks seems to be making a serious attempt to take over the data management space, while maintaining a commitment to open sources and especially to the Apache Foundation. Thus as Hortonworks keeps gaining momentum, it’s also consolidating its corporate strategy and bringing a new balance to its message (combining both technology and business).

By reinforcing alliances, and at the same time moving further towards the business mainstream with a more concise messaging around enterprise readiness, Hortonworks is declaring itself ready to win the battle for the big data management space.

The big question is if the company’s strategy will be effective enough to succeed at this goal, especially in a market already overpopulated and fiercely defended by big software providers.

Digesting Hortonworks’s Announcements
The announcements at the Hortonworks Analyst Summit included news on both the product and partner fronts. With regards to products, Hortonworks announced new versions for both its Hadoop Data (HDP) and Hadoop Dataflow (HDF) platforms.

HDP—New Release, New Cycle
Alongside specific features to improve performance and reinforce ease of use, the latest release of Apache HDP 2.4 (figure 1) includes the latest generation of Apache’s large-scale data processing framework, Spark 1.6, along with Ambari 2.2, Apache’s project for making Hadoop management easier and more efficient.

The inclusion of Ambari seems to be an important key for the provision of a solid, centric management and monitoring tool for Hadoop clusters.

Figure 1. Hortonworks emphaszes enterprise readiness for its HDP version
(Image courtesy of Hortonworks)

Another key announcement with regard to HDP is the revelation of a new release cycle for HDP. Interestingly, it aims to provide users with a consistent product featuring core stability. The new cycle will enable, via yearly releases, HDP services such as HDFS, YARN, and MapReduce as well as Apache Zookeeper to align with a compatible version of Apache Hadoop with the “ODPi Core,” currently in version 2.7.1. These can provide standardization and ensure a stable software base for mission critical workloads.

On the flip side, those extended services that run on top of the Hadoop core, including Spark, Hive, HBase, Ambari and others will be continually released throughout the year to ensure these projects are continuously updated.

Last but not least, HDP’s new version also comes with the new Smartsense 1.2, Hortonworks’s issue resolution application, featuring automatic scheduling and uploading, as well as over 250 new recommendations and guidelines.


Growing NiFi to an Enterprise Level
Along with HDP, Hortonworks also announced version 1.2 of HDF, Hortonworks’s offering for managing data in motion by collecting, manipulating, and curating data in real time. The new version includes new streaming analytics capabilities for Apache NiFi, which powers HDF at its core, and support for Apache Storm and Apache Kafka (figure 2).

Another noteworthy feature coming to HDF is its support for integration with Kerberos, a feature which will enable and ease management of centralized authentication across the platform and other applications. According to Hortonworks, HDF 1.2 will be available to customers in Q1 of 2016.

Figure 2. Improved security and control added to Hortonworks new HDF version
(Image courtesy of Hortonworks)

Hortonworks Adds New Partners to its List
The third announcement from Hortonworks at the conference was a partnership with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise (HPE).

The collaboration mainly has to do with a bipartisan effort to enhance performance and capabilities of Apache Spark. According to Hortonworks and HPE, this collaboration will be mainly focused on the development and analysis of a new class of analytic workloads which benefit from using large pools of shared memory.

Says Scott Gnau, Hortonworks’s chief technology officer, with regard to the collaboration agreement:

This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.

According to both companies, this collaboration has already generated interesting results which include more efficient memory usage and increased performance as well as faster sorting and in-memory computations for improving Spark’s performance.

The result of these collaborations will be derived as new technology contributions for the Apache Spark community, and thus carry beneficial impacts for this important piece of the Apache Hadoop framework.

Commenting on the new collaborations, Martin Fink, executive vice president and chief technology officer of HPE and board member of Hortonworks, said:

We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.

Additionally Hortonworks signed a partnership with Impetus Technologies, Inc., another solution provider based on open source technology. The agreement includes collaboration around StreamAnalytix™, an application that provides tools for rapid and less code development of real-time analytics applications using Storm and Spark. Both companies have the aim that with the use of HDF and StreamAnalytix together, companies will gain a complete and stable platform for the efficient development and delivery of real-time analytics applications.

But The Real News Is …
Hortonworks is rapidly evolving its vision of data management and integration, and this was in my opinion the biggest news of the analyst event. Hortonworks’s strategy is to integrate the management of both data at rest (data residing in HDP) and data in motion (data HDF collects and curates in real-time), as being able to manage both can power actionable intelligence. It is in this context that Hortonworks is working to increase integration between them.

Hortonworks is now taking a new go-to-market approach to provide an increase in quality and enterprise readiness to its platforms. Along with ensuring that ease of use will avoid barriers for end use adoption its marketing message is changing. Now the Hadoop-based company sees the need to take a step further and convince businesses that open source does more than just do the job; it is in fact becoming the quintessential tool for any important data management initiative—and, of course, Hortonworks is the best vendor for the job. Along these lines, Hortonworks is taking steps to provide Spark with enterprise-ready governance, security, and operations to ensure readiness for rapid enterprise integration. This to be gained with the inclusion of Apache Ambari and other Apache projects.

One additional yet important aspect within this strategy has to do with Hortonworks’s work done around enterprise readiness, especially regarding issue tracking (figure 3) and monitoring for mission critical workloads and security reinforcement.

Figure 3. SmartSense 1.2 includes more than 250 recommendations
(Image courtesy of Hortonworks)

It will be interesting to see how this new strategy works for Hortonworks, especially within the big data market where there is extremely fierce competition and where many other vendors are pushing extremely hard to get a piece of the pie, including important partners of Hortonworks.

Taking its data management strategy to a new level is indeed bringing many opportunities for Hortonworks, but these are not without challenges as the company introduces itself into the bigger enterprise footprint of the data management industry.

What do you think about Hortonworks’s new strategy in data management? If you have any comments, please drop me a line below and I’ll respond as soon as I can.

(Originally published)
Creating a Global Dashboard. The GDELT Project

Creating a Global Dashboard. The GDELT Project

There is probably no bigger dream for a data geek like myself than creating the ultimate data dashboard or scorecard of the world. One that summarizes and enables the analysis of all the data in the world. Well, for those of you who have also dreamt about this, Kalev H. Leetaru, a senior fellow at the George Washington University Center for Cyber & Homeland Security has tapped into your
Dell Toad’s Big Jump into the BI and Analytics Market

Dell Toad’s Big Jump into the BI and Analytics Market

Having a background in software and database development and design, I have a special nostalgia and appreciation for Toad’s set of database solutions, as in my past working life I was a regular user of these and other tools for database development. Of course, Toad’s applications have grown and expanded over the years and now cover the areas within data management that are key to many
TIBCO Spotfire Aims for a TERRific Approach to R

TIBCO Spotfire Aims for a TERRific Approach to R

terrific /təˈrɪfɪk/ adjective  1. very great or intense: a terrific noise 2. (informal) very good; excellent: a terrific singer The British Dictionary R is quickly becoming the most important letter in the world of analytics. The open source environment for statistical computing is now at the center of major strategies within many software companies. R is here to stay. As mentioned
Microsoft and the Revolution… Analytics

Microsoft and the Revolution… Analytics

You say you want a revolution
Well, you know
We all want to change the world
You tell me that it's evolution
Well, you know
We all want to change the world
(Revolution, Lennon &McCartney)

With a recent announcement  Microsoft took another of multiple steps towards what is now a clear internal and external revolution regarding the future of the company.

By announcing the acquisition of Revolution Analytics, a company that in a just a few years has become a leading provider of predictive analytics solutions, Microsoft looks not just to strengthen its already wide analytics portfolio but, perhaps is also trying to increase its presence in the open source and data science communities, with the latter being one with huge future potential. An interesting movement no doubt, but…  Was this acquisition one that Microsoft needed to boost its Analytics strategy against its biggest competitors? Will this movement really give Microsoft’s revolution a better entrance to the open source space, especially within the data science community? Is Microsoft ready for open source and vice versa?

The Appeal of Revolution Analytics
Without a doubt Revolution Analytics is quite an interesting company, founded lest than 10 years ago (in 2007) it has become one of the most representative software providers of predictive analytics in the market. The formula has been, if not easy to achieve, simple and practical, Revolution R software has been created on top of the increasingly popular programming language called ‘R’.

As a programming language, R is designed especially for the development of statistical and predictive analytics applications. Because this is a language that emerged from the trenches of academia and because of its open source nature, it has grown and expanded to the business market along with a vibrant community which develops and maintains its Comprehensive R Archive Network (CRAN), R’s wide library of functions.

Revolution Analytics had the apparently simple yet pretty clever strategy of developing and enhancing its analytics platform on top of R in order to offer a debugged and commercial ready R offering. It also has been clever to offer different flavors of software, ranging from a free version to a version ready for the enterprise.

At the same time, Revolution Analytics has maintained its close relation with both the R and open source communities and has developed a wide range of partnerships with important vendors such as Teradata, HP, IBM and many others, increasing its market presence, adoption and continuing technical development.

At first glance of course, Revolution Analytics is quite an interesting bet not just for Microsoft but for many other software providers eager to step big into the predictive analytics arena but.

Not so fast Microsoft…Was it a good idea?
In an article published recently on Forbes, Dan Woods states that Microsoft’s acquisition of Revolution Analytics is the wrong way to embrace R. He explains that the acquisition represents a step forward for the R language but will limit what R could bring to Microsoft’s own business. According to Mr. Woods:

It is vital to remember that R is not a piece of software created by software engineers. Like much of the open source world, R was created by those who wanted to use it – statisticians and data scientists. As a result, the architecture of the implementation has weaknesses that show up at scale and in other inconvenient ways. Fixing this architecture requires a major rewrite.


While Microsoft will be able to make its Hadoop offering on Azure better with what Revolution has done, the open source model will inhibit the wider deployment of R throughout the rest of the Microsoft ecosystem.

Both points are absolutely valid especially considering how the open source code would need to be accommodated within the Microsoft analytics portfolio. However, I would not be surprised if Microsoft had already taken this into account and had contemplated putting R on Azure as a short-term priority and the immersion of R with the rest of the portfolio as a medium-term priority –considering that they have not just acquired the software, but the expertise of the Revolution Analytics team. Important will be then to maintain cohesion on the team to pursue these major changes.

Another interesting aspect is Mr. Woods’  comparison of Microsoft’s acquisition vs TIBCO’s approach which took a radical posture and re-implemented R to make it suitable for high performance tasks and highly compatible with its complete set of analytics offerings and, thus creating TERR.

While TIBCO’s approach is quite outstanding (it deserves its own further post), it was somehow more feasible for TIBCO due to its experience with Bell Labs S, a precursor and similar offering to R and, its longtime expertise within the predictive analytics field. Microsoft by the contrary, is in the need for shortening distances with IBM, SAS and many others to enter the space with a strong foothold, one R can certainly provide, and also to  give the company some air and space to further work on an already stable product such as the one provided by Revolution Analytics.

One thing to consider though is Microsoft’s ability to enter and maintain active a community that at times has proven to be hostile to the Seattle software giant and, of course, willing to turn their backs on them. About this David Smith, Chief Community Officer with Revolution Analytics, mentioned:

Microsoft might seem like a strange bedfellow for an open-source company, but the company continues to make great strides in the open-source arena recently. Microsoft has embraced Linux as a fully-supported operating system on its Azure cloud service.

While it’s true that Microsoft has increased its presence in the open source community, whether the inclusion of Linux under Azure, contributing to its kernel or maintaining close partnerships with Hortonworks —big data’s big name— being able to convince and conquer the huge R community can prove to be difficult yet highly significant to increase its presence in market that has huge potential.

This of course, considering that Microsoft has changed its strategy regarding its development platforms by making them available to enable free development and community growth, like with .NET, Microsoft’s now open source development platform.

Embracing the revolution
While for Microsoft the road to embrace R can potentially be bumpy, it might still prove to be the way to go, if not the only, in order to foresee a bright future in the predictive analytics market. Much work perhaps will need to be done, including rewriting and optimizing but, at the end of the day, it might be a movement that could catapult Microsoft to compete in better shape in the predictive analytics market before it is too late.

At this point it Microsoft seems to rely that the open source movement is mature enough to accept Microsoft as another common contributor, while Microsoft seems to be ready to take what appears to be a logical step to reposition itself in line with modern times and ready to embrace new tech trends.

Like any new relationship, adjustment and adaption is needed. Microsoft’s (R) evolution and transformation seems to be underway.
Have a comment? Drop me a line below. I’ll respond the soonest I can.

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potential benefits, uses, and challenges within the scope of Business Intelligence and Analytics.

In this installment of the series, and the last devoted to machine learning before we step into cognitive systems, I will attempt to provide a general overview of the Machine Learning (ML) market landscape, describing some, yes, only some, of the vendors and software products that are using ML for performing Analytics and Intelligence, so, here a brief market landscape overview.

Machine learning: a common guest with no invitation

It is quite surprising to find Machine Learning has a vast presence in many of today’s modern analytics applications. Its use is driven by:

  • The increasing need to crunch data that is more complex and more voluminous, at greater speed and with more accuracy—I mean really big data
  • The need to solve increasingly business problems that require methods out of conventional data analysis.

An increasing number of traditional and new software providers, forced by specific market needs to radically evolve their existing solutions or moved by the pure spirit of innovation, have followed the path of incorporating new data analytics techniques to their analytics offering stack, both explicitly, or simply hidden within white curtains.

For software providers that already offer advanced analytics tools such as data mining, incorporating machine learning functionality into their existing capabilities stack is an opportunity to evolve their current solutions and take analytics to the next level.

So, it is quite possible that if you are using an advanced business analytics application, especially for Big Data, you are already using some machine learning technology, whether you know it or not.

The machine learning software landscape, in brief 

One of the interesting aspects of this seemingly new need for dealing with increasingly large and complex sets of information is that many of the machine learning techniques originally used within pure research labs have already gained entrance to the business world, via their incorporation within analytics offerings. New vendors often may incorporate machine learning as the core of their analytics offering, or just as another of the functional features available in their stack.

Taking this into consideration, we can find a great deal of software products that offer machine learning functionality, to different degrees. Consider the following products, loosely grouped by type:

From the lab to the business

In this group we can find a number of products, most of them based on an open-source licensing model that can help organizations to test machine learning and maybe take their first steps.


A collection of machine learning algorithms written in Java that can be applied directly over a dataset, or can be called from a custom Java-coded program, Weka is one of the most popular machine learning tools used in research and academia. It is written under the GNU General Public License, so it can be downloaded and used freely, as long as you comply with the GNU license terms.

Because of its popularity, a lot of information is available about the use of and development with Weka. It still can prove to be challenging for some users not familiar with machine learning, but it’s quite good for those who want to uncover explore the bits and bytes of using machine learning analysis on large datasets.


Probably the most popular language and environment for statistical computing and graphics, R is a GNU project that comprises a wide variety of statistical and graphical techniques with a high degree of scalability. No wonder that R is one of the most widely used statistical tools used by students.

The way the R project is designed to work is by having a core or based system set of statistical features and functions that can be extended with a large set of function libraries provided within the Comprehensive R Archive Network (CRAN).

Within the CRAN library, it is possible to download the necessary functions for multivariate analysis, data mining, and machine learning. But it is fair to assume that it takes a bit of effort to put machine learning to work with R.

Note: R is also of special interest owing to its increasing popularity and adoption via a commercial offering for R called Revolution Analytics, an offering I discuss below.


Jubatus is an online distributed machine learning framework. It is distributed under GNU Lesser General Public License  version 2.1, which makes Jubatus another good option for the learning, trial, and—why not—exploitation of machine learning techniques within a reduced budget.

The framework can be installed in different flavors of Linux, such as Red Hat, Ubuntu, and others, as well as within the Mac OS X. Jubatus includes client libraries for C++, Python, Ruby, and Java. Some of its functional features include a list of machine learning libraries for applying different techniques such as graph mining, anomaly detection, clustering, classification, regression, recommendation, etc.

Apache Mahout

Mahout is Apache’s machine learning algorithm library. Distributed under a commercially friendly Apache software license, Mahout comprises a core set of algorithms for clustering, classification and collaborative filtering that can be implemented on distributed systems.

Mahout supports three basic types of algorithms or use cases to enable recommendation, clustering and classification tasks.

One interesting aspect of Mahout is its goal to build a strong community for the development of new and fresh machine learning algorithms.

Apache Spark

Spark is Apache Hadoop’s general engine for processing large-scale data sets. The Spark engine is also an open source engine that enables users to generate applications in Java, Scala, or Python.

Just like the rest of the Hadoop family, Spark is designed to deal with large amounts of data, both structured and unstructured. The Spark design supports cyclic data flow and in-memory computing, making it ideal for processing large data sets at high speed.

In this scenario, one of the engine’s main components is the MLlib, which is Spark’s machine learning library. The library works using the Spark engine to perform faster than MapReduce and can operate in conjunction with NumPy, Python’s core scientific computing package, giving MLlib a great deal of flexibility to design new applications in these languages.

Some of the algorithms included within MLlib are:

  • K-means clustering with K-means|| initialization
  • L1- and L2-regularized linear regression
  • L1- and L2-regularized logistic regression
  • Alternating least squares collaborative filtering, with explicit ratings or implicit feedback
  • Naïve-Bayes multinomial classification
  • Stochastic gradient descent

While this set of applications gives users hands-on machine learning, at no cost, they can still be somewhat challenging when it comes to putting these applications to work. Many of them require special skills in the art of machine learning or in Java or MapReduce to fully develop a business solution.

Still, these applications can enable new teams to start working on machine learning and experienced ones to develop complex solutions for both small and big data. 

Machine learning by the existing players

As we mentioned earlier in this series, the evolution of Business Intelligence is demanding an increasing incorporation of machine learning techniques into existing BI and Analytics tools.

A number of popular enterprise software applications have already expanded their functional coverage to include machine learning—a useful ally—within their stacks.

Here are just a couple of the vast number of software vendors that have added machine learning either to their core functionality or as an additional feature-product of their stack.


It is no secret that IBM is betting strong in the areas of advanced analytics and cognitive computing, especially with Watson, IBM’s cognitive computing initiative and an offering which we will examine in the cognitive computing part of this series. IBM can potentially enable users to develop machine learning analytics approaches via its SPSS product stack, which incorporates the ability to develop some specific machine learning algorithms via the SPSS Modeler.


Indubitably SAS is one of the key players in the advanced analytics arena, with a solid platform for performing mining and predictive analysis, for both general and industry vertical purposes. It has incorporated key machine language techniques to be adopted for different uses. Several ML techniques can be found within SAS’ vast analytics platform, from SAS Enterprise and Tex Miner products to its SAS High-Performance Optimization offering.

An interesting fact to consider is SAS’ ability to provide industry and line-of-business approaches for many of its software offerings, encapsulating functionality with prepackaged vertical functionality.

Embedded machine learning

Significantly, machine learning techniques are reaching the core of many of the existing powerhouses as well as the newcomers in the data warehouse and Big Data spaces. Via its incorporation as embedded technologies within their database technologies, some analytic and data warehouse providers have now incorporated machine learning techniques, to varying degrees, to their database structures. 


The New York-based company, a provider of Big Data and discovery software solutions, offers a set of what it calls in-database analytics in which a set of analytics capabilities is built right into 1010Data’s database management engine. Machine learning is included along with a set of in-database analytics such as clustering, forecasting, optimization, and others.


Among its multiple offerings for enterprise data warehouse and Big Data environments, Teradata offers the Teradata Warehouse Miner, an application that packages a set of data profiling and mining functions that includes machine learning algorithms alongside predictive and mining ones. The Warehouse Miner is able to perform analysis directly in the database without undergoing a data movement operation, which ease the process of data preparation. 


SAP HANA, which may be SAP’s most important technology initiative ever, will now support almost all (if not actually all) of SAP’s analytics initiatives, and its advanced analytics portfolio is not the exception.

Within HANA, SAP originally launched SAP HANA Advanced Analytics, in which a number of functions for performing mining and prediction take place. Under this set of solutions it is possible to find a set of specific algorithms for performing machine learning operations.

Additionally, SAP has expand its reach into predictive analysis and machine learning via the SAP InfiniteInsight predictive analytics and mining suite, a product developed by KXEN, which SAP recently acquired.

Revolution Analytics

As mentioned previously, the open source R language is becoming one of the most important resources for statistics and mining available in the market. Revolution Analytics, a company founded in 2007, has been able to foster the work done by the huge R community and at the same time develop a commercial offering for exploiting R benefits, giving R more power and performance resources via technology that enables the use of R for enterprise data intensive applications.

Revolution R Enterprise is Revolution Analytics’ main offering and contains the wide range of libraries provided by R enriched with major technology improvements for enabling the construction of enterprise-ready analytics applications. The application is available for download both as workstation and server versions as well as on demand via the AWS Marketplace.

The new breed of advanced analytics

The advent and hype of Big Data has also become a sweet spot for innovation in many areas of the data management spectrum, especially in the area of providing analytics for large volumes of complex data.

A new wave of fresh and innovative software providers is emerging with solutions that enable businesses to perform advanced analytics over Big Data and using machine learning as a key component or enabler for this analysis.

A couple of interesting aspects of these solutions:

  1. Their unique approach to providing specific solutions to complex problems, especially adapted for business environments, combining flexibility and ease of use to make it possible for business users with a certain degree of statistical and mathematical preparation to address complex problems in the business.
  2. Many have them have already, at least partially, configured and prepared specific solutions for common business problems within line-of-business and industries via templates or predefined models, easing the preparation, development, and deployment process.

Here is a sampling of some of these vendors and their solutions:


Being that Skytree’s tagline is “The Machine Learning Company,� it’s pretty obvious that the company has machine learning in its veins. Skytree has entered the Big Data Analytics space with a machine learning platform for performing mining, prediction, and recommendations with, according to Skytree, an enterprise-grade machine learning offering.

Skytree Server is its main offering. A Hadoop-ready machine learning platform with high-performance analytics capabilities, it can also connect to diverse data streams and can compute real-time queries, enabling high-performance analytics services for churn prediction, fraud detection, and lead scoring, among others.

Skytree also offers a series of plug-ins to be connected to the Skytree Server Foundation to improve Skytree’s existing capabilities with specific and more advanced machine learning models and techniques.


If you Google BigML, you will find that “BigML is Machine Learning for everyone.�

The company, founded in 2011 in Corvallis, Oregon, offers a cloud-based large-scale machine learning platform centered on business usability and at highly competitive costs by providing advanced analytics via a subscription-based offering.

The application enables users to prepare complete analytics solutions for a wide range of analysis scenarios, from collecting the data and designing the model to creating special analytics ensembles.

Since it is a cloud-based platform, users can start using BigML services via a number of subscription-based and/or dedicated options. An attractive approach for those organizations trying to make the best of advanced analytics with less use of technical and monetary resources.

Yottamine Analytics

Founded in 2009 by Dr. David Huang, Yottamine has taken Dr. Huang contributions to the theory of machine learning to practice and reflected it within the Yottamine Predictive Service (YPS).

YPS is an on-demand advanced analytics solution based on the use of web services, which allows users to build, deploy, and develop advanced big data analytics solutions.

As an on-demand solution it offers a series of subscription models based on clusters and nodes, with payment based on the usage of the service in terms of node hours—a pretty interesting quota approach. 

Machine learning is pervasive

Of course, this is just a sample of the many advanced analytics offerings that exist. Others are emerging. They use machine learning techniques to different degrees and for many different purposes, specific or general. New companies such as BuildingIQ, emcien, BayNote,  Recommind, and others are taking advantage of the use of machine learning to provide unique offerings in a wide range of industry and business sectors.

So what?

One of the interesting effects of companies dealing with increasing volumes of data and, of course, increasing problems to solve is that techniques such as Machine Learning and other Artificial Intelligence and Cognitive Computing methods are gaining terrain in the business world.

Companies and information workers are being forced to learn about these new disciplines and use them to find ways to improve analysis accuracy, the ability to react and decide, and prediction, encouraging the rise of what some call the data science discipline.

Many of the obscure tools for advanced analytics traditionally used in the science lab or at pure research centers are now surprisingly popular within many business organizations—not just within their research and development departments, but within all their lines of business.

But on the other hand, new software is increasingly able not only to help in the decision-making process, but also to be proactive in reproducing and automatically improving complex analysis models, recommendations, complex scenario analysis to enable early detection and prediction and, potentially, data-based decisions. 

Whether measuring social media campaign effectiveness, effectively predicting sales, detecting fraud, or performing churn analysis, these tools are remaking the way data analysis is done within many organizations.

But this might be just the beginning of a major revolution in the way software serves and interacts with humans. An increasing number of Artificial Intelligence disciplines, of which machine learning is a part, are rapidly evolving and reaching mainstream spaces in the business software world in the form of next-generation cognitive computing systems.

Offerings such as Watson from IBM might be the instigators of a new breed of solutions that go well beyond what we have so far experienced with regard to computers and the analysis process, So, I dare you to stay tuned for my next installment on Cognitive Systems and walk with me to discovery these new offerings.

Qlik: Newer, Bigger, Better?

Qlik: Newer, Bigger, Better?

Originally published in the TEC Blog

During the second half of last year and what has already passed this year, the Pennsylvania-based software company QlikTech has undergone a number of important adjustments, from its company name to a series of changes allowing the company to remain as a main force driving the evolution of the business intelligence (BI) and analytics scene. Are these innovations enough to enable the in-memory software company to retain its success and acceptance within the BI community?

From QlikTech to Qlik

One big shift in the past few months was with the company’s name, from QlikTech to Qlik; though a mainly cosmetic change, it’s still worth being taken into account, as it will enable the software provider to be more easily identified and better branded, and also to reposition its entire product portfolio stack as well as all the company’s services, resources, and communities.

Having a name that is simple to identify, as the biggest umbrella of a product stack that has been growing over time, is a smart move from business, marketing, and even technical perspectives.

Qlik goes Big… Data

A second recent event within Qlik’s realm is the revelation of their strategy regarding big data, something that Qlik had been quietly working on for some time. During a very interesting call with John Callan, senior director of global product marketing, Callan took us through some of the details of Qlik’s recently revealed strategy to help users make use of the company’s big data initiatives. Two starting statements could not state more clearly the role of Qlik, and many other BI providers, in the big data space:

QlikView as the catalyst for implementing big data

This certainly is true, as many new big data projects find their motivation in the data analysis and discovery phases, and it’s also true that an offering like QlikView can lower some of the technical and knowledge barriers when implementing a big data initiative.

The second statement was:

QliKView relieves the big data bottleneck.

According to Qlik, it grants access to a wider number of users, augmenting the potential use of big data and providing implicit benefits—access to a wider number of data sources while at the same time having access to QlikView’s in-memory computing power.

True to its goal of bringing BI closer to the business user, the approach from Qlik is to enable the use of big data and offer a new connection and integration with technology provided by some of the most important big data players in the market: Cloudera, Hortonworks, MongoDB, Google BigQuery, Teradata, HP Vertica and Attivio.

What makes QlikView so interesting in the context of big data is that, being a long-time provider of an in-memory architecture for data analysis and having a unique data association model, it can not only ensure a reliable architecture for a big data analysis platform, but it can also add speed to the process. Plus, QlikView’s data association model, along with its business user orientation, can provide an ease-of-use component, often hard to accomplish within a big data initiative.

So, while QlikView provides for its users all the necessary connectors from their big data partners, it also makes an effort to maintain simplicity of use when dealing with information coming from other more common sources.

On this same topic, one key aspect of Qlik’s approach to big data is the vendor’s flexibility regarding data sourcing; Qlik provides users with three possibilities for performing data exploration and discovery from big data sources:

  1. Loading the information within Qlik’s in-memory computing engine;
  2. Performing data discovery directly from big data sources; and
  3. A hybrid approach, which includes the possibility to combine both previous models, configuring which data should be in-memory and which should be based on direct discovery.

Having this three-pronged approach could prove to be effective for those organizations in initial phases of big data adoption, especially while undergoing initial tests, or those that already require big data services with a certain degree of functionality, but it would be interesting to see if it brings about some difficulties for users and organizations regarding finding the appropriate schema or identifying when and where to apply which approach for better performance and effectiveness.

New “Natural” Analytics

Recently, a blog written by Donald Farmer, VP of product management at Qlik, established what Qlik has been up to for some time now: working towards bringing a new generation of analytics to the market. In this sense, two things seem to be particularly interesting.

First, Qlik’s continuous work towards changing and evolving analytics from its traditional role and state and delivering new ways analysis can be performed, thus improving associations and correlations to provide extensive context. As Farmer states:

Consider how we understand customers and clients. What patterns do we see? What do they buy? How are they connected or categorized? Similarly every metric tracking our execution makes a basic comparison.

These artifacts may be carefully prepared and designed to focus on what analysts used to call an "information sweet spot"—the most valuable data for the enterprise, validated, clarified, and presented without ambiguity to support a specific range of decisions.

Second, regarding providing users with the necessary abilities to, beyond predict, actually anticipate and help discover:

It's not enough to know our quarter's sales numbers. We must compare them to the past, to our plans, and to competitors. Knowing these things, we ask what everyone in business wants to know: What can we anticipate? What does the future hold?

Particularly interesting is how Farmer addresses a core aspect of the decision-making process, which is to anticipate, especially in our modern business world which operates increasingly in real-time, ways to get away from traditional operations in a linear sequence with long time latencies.

Of course, little can be said here about Qlik’s future vision, but we can get a glimpse—Qlik has built a prototype showing its new natural analytics approach and much more in QlikView > next, Qlik’s own vision of the future of BI.

This is a vision in which analysis is carried following five basic themes, to accomplish, according to Qlik, two main objectives: 1) an understanding of what people need, and 2) an understanding of who those people are.

These five themes are:

  • Gorgeous and genius—a user interface that is intuitive and natural to use, but aiming to be productive, improving the user’s visual and analysis experience.
  • Mobility and agility—having access to the Qlik business discovery platform from any device and with seamless user experience.
  • Compulsive collaboration—providing users with more than one way to collaborate, analyze, and solve problems as a team by providing what Qlik call a “new point of access”.
  • The premier platform—Qlik’s vision for providing users with improved ways to provide new apps quickly and easily.
  • Enabling the new enterprise—Qlik aims to provide IT infrastructure with the necessary resources to offer true self-service for their users while easing the process of scaling and reconfiguring QlikView’s infrastructures to adapt to new requirements.

Qlik, Serving Modern BI with a Look Into the Future

It would be hard not to consider Qlik from its inception, as an in-memory computing pioneer in the business space, and Qlik is keeping that pioneer status two decades later, innovative both in backend and forefront design, and able to wear more than one hat in the business intelligence space. An end-to-end platform, from storage to analysis and visualization, Qlik is both adapting to the increasingly fast-paced current evolution of BI and looking into the future to maintain and gain markets in this disputed space.

However, to maintain its place in the industry it will be crucial for Qlik to maintain the pace on the many fronts where QlikView, its flagship product, is front-and-center: business-ready for those small to medium-sized customers, as well as powerful, scalable, and governable for large organizations. These days Qlik is surrounded by other innovation sharks within the BI ocean, so remaining unique, original, and predominant will prove to be increasingly difficult for Qlik and the rest of the players in the space. As with nature, let those more capable of fulfilling their customer’s need survive and prosper.

It comes as no surprise that Qlik is already looking forward to anticipating the next step in the evolution of BI and analytics. Qlik has a brand that stands for innovation, and certainly, Qlik is working to make QliView newer, better, and bigger. It will be really interesting to see how the company’s innovative vision will play out, and if it will gain the same or more traction as Qlik’s previous innovations in the market.

Have a comment on Qlik or the BI space in general? Let me know by dropping a line or two below. I’ll respond the soonest I can.
The BBBT Sessions: HortonWorks, Big Data and the Data Lake

The BBBT Sessions: HortonWorks, Big Data and the Data Lake

Some of the perks of being an analyst are the opportunities to meet with vendors and hear about their offerings, their insight on the industry and best of all, to be part of great discussions and learn from those that are the players in the industry.

For some time now, I have had the privilege of being a member of the Boulder BI Brain Trust (BBBT), an amazing group consisting of Business Intelligence and Data Management analysts, consultants and practitioners covering various specific and general topics in the area. Almost every week, the BBBT engages a software provider to give us a briefing of their software solution. Aside from being a great occasion to learn about a solution, the session is also a tremendous source for discussion. 

I will be commenting on these sessions here (in no particular order), providing information about the vendor presenting, giving my personal view, and highlighting any other discussion that might arise during the session.

I would like to start with Hortonworks, one of the key players in the Big Data space, and a company that has a strong influence on how Big Data is evolving in the IT industry.

The session

In a session conducted by David McJannet and Jim Walker, Hortonworks’ Marketing VP and Director of Product Marketing respectively, BBBT members had the chance to learn in more detail about Hortonworks’ offerings, strategy, and services aimed at bringing Hadoop to the enterprise, as well as to discuss Big Data and its insertion into the enterprise data management infrastructure especially in relation to data warehousing, analytics, and governance. Here are some of the highlights of the session…

About Hortonworks 

Hortonworks is a recently emerged company, but with a lot of experience in the Big Data space. Founded in 2011, it was formed by the original Hadoop development and operations team from Yahoo! Why is this so relevant? Well, because Hortonworks lives and breathes Hadoop, and the company makes a living by building its data solutions on top of Hadoop and many of its derivative projects. And Hadoop is arguably the most important open source software project of all time, or maybe just after Linux.

Hadoop is described on its Web page as follows:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers […].

Hortonworks focuses on driving innovation exclusively via the Apache Software Foundation, producing open source–based software that enables organizations to deal with their Big Data initiatives by delivering Apache Hadoop solutions ready for enterprise consumption. Hortonworks’ mission, as stated in Hortonworks’ presentation title:

Our mission is is to enable your Modern Data Architecture by delivering Enterprise Apache Hadoop.

Hortonworks’ commitment to Hadoop

One of the interesting aspects of Hortonworks is its commitment to Hadoop, in many regards, from the way it handles Hadoop offerings for corporate consumption, to the amount of effort Hortonworks’ team devotes to evolving and enhancing Hadoop’s capabilities. To this point, Hortonworks shared the following graph, in which it’s possible to see the level of contribution of Hortonworks to the famed Apache project in 2013.

Figure 1. List of contributors for Hadoop and number of lines contributed (Source:

In the same vein, the contribution of the Hortonworks team to Hadoop extends across its multiple sub-subprojects—HBase (Hadoop’s distributed data store), Pig (Hadoop’s large data set analysis language), and Hive (Hadoop’s data warehouse infrastructure), among others (Figure 2)—making Hortonworks a hub with some of the most important experts in Apache Hadoop and a strong commitment to its open source nature.

Figure 2. List of contributors to Hadoop and number of lines contributed (Courtesy of: Hortonworks)

Hortonworks’ approach to the business market is quite interesting. While maintaining its commitment to both Hadoop and open source ecosystems, Hortonworks has also been able to:

  1. Package corporate-ready solutions, and
  2. Ensuring strong partnerships with some important software companies such as Microsoft, Teradata, SAP, HP, RackSpace, and, most recently, Red Hat, extending Hortonworks’ reach and influence in the Big Data space and especially into corporate markets.

So what does Hortonworks offer?

Hortonworks clearly says it: They do Hadoop. And what this means is that Hortonworks flagship product—the Hortonworks Data Platform (HDP2)—is an enterprise solution based 100% on the open source Apache Hadoop platform. HDP2’s architecture uses the core set of Hadoop’s modules architected and certified for enterprise use, then includes fully tested and certified versions of Hadoop modules as well as a complete set of professional services provided by Hortonworks for its customers.

Another offering from the company includes Hortonworks sandbox, a Hadoop environment that includes interactive tutorials and the most recent Hadoop developments for learning and testing.

How does Hortonworks fit into an organization?

One of the main concerns of many organizations trying to embrace Big Data is how their Big Data initiative will fit within their existing data management infrastructure. More importantly, the organization needs to evolve its traditional data management infrastructure (Figure 3) so that Big Data adoption doesn’t generate more problems than solutions. Hortonworks is by no means  the only software provider; vendors such as Cloudera and MapR also embrace Hadoop to solve an organization’s Big Data issues, but with a different approach.

Figure 3. A traditional data management approach (Courtesy of: Hortonworks)

Wayne Eckerson explains in The Battle for the Future of Hadoop:

Last November, Cloudera finally exposed its true sentiments by introducing the Enterprise Data Hub in which Hadoop replaces the data warehouse, among other things, as the center of an organization's data management strategy. In contrast, Hortonworks takes a hybrid approach, partnering with leading commercial data management and analytics vendors to create a data environment that blends the best of Hadoop and commercial software.

During the session, aside from the heated debates about whether or not to replace the data warehouse with new information hubs, both David McJannet and Kim Walker confirmed Hortonworks’ position, which consists of enabling companies to expand their existing data infrastructures (in contrast to Cloudera’s approach)—let companies to evolve, without replacing their data management platforms (Figure 4).

Figure 4. Hortonworks expands an organization’s traditional data management capabilities for addressing Big Data (Courtesy of: Hortonworks)

The appealing part of Hortoworks schema is that its Hadoop offerings act as an expansion of the rest of the data repository spectrum (relational databases, data warehouses, data marts, and so on). This makes sense in the context of coupling new data management strategies with existing ones; while Hadoop has proven to be effective for certain tasks and type of data, some problems still need to be handled with the use of “traditional” methods and existing tools. According to Mark Madsen ( What Hadoop Is. What Hadoop Isn’t.):

What it doesn’t resolve is aspects of a database catalog, strong schema support, robust SQL, interactive response times or reasonable levels of interactive concurrency—all things needed in a data warehouse environment that delivers traditional BI functions. In this type of workload, Hadoop doesn’t come close to what a parallel analytic database can achieve, including scaling this workload into the Petabyte range.

Yet Hadoop offers features the database can’t: extremely low cost storage and retrieval, albeit through a limited SQL interface; easy compatibility with parallel programming models; extreme scalability for storing and retrieving data, provided it isn’t for interactive, concurrent, complex query use; flexible concepts of schema (as in, there is no schema other than what you impose after the fact); processing over the stored data without the limitations of SQL, without any limitations other than the use of the MapReduce model; compatibility with public or private cloud infrastructures; and free, or support-only, so a price point far below that of databases.

Hortonworks’ approach is then to enable expansion and evolution of the existing data management platform by offering an enterprise-ready version of Hadoop, one that can be nicely integrated and fill those gaps between the data warehouse and the analysis of huge amounts of non-structured (polystructured) information.

What is Hortonworks for, anyway?

Despite the hype and eagerness about Big Data, many people still don’t have a clear idea about the context and use cases where a Hadoop approach can be useful. Hortonworks showed us a good list of examples of how some of their customers are using Hortonworks. Their current deployments run mainly within the financial services, telecom, retail, and manufacturing industries and expand for applications such as fraud prevention, trading risk, call detail records, and infrastructure investment as well as for assembly-line quality assurance and many other potential uses.

How Hortonworks addresses its customers Big Data needs is by demonstrated by how a customer typically embraces Hadoop in the context of working with increasing volumes of information.

The graph below shows a diagram correlating data (volume) and the value that it can bring to the organization by enhancing an organization’s capability to derive insight.

Figure 5. Described as a “Common journey to the data lake,” Hortonworks shows the relation between data volume and its potential value in the context of addressing specific problems (Courtesy of: Hortonworks)

Another interesting thing about this is the notion of the data lake. Pentaho CTO James Dixon, who’s credited with coining the term, describes it in the following simple terms:

If you think of a datamart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Hortonworks uses Hadoop as the platform to provide a solution for the two main issues that this implies:

  1. A new approach to analytics by enabling the expansion from a single query engine and a deterministic list of questions to a schema-on-read basis, enabling information analysis that addresses polystructured as well as real-time and batch data.
  2. A  means for data warehouse optimization, expanding the boundaries of strict data schemas.

The Hortonworks Data Platform uses the full open source Hadoop platform. It provides an enterprise-ready basis for handling Big Data within an organization, and aims to fit and optimize—not disrupt—the existing data platform (Figure 6). Some of the challenges of Hadoop deployments have been to cope with the often unfriendly environment and technical lack of expertise to handle Hadoop projects properly, especially for hybrid and complex environments mixing and interconnecting both traditional and Hadoop deployments. 

The recent addition of YARN—Hadoop’s recent resource, job, and application manager—within Hadoop 2.0 and its inclusion in Hortonworks’ HDP2-enabled Hortonworks to provide more robust processing platform, which now can work and manage process loads aside from MapReduce, expanding HDP capabilities to managing both MapReduce and external applications and resources more efficiently. The Hortonworks website has a good summary of the use of YARN within HDP

Figure 6. Hortonworks Data Platform General Architecture (Courtesy of: Hortonworks)

Open source software, especially projects based on Hadoop and big data, traditionally has a Linux orientation, so it’s worth mentioning that HDP2 platform is available on both Linux and Windows operating systems.

Hortonworks Data Platform, enterprise tested

During the session, one thing David McJannet and Jim Walker emphasized was Hortonworks’ testing and quality and assurance model, which includes testing HDP directly within Yahoo’s data environment, providing Hortonworks with a vast and ideal testing platform with complex and data-flooded scenarios—a good testing scenario for any data application.

To conclude

I have no doubt that the new breed of solutions such as Hortonworks and others offer impressive and innovative approaches to the analysis and management of complex and big data problems. Clearly, frameworks such as the data warehouse need to adapt to these new conditions or die (I tend to believe they will not die).

Instead, it seems that data warehouse methodologies and platforms potentially have the necessary elements—such as enterprise readiness, methodology, and stability—to evolve and include these new computing paradigms, or at least live within these new ecosystems.

So some of the challenges of deploying Big Data solutions, aside from the natural technological issues, could come from how these new concepts fit within existing infrastructures. They need to avoid task duplication, actually streamline processes and data handling, and fit within complex IT and data governance initiatives, ultimately to procure better results and return of investment for an organization.

Hortonworks takes an approach that should appeal to many organizations by fitting within their current infrastructures and enabling a smooth yet radical evolution of their existing data management platforms, whether via its HDP2 platform or delivered via Hortonworks’ strategic partners. It will be interesting to see what their big competitors have to offer.

But don’t take my word for granted. You can replay the session with Hortonworks—just go to the BBBT web page and subscribe.

Have comments? Feel free to drop me a line and I’ll respond as soon as possible.

BI on the Go: About Functionality and Level of Satisfaction

BI on the Go: About Functionality and Level of Satisfaction

Originally published on the TEC Blog

TEC recently published its 2014 Mobile BI Buyers Guide and a related blog post in which some results from a survey on mobile business intelligence (BI) usage, needs, and trends were discussed. We thought it would be useful to take another look at what was revealed from the survey regarding what’s important for mobile BI users, and of course, how satisfied they are with the mobile BI solutions they work with. Let’s take a look at some of our findings in this regard. Here we will discuss two additional criteria and how they affect mobile BI practices and decision-making: functionality and level of satisfaction.

General Functionality: What Tops the List?

One of the questions we asked mobile BI users in the survey had to do with the functionality they find the most important when using the services of their mobile BI application. From the list we provided, including ad hoc querying, alerting, collaboration, data analysis and discovery, and dashboarding (Figure 1), users were clear that both dashboarding and data analysis/discovery are an essential part of their day-to-day lives with a mobile BI application. It is clear reporting in mobile media is slowly decreasing and leaving space for more data discovery functions.

On the other hand, two things were surprising for me; first, the level of importance that users gave to the alerting functionality above collaboration abilities. Despite the buzz around the importance of collaboration embedded within all types of enterprise software, the ability of a mobile BI application to alert users quickly about any given emergency or contingency is vital for users, especially these days, when acting in real-time is becoming increasingly important for many organizations.

Second, I was surprised that collaboration was positioned in fifth place, while the top places went to more common BI functionality features such as dashboarding, data analysis, and reporting and alerting. It seems that although collaboration is important, users have their priorities clear, and they first and foremost want analysis capabilities and other key tasks in a BI application.

Figure 1. Top functionality (Source: TEC Mobile BI Survey 2014)

Mobile BI Satisfaction Levels: Still Not There Yet?

Another question we asked in the survey refers to how satisfied users are with their mobile BI applications. As Figure 2 shows, despite not showing high levels of dissatisfaction, the survey did indicate a lot of respondents are only “somewhat satisfied,” revealing there’s still a high number of users who are not totally impressed with what a mobile BI solution can do for them. Why is this?

Many things can play into these results, from limitations of mobile BI applications to misconceptions about what a mobile BI application should or should not be able to do, but it seems that in this technological world we live in, mobile is a synonym of innovation and user experience, so users in general are giving broad attention not just to the efficiency of mobile BI applications, but increasingly to the degree of innovation of mobile apps.

Figure 2. General satisfaction level (Source: TEC Mobile BI Survey 2014)

According to an article in Enterprise Apps Today, big business intelligence vendors are not quite satisfying users. The article mentions a Gartner study looking at mobile BI based on its ability to

provide devices with reports and other dashboard content.” The study revealed that mobile BI usage “showed the highest growth among all business intelligence features considered, with 6.2 percent of respondents stating they make extensive use of mobile functionality

And, according to the article,

small independent vendors continue to lead the way on mobile business intelligence. However, mega-vendors and large independents are beginning to gain some ground. That said, they still have a good amount of ground to cover based on the number of them being graded below average.”

While I have also noted that mobile BI is experiencing a level of popularity over other BI features in recent times, our survey gave us a slightly different view, in which customer satisfaction is located mostly in the middle, with the majority of users being very or somewhat satisfied, indicating perhaps that efficiency still makes up a huge portion of what matters for BI users. Of course, many users are waiting for more than that, hoping for the real wow factor that gives them that benefit that the mobile experience might be providing them with through their non-commercial mobile applications, mobile social platforms, and even maybe in other mobile business applications.

Along the same lines, and to make things a bit more interesting, let’s mix these two results together and see what happens (Figure 3).

Figure 3. Satisfaction level vs functionality (Source: TEC Mobile BI Survey 2014)

When looking at top functionality and customer satisfaction together, it is interesting to note several things:

  1. Across the board, dashboarding remains as one of the most important features for performing business intelligence with mobile devices. Across this sample of mobile BI users, in the “not very satisfied” category of users dashboarding seems to be quite popular, perhaps signaling what we mentioned before: users are waiting to see more enriching experiences within their mobile BI applications.
  2. For those users that are “completely satisfied” with their current mobile BI solution, alerting plays an important role within their mobile BI criteria, an essential feature for enabling early issue, risk, or opportunity detection. It is possible that for these organizations having an effective way to receive alerts is key to ensuring successful operation and planning.
  3. It seems users are increasingly expecting to have more features for performing data analysis and discovery; this fact is somehow surprising as I know many business intelligence providers are making big efforts to improve their functionalities in this area.

So, it seems users recognize the importance of three main functional features (dashboarding, data discovery, and alerting) for a reliable mobile BI solution but, still, they have expectations of further evolution of mobile BI functionality in the future.

Functionality and Organization Size: How do They Relate?

In a final exercise, we segmented our respondents according to the size of their organization and their most relevant functional features (Figure 4) and noted some clear differences among different sized organizations.

Figure 4. Functionality vs company size (Source: TEC Mobile BI Survey 2014)

As the graph shows, for very small organizations, the functional interest is distributed relatively evenly throughout all six main functional features, with data analysis and discovery ranking as the most important feature. On the other hand, for corporations it is clear that dashboarding and data analysis/discovery, as well as alerting, all play a major role. This seems to be a good indication that within large corporations efficiency and strong response is extremely important for mobile BI users on staff. On the other hand, for those organizations sitting in the middle (from 250 to 1000 employees), dashboarding is clearly the most important feature, which seems to make sense, as many of these organizations might have a certain level of BI maturity where dashboarding remains key for the decision-making process.   

It is also relevant to notice that while collaboration features, which I personally expected to rank higher, did not display a high level of importance in our survey results, showing that while collaboration is a basic feature for mobile BI applications, other important features in mobile BI are a higher priority to end users.

Where Will Mobile BI Go From Here? 

In this final part of our mobile BI mini-series (in the first part we explored who is using mobile BI offerings and which vendors they are selecting) we have found that despite being an important change agent in the business intelligence space, the mobile BI arena still has a lot of potential and a lot of ground to break.

As organizations on one side (and mobile BI products on the other) mature and grow, the adoption and evolution of mobile BI applications will enable both end-users and vendors to incorporate key functionalities into mobile BI solutions, for example, reinforcing collaboration, making mobile BI customization and configuration more flexible and accessible, and enabling mobile BI to continue changing the way traditional users consume and produce business intelligence and analytics solutions.

But what do you think? Tell us your experience with mobile BI. Drop me a line below and I’ll respond as soon as I can.

Further Reading

BI on the Go . . . So, Who’s Using Mobile BI? (February 2014)
TEC 2014 Mobile BI Buyer's Guide (January 2014)
BI on the Go Infographic (January 2014)
VIDEO: Mobile Business Intelligence in the Enterprise (November 2013)
This Week in the DoT, 03/14/2014

This Week in the DoT, 03/14/2014

As my father use to say, better late than never so, here is a list of things you might want to check including news, humor and more…


In the news:

To read:

To watch:

The Internet of Things: Dr. John Barrett at TEDxCIT

Kinoma Create — The JavaScript-Powered Internet of Things Construction Kit

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Finally, to end your week with a smile:

- Agile Methodology - Applied to Other Fields...
- Big Data Analysis... in the Cloud

Bon weekend!

This Week in the DoT, 03/07

This Week in the DoT, 03/07

Another week, another month, and the year goes by...

Before heading to you local… place of weekend rest, here’s a list of things I've crossed by this week that you might want to check out.

 Have a tremendous weekend!

In the news:

To read:

To watch:

Big Data and the Rise of Augmented Intelligence: Sean Gourley at TEDxAuckland

Teradata and Big Data - from the CTO's Point of View - Stephen Brobst

Influencers on Twitter you certainly need to follow:

  • Carla Gentry (@data_nerd)
  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Mark Smith (@marksmithvr)
  • Merv Adrian  (@merv)
  • Mike Ferguson (@mikeferguson1)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Some Humor:

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry.

In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective.

Machine learning in the context of BI and Big Data analytics

Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data mining or predictive analysis—but also with complex problems in which the algorithms used need to adapt to frequent changing conditions. This is the case for successful applications of machine learning techniques in software applications such as those for spam detection, or those from Amazon to automate employee access control or Cornell for protecting animals.

But the incorporation of machine learning techniques within enterprise software is rapidly expanding to many other areas of business, especially those related to business intelligence or analytics, or in general as part of the decision support framework of an organization. As I mentioned in Part 1, as information collection increases in volume, velocity, and variety (the three Vs of big data) and as business pressures to expedite and decrease the latency of the analysis grow, new and existing business software solutions are incorporating improved ways to analyze these large data sets, taking new approaches to perform effective analysis over large and complex amounts of data sets, but most importantly, furthering the reach of what analytics and BI solutions can do.

As data sources become increasingly complex, so do the means of analyzing them, and the maturity model of the analytics BI platform is forced to accommodate the process and expand to the next level of evolution—and sometimes even revolution—of the decision-making process. So the role of a BI and analytics framework is changing from being solely a decision support companion to a framework that can trigger decision automation. To show this, I have taken the following standard BI maturity model from TEC’s BI Maturity and Software Selection Perspectives report (Figure 1) to show in a simple form some of the pressures that this complexity has on the maturity process. As a consequence, the process is expanded to a double-phase decision-making process, which implies giving the system an increased role in the decision.

Figure 1. Standard BI maturity model is being expanded by complexity of data and processes

The decision phase can happen in two ways: as a supported decision made by users, or by enabling the system to delegate the ability to make a decision to itself, automating the decision-making process based on a previous analysis and letting the system learn and adapt. By delegating the decision to the system, for the process extends the reach of analytics to prediction analysis, early warning messaging, and data discovery.

At this stage we might find more permutations of analytics platforms and frameworks that combine both assisted and automated decisions, ideally increasing the effectiveness of the process and streamlining it (Figure 2).

Figure 2. Standard BI maturity model expands to be able to automate decisions

In this context, due to new requirements coming from different directions, especially from Big Data sources in which systems deal with greater and more complex sets of data, BI and analytics platforms become, most of the time, hubs containing dynamic information that changes in volume, structure, and value over time.

In many cases decisions are still made by humans, but with software assistance to different degrees. In some more advanced cases, decisions are made by the system with no human intervention, triggering the evolution of analytics systems, especially in areas such as decision management, and closing the gap between analytics and operations, which can mean boosting tighter relations between the operations, management, and strategy of an organization.

Opportunities and challenges

The opportunities for implementing machine learning within the context of Big Data, and especially Big Data analytics, are enormous. From the point of view of decision support, it can enhance the complete decision management cycle by

  1. Enhancing existing business analytics capabilities such as mining and predictive which enable organizations to address more complex problems and enhance precision of the analysis process.
  2. Enhancing the level of support for decisions by providing increased system abilities for performing adaptable data discovery features such as detecting patterns, enabling more advanced search capabilities, reinforcing knowledge discovery by identifying correlations, and many other things, much along the same line of what data mining and predictive analytics can do.
  3. Boosting the incorporation of early detection capabilities within traditional or new BI and analytics systems, a key component of modern organizations that want to anticipate or detect short-term trends that might have great impact on an organization.
  4. Enabling the process of enabling a system to perform autonomous decisions, at least at early stages, to optimize the decision process in cases where the application can decide by itself.

Many organizations that already use machine learning can be considered to be exploiting the first level of this list—improving and enabling the analysis of large volumes of complex data. A smaller number of organizations can be considered to be transitioning to the subsequent levels of Big Data analysis using machine learning.

At this point in time, much of the case for the application of machine learning is based on reinforcing the first point of the list. But aside from its intrinsic relevance, it is, in my view, in the area of early detection and automation of decisions where machine learning has a great deal of potential to help boost BI and analytics to the next level. Of course this will occur most probably alongside other new information technologies in artificial intelligence and other fields.

Many organizations that already have robust analytics infrastructures need to take steps to incorporate them within their existing BI and analytics platforms, for example, building machine learning into their strategies. But organizations that wish to leverage machine learning potential may encounter some challenges:

  1. The complexity of applying machine learning requires a great deal of expertise. This in turn leads to the challenge of gaining the expertise to interpret the right patterns for the right causes.
  2. There may be a shortage of people who can take care of a proper deployment. Intrinsically, the challenge is to find the best people in this discipline.
  3. As an emerging technology, for some organizations it still is a challenge to measure the value of applying these types of advance analytics disciplines, especially if they don’t have sufficiently mature BI and Big Data analytics platforms.
  4. Vendors need to make these technologies increasingly suitable for the business world, easing both deployment and development processes.

Despite these challenges, there is little doubt that over time an increasing number of organizations will continue to implement machine learning techniques, all in order to enhance their analytics potential and consequently mature their analytics offerings.

Some real-life use cases

As we mentioned earlier, there are a number of cases where machine learning is being used to boost an organization’s ability to satisfy analytics needs, especially for analytics applied to Big Data platforms. Following are a couple of examples of what some organizations are doing with machine learning applied to Big Data analytics, which surprisingly are tied to solving not complex scientific projects but more business-oriented ones. These cases were taken from existing machine learning and Big Data analytics vendors, which we will describe in more detail in the next post of this series:

Improving and optimizing energy consumption

  • NV Energy, the electricity utility in northern Nevada, is now using software from Big Data analytics company BuildingIQ for an energy-efficient pilot project using machine learning at their headquarters building in Las Vegas. The 270,000-square-foot building uses BuildingIQ to reduce energy consumption by using large sets of data such as weather forecasts, energy costs and tariffs, and other datasets within proprietary algorithms to continuously improve energy consumption for the building

Optimizing revenue for online advertising

  • Adconion Media Group, an important Media Company with international reach, uses software from machine learning and Big Data analytics provider Skytree for ad arbitrage, improving predictions for finding the best match between buyers and sellers of web advertising.

Finding the right partner

  • eHarmony, the well-known matchmaking site uses advanced analytics provided by Skytree to find the best possible matches for prospective relationship seekers. Skytree machine learning finds the best possible matching scenarios for each customer, using profile data and website behavior along with specific algorithms.

This is just a small sample of real use cases of machine learning in the context of Big Data analytics. There is new but fertile ground for machine learning to take root in and grow.

So what?

Well, in the context of analytics, and specifically Big Data analytics, the application of machine learning has a lot of potential for boosting the use of analytics to higher levels, and extend its use alongside other disciplines, such as artificial intelligence and cognition. But the applications need to be approached within the context of machine learning as enabler and enhancer, and must be integrated within an organizational analytics strategy.

As with other disciplines, the success of the implementation of machine learning and its evolution to higher stages needs to be ensured by an organization’s extensive adaptability to business needs, operations, and processes.

One of the most interesting trends in analytics is its increasing pervasiveness and tighter relation with all levels of an organization. As the adoption of new features increases the power of analytics, it also closes the gap of two traditionally separated worlds within the IT space, the transactional and the non-transactional, enabling analytics to be consumed and used in ways that just a decade ago were unimaginable. The line between business operations and analysis is blurrier than ever, and disappearing. The new IT space will live within these colliding worlds with analytics being performing at each level of an organization, from operations to strategy.

In upcoming posts in this series, we will address the machine learning market landscape and look at some vendors that currently use machine learning to perform Big Data analytics. And we will go a step further, into the space of cognitive systems.

In the meantime, please feel free to drop me a line with your comment. I’ll respond as soon as I can.

This Week in the DoT, 02/28

This Week in the DoT, 02/28

Yeap, Thank God is Friday.

And before you go home and hopefully have a relaxing weekend, here is a list of some interesting things that happened in the Data of Things during this week: news, events, tweets and a bit of humor.

These are some relevant things you might want to check...

Snow Boarding in Fernie, Canada by Chris Barton

In the news:

With interesting readings:

Interesting to watch:

Live from Strata 2014: In-Hadoop analytics and IBM's Watson

What Does Collaboration Among Humans and Machines Actually Look Like? Structure:Data 2013

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

And finally, to end your week with a smile:


This Week in the DoT (Data of Things)

This Week in the DoT (Data of Things)

Every Friday, starting today, I will try to post some of what in my view were the relevant events during the week for the Data of Things, including news, videos, etc.

For today, I have a short list of influencers on Twitter — in no particular order — that you might want to follow for all data-related topics. I’m sure you will enjoy their tweets as much as I do:

Claudia Imhoff       @Claudia_Imhoff

Merv Adrian           @merv
Neil Raden             @NeilRaden
Marcus Borba         @marcusborba
Howard Dresner     @howarddresner
Curt Monash           @curtmonash
Cindi Howson         @BIScorecard
Jim Harris              @ocdqblog
Julie Hunt              @juliebhunt

Of course, the list will grow in time. For now, enjoy following this group of great data experts.

Bon weekend!

BI on the Go . . . So, Who’s Using Mobile BI?

BI on the Go . . . So, Who’s Using Mobile BI?

Piggybacking off of the success of the most recent TEC Buyer’s Guide, the 2014 Buyer's Guide on Mobile BI applications, we took the opportunity to survey users of mobile business intelligence (BI) applications and collect their impressions in regard to these tools. Most of the results of this survey with more than 250 respondents were captured in an Infographic. Additional information garnered from the survey, while not conclusive, may provide a glance into the sentiment of the respondents on their use of mobile BI applications. In this post, I’ll describe some of those results, which may be useful for those organizations evaluating a new mobile BI solution for their business needs.

Most popular mobile BI apps

The top 10 mobile BI apps in use are depicted in figure 1. Microsoft takes a clear lead, followed by the other big software powerhouses SAP, Oracle, and IBM. These results are in line with what we would expect considering that most of these vendors already have large sets of BI implementations worldwide and that customers tend to choose mobile BI offerings from their existing BI provider.

Figure 1 shows that more than 8 percent of respondents either still have not implemented or are not using a mobile BI application. This is a relatively large segment of potential BI users, especially considering that most of the existing BI software providers now have mobile BI offerings, suggesting that it’s relatively effortless to put them in place. This apparent avoidance of mobile BI offerings within some organizations may stem from the following:

  • Lack of use case for mobile BI apps
  • Technical limitations to implementing a mobile BI app
  • Budget restrictions

Figure 1. Top 10 Mobile Apps Used by Respondents to TEC’s 2014 Mobile BI Survey

Other mobile BI offerings used in the organizations of the survey respondents come from QlikView, MicroStrategy, and Tableau—all great promoters of mobile offerings in the BI space, which are rapidly increasing their footprint in not only mobile BI, but also the mobile space overall. The remaining mobile BI offerings cited in our survey come the long-time and well-known BI player Information Builders; Infor, a traditional player in the enterprise resource planning (ERP) space which has been growing its BI presence; and, last but not least, an experienced BI player from the open-source community, Pentaho, which now has a robust mobile BI solution covering most, if not all, the aspects of an enterprise BI solution.

Who’s using mobile BI and how?

We also wanted to determine who is using mobile BI solutions and which mobile BI applications. When we took our top 10 provider list and segmented it in terms of the size of the company our survey respondents work for, some results became immediately apparent. Microsoft was by far the most widely used mobile BI solution by companies with 1 to 50 employees, while SAP was the most popular solution among companies with 51 to 100 employees (see figure 2). On the other hand, there seems to be healthy competition between the big four (IBM, SAP, Microsoft, and Oracle) in the large enterprise segment, with increased presence of other players such as Information Builders, MicroStrategy, QlikView, and Tableau.

Figure 2. Top 10 Mobile BI Apps Used by Respondents According to their Company Size (TEC’s 2014 Mobile BI Survey)

Figure 2 shows that the most widely used mobile BI offerings are from Microsoft and SAP regardless of respondent’s company size—from small companies with 1 to 50 employees to large enterprises with more than 10,000 employees. These results may reflect the intense efforts both these vendors have undertaken to evolve their enterprise BI solutions with new mobile technologies and capabilities to enable customers to use mobile more seamlessly.

If we look at the type of industry the respondents’ companies belong to, we can see that the top 10 industries in figure 3. The computer, IT, and software industry takes the lead in usage of mobile BI solutions, a field of course that typically spurs trends in technology adoption. Additionally, business services and consulting and manufacturing are in second and third place, respectively, followed by finance and banking in fourth place. All these industries are in my opinion justified in their need for mobile services, as are many of their lines of business. I have to admit that I was surprised to find hotels and restaurants in the top 10 industries using mobile BI offerings, not because there is no use case for mobile BI in that industry, but because there are other industries which according to previous research appear to be more amenable to the adoption of mobile BI solutions. Examples include utilities and service industries.

Figure 3. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

If we dig a little deeper and look at what mobile BI apps are used by the top 10 industries, we see that Microsoft still leads the pack with dominant presence in the hotels and restaurants industry, and in finance and banking. On the other hand, SAP shows a prominent dominance in hotels and restaurants, and in the finance and banking and manufacturing industries.

It is noteworthy to mention that QlikView, among other vendors, has a strong presence in electronics and high-tech. Organizations in these areas typically have great technical expertise, attesting to QlikView’s technical and functional capabilities.

Figure 4. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

Additionally, Oracle shows mobile BI presence nearly across the board, from the software industry to electronics and banking. Furthermore, the three powerhouses Oracle, SAP, and Microsoft dominate the mobile BI usage in the insurance and finance and banking industries.


Based on the results of our survey on mobile BI usage, we can see that the four main players—Microsoft, SAP, IBM and Oracle—are well positioned in the mobile BI market, pretty much inheriting success from their high-profile BI solutions.

Other vendors such as Tableau, QlikView, MicroStrategy, and Information Builders are rapidly establishing themselves as major BI providers and making their presence known on the mobile BI stage.

Though the information presented in this post can be considered neither conclusive nor extensive, it can be used as a good starting point or basic point of reference for gauging what mobile BI solutions your peers in companies with a similar size as yours and in the same industry are using. This information may be useful before you embark upon the venture of acquiring a new mobile BI solution or replacing the one you already have.

Stay tuned for a second post on the survey, where I will present the most requested mobile BI functionality and the users’ level of satisfaction with their mobile BI offerings.

Link to original article

Machine Learning and Cognitive Systems, Part 1: A Primer

Machine Learning and Cognitive Systems, Part 1: A Primer

IBM’s recent announcements of three new services based in Watson technology make it clear that there is pressure in the enterprise software space to incorporate new technologies, both in hardware and software, in order to keep pace with modern business. It seems we are approaching another turning point in technology where many concepts that were previously limited to academic research or very narrow industry niches are now being considered for mainstream enterprise software applications.

Image by Penn Sate

Machine learning, along with many other disciplines within the field of artificial intelligence and cognitive systems, is gaining popularity, and it may in the not so distant future have a colossal impact on the software industry. This first part of my series on machine learning explores some basic concepts of the discipline and its potential for transforming the business intelligence and analytics space.

So, what is machine learning anyway?

In simple terms, machine learning is a branch of the larger discipline of Artificial Intelligence, which involves the design and construction of computer applications or systems that are able to learn based on their data inputs and/or outputs. Basically, a machine learning system learns by experience; that is, based on specific training, the system will be able to make generalizations based on its exposition to a number of cases and then be able to perform actions after new or unforeseen events.

The discipline of machine learning also incorporates other data analysis disciplines, ranging from predictive analytics and data mining to pattern recognition. And a variety of specific algorithms are used for this purpose, frequently organized in taxonomies, these algorithms can be used depending on the type of input required (a list of algorithms can be found in Wikipedia based on their type).

As a discipline, machine learning is not new. Initial documents and references can be traced back to the early fifties with the work of Alan Turing, Arthur Samuel, and Tom M. Mitchell. And the field has undergone extensive development since that time.

One of the more important applications of machine learning is to automate the acquisition of knowledge bases used by so-called expert systems, systems that aims to emulate the decision making process of human expertise in a field. But the scope of its application has been growing. In Applications of Machine Learning and Rule Induction, Langley and Simon review some major paradigms for machine learning scenarios, all based on a very important premise:

To improve performance on some task, and the general approach involves finding and exploiting regularities in training data.

The major approaches include using neural networks, case-based learning, genetic algorithms, rule induction, and analytic learning. While in the past they were applied independently, in recent times these paradigms or models are being used in a hybrid fashion, closing the boundaries between them and enabling the development of more effective models. The combination of analytic methods can ensure effective and repeatable and reliable results, a required component for practical usage in mainstream business and industry solutions.

According to A Few Useful Things to Know about Machine Learning, while the discipline by itself is far from simple, it is based on a simple (but not simplistic) principle:



  • representation means the use of a classifier element represented in a formal language that a computer can handle and interpret; 
  • evaluation consists of a function needed to distinguish or evaluate the good and bad classifiers; and
  • optimization represents the method used to search among these classifiers within the language to find the highest scoring ones.

As the paper states:

The fundamental goal of machine learning is to generalize beyond the examples in the training set.

This way the system can infer new decisions or correct answers that then serve to increase learning and optimize accuracy and performance.

Also, each component of the machine learning process comprises a good mix of mathematical techniques, algorithms, and methodologies that can be applied (Figure 1).

Figure 1. The Three components of learning algorithms. Source: A Few Useful Things to Know about Machine Learning

In this context, machine learning can be done by applying specific learning strategies, such as:

  • a Supervised strategy to map the data inputs and model them against desired outputs, and
  • an unsupervised strategy, to map the inputs and model them to find new trends.

Derivative ones that combine these for a semi-supervised approach and others are also be used. This opens the door onto a multitude of applications for which machine learning can be used, in many areas, to describe, prescribe, and discover what is going on within large volumes of diverse data.

The increasing presence of machine learning in business, especially for analytics
Thanks to the success of the application of machine learning within certain disciplines such as speech recognition, computer vision, bio-surveillance, and robot control, the interest in and adoption of machine language technologies has grown, particularly over the last decade. Interesting also is how, in many fields, machine learning is escaping its containment from science labs to reach commercial and business applications.

There are several scenarios where machine learning can have a key role: in those systems that are so complex that algorithms are very hard to design, or when an application requires the software to adapt to an operational environment, or with complex systems that need to work with extensive and complex data sets. In this way, machine learning methods play an increasing role not just in general in the field of computer science, but also in terms of enterprise software applications, especially for those types of applications that need in-depth data analysis and adaptability. These areas include analytics, business intelligence, and Big Data.

Why business intelligence and Big Data?

In 1958, H. P. Luhn wrote what perhaps is the first document on Business Intelligence. The abstract begins as:

An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This paper shows the flexibility of such a system in identifying known information, in finding who needs to know it and in disseminating it efficiently either in abstract form or as a complete document.

The premise of BI systems has remained pretty much the same: to collect an organization’s data from disparate sources and process it in the best possible way to produce useful information to—and perhaps this is the most important part—help decision makers to make the best informed decision for the benefit of an organization. A simple definition, not a simple task.

In this regard, Business Intelligence has been adapting and evolving with greater or lesser degrees of accuracy and failure to provide information workers with the ability to make these decisions, and has played a very important role in the decision support platform of many organizations.

This evolution has changed the role of BI systems not just to provide high-level decision support at a strategic level, but to inform an increasing number of areas involved with middle management and operations. It has also has increased the need for BI systems and initiatives to evolve so that they are able to deal with increasingly complex data analysis problems. Applications need to be boosted so that they can deal with larger and complex amounts of data and can not only provide current status results, but also predict, play with hypothetical scenarios, and finally learn to make accurate suggestions—a green field for machine learning (Figure 2).

Figure 2. Some factors triggering the need for faster, better, and improved ways for decision support, analytics, and BI systems

A good model to understand the evolution of BI systems is D. J. Power’s  history of decision support systems, which of course BI is an important part of. According to Mr. Power, decision support systems and applications have evolved in the following stages:

  1. Model Driven. Emphasizes access to and manipulation of financial, optimization and/or simulation models. Simple quantitative models provide the most elementary level of functionality. Use limited data and parameters provided by decision makers to aid decision makers in analyzing a situation, but in general large data bases are not needed for model-driven. 
  2. Data Driven. In general, a data-driven DSS emphasizes access to and manipulation of a time-series of internal company data and sometimes external and real-time data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality. Data warehouse systems that allow the manipulation of data by computerized tools tailored to a specific task and setting or by more general tools and operators provide additional functionality. Data-Driven DSS with On-line Analytical Processing OLAP provide the highest level of functionality and decision support that is linked to analysis of large collections of historical data.
  3. Communications Driven. Communications-driven DSS use network and communications technologies to facilitate decision-relevant collaboration and communication. In these systems, communication technologies are the dominant architectural component. Tools used include groupware, video conferencing and computer-based bulletin boards.
  4. Document Driven. Uses computer storage and processing technologies to provide document retrieval and analysis. Large document databases may include scanned documents, hypertext documents, images, sounds and video. Examples of documents that might be accessed by a document-driven DSS are policies and procedures, product specifications, catalogs, and corporate historical documents, including minutes of meetings and correspondence. A search engine is a primary decision-aiding tool associated with a document-driven DSS. These systems have also been called text-oriented DSS.
  5. Knowledge Driven. Knowledge-driven DSS can suggest or recommend actions to managers. These DSS are person-computer systems with specialized problem-solving expertise. The "expertise" consists of knowledge about a particular domain, understanding of problems within that domain, and "skill" at solving some of these problems.

Within these descriptions there are clear elements in place to boost the adoption of technologies and methodologies such as machine learning—collaboration, intensive management, and increase of non-traditional data (relational). The need for systems to solve complexity coincides with the advent of phenomena such as Big Data and advanced analytics in business giving a natural space for machine learning to make the entrance to help crunching big sets of complex data, and to be part of the increasingly complex machinery in place for data analysis and decision making.

Along with disciplines like data mining, natural language processing, and others, machine learning is being seen in business as a tool of choice for transforming what use to be a business intelligence application approach into a wider enterprise intelligence or analytics platform or ecosystem, which goes beyond the traditional scope of BI—focused on answering “what is going on with my business?”—to give all possible answers to “why are we doing what we’re doing?” and “how can we do it better?” and even “what should we do?”.

As businesses models are becoming more complex and producing massive amounts of data to be handled with less and less latency, decision support and BI systems are required to grow in complexity and in their ability to handle those volumes of data. This demand is boosting the growth of more sophisticated solutions to address specific business and industry problems; it’s not enough to sit out a straightforward result, systems need to provide business guidance.

Some scenarios where machine learning is gaining increased popularity in the context of analytics and BI can be found in applications for risk analysis, marketing analytics, and advanced analytics for Big Data sources.

Machine learning is a reality for business

As Tim Negris states in Getting ready for machine learning:

Despite what many business people might guess, machine learning is not in its infancy. It has come to be used very effectively across a wide array of applications.

And it’s being increasingly adopted within many analytics, Big Data, and business intelligence initiatives in the form of a component laying side-to-side with other analytics solutions, or packaged within a solution that has already adopted it as part of its functional stack.

In either case, machine learning is preparing to be part of the next evolution of enterprise intelligence business offerings.

In the next part of this series on machine learning, I will address some specifics of the usage of machine learning as part of Big Data and Advanced Analytics, as well as its role in the formation of the new so-called area of cognitive systems. In the mean time, please share comments below and let me know your thoughts.

Hello World!

Hello World!

There is a first time for everything... at least, that’s what my father used to say, and sometimes he was right. As I have been blogging for quite some time for my employers or through other channels, I think the time has come for me to have a personal blog that allows me a bit more freedom to explore what might be closer to my personal interest, where I can let go a bit, and include a deeper (or not) and personal view on topics concerning data:

Data in its several forms, with multiple layers, and from many perspectives. From traditional databases to new databases, from small to big data, simple to complex events. Intelligent and not so intelligent data.

Hello to the Data of Things!

I want to start with the iconic Hello World! phrase because it marked one of the most important moments in my career in IT. The phenomenal book written by Brian W. Kernighan and Dennis Ritchie called “The C programming language” was my introduction to the world of C and UNIX, which led, eventually, via a software programming career, to the challenging and awesome experience of data mingling.

Brian Kernighan playing tribute to Dennis Ritchie at Bell Labs

Data has become a fundamental material for almost all human activities in our lives, and as this presumably will not change and by the contrary will be reinforced, we need to think about data as a key driver in current and future human life. This blog will be devoted to talk about data, the technology, and the people who work with it, from its source, its processing, and movement, to its destination. People are changing our lives by using data in a unique or special way.

So, dearest reader, this blog is devoted to the Data of Things, from data sources and targets, the technologies involved, and those who produce it, use it, and manage it, … and maybe more.

A huge chunk to bite off, I know, but a delicious one, too. :)

Of course, do not hesitate to comment, discuss, and make this blog live… You just need to use the comment space below to start the conversation.

Privacy Policy

Copyright © 2017 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa