The BBBT Sessions: Outlier, and the Importance of Being One

The BBBT Sessions: Outlier, and the Importance of Being One

It has been some time since my last write up about my briefings with the Boulder Business Intelligence Brain Trust (BBBT), multiple business engagements and yes, perhaps a bit of laziness can be attribute to it.

Now I have the firm intention to regain coverage of these series of great analyst sessions a more regular basis, hoping of course, my hectic life will not stand in my way.

So, to resume my coverage of this great series of sessions with software vendors and analysts, I have picked one that, while not that recent, was especially significant for the BBBT group and the vendor itself. I’m talking about a new addition to the analytics and BI landscape called Outlier.

Members of the BBBT and myself had the pleasure to be witness of the official launch of this new analytics and business intelligence (BI) company and its solution to the market.

Outlier presented its solution to our analyst gathering in what was an appealing session. So here, a summary of the session and info about this newcomer to the BI and Analytics space.

About Outlier

Outlier, the company, was founded by seasoned tech entrepreneur Sean Byrnes (CEO) and experienced data scientist Mike Kim (CTO) in 2015 in Oakland, Ca. with founding of First Round CapitalHomebrew, and Susa Ventures.

Devoting more than year to develop the new solution, Outlier maintained it in beta through most of 2016, to be finally released in February of 2017 aiming to offer users a unique approach to BI and analytics.

With its product named after the company, 33,3 333Outlier aims to be well, precisely that by offering a different approach to analytics, so that it:

“Monitors your business data and notifies you when unexpected changes occur.�

Which means that, rather than taking a reactive approach in which the system waits for the business user to launch the analytics process, the system will take a proactive approach and signal or alert when these changes occur, triggering action from analysts. 

Now to be honest, this is not the first time I hear this claim from a vendor and frankly, as many modern BI solutions incorporate more sophisticated alerting mechanisms and functionality I’m less concerned on hearing it and more on discovering how each software provider addresses the issue of making analytics and BI solutions able to be proactive.

During the session, Sean Byrnes and Doug Mitarotonda, CEO and Head of Customer Development respectively, gave us a great overview of Outlier’s new approach to BI and analytics. Here, a summary of this great briefing.

Outlier AI and a New Analytics Value Chain

Being data scientists themselves, Outlier’s team understands the hardships, complexities and pains data scientists and business analysts undergo to design, prepare and deploy BI and analytics solutions so, by considering this and aiming to take a fresh approach Outlier was born, aiming to provide a new approach to business intelligence.

The approach developed by Outlier intends to ―opposed to creating dashboards or running queries against business data analysis requirements― watch consistently and automatically business data and alert of when unexpected changes occur so to do this.

Outlier connects directly to a number business data types in the likes of Google Analytics, Adobe Cloud, Salesforce, Stripe, SQL databases and many others to, then, automatically monitor or watch the data and alert of unexpected behavior.

Along with the ability to proactively monitor business data and alert of changes, Outlier can sift through metrics and dimensions, aiming to understand and identify business cycles, trends and patterns to automate the business analysis process and consequently, position themselves in the realm of a new generation of BI solutions (Figure 1).

Figure 1. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

During the BBBT session with Outlier, one key thing brought up by Sean Byrnes was the fact that the company’s leadership understands the analytics and business intelligence (BI) market is changing and yet, many companies are still struggling now, not with the availability of data but with the questions themselves, as the analytics formulation process becomes increasingly complex.

According to the company, as part of a process aimed to automate the monitoring and analytics process and, to help users ease its regular monitoring process, once deployed Outlier can provide daily headlines from key business dimensions, enabling them to ask those critical questions in the knowing there will be a regular answer but still enable them to formulate also new ones to keep discovering what is important (Figure 2).

Figure 2. Outlier’s positioning themselves as new generation BI (Courtesy of Outlier)

Interestingly, I find this process to be useful, especially to:
  • Carry on with common data analysis and reporting tasks and above all that can truly automate the analytics process so it can detect when a significant change occurs.
  • Take a proactive approach to encapsulate the complexities of data management and present insights in a proper way for users to make business decisions ―act on data―
  • Filter data to recognize what is important to know when making a decision.

Outlier: It is not Just About the Common, but the Uncommon

Today many organizations can know how much they sold last month or, how much they spend on the last quarter, those are relevant yet, common questions that can be answered with relative ease but today, it is now also about discovering not just answers but new questions that can unveil new key insights, opportunities, and risks.

Outlier identified this as a key need and acted upon it, knowing that sometimes constructing the infrastructure to achieve it can become far more than a trivial task, as it many times forces organizations to radically modify existing traditional BI platforms to accommodate the introduction of new or additional analytics capabilities ―predictive, mining, etc.― that might easily fit or not with existing BI solutions within an organization.

Outlier aims to automate this process by making it possible for organizations connect directly with various sources a business analyst take data from to guide him through an automation of the monitoring process.

One key aspect of Outlier I find worth mentioning is how the company strives to augment rather than replace the capabilities of existing analytics and data management solutions, and trying to fit within a specific point of what the company calls the analytics value chain (Figure 3).

Figure 3. Outlier’s Analytics Value Chain Proposition (Courtesy of Outlier)

During the demo session, other relevant aspects of Outlier include its functionality for providing new and useful functional elements like the inclusion of headlines and dashboards or scorecards that include nicely a combination of graph and textual information (Figure 4), a large set of connectors for different data sources including traditional databases and social media sources.

Also, worth mentioning is the effort Outlier is doing for educating potential users in the field of BI and Analytics and, of course, the potential use of Outlier in different industries and lines of business by making available a section in their portal with helpful information ranging how to analyze customer acquisition cost to performing customer segmentation.

Figure 4. Outlier’s Screencap (Courtesy of Outlier)

Outlier and a New Generation of BI and Analytics Solutions

As part of a new wave of solutions developing and providing analytics and BI services, Outlier is constantly working in the introduction of new technologies and techniques to the common functional portfolio of data analysis tasks, Outlier seems to have countless appealing functions and features to modernize the current state of analytics.

Of course, Outlier will face significant competition from other incumbents already in the market such is the case for Yellowfin, Board, AtScale, Pyramid Analytics and others but, if you are in the search or just knowing about new analytics and BI offerings, it might be a good idea to check out this new solution if you think your organization requires an innovative and agile approach to analytics, with full monitoring and alerting capabilities.

Finally, you can start by checking, aside its website some additional information right from the BBBT, including a nice podcast and the session’s video trailer.
Book Commentary: Predictive Analytics by Eric Siegel

Book Commentary: Predictive Analytics by Eric Siegel

As much as we’d like to imagine that today the deployment and use of predictive analysis has now become a commodity for every organization and it’s of use in every “modern� business.

The reality is that in many cases an number of small, medium and even large organizations are still not using predictive analytics and data mining solutions as part of their core software business stack.

Reasons can be plenty: insufficient time, budget or human resources as well as a dose of inexperience and ignorance of its real potential benefits can be the cause. This and other reasons came to my mind when I had the opportunity to read the book: Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die written by former Columbia University professor and founder of Predictive Analytics World conference series, Eric Siegel.

Aside from being a well and clear written book filled with examples and bits of humor to make it enjoyable, what makes this book in my view a one is, mostly written for a general audience, in plain English, which makes it a great option for those new in the field to fully understand what predictive analytics is and the potential effect and benefits for any organization.

With plenty of industry examples and use cases, Mr. Siegel neatly introduces the reader to the world of predictive analytics, what it is, and how this discipline and its tools are currently helping an increasing number organizations in the likes of Facebook, HP, Google, Pfizer ―and other big players in their fields― to discover hidden trends, predict and plan for making better decisions with data.

Another great aspect of the book is also its clear and easy explanation of current important topics including data mining and machine learning as a key to further advanced topics including artificial intelligence and deep learning. It also does a good job of mentioning some caveats and dangers making wrong assumptions when using predictive analytics.

I especially enjoyed the central section of the book filled with examples and use cases predictive analysis in different industries and lines of business like healthcare financial and law enforcement among others as well as list of resources listed at the end of the book.

Of course for me, being a practitioner for many years, there was a small sense of wanting a bit more technical and theoric details but yet, the book is a great introduction reference for both novices that need to get the full potential of predictive analytics and those familiar with the topic but that want to know what their peers are doing to expand their view of the application of predictive analysis in their organization.

If you are still struggling to understand what predictive analysis is and what benefits can offer to your organization can do to improve your decision making and planning abilities, or want a fresh view on what are the new use cases for this discipline and software solutions, Predictive Analytics from Eric Siegel is certainly reference you should consider having in your physical or virtual bookshelf.

Have you read the book? About to do it? Don’t be shy, share your comments right below…

BOARD International: Cognitive, Mobile, and Collaborative

BOARD International: Cognitive, Mobile, and Collaborative

Business Intelligence (BI) and Enterprise Performance Management (EPM) software provider BOARD International recently released version 10.1 of its all-in-one BI and EPM solution. This release includes new user experience, collaboration, and cognitive capabilities, which will enable BOARD to enter into the cognitive computing field.

By incorporating all these new capabilities into its single BI/EPM offering, BOARD continues to uphold its philosophy of offering powerful capabilities within a single platform.

With version 10.1, BOARD aims to improve the way users interact with data significantly. The new version’s interface introduces new user interaction functionality in areas such as user experience and storytelling and is a major improvement on that of the previous version.

BOARD gave me an exclusive overview of the main features of version 10.1 and the company's product roadmap. Read on for details.

Getting On Board with Cognitive Technologies

With version 10.1, BOARD seems to be making its solution fit for a new era centered on machine learning. The solution uses natural language recognition (NLR) and natural language generation (NLG) capabilities to offer users new ways to interact with data (see Figure 1).

Figure 1. BOARD’s assistant (image courtesy of Board International)

For instance, users can now create an entire report in a drag-and-drop interface. They can also directly ‘talk’ to the system through spoken and written language. The system uses search-like strings that automatically translate human speech into words, words into queries, queries into reports, and finally reports that include the most important insights from the source information.

One key aspect of these features is that users can create a report by simply writing a search string or request. Specifically, BOARD uses a fuzzy search mechanism that searches the string for character sequences that are not only the same but similar to the query term to transform this request into a machine-generated report (Figure 2).

Figure 2. BOARD’s machine-generated report analysis (image courtesy of Board International)

BOARD can also identify, recover, and list reports that match the search criteria, such as reports generated by other users. This capability speeds up the solution development process by enabling users to identify existing work that can be used for a new purpose.

In-context Collaboration

BOARD has also improved its collaboration strategy, specifically by facilitating communication between users. The vendor has introduced an in-context collaboration feature that enables users to share their analyses, communicate via live chat, and enabling multiple users to edit and author reports in a single interface. Embedded security (Figure 3) ensures users have the right level of access and defines user groups. This enables users to share analytics securely and seems to improve the overall analysis of data and the development of analytics apps.

Figure 3. BOARD’s embedded collaboration features (Courtesy of Board International)

User Experience and Storytelling

BOARD is also continuing to focus heavily on customer experience and functional efficiency.

The latest version of BOARD’s BI and EPM platform has redesigned user interfaces, including a color-coded tile menu with icons to improve hierarchy management and touchscreen usability. In addition, the configuration panel now offers more time and analytics functions.

10.1 also introduces Presentations—a new storytelling capability that enables users to personalize their reports and save them as a live presentation. This enables users to share presentations that incorporate live information rather than static images and graphs with other users and groups, improving user collaboration.

This new feature lets BOARD stay up to date with current trends in BI and compete with other players in the field that already offer similar capabilities, such as Tableau and Yellowfin.

Mobility, Cognitive Capabilities, and Collaboration: BOARD’s Bet for the Future

BOARD also explained that it‘s paving the way for medium- and long-term product advancements.

In its latest release, BOARD has ensured its HTML 5- based client will replicate all the functionality of its existing Windows client interface in future. This will enable users to choose between mobile and desktop devices.

10.1 also introduces, new mobile apps and add-ons, which widen BOARD’s intrinsic analytics and data management capabilities and the solution’s mobile functions and features.   The company is also currently reinforcing the product’s interaction with the Microsoft Office software stack in a continuous effort to help users increase productivity. This will help users conduct BI and EPM analysis more easily as they will have access to embedded analytics services within the standard Office applications such as Word and Excel.

Lastly, 10.1 also includes more features for accessing big data sources and cloud-based technologies and has partnered with cloud CRM and Business Software leader’s also worth noting that BOARD is now expanding its North American presence. Specifically, the vendor is increasing the number of its human and material resources to reinforce its marketing and sales efforts and support and services capabilities.

BOARD 10.1 offers a good balance of analytics and enterprise performance management capabilities. It could be a solution for those looking to start using analytics or enhance their existing analytics capabilities.

(Originally published on TEC's Blog)
2017 Teradata Influencer Summit: Blending In on the New Management Era

2017 Teradata Influencer Summit: Blending In on the New Management Era

A couple of weeks ago I was fortunate to be invited to attend the 2017 influencer summit event in the beautiful venue chosen by Teradata in La Joya, California. Aside from the beautiful venue, a great event took place, one which was insightful, interesting and, well fun. A confirmation of Teradata’s evolution in both technical and b
usiness sides and a confirmation that the IT and software industry have radically changed in the last couple of years.

Since last year’s partners Conference and influencer events, Teradata keeps moving forward with its evolutive process to adapt to the new business and technical dynamics of the market. These year’s event allowed analysts, pundits and influencers alike to have a glimpse of what Teradata is doing to deliver value to customers and new customers.

More Analytics, More Integration, More Scale...

In its continuous effort Teradata is making sure its offerings are available in all shapes and forms, more precisely, in all major Cloud and on-premises flavors as part of Teradata’s everywhere strategy. This includes launching Teradata in the Azure marketplace, and increasing geographic coverage for its own Manage Cloud. In the same pace, the company is also working to rapidly adjust to business and industry changes to continuously improve solution delivery and services.

Right from the get-go John Dining, Teradata’s Executive Vice President & Chief Business Officer, gave us a clear overview the enterprise analytics and data management software provider is working on different strategic paths to ensure the company remains a top of its industry market segment.

John Dining presenting at Teradat's 2017 Influencer Summit Event

One key and noteworthy aspect of this overall strategy is Teradata’s bold approach and continuing effort to match its product development with the creation of a business coherent proposal via three areas:

  • Reinforcing its multi-genre analytics strategy, which means widening the offering of analytics capabilities to strengthen user’s capabilities in areas such as text, path and graph analysis, among others.
  • Boldening Teradata’s power to perform more versatile and flexible data movement and integration operations to support an increasing number of sources and complex operations with data. This includes increasing Teradata’s ability to incorporate intelligence and automation for data management operations as well as developing vertical solutions for specific areas such as communications, finance or lines of business like marketing and devops.
  • Increasing Teradata’s ability to scale according with customer's’ needs, especially for those with big data management demands.

One important takeaway here in my view is Teradata’s clear path from a technical perspective, focusing on real technical challenges to be addressed by a majority of organizations and yet, at the same time, changing its message to be less technical and more business oriented to provide clarity especially to the enterprise market, a market they know perfectly well.

Blended Architectures are in the Future Oh! and Yes, they Need Service

In a time where organizations seem to be increasingly reluctant to invest in consulting services and keen to look for vanilla deployment solutions, Teradata seems to be taking a more realistic approach.

On one hand, by putting specific measures to reinforce its services business, and on the other, by clearly acknowledging that blended architectures and hybrid deployments will be the norm in the following years or at least for the time being, which means having high quality consulting and services can be key to ensure success, especially in complex analytics deployment scenarios.

Aside from their incumbent software solutions, by taking aim to restructure its service and consulting areas, Teradata aims to have a better position to act upon these complex deployments that require specialized services.

According to Teradata, the company has been working to consolidate its services areas, via important acquisitions in the likes of ThinkBig, Claraview and BigData Partnership, as well as working to integrate them into a coherent service model, its Teradata Global Services Initiative.

The initiative is prepared on three main areas through:

  • Think Big Analytics, the global analytics consultancy group group leading with expertise in areas such as data science, solution development and data visualization for different industries and functions.
  • Enterprise Data Consulting, the technology-enabled group with strong expertise on analytical ecosystems, providing services ranging from architecture, data management & governance, managed services, as well as security 
  • Customer Services, the group responsible for providing value and availability of analytic platforms via change management services, and with expertise in systems and software management 

The strategy seems to be well complemented with the inclusion of a complete business value framework that, asides from a comprehensive analytics strategy for customers, education and the inclusion of Teradata’s Rapid Consulting Engagement (RACE) strategy, is aimed to help customers leverage comprehensive solution in a matter of weeks and providing “agile� development models for is customers.

Teradata’s approach seem to make perfect sense, enabling the company to grow efficiently on the technology side, especially towards a hybrid cloud approach while ensuring the offering of high quality consulting services.

Now, can this approach carry challenges for the company?

It is possible, perhaps one challenge for Teradata will be to ensure successful delivery, especially in areas where being “agile� is a must, especially talking about big data and data science projects which more often than not tend to require fast times to deployment so, Teradata will need to make sure consulting, educational and all service offerings are fine tuned, and in tune with its software and hardware offerings own evolution.

For this then, the company is working to consolidate its technical and business messaging to the company’s strategy towards: the offering of hybrid cloud Solutions, business analytics solutions and full fledged ecosystem architecture consulting.

So, part of his strategy includes, aside from reinforcing its go to cloud strategy, accelerating its existing release calendar to offer three major release a year for its flagship product Teradata Database, reinforcing its Intelliflex data warehouse appliance with new functionality and the launch of Teradata Intellibase, Teradata’s compact environment for data warehousing and continued evolution of Intellicloud, the company’s secure managed cloud offering.

So, on the Big Picture...

Many more things happened and were revealed by Teradata, both publicly and under disclosure, but from a personal view, what still sticks with me as the relevant story is how Teradata is managing to keep its transformation at a pace and form that continuous to have a fine balance between its more “traditional� data management customers and its new customers to ensure both offerings ranging in the “typical� data warehousing and analytics space and those that require innovation via new advanced analytics and big data ecosystems.

Challenges may still wait ahead for Teradata due to an increased and more fierce competition but the data warehousing company seems to be adapting well to the new data management era.

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

DomoPalooza 2017: Flare, Stravaganza…and Effective Business Management

Logo courtesy of DOMO , Inc.
When you decide to show up at Domopalooza, Domo’s big user event, you don’t know for sure what you will find, but from the very beginning you can feel that you’ll have a unique experience. From the individual sessions and training, the partner summit and the concert line-up, to what might come from Domo’s CEO/rock-star Josh James, who certainly is one of a kind in the software industry; you know that you’ll witness a delightful event.

This year under the strings of Styx’s, Mr. James kicked off an event that amalgamated business, entertainment, fun and work in a unique way —a very Domo way.

With no more preambles, here is a summary of what happened during Domo’s 2017 DomoPalooza user conference.

Josh James at DomoPalooza 2017 (Photo courtesy of DOMO)
Key Announcements

Before entering to the subjective domain of my opinion about Domo’s event and solutions, let’s take a minute to pin point some of the important announcements made previous and during the event:
  • The first news came some days before the user event, when Domo announced its new model for rapid deployment dashboards. This solution consists of a series of tools that accelerate and ease the dashboard deployment process. Starting with its large number of connectors to diverse data sources, to a set of pre-installed and easy to configure dashboards, this model will enable developers deploy dashboards quickly and easily that decision makers can use effectively.
  • The next important announcement occurred during the conference. Domo came out with the release of Mr. Roboto —DOMO’s new set of capabilities for performing machine learning, predictive analytics and predictive intelligence. According to DOMO, the new offering will be fully integrated within DOMO’s business cloud, aiming for fast and non-disruptive business adoption. Two major features from Mr. Roboto include Alerts Center, a personalized visual console powered by advanced analytics functionality to provide insights and improve decision making. The other is its data science interface to enable users to apply predictive analytics, machine learning and other advanced analytics algorithms to its data sets. This is for sure one product I’m looking forward to analyzing further!

The introduction of new features, especially directed to narrow the technical-business gap within the C-Suite of an organization, and to facilitate decision makers an easier and customized access to insights, will enable business management and monitoring using DOMO. Some of these features include the introduction of:
  • Annotations, so information workers and decision makers can highlight significant insights in the process on top of a chart or data point. Enhancement to its Analyzer tool with the incorporation of a visual data lineage tool to enable users to track data from source to visualization.
  • Data slicing within DOMO’s cards to create more guided analysis paths business users and decision makers can take advantage of. 
  • More than 60 chart families to enhance the rich set of visual options already within DOMO’s platform. 

DOMO’s new features seem to fit well within a renewed effort from the company to address bigger enterprise markets and increase presence within segments which traditionally are occupied by other enterprise BI contenders.

It may also signal DOMO’s necessary adaptive process to comply with a market currently in a rampage for the inclusion of advanced analytic features to address larger and new user footprints within organizations, such as data scientists and a new more tech savvy generation of information workers.

There is much more behind Domo’s Curtains

Perhaps the one thing I did enjoy the most about the conference was having a continuous sense of discovery —different from previous interactions with DOMO, which somehow left me with a sense of incompletion. This time I had the chance to discover that there is much more about DOMO behind the curtains.

Having a luminary as CEO, such as Josh James, can be a two-edged sword. On one side, its glowing personality has served well to enhance DOMO’s presence in a difficult and competitive market. Josh has the type of personality that attracts, creates and sells the message, and with no doubt drives the business.

On the other end, however, if not backed and handled correctly, his strong message can create some scepticism, making some people think a company is all about a message and less about the company’s substance. But this year’s conference helped me to discover that DOMO is way more than what can be seen in the surface.

Not surprising is the fact that Josh and Chris Harrington —savvy businessmen and smart guys— have been keen to develop DOMO’s business intelligence and analytics capabilities to achieve business efficiency, working towards translating technical complexity into business oriented ease of use. To achieve this, DOMO has put together, on the technical side, a very knowledgeable team lead by Catherine Wong and Daren Thayne, DOMO’s Chief Product Officer and Chief Technology Officer respectively, both with wide experience. Their expertise goes from cloud platforms and information management to data visualization and analysis. On the business side, an experienced team that includes tech veterans like Jay Heglar and Paul Weiskopf, lead strategy and corporate development, respectively.

From a team perspective, this balance between tech experience and business innovation seems to be paying off as, according to them, the company has been growing steadily and gaining the favour of big customers such as TARGET, Univision or Sephora,  some of the customers that were present during the event.

From an enterprise BI/Analytics perspective, it seems DOMO has achieved a good balance in at least two major aspects that ensure BI adoption and consumption:

  • The way BI services can be offered to different user groups— especially to the C-level team— which requires a special degree of simplification, but at the same time an efficiency in the way the data is shown.
  • The way BI services can encapsulate complex data processing problems and hide them from the business user. 

Talking about this topic, during the conference we had the chance to see examples of the aforementioned aspects, both onstage and offstage. One with Christel Bouvron,  Head of Business Intelligence at Sephora Southeast Asia. Christel commented the following, in regards to the adoption and use of DOMO:

“We were able to hook in our data sets really quickly. I had sketched out some charts of what I wanted. They didn’t do that, but what they did was even better. I really liked that it wasn’t simply what I was asking for – they were trying to get at the business problem, the outcomes we were trying to get from it, and think about the bigger picture.�

A good example of the shift DOMO wants to convey is that they are now changing the approach from addressing a business problem with a technical perspective, to addressing the business problem with business perspective but having a technical platform in the background to support it. Of course this needs to come with the ability to effectively encapsulate technical difficulties in a way that is efficient and consumable for the business.

Christel Bouvron at DomoPalooza 2017 (Photo coutesy of DOMO)

It was also good to hear from the customers that they acknowledge that the process wasn’t always that smooth, but it helped to trigger an important cultural shift within their organization.

The takeaway

Attending Domopalooza 2017 was informative and very cool indeed. DOMO’s team showed me a thing or two about the true business of DOMO and its interaction with real customers; this includes the fact that DOMO is not a monolithic solution. Besides its already rich set of features, it enables key customization aspects to provide unique customers with unique ways to solve their problems. While DOMO is a software rather than a service company, customers expressed satisfaction with the degree of customization and services DOMO provides —this was especially true with large companies.

DOMO has done a great job to simplify the data consumption process in a way that data feeds are digestible enough. The solution concentrates more on the business problem rather than the technical one, giving many companies the flexibility and time to make the development of business intelligence solutions more agile and effective. Although these results might not be fully achieved in all cases, DOMO’s approach certainly can help organizations to from a more agile and fast deployment process, thus, more efficient and productive.

Despite being a cloud-based software company, DOMO seems to understand quite well that a great number of companies are working, for necessity or by choice, in hybrid cloud/on-premises environments, which enables the customer to easily connect and quickly interact with on-premises systems, whether this is a simple connection to a database/table source or it requires more sophisticated data extraction and transformation specifications.

There is no way that in the BI and Analytics market a company such as DOMO — or any other player in the market— will have a free ticket to success. The business intelligence market is diversifying as an increasing number of companies seem to need their services, but certainly
DOMO’s offering is, by all means, one to be considered when evaluating a new generation BI solution to meet the increasing demand for insights and data analysis.

Finally, well... what can be a better excuse to watch Styx's Mr. Roboto than this.

(All photos credited to Domo, Inc.)
A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

Many things have been said and done in the realm of analytics, but visualizations remain as the forefront of the data analysis process, where intuition and correct interpretation can help us make sense of data.

As an increasing number of tools emerge, current visualizations are far more than mere pictures in a screen, allowing for movement, exploration and interaction.

One of this tools is D3, an open-source Javascript data visualization library. D3 is perhaps the most popular tool to develop rich and interactive data visualizations, used by small and large companies such as Google and the New York Times.

With the next Open Data Science Conference in Boston coming soon, we had the opportunityto talk with DataRobot’s and ODSC speaker Morgane Ciot about her workshop session: “Intro to 3D�, the state of data visualization and her very own perspectives around the analytics market.

Morgane Ciot is a data visualization engineer at DataRobot, where she specializes in creating interactive and intuitive D3 visualizations for data analysis and machine learning. Morgane studied computer science and linguistics at McGill University in Montreal. Previously, she worked in the Network Dynamics Lab at McGill, answering questions about social media behavior using predictive models and statistical topic models.

Morgane enjoys studying about machine learning (ML), reading, writing, and staging unusual events.

Let's get to know more about Morgane and her views as a data visualization engineer.

Morgane, could you tell us a bit more about yourself, especially about your area of expertise, and what was your motivation to pursue a career in analytics and data science?

I went to school for computer science and linguistics. Those two fields naturally converge in Natural Language Processing (NLP)/Artificial Intelligence (AI), an intersection that was unfortunately not exploited by my program but that nonetheless got me interested in machine learning.

One of the computer science professors at my school was doing what essentially amounted to sociological research on social media behavior using machine learning techniques. Working with him furthered my interest in ML, NLP, and topic modeling, and I began to also explore how to visualize some of the unmanageable amounts of data we had (like, all of Reddit).

I’m probably indebted to that part of my life, and my professor, for my current position as a data viz engineer. Also, machine learning's practical ramifications are going to be game changing. I want to live closest to the eye of the storm when the singularity hits.

Based on your experience, which attributes or skills should every data master have if he/she wants to succeed, and what would be your recommendations for those looking for an opportunity at this career?

Stats, problem-solving skills, and engineering or scripting abilities all converge in the modern data scientist.

You have to be able to understand how to formulate a data science problem, how to approach it, and how to build the ad hoc tools you’ll need to solve it. At least some basic statistical knowledge is crucial. Elements of Statistical Learning by Hastie and Andrew Ng’s Coursera course both provide a solid foundational understanding of machine learning and require some statistical background.

Learn at least one programming language — Python or R are the most popular. R is the de facto language for statisticians, and Python has a thriving community and a ton of data science libraries like scikit-learn and pandas. It’s also great for writing scripts to scrape web data. If you’re feeling more adventurous, maybe look into Julia.

As usual, don’t just learn the theory. Find a tangible project to work on. Kaggle hosts competitions you can enter and has a community of experts you can learn from.

Finally, start learning about deep learning. Many of the most interesting papers in the last few years have come out of that area and we’re only just beginning to see how the theory that has been around for decades is going to be put into practice.

Talking about data visualization, what is your view of the role it plays within data science? How important is it in the overall data science process?

Data visualization is pretty fundamental to every stage of the data science process. I think how it’s used in data exploration — viewing feature distributions — is fairly obvious and well-practiced, but people often overlook how important visualizations can be even in the modeling process.

Visualizations should accompany not just how we examine our data, but also how we examine our models! There are various metrics that we can use to assess model performance, but what’s really going to convince an end user is a visualization, not a number. That's what's going to instill trust in model decisions.

Standard introductions to machine learning lionize the ROC curve, but there are plenty of other charts out there that can help us understand what and how a model is doing: plotting predicted vs. actuals, lift charts, feature importance, partial dependence, etc. — this was actually the subject of my ODSC talk last year, which should be accessible on their website.

A visualization that rank-orders the features that were most important to the predictive capacity of a model doesn’t just give you insight, it also helps you model better. You can use those top features to build faster and more accurate models. 

What do you think will be the most important data visualization trend in the next couple of years?

Data is becoming evermore important basically everywhere, but popular and even expert understanding hasn’t quite kept up.

Data is slowly consuming us, pressing down from all angles like that Star Wars scene where Luke Skywalker and Princess Leia get crushed by trash. But are people able to actually interpret that data, or are they going to wordlessly nod along to the magical incantations of “dataâ€� and “algorithmsâ€�? 

As decisions and stories become increasingly data-driven, visualizations in the media are going to become more important. Visualizations are sort of inherently democratic.

Everyone who can see can understand a trend; math is an alien language designed to make us feel dumb. I think that in journalism, interactive storytelling — displaying data with a visual and narrative focus — is going to become even more ubiquitous and important than it already is. These visualizations will become even more interactive and possibly even gamified.

The New York Times did a really cool story where you had to draw a line to guess the trend for various statistics, like the employment rate, during the Obama years, before showing you the actual trend. This kind of quasi-gamified interactivity is intuitively more helpful than viewing an array of numbers.

Expert understanding will benefit from visualizations in the same way. Models are being deployed in high-stakes industries, like healthcare and insurance, that need to know precisely why they’re making a decision. They’ll need to either use simplified models that are inherently more intelligible, at the expense of accuracy, or have powerful tools, including visualizations, to persuade their stakeholders that model decisions can be interpreted.

The EU is working on legislation called “right of explanationâ€� laws, which allows any AI-made decision to be challenged by a human. So visualizations focused on model interpretability will become more important. 

A few other things….as more and more businesses integrate with machine learning systems, visualizations and dashboards that monitor large-scale ML systems and tell users when models need to be updated will become more prevalent. And of course, we’re generating staggering amounts of new data every day, so visualizations that can accurately summarize that data while also allowing us to explore it in an efficient way — maybe also through unsupervised learning techniques like clustering and topic modeling— will be necessary. 

Please tell us a bit about DataRobot, the company you work at.

We’re a machine learning startup that offers a platform data scientists of all stripes can use to build predictive models. I’m equal parts a fan of using the product and working on it, to be honest. The app makes it insanely easy to analyze your data, build dozens of models, use the myriad visualizations and metrics we have to understand which one will be the best for your use case, and then use that one to predict on new data.

The app is essentially an opinionated platform on how to automate your data science project. I say opinionated because it’s a machine that’s been well-oiled by some of the top data scientists in the world, so it’s an opinion you can trust. And as a data scientist, the automation isn’t something to fear. We’re automating the plumbing to allow you to focus on the problem-solving, the detective work. Don’t be a luddite! 

It’s really fun working on the product because you get to learn a ton about machine learning (both the theoretic and real-world applications) almost by osmosis. It’s like putting your textbook under your pillow while you sleep, except it actually works. And since data science is such a protean field, we’re also covering new ground and creating new standards for certain concepts in machine learning. There’s also a huge emphasis, embedded in our culture and our product, on — “democratizing� is abusing the term, but really putting data science into as many hands as possible, through evangelism, teaching, workshops, and the product itself.

Shameless promotional shout-out: we are hiring! If you’re into data or machine learning or python or javascript or d3 or angular or data vis or selling these things or just fast-growing startups with some cool eclectic people, please visit our website and apply!

As a data visualization engineer at DataRobot, what are the key design principles the company applies for development of its visualizations?

The driving design principle is functionality. Above all, will a user be able to derive an insight from this visualization? Will the insight be actionable? Will that insight be delivered immediately, or is the user going to have to bend over backwards scrutinizing the chart for its underlying logic, trying to divine from its welter of hypnotic curves some hidden kernel of truth? We’re not in the business of beautiful, bespoke visualizations,  like some of the stuff the NYTimes does.

Data visualization at DataRobot can be tricky because we want to make sure the visualizations are compatible with any sort of data that passes through — and users can build predictive models for virtually any dataset — which means we have to operate at the right level of explanatory and visual abstraction. And we want users of various proficiencies to immediately intuit whether or not a model is performing well, which requires thinking about how a beginner might be able to understand the same charts an expert might expect. So by “functionality� I mean the ability to quickly intuit meaning.

That step is the second in a hierarchy of insight: the first is looking at a single-valued metric, which is only capable of giving you a high-level summary, often an average. This could be obfuscating important truths. A visualization —the second step— exposes these truths a bit further, displaying multiple values at a time over slices of your data, allowing you to see trends and anomalous spots. The third step is actually playing with the visualization. An interactive visualization confirms or denies previous insights by letting you drill down, slice, zoom, project, compare — all ways of reformulating the original view to gain deeper understanding. Interactive functionality is a sub-tenet of our driving design principle. It allows users to better understand what they’re seeing while also engaging them in (admittedly) fun ways. 

During the ODSC in Boston, you will be presenting an intro to D3, can you give us a heads up? What is D3 and what are its main features and benefits?

D3 is a data visualization library built in Javascript. It represents data in a browser interface by binding data to a webpage’s DOM elements. It’s very low-level, but there are plenty of wrapper libraries/frameworks built around it that are easier to use, such as C3.js or the much more sophisticated If you find a browser-rendered visualization toolkit, it’s probably using D3 under the hood. D3 supports transitions and defines a data update function, so you can create really beautiful custom and dynamic visualizations with it, such as these simulations or this frankly overwrought work of art.

D3 was created by Mike Bostock as a continuation of his graduate work at Stanford. Check out the awesome examples.

Please share with us some details about the session. What will attendees get from it?

Attendees will learn the basics of how D3 works. They’ll come away with a visualization in a static HTML file representing some aspect of a real-world dataset, and a vague sense of having been entertained. I’m hoping the workshop will expose them to the tool and give them a place to start if they want to do more on their own. 

What are the prerequisites attendees should have to take full advantage of your session?

Having already downloaded D3 4.0 (4.0!!!!!) will be useful, but really just a working browser — I’ll be using Chrome — and an IDE or text editor of your choice. And a Positive Attitudeâ„¢. 

Finally, on a more personal tenor, what's the best book you've read recently? 

Story of O: a bildungsroman about a young French girl's spiritual growth. Very inspiring!

Thank you Morgane for your insights and thoughts.

Morgane's “Intro to 3Dâ€� workshop session will be part of the Open Data Science Conference to take place in Boston, Ma. from May 3 to 5.

A good excuse to visit beautiful Boston and have a great data science learning experience!

Cloudera Analyst Event: Facing a New Data Management Era

Cloudera Analyst Event: Facing a New Data Management Era

I have to say that I attended this year’s Cloudera analyst event in San Francisco with a mix of excitement, expectation and a grain of salt also.

My excitement and expectation were fuelled with all that has been said about Cloudera and its close competitors in the last couple of years, and also by the fact that I am currently focusing my own research on big data and “New Data Platforms�. Moreover, when it comes to events hosted by vendors, I always recommend taking its statements with a grain of salt, because logically the information might be biased.

However, in the end, the event resulted in an enriching learning experience, full of surprises and discoveries. I learnt a lot about a company that is certainly collaborating big time in the transformation of the enterprise software industry.

The event certainly fulfilled many of my “want-to-know-more� expectations about Cloudera and its offering stack; the path the company has taken; and their view of the enterprise data management market.

Certainly, it looks like Cloudera is leading and strongly paving the way for a new generation of enterprise data software management platforms.

So, let me share with you a brief summary and comments about Cloudera’s 2017 industry analyst gathering.

OK, Machine Learning and Data Science are Hot Today

One of the themes of the event was Cloudera’s keen interest and immersion into Machine Learning and Data Science. Just a few days before the event, the company made two important announcements:

The first one was about the beta release of Cloudera Data Science Workbench (Figure 1), the company’s new self-service environment for data science on top of Cloudera Enterprise. This new offering comes directly from the smart acquisition of machine learning and data science startup,

Screencap of Cloudera's Data Science Workbench (Courtesy of Cloudera) 
Some of the capabilities of this product allow data scientists to develop on some of the most popular open source languages —R, Python and Scala— with native Apache Spark and Apache Hadoop integration, which in turn fastens project deployments, from exploration to production.

In this regard, Charles Zedlewski, senior vice president, Products at Cloudera mentioned that

“Cloudera is focused on improving the user experience for data science and engineering teams, in particular those who want to scale their analytics using Spark for data processing and machine learning. The acquisition of and its team provided a strong foundation, and Data Science Workbench now puts self-service data science at scale within reach for our customers.�

One key approach Cloudera takes with the Data Science Workbench is that it aims to enable data scientists to work in an truly open space that can expand its reach to use, for example, deep learning frameworks such as TensorFlow, Microsoft Cognitive Toolkit, MXnet or BigDL, but within a secure and contained environment.

Certainly a new offering with huge potential for Cloudera to increase its customer base, but also to reaffirm and grow its presence within existing customers which now can expand the use of the Cloudera platform without the need to look for third party options to develop on top on.

The second announcement showcases the launch of Cloudera Solution Gallery (Figure 2), which enables Cloudera to showcase its solution’s large partner base  â€”more than 2,800 globally— and a storefront of more than 100 solutions.

This news should not be taken lightly as it shows Cloudera capability to start building a complete ecosystem around this robust set of products, which in my view is a defining aspect of those companies who want to become an industry de-facto.

Figure 2. Cloudera Solution Gallery (Courtesy of Cloudera)

Cloudera: Way More than Hadoop

During an intensive two-day event filled with presentations, briefings and interviews with Cloudera’s executives and customers, a persistent message prevailed. While the company recognizes its origin as a provider of a commercial distribution for Hadoop, it is now making it clear that its current offering has expanded way beyond the Hadoop realm to become a full-fledged open source data platform. Hadoop is certainly in the core of Cloudera as the main data engine itself but, with support for 25 open source projects, its platform is currently able to offer much more than Hadoop distributed storage capabilities.
This is reflected through Cloudera’s offerings, from the full fledged Cloudera Enterprise Data Hub, its comprehensive platform, or via one of Cloudera’s special configurations:

Cloudera’s executives made it clear that the company strategy is to make sure they are able to provide, via open source offerings, efficient enterprise-ready data management solutions.

However, don’t be surprised if the message from Cloudera changes through time, especially if the company wants to put its aim on larger organizations that most of the times rely on providers that can center their IT services to the business and are not necessarily tied with any particular technology.

Cloudera is redefining itself so it can reposition its offering as a complete data management platform. This is a logical step considering that Cloudera wants to take a bigger piece of the large enterprise market, even when the company’s CEO stated that they “do not want to replace the Netezzas and Oracle’s of the world�.

Based on these events, it is clear to me that eventually, Cloudera will end up frontally competing in specific segments of the data management market —especially with IBM through its  IBM BigInsights, and Teradata, with multiple products that have left and keep leaving a very strong footprint in the data warehouse market. Either we like it or not, big data incumbents such as Cloudera seem to be destined to enter the big fight.

The Future, Cloudera and IoT

During the event I had also a chance to attend a couple of sessions specifically devoted to show Cloudera’s deployment in the context of IoT projects. Another thing worth notice is that, even when Cloudera has some really good stories to tell about IoT, the company seems not to be in a hurry to jump directly onto this wagon.

Perhaps it’s better to let this market get mature and consistent enough before devoting larger technical investments on it. It is always very important to know when and how to invest in an emerging market.

However, we should be very well aware that Cloudera, and the rest of the big data players, will be vital for the growth and evolution of the IoT market.

Figure 3. Cloudera Architecture for IoT (Courtesy of Cloudera)

It’s Hard to Grow Gracefully

Today it’s very hard, if not impossible, to deny that Hadoop is strongly immerse in the enterprise data management ecosystem of almost every industry. Cloudera’s analyst event was yet another confirmation. Large companies are now increasingly using some Cloudera’s different options and configurations for mission critical functions.

Then, for Cloudera the nub of the issue now is not about how to get to the top, but how to stay there, evolve and leave its footprint at the top.

Cloudera has been very smart and strategic to get to this position, yet it seems it has gotten to a place where the tide will get even tougher. From this point on, convincing companies to open the big wallet will take much more than a solid technical justification.

At the time of writing this post, I learnt that Cloudera has filed to go public and will trade on the NY Stock Exchange, and as an article on Fotune mentions:

“Cloudera faces tough competition in the data analytics market and cites in its filing several high-profile rivals, including Amazon Web Services, Google, Microsoft, Hewlett Packard Enterprise, and Oracle.�

It also mentions the case of Hortonworks, which:

“went public in late 2014 with its shares trading at nearly $28 during its height in April 2015. However, Hortonworks’ shares have dropped over 60% to $9.90 on Friday as the company has struggled to be profitable.�

In my opinion, in order for Cloudera to succeed while taking this critical step, they will have to show that they are more than well prepared business, technically and strategically wise, and also prepared and ready for the unexpected, because only then they will be able to grow gracefully and align to play big, with the big guys.

Keep always in mind that, as Benjamin Franklin said:

Without continual growth and progress, such words as improvement,
achievement, and success have no meaning.

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

Enterprise Performance Management: Not That Popular But Still Bloody Relevant

While performing my usual Googling during preparation for one of my latest reports on enterprise performance management (EPM), I noticed a huge difference in popularity between EPM and, for example, big data (Figure 1).

From a market trend perspective, it is fair to acknowledge that the EPM software market has taken a hit from the hype surrounding the emergence of technology trends in the data management space, such as business analytics and, particularly, big data.

Figure 1: Searches for big data, compared with those for EPM (Source: Google Trends)

In the last four years, at least, interest in big data has grown exponentially, making it a huge emerging market in the software industry. The same has happened with other data management related solutions such as analytics.

While this is not that surprising, my initial reaction came with a bit of discomfort. Such a huge difference makes one wonder how many companies have simply jumped onto the big data wagon rather than making a measured and thoughtful decision regarding the best way to deploy their big data initiative to fit within the larger data management infrastructure in place, especially with regards to having the system co-exist and collaborate effectively with EPM and existing analytics solutions.

Now, don’t get me wrong; I’m not against the deployment of big data solutions and all the potential benefits. On the contrary, I think these solutions are changing the data management landscape for good. But I can’t deny that, over the past couple of years, a number of companies, once past the hype and euphoria, have raised valid concerns about the efficiency of their existing big data initiatives and have questioned its value within the overall data management machinery already in place, especially alongside EPM and analytics solutions, which are vital for measuring performance and providing the right tools for strategy and planning.

The Analytics/EPM/Big Data Conundrum
A study published by Iron Mountain and PwC titled How Organizations Can Unlock Value and Insight from the Information they Hold, for which researchers interviewed 1,800 senior business executives in Europe and North America, concluded that:

“Businesses across all sectors are falling short of realizing the information advantage.�

Even more interesting is that, in the same report, when evaluating what they call an Information Value Index, the authors realized that:

“The enterprise sector, scoring 52.6, performs only slightly better than the mid-market (48.8).�

For some, including me, this statement is surprising. One might have imagined that large companies, which commonly have large data management infrastructures, would logically have already mastered, or at least reached an acceptable level of maturity with, their general data management operations. But despite the availability of a greater number of tools and solutions to deal with data, important issues remain as to finding, on one hand, the right way to make existing and new sources of data play a better role within the intrinsic mechanics of the business, and, on the other, how these solutions can play nicely with existing data management solutions such as EPM and business intelligence (BI).

Despite a number of big data success stories—and examples do exist, including Bristol-Myers Squibb, Xerox, and The Weather Company—some information workers, especially those in key areas of the business like finance and other related areas, are:

  • somehow not understanding the potential of big data initiatives within their areas of interest and how to use these to their advantage in the operational, tactical, and strategic execution and planning of their organization, rather than using them for in tangential decisions or for relevant yet siloed management tasks.
  • oftentimes swamped with day-to-day data requests and the pressure to deliver based on the amount of data already at their disposal. This means they have a hard time deciphering exactly how to integrate these projects effectively with their own data management arsenals.

In addition, it seems that for a number of information workers on the financial business planning and execution side, key processes and operations remain isolated from others that are directly related to their areas of concern.

The Job Still Needs to Be Done
On the flip side, despite the extensive growth of and hype for big data and advanced analytics solutions, for certain business professionals, especially those in areas such as finance and operations, interest in the EPM software market has not waned.

In every organization, key people from these important areas of the business understand that improving operations and performance is an essential organizational goal. Companies still need to reduce the cost of their performance management cycles as well as make them increasingly agile to be able to promptly respond to the organization’s needs. Frequently, this implies relying on traditional practices and software capabilities.

Activities such as financial reporting, performance monitoring, and strategy planning still assume a big role in any organization concerned with improving its performance and operational efficiency (Figure 2).

Figure 2: Population’s perception of EPM functional area relevance (%)
(Source: 2016 Enterprise Performance Management Market Landscape Report)

So, as new technologies make their way into the enterprise world, a core fact remains: organizations still have basic business problems to solve, including budget and sales planning, and financial consolidation and reporting.

Not only do many organizations find the basic aspects of EPM relevant to their practices, an increasing number of them are also becoming more conscious of the importance of performing specific tasks with the software. This signals that organizations have a need to continuously improve their operations and business performance and analyze transactional information while also evolving and expanding the analytic power of the organization beyond this limit.

How Can EPM Fit Within the New Data Management Technology Framework?
When confronted with the need for better integration, some companies will find they need to deploy new data technology solutions, while others will need to make existing EPM practices work along with new technologies to increase analytics accuracy and boost business performance.

In both cases, a number of organizations have taken a holistic approach, to balance business needs by taking a series of steps to enable the integration of data management solutions. Some of these steps include:

  • taking a realistic business approach towards technology integration. Understanding the business model and its processes is the starting point. But while technical feasibility is vital, it is equally important to take into account a practical business approach to understand how a company generates value through the use of data. This usually means taking an inside-out approach to understanding, by taking control of data from internal sources and that which might come from structured information channels and/or tangible assets (production, sales, purchase orders, etc.). Only after this is done should the potential external data points be identified. In many cases these will come in the form of data from intangible assets (branding, customer experiences) that can directly benefit the specific process, both new or already in place.

  • identifying how data provided by these new technologies can be exploited. Once you understand the business model and how specific big data points can benefit the existing performance measuring process, it is possible to analyze and understand how these new incoming data sources can be incorporated or integrated into the existing data analysis cycle. This means understanding how it will be collected (period, frequency, level of granularity, etc.) and how it will be prepared, curated, and integrated into the existing process to increase its readiness for the specific business model.
  • recognizing how to amplify the value of data. By recognizing and making one or two of these sources effectively relate and improve the existing analytics portfolio, organizations can build a solid data management foundation. Once organizations can identify where these new sources of information can provide extended insights into common business processes, the value of the data can be amplified to help explain customer behavior and needs; to see how branding affects sales increases or decreases; or even to find out which sales regions need improved manufacturing processes.

All this may be easier said than done, and the effort devoted to achieving this is considerable, but if you are thinking in terms of the overall business strategy, it makes sense to take a business-to-technical approach that can have a direct impact on the efficiency, efficacy, and success of the adoption of EPM/big data projects while also improving chances of adoption, understanding, and commitment to these projects.

Companies need to understand how the value of data can be amplified by integrating key big data points with the “traditional� data management cycle so it effectively collaborates with the performance management process, from business financial monitoring to planning and strategy.

While enterprise performance management initiatives are alive and kicking, new big data technologies can be put to work alongside them to expand the EPM software’s capabilities and reach.

The full potential of big data for enterprise performance management will only be realized when enterprises are able to fully leverage all available internal and external data sources towards the same business performance management goal to better understand their knowledge-based capital.

(Originally published on TEC's Blog)
(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

(Guest Post) Value And Insights: Yves Mulkers ahead of Big Data World 2017

 John Bensalhia talks to Yves Mulkers, freelance Data Architect and blogger at 7wData, about the benefits, developments and challenges linked with Big Data...

“I'm an explorer on new technologies and Data Visualisation, and keep my finger on what's happening with Big Data from an architecture point of view.”

So says Yves Mulkers, freelance Data Architect and social media influencer. Yves is speaking ahead of upcoming Big Data World event in London, where he will make an appearance. Listing the key benefits of what Big Data can offer, Yves says that these are:

“Scalability, cost reduction, new products and revenue streams, tailored solutions and targeting, enterprise wide insights, and Smart cities.”

Having worked as a software developer in various branches, Yves achieved great expertise and mindset in object oriented thinking and development.
“Doing the full cycle of software development from analysis, implementation, support and project management in combination with a strong empathy, he positioned himself as a technical expert bridging and listening into the needs of the business and end-users.” 

Yves says that this past year has seen a number of breakthroughs in the development of Big Data such as:
“Integrated platforms, data preparation automation, automating automation, GPU and in-memory databases, Artificial Intelligence, micro services, IoT (Internet Of Things), and self-service analytics.”

Big Data can be used to create a competitive advantage in various ways for businesses. In addition to a 360% Customer View and narrower segmentation of customers, Yves says that next generation products, real-time customization, and business models based on data products are the new approaches. In addition, better informed decisions, such as the measurement of consumer sentiment are good gauges of raising the value of what Big Data can bring.

Businesses must consider a variety of aspects in order to ensure successful Data implementation. Yves says that businesses must have clear business processes and information state diagrams, and should also ensure that they are on top of their game with respect to training and documentation. Data standards must also be developed and complied with.

For applying data analytics and applications in a business, Yves explains that there are challenges to tackle:
“Creating value from your data products, finding the right talent and tools, maturity of the organisation in information management, and trusting the results of analytics. It's worth noting that Big Data and analytics are not the same as business intelligence.”

In the next five to 10 years, Yves says that:
“Big Data will become the business intelligence of now.”

In addition to businesses and companies, aspects of Big Data will be for everyone to take advantage of:
 “Big Data will be embedded in companies strategy, and analytics will become available to everyone. “
“Data volumes will keep on growing as data products will become a commodity and improve our quality of life.”

Looking ahead to the event, Yves says that he expects it to bring a lot of value and insights.
“The combination with the sidetracks around Cloud and others, will bring a broader view on the complete architecture (business, technical and data) needed to be successful in Big Data implementations.”

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

SAP Leonardo, SAP’s IoT Platform Now Has a Name: Interview with SAP’s Rakesh Gandhi

As the “Internet of Things (IoT)” market becomes less hype and more reality, German software powerhouse SAP is aiming to move fast with important economical and research investments, aiming to become a leader in the IoT field.

One key move is the recent announcement of SAP’s Leonardo Innovation Portfolio, a comprehensive solution offering to enable organizations plan, design and deploy IoT solutions.

Of course, with these announcements we felt compelled to reach out to SAP and know from their own words, about the details of this SAP’s new IoT portfolio.

As a result we had the opportunity to speak with Rakesh Gandhi, Vice President for IOT GTM & Solutions at SAP America.

Rakesh is an innovation Enthusiast and IOT evangelist, he is currently responsible for SAP Leonardo portfolio for IoT innovation’ GTM and Solutions Management. A 12 year veteran at SAP, Rakesh has been involved in incubating new innovations of Mobile, Cloud for Customer, CEC and now IoT.

Thank you Mr. Gandhi:

Last year SAP announced an ambitious €2 Billion investment plan to help companies and government agencies to develop their IoT and Big Data initiatives. Could share with us some details about this program and what this involves in a general sense?

IoT is one of the key pillar of SAP’s strategy to enable customer’ digital transformation journey. Over past several years SAP is developing IoT portfolio working closely with our customers. Recent announcement for SAP Leonardo brand is a continuation of SAP commitment and plans in following key areas

  • Accelerate innovation of IoT solution portfolio both organic and inorganic with acquisitions. 
  • Create awareness of SAP’s IoT innovations that empowers customers to run live business with smart processes across all line of business and re-invent business model  
  • Drive customer adoption, scale service, support and co-innovation, and 
  • Most importantly grow its ecosystem of partners and startups in the IoT market

To date, summary of key announcement includes:

Key acquisitions such as:

  • Fedem: With this acquisition SAP can now build an end-to-end IoT solution in which a digital avatar continuously represents the state of operating assets through feeds from sensors, replacing the need for physical inspection with a “digital inspection.” Additionally, the solution is intended to consider complex forces in play and detect both instantaneous consequences of one-off events and long-term health effects of cyclic loads, making possible accurate monitoring of maintenance requirements and remaining-life prediction for assets.
  • This acquisition helped provide expertise and technology to accelerate the availability of key IoT capabilities in SAP HANA Cloud Platform, such as advanced lifecycle management for IoT devices, broad device connectivity, strong IoT edge capabilities that work seamlessly with a cloud back end, end-to-end role-based security and rapid development tools for IoT applications.
  • Altiscale: This acquisition is helping our customers create business value by harnessing the power of BIG DATA generated by the connected world.
The Launch of SAP Leonardo Brand for IoT Innovation portfolio: This was a major step in announcing our brand for IoT driven innovation

SAP Leonardo jumpstart program: This is a major step in our commitment to help our customers drive adoption and rapidly deployment core IoT applications in a short time frame of 3months duration with fixed scope and price.

Partners Ecosystem are critical to our success; we are working closely with partners to create an ecosystem that our customers can leverage to further simplify their deployment projects.

Additionally, SAP is on track in opening up IoT labs to collaborate on Industry 4.0 and IoT innovations with our customers, partners and startups.

Can you share with us some of the details of the new enablement program as well as the general features of the Leonardo IoT Portfolio?

What are observing in the market place is that many organizations are starting with small experimental IoT projects or may have started to collect & store sensor data with some visualization capabilities.

However, it is still generally believed that IoT as a topic is very low on maturity curve. SAP now have a very robust portfolio which has been co-innovated with our early adopter customers and proven to deliver business value.

The second challenge and general perception with customers is that of IoT is still in hype phase and difficult to deploy, we decided it is very important for SAP to support our customer’ adoption and showcase that they can go productive live in a short time frame for first pilot.

This jumpstart program supports three scenarios as three distinct separate packages viz:

  • Vehicle Insights for fleet telematics,
  • Predictive Maintenance & Service with Asset Intelligence Network for connected assets
  • Connected Goods for scenarios connected coolers, connected vending machines and such mass marketing things.
Customers can now deploy one of this scenarios in 3 months timeframe. It is a very structured 3 steps process where-in first SAP teams works with customer leveraging ½ day design thinking workshop to get an agreement on pilot deployment scope, step 2 deliver a rapid prototype to demonstrate vision to get customer buy in.

In the final step, towards the end of 3 months engagement deliver a pilot productive system.

Lastly, SAP will continue to engage with customers to help with their IoT roadmap for next processes and business case.

It seems natural to assume SAP has already started working to support IoT projects in key industries and/or lines of business, could talk about some of these industry/LoB efforts?

SAP Leonardo IoT innovation portfolio powers digital processes across line of businesses and Industry.

As an example we have released new value map [here] of supply chain processes, now referred to as digital supply chain and this is powered by SAP Leonardo IoT innovation portfolio.

The same is applicable for other LoBs e.g. customer service processes to enable predictive & proactive maintenance process and also industry specific e2e solutions powered by SAP Leonardo e.g. with SAP Connected Goods for CPG & Retail industry.

Is this program designed mostly for SAP’s existing partners and customers? How non SAP customers could take advantage of it?

Jump start program is designed to support all our customer, both existing and net new prospect customers as well.

This mirrors how SAP Leonardo portfolio of IoT solutions is designed to work with SAP or Non-SAP backend and agnostic in that regard.

Finally, what are the technical and/or business requirements for applicants of this program?

As mentioned above, initially SAP Leonardo jump start program is offered for three packages, viz: SAP Vehicles Insights, SAP Connected Goods and SAP Predictive Maintenance &service + Asset intelligence networks.

These are cloud solutions and use cases covered by each of these packages are applicable across multiple industry.

Thank you again Mr. Gandhi!

You can learn more about SAP Leonardo by reaching its web site and/or reading this post by Hans Thalbauer.
In the meantime, you can take a look at the video introduction produced by SAP.

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

(Guest Post) Winning Solutions: Kirk Borne Discusses the Big Data Concept Ahead of Big Data World London

Looking ahead to 2017's Big Data World event, Booz Allen Hamilton's Principal Data Scientist discusses the Big Data concept, benefits and developments in detail with John Bensalhia...

2017's Big Data World promises plenty in the way of insightful talks and discussions on the subject. One of the unmissable talks to watch out for in March will come from Kirk Borne, Booz Allen Hamilton's Principal Data Scientist, who will look at “The Self-Driving Organisation and Edge Analytics in a Smart IoT World.

“I will describe the concept of a self-driving organisation that learns, gains actionable insights, discovers next-best move, innovates, and creates value from streaming Big Data through the application of edge analytics on ubiquitous data sources in the IoT-enriched world.”

As part of this discussion, Kirk will also present an Analytics Roadmap for the IoT-enabled Cognitive Organisation.

“In this case, the “self-driving organisation” is modeled after the self-driving automobile, but applicable organisations include individual organisations, and also smart cities, smart farms, smart manufacturing, and smart X (where X can be anything). The critical technologies include machine learning, machine intelligence, embedded sensors, streaming analytics, and intelligence deployed at the edge of the network.”
“Big Data and data science are expanding beyond the boundaries of your data centre, and even beyond the Cloud, to the point of data collection at the point of action! We used to say “data at the speed of business”, but now we say “business at the speed of data.”

Having achieved a Ph.D. in astronomy from Caltech, Kirk focused most of the first 20 years of his career on astrophysics research (“colliding galaxies and other fun stuff”), including a lot of data analysis as well as modelling and simulation.

“My day job for nearly 18 years was supporting large data systems for NASA astronomy missions, including the Hubble Space Telescope. So, I was working around data all of the time.”
“When data set sizes began to grow “astronomically” in the late 1990s, I began to focus more on data mining research and data science. It became apparent to me that the whole world (and every organisation) was experiencing large growth in digital data. From these observations, I was convinced that we needed to train the next-generation workforce in data skills. So, in 2003, I left my NASA job and joined the faculty at George Mason University (GMU) within the graduate Ph.D. program in Computational Science and Informatics (Data Science).”

As a Professor of Astrophysics and Computational Science at GMU, Kirk helped to create the world’s first Data Science undergraduate degree program.

“I taught and advised students in data science until 2015, at which point the management consulting firm Booz Allen Hamilton (BAH) offered me the position as the firm’s first Principal Data Scientist. I have been working at BAH since then.”

Booz Allen Hamilton offers management consulting services to clients in many sectors: government, industry, and non-profit. “Booz Allen Hamilton (BAH) is over 100 years old, but has reinvented itself as an agile leading-edge technology consultant,” says Kirk.

“Our market focus is very broad, including healthcare, medicine, national defense, cyber-security, law enforcement, energy, finance, transportation, professional sports, systems integration, sustainability, business management, and more. We deliver systems, technology strategy, business insights, consultative services, modelling, and support services in many technology areas: digital systems, advanced analytics, data science, Internet of Things, predictive intelligence, emerging technologies, Cloud, engineering, directed energy, unmanned aerial vehicles (drones), human capital, fraud analytics, and data for social good (plus more, I am sure).”

Discussing Big Data, Kirk regards this as a “concept”.

“It is not really about “Big” or “Data”, but it is all about value creation from your data and information assets. Of course, it is data. But the focus should be on big value, not on big volume; and the goal should be to explore and exploit all of your organisation’s data assets for actionable information and insights.”
“I like to say that the key benefits of Big Data are the three D2D’s: Data-to-Discovery (data exploration), Data-to-Decisions (data exploitation), and Data-to-Dividends (or Data-to-Dollars; i.e., data monetisation).”

Looking back to the the past year, Kirk says that there have been several significant Big Data-related developments.

“These include the emergence of the citizen data scientist, which has been accompanied by a growth in self-service tools for analytics and data science. We are also seeing maturity in deep learning tools, which are now being applied in many more interesting contexts, including text analytics. Machine intelligence is also being recognised as a significant component of processes, products, and technologies across a broad spectrum of use cases: connected cars, Internet of Things, smart cities, manufacturing, supply chain, prescriptive machine maintenance, and more.”
“But I think the most notable developments are around data and machine learning ethics – this has been evoked in many discussions around privacy and fairness in algorithms, and it has been called out also in some high-profile cases of predictive modelling failures. These developments demand that we be more transparent and explanatory to our clients and to the general public about what we are doing with data, especially their data!”

Much value can be gleaned from the Smart IoT World for businesses, and in a number of ways, as Kirk explains.

“First of all, businesses can learn about the latest products, the newest ideas, and the emerging technologies. Businesses can acquire lessons learned, best practices, and key benefits, as well as find business partners to help them on this journey from digital disruption to digital transformation.”
“The “Smart” in “Smart IoT” is derived from machine learning, data science, cognitive analytics, and technologies for intelligent data understanding. More than ever, businesses need to focus more on the “I” in “IT” – the Information (i.e., the data) is now the fundamental asset, and the Technology is the enabler. IoT is about ubiquitous sensors collecting data and tracking nearly everything in your organisation: People, Processes, and Products. Smart IoT will deliver big value from Big Data.”

Kirk says that the past few years of Big Data have been described as The End of Demographics and the Age of Personalisation. The next five to ten years, on the other hand will be the Age of Hyper-Personalisation.

“More than ever, people are at the centre of business,”

 explains Kirk.

“Big Data can and will be used to engage, delight, and enhance employee experience (EX), user experience (UX), and customer experience (CX). The corresponding actionable insights for each of these human experiences will come from “360 view” Big Data collection (IoT), intelligence at the point of data collection (Edge Analytics), and rich models for behavioural insights (Data Science).”
“These developments will be witnessed in Smart Cities and Smart Organisations of all kinds. The fundamental enabler for all of this is Intelligent Data Understanding: bringing Big Data assets and Data Science models together within countless dynamic data-driven application systems.”

With Big Data World only weeks away, Kirk is looking forward to the great opportunities that it will bring.

“I expect Big Data World to be an information-packed learning experience like no other. The breadth, depth, and diversity of useful Smart IoT applications that will be on display at Big Data World will change the course of existing businesses, inspire new businesses, stimulate new markets, and grow existing capabilities to make the world a better place.”
“I look forward to learning from technology leaders about Smart Cities, IoT implementations, practical business case studies, and accelerators of digital transformation. It is not true that whoever has the most data will win; the organisation that wins is the one who acts on the most data! At Big Data World, we can expect to see many such winning solutions, insights, and applications of Big Data and Smart IoT.”

Not Your Father’s Database: Interview with VoltDB’s John Piekos

Not Your Father’s Database: Interview with VoltDB’s John Piekos

As organizations deal with challenging times ―technologically and business wise―, managing increasing volumes of data has become a key to success.

As data management rapidly evolve, the main Big Data paradigm has changed from just “big” to “big, fast, reliable and efficient”.

No more than today in the evolution of the big data and database markets, the pressure is on for software companies to deliver new and improved database solutions capable not just to deal with increasing volumes of data but also to do it faster, better and in a more reliable fashion.

A number of companies have taken the market by storm, infusing the industry with new and spectacularly advanced database software —for both transactional and non-transactional operations— that are rapidly changing the database software landscape.

One of these companies is VoltDB. This New England (Massachusetts) based company has rapidly become a reference when it comes to the offering of next-generation of database solutions and, has gained the favor of important customers in key industries such as communications, finance and gaming.

VoltDB was co-founded by no other than world known database expert and 2014 ACM A.M. Turing Award recipient, professor, Dr. Michael Stonebraker who has been key in the development of a new generation database solution and the formation of a talented team in charge of its development.

With the new VoltDB 7.0 already in the market, we had the opportunity to chat with VoltDB’s John Piekos about VoltDB’s key features and evolution.

John is VoltDB’s Vice President of Engineering at VoltDB, where he heads up VoltDB’s engineering operations, including product development, QA, technical support, documentation and field engineering.

John has more than 25 years of experience leading teams and building software, delivering both enterprise and Big Data solutions.

John has held tech leadership positions at several companies, most recently at Progress Software where he led the OpenEdge database, ObjectStore database and Orbix product lines. Previously, John was vice president of Web engineering at EasyAsk, and chief architect at Novera Software, where he led the effort to build the industry’s first Java application server.

John holds an MS in computer science from Worcester Polytechnic Institute and a BS in computer science from the University of Lowell.

Thank you John, please allow me to start with the obvious question:

What’s the idea behind VoltDB, the company, and what makes VoltDB the database, to be different from other database offerings in the market?

What if you could build a database from the ground-up, re-imagine it, re-architect it, to take advantage of modern multi-core hardware and falling RAM prices, with the goal of making it as fast as possible for heavy write use cases like OLTP and the future sensor (IoT) applications?  That was the basis of the research Dr. Stonebraker set out to investigate.

Working with the folks at MIT, Yale, and Brown, they created the H-Store project and proved out the theory that if you eliminated the overhead of traditional databases (logging, latching, buffer management, etc), ran an all in-memory workload, spread that workload across all the available CPUs on the machine and horizontally scaled that workload across multiple machines, you could get orders of magnitude performance out of the database.

The commercial realization of that effort is VoltDB.  VoltDB is fully durable, able to process hundreds of thousands to millions of multi-statement SQL transactions per second, all while producing SQL-driven real-time analytics.

Today an increasing number of emerging databases work partially or totally in-memory while existing ones are changing their design to incorporate this capability. What are in your view the most relevant features users need to look for when trying to choose from an in-memory based database?

First and foremost, users should realize that not all in-memory databases are created equal.  In short, architecture choices require trade-offs.  Some IMDBs are created to process reads (queries) faster and others, like VoltDB, are optimized for fast writes.  It is impractical (impossible) to get both the fastest writes and the fastest reads at the same time on the same data, all while maintaining high consistency because the underlying data organization and architecture is different for writes (row oriented) than it is for reads (columnar).

 It is possible to maintain two separate copies of the data, one in row format, the other in compressed column format, but that reduces the consistency level - data may not agree, or may take a while to agree between the copies.

Legacy databases can be tweaked to run in memory, but realize that, short of a complete re-write, the underlying architecture may still be disk-based, and thus incur significant (needless) processing overhead.

VoltDB defines itself as an in-memory and operational database. What does this mean in the context of Big Data and what does it mean in the context of IT’s traditional separation between transactional and analytical workloads, how does VoltDB fit or reshapes this schemas?

VoltDB supports heavy write workloads - it is capable of ingesting never-ending streams of data at high ingestion rates (100,000+/second per machine, so a cluster of a dozen nodes can process over a million transactions a second).

While processing this workload, VoltDB can calculate (via standard SQL) and deliver strongly consistent real-time analytics, either ad hoc, or optimally, as pre-computed continuous queries via our Materialized View support.

These are capabilities simply not possible with traditional relational databases.  In the Big Data space, this places VoltDB at the front end, as the ingestion engine for feeds of data, from telco, digital ad tech, mobile, online gaming, IoT, Finance and numerous other application domains.

Just recently, VoltDB passed the famous Jepsen Testing for improving safety of distributed databases with VoltDB 6.4, Could you share with us some details of the test, the challenges and the benefits it brought for VoltDB?

We have a nice landing page with this information, including Kyle’s and VoltDB’s founding engineer John Hugg’s blog.

In summary, distributed systems programming is hard. Implementing the happy path isn’t hard, but doing the correct thing (such as returning the correct answer) when things go wrong (nodes failing, networks dropping), is where most of the engineering work takes place. VoltDB prides itself on strong consistency, which means returning the correct answer at all times (or not returning an answer at all - if, for example, we don’t have all of the data available).

Kyle’s Jepsen test is one of the most stringent tests out there.  And while we hoped that VoltDB would pass on the first go-around, we knew Kyle was good at breaking databases (he’s done it to many before us!).  He found a couple of defects, thankfully finding them before any known customer found them, and we quickly went to work fixing them. Working with Kyle and eventually passing the Jepsen test was one of the 2016 engineering highlights at VoltDB. We’re quite proud of that effort.


One interesting aspect of VoltDB is that It’s a relational database complies fully with ACID and bring native SQL support, what are the differences of this design compared to, for example NoSQL and some so-called NewSQL offerings? Advantages, tradeoffs perhaps?

In general, NoSQL offerings favor availability over consistency - specifically, the database is always available to accept new content and can always provide content when queried, even if that content is not the most recent (i.e., correct) version written.

NoSQL solutions rely on non-standard query languages (some are SQL-like), to compute analytics. Additionally, NoSQL data stores do not offer rich transaction semantics, often providing “transactionality” on single key operations only.

Not all NewSQL database are created equal. Some favor faster reads (over fast writes).  Some favor geo-distributed data sets, often resulting in high latency, or at least unpredictable latency access and update patterns.  VoltDB’s focus is low and predictable OLTP (write) latency at high transactions/second scale, offering rich and strong transaction semantics.

Note that not all databases that claim to provide ACID transactions are equal. The most common place where ACID guarantees are weakened is isolation. VoltDB offers serializable isolation.

Other systems offer multiple levels of isolation, with a performance tradeoff between better performance (weak guarantees) and slower performance (strong guarantees). Isolation models like Read-Committed and Read-Snapshot are examples; many systems default to one of these.

VoltDB’s design trades off complex multi-dimensional (OLAP) style queries for high throughput OLTP-style transactions while maintaining an ACID multi-statement SQL programming interface. The system is capable of surviving single and multi-node failures.

Where failures force a choice between consistency and availability, VoltDB chooses consistency. The database supports transactionally rejoining failed nodes back to a surviving cluster and supports transactionally rebalancing existing data and processing to new nodes.

Real-world VoltDB applications achieve 99.9% latencies under 10ms at throughput exceeding 300,000 transactions per second on commodity Xeon-based 3-node clusters.

How about the handling of non-structured information within VoltDB? Is it expected VoltDB to take care of it or it integrates with other alternative solutions? What’s the common architectural scenario in those cases?

VoltDB supports the storage of JSON strings and can index, query and join on fields within those JSON values. Further, VoltDB can process streamed JSON data directly into the database using our Importers (See the answer for question #9) and custom formatters (custom decoding) - this makes it possible for VoltDB to transactionally process data in almost any format, and even to act as an ETL engine.

How does VoltDB interact with players in the Big Data space such as Hadoop, both open source and commercial distributions?

The VoltDB database supports directly exporting data into a downstream data lake.  This target could be Hadoop, Vertica, a JDBC source or even flat files.  VoltDB handles the real-time data storage and processing, as it is capable of transactionally ingesting (database “writes”) millions of events per second.

Typically the value of this data decreases with age - it becomes cold or stale - and eventually would be migrated to historical storage such as Hadoop, Spark, Vertica, etc.  Consider applications in the telco or online gaming space - the “hot data” may have a lifespan of one month in telco, or even one hour or less, in the case of game play.

Once the data becomes “historical” and is of less immediate value, it may be removed from VoltDB and stored on disk in the historical archive (such as Hadoop, Vertica, etc).

What capabilities VoltDB offers not just for database administration but for development on top of VoltDB with Python, R, or other languages?

While VoltDB offers traditional APIs such as JDBC, ODBC, Java and C++ native bindings, as well as Node.js, Go, Erlang, PHP, Python, etc., I think one of the more exciting next-generation features VoltDB offers is the ability to stream data directly into the database via our in-process Importers. VoltDB is a clustered database, meaning a database comprises one (1) or more processes (usually a machine, VM or container).

A database can be configured to have an “importer,” which is essentially a plug-in that listens to a source, reads incoming messages (events, perhaps) and transactionally processes them. If the VoltDB database is highly available, then the importer is highly available (surviving node failure).  VoltDB supports a Kafka Importer and a socket importer, as well as the ability to create your own custom importer.

Essentially this feature “eliminates the client application” and data can be transactionally streamed directly into VoltDB.  The data streamed can be JSON, CSV, TSV or any custom-defined format.  Further, the importer can choose which transactional behavior to apply to the incoming data.  This is how future applications will be designed: by hooking feeds, streams of data, directly to the database - eliminating much of the work of client application development.

We have one customer who has produced one of the top 10 games in the app store - their application streams in-game events into VoltDB at a rate upwards of 700,000 events per second.  VoltDB hosts a Marketing Optimization application that analyzes these in-game events in an effort to boost revenue.

If you had a crystal ball, how would you visualize the database landscape in 5 years from now? Major advancements?

Specialized databases will continue to carve out significant market share from established vendors.
IoT will be a major market, and will drive storage systems to support two activities: 1) Machine learning (historical analysis) on the Data Lake/Big Data; storage engines will focus on enabling data scientists to capture value from the vast increases of data, and 2) Real-time processing of streams of data. Batch processing of data is no longer acceptable - real-time becomes a “must have”.

Data creation continues to accelerate and capturing value from fresh data in real-time is the new revenue frontier.

Finally, could tell us a song that is an important part of the soundtrack of your life?  

I’m a passionate Bruce Springsteen fan (and also a runner), so it would have to be “Born to Run”.

Springsteen captures that youthful angst so perfectly, challenging us to break out of historic norms and create and experience new things, to challenge ourselves.

This perfectly captures the entrepreneurial spirit both of personal “self” as well as “professional self,” and it matches the unbridled spirit of what we’re trying to accomplish with VoltDB. “Together we could break this trap We'll run till we drop, baby we'll never go back.”

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

Here, There and Everywhere: Interview with Brian Wood on Teradata’s Cloud Strategy

(Image Courtesy of Teradata)
In a post about Teradata’s 2016 Partners event I wrote about the big effort Teradata is making to ensure its software offerings are now available both on-premises and in the Cloud, in variety of forms and shapes, making a big push to ensure Teradata’s availability, especially for hybrid cloud configurations.

So, the data management and analytics software giant seems to be sticking to its promise by increasingly incorporating its flagship Teradata Database other solutions to the Cloud in the form of its own Manage Cloud for Americas and Europe, a private cloud-ready solution or via public cloud providers such as AWS and most recently announced on Microsoft’s Azure Marketplace.

To chat about this latest news and Teradata’s the overall strategy directed to the cloud we’ve sat with Teradata’s Brian Wood.

Brian Wood is director of cloud marketing at Teradata. He is a results-oriented technology marketing executive with 15+ years of digital, lead gen, sales / marketing operations & team leadership success.

Brian has an MS in Engineering Management from Stanford University, a BS in Electrical Engineering from Cornell University, and served as an F-14 Radar Intercept Officer in the US Navy.

All along 2016 and especially during its 2016 Partners conference, Teradata made it clear it is undergoing an important transformation process and, a key strategy includes its path to the cloud. Offerings such as Teradata Database on different private and public cloud configurations, including AWS, VMware, Teradata Managed Cloud, and of course Microsoft Azure are available now. Could you share some details about the progress of this strategy so far?

Thanks for asking, Jorge. It’s been a whirlwind because Teradata has advanced tremendously across all aspects of cloud deployment in the past few months; the progress has been rapid and substantial.

To be clear, hybrid cloud is central to Teradata’s strategy and it’s all about giving customers choice. One thing that’s unique to Teradata is that we offer the very same data and analytic software across all modes of deployment – whether managed cloud, public cloud, private cloud, or on-premises.

What this means to customers is that it’s easy for them to transfer data and workloads from one environment to another without hassle or loss of functionality; they can have all the features in any environment and dial it up or down as needed. Customers like this flexibility because nobody wants to locked in and it’s also helpful to be able to choose the right tool for the job and not worry about compatibility or consistency of results.

Specific cloud-related advancements in the last few months include:
  • Expanding Teradata Managed Cloud to now include both Americas and Europe
  • Increasing the scalability of Teradata Database on AWS up to 64 nodes
  • Launching Aster Analytics on AWS with support up to 33 nodes
  • Expanding Teradata Database on VMware scalability up to 32 virtual nodes
  • Bolstering our Consulting and Managed Services across all cloud options
  • And announcing upcoming availability of Teradata Database on Azure in Q1
These are just the ones that have been announced; there are many more in the pipeline queued up for release in the near future. Stay tuned!

The latest news is the availability of Teradata Database on Microsoft’s Azure Marketplace. Could you give us the details around the announcement?

We’re very excited about announcing Q1 availability for Teradata Database on Azure because many important Teradata customers have told us that Microsoft Azure is their preferred public cloud environment. We at Teradata are agnostic; whether AWS, Azure, VMware, or other future deployment options, we want what’s best for the customer and listen closely to their needs.

It all ties back to giving customers choice in how they consume Teradata, and offering the same set of capabilities across the board to make experimentation, switching, and augmentation as easy as possible.

Our offerings on Azure Marketplace will be very similar to what we offer on AWS Marketplace, including:
  • Teradata Database 15.10 (our latest version)
  • Teradata ecosystem software (including QueryGrid, Unity, Data Mover, Viewpoint, Ecosystem Manager, and more)
  • Teradata Aster Analytics for multi-genre advanced analytics
  • Teradata Consulting and Managed Services to help customers get the most value from their cloud investment
  • Azure Resource Manager Templates to facilitate the provisioning and configuration process and accelerate ecosystem deployment

What about configuration and licensing options for Teradata Database in Azure?

The configuration and licensing options for Teradata Database on Azure will be similar to what is available on AWS Marketplace. Customers use Azure Marketplace as the medium through which to find and subscribe to Teradata software; they are technically Azure customers but Teradata provides Premier Cloud Support as a bundled part of the software subscription price.

One small difference between what will be available on Azure Marketplace compared to what is now available on AWS Marketplace is subscription duration. Whereas on AWS Marketplace we currently offer both hourly and annual subscription options, on Azure Marketplace we will initially offer just an hourly option.

Most customers choose hourly for their testing phase anyway, so we expect this to be a non-issue. In Q2 we plan to introduce BYOL (Bring Your Own License) capability on both AWS Marketplace and Azure Marketplace which will enable us to create subscription durations of our choosing.

Can we expect technical and functional limitations from this version compared with the on-premises solution?

No, there are no technical or functional limitations of what is available from Teradata in the cloud versus on-premises. In fact, this is one of our key differentiators: customers consume the same best-in-class Teradata software regardless of deployment choice. As a result, customers can have confidence that their existing investment, infrastructure, training, integration, etc., is fully compatible from one environment to another.

One thing to note, of course, is that a node in one environment will likely have a different performance profile than what is experienced with a node in another environment. In other words, depending on the workload, a single node of our flagship Teradata IntelliFlex system may require up to six to ten instances or virtual machines in a public cloud environment to yield the same performance.

There are many variables that can affect performance – such as query complexity, concurrency, cores, I/O, internode bandwidth, and more – so mileage may vary according to the situation. This is why we always recommend a PoC (proof of concept) to determine what is needed to meet specific customer requirements.

Considering a hybrid cloud scenario. What can we expect in regards to the integration with the rest of the Teradata stack, especially on-premises?

Hybrid cloud is central to Teradata’s strategy; I cannot emphasize this enough. We define hybrid cloud as a customer environment consisting of a mix on managed, public, private, and on-premises resources orchestrated to work together.

We believe that customers should have choice and so we’ve made it easy to move data and workloads in between these deployment modes, all of which use the same Teradata software. As such, customers can fully leverage existing investments, including infrastructure, training, integration, etc. Nothing is stranded or wasted.

Hybrid deployment also introduces the potential for new and interesting use cases that were less economically attractive in an all-on-premises world. For example, three key hybrid cloud use cases we foresee are:
  • Cloud data labs – cloud-based sandboxes that tie back to on-premises systems
  • Cloud disaster recovery – cloud-based passive systems that are quickly brought to life only when needed
  • Cloud bursting – cloud-based augmentation of on-premises capacity to alleviate short-term periods of greater-than-usual utilization

How about migrating from existing Teradata deployments to Azure? What is the level of support Teradata and/or Azure will offer?

Teradata offers more than a dozen cloud-specific packages via our Consulting and Managed Services team to help customers get the most value from their Azure deployments in three main areas: Architecture, Implementation, and Management.

Specific to migration, we first always recommend that customers have a clear strategy and cloud architecture document prior to moving anything so that the plan and expectations are clear and realistic. We can facilitate such discussions and help surface assumptions about what may or may not be true in different deployment environments.

Once the strategy is set, our Consulting and Managed Services team is available to assist customers or completely own the migration process, including backups, transfer, validation, testing, and so on. This includes not only Teradata-to-Teradata migration (e.g., on-premises to the cloud), but also competitor-to-Teradata migrations as well. We especially love the latter ones!

Finally, can you share with us a bit of what is next for Teradata in the Cloud?

Wow, where should I start? We’re operating at breakneck pace. Seriously, we have many new cloud developments in the works right now, and we’ve been hiring cloud developers like crazy (hint: tell ‘em Brian sent you!).

You’ll see more cloud announcements from us this quarter, and without letting the cat out of bag, expect advancements in the realm of automation, configuration assistance, and an expansion in managed offers.

Cloud is a key enabler to our ability to help customers get the most value from their data, so it’s definitely an exciting time to be involved in helping define the future of Teradata.
Thanks for your questions and interest!

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

Yep, I’m Writing a Book on Modern Data Management Platforms (2017-02 Update)

(Image courtesy of Thomas Skirde)
As I mentioned in a first blog about the book, I'm now working hard to deliver a piece that will hopefully, serve as a practical guide for the implementation of a successful modern data management platform.

I'll try to provide frequent updates and, perhaps, share some pains and gains about its development.
For now, here's some additional information, including the general outline and the type of audience is intended for.

I invite you to be part of the process and leave your comments, observations and encouragement quotes right below, or better yet, to consider:
  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book, soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up to our pre-order list, or for
  • Providing us with information about your own successful enterprise use case, which we may use in the book
Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.
So here, take a look at the update...

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

Main Post Image courtesy of Thomas Skirde 

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer

Recently big data and analytics company Rocana, a software company specialized in developing solutions that bring visibility for IT & DevOps announced the release of its data platform Rocana Ops.

It is in this context that we had the chance to have an excellent interview with Eric Sammer, CTO and Co-Founder of Rocana who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.

Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at Cloudera where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.

He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.

Eric is also the author of Hadoop Operations published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.

Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?

Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.

These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.

Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.

Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.

We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.

How would you describe what Rocana Ops is?

Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.

A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.

Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?

I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.

There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.

Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is Conway’s Law in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.

IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.

We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.

What are the major pains of IT Ops and how Rocana helps them to solve this pains?

Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.

Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.

Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.

Could you describe Rocana’s main architecture?

Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.

As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.

Rocana's General Architecture (Courtesy of Rocana)

A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.

Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.

What are the basic or general requirements needed for a typical Rocana deployment?

Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.

Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.

A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.

As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.

What are then, the key functional differentiators of Rocana?

Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.

Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.

 Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.

Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.

Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?

Rocana Ops 2.0 is really exciting for us.

We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.

Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.

One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!

How does Rocana interact with the open source community, especially the Apache Hadoop project?

Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.

We’re committed to collaborating with the community whenever possible. We’ve open sourced parts of Rocana Ops where we believe there’s a benefit to the community (like Osso - A modern standard for event-oriented data). As we build with projects like Apache Hadoop, Kafka, Spark, Impala, and Lucene, we look closely at places where we can contribute features, insight, feedback, testing, and (most often) fixes.

The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.

Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.

Finally, what is in your opinion the best and geekiest song ever?

Now you’re speaking my language; I studied music theory.
Lateralus by Tool for the way it plays with the fibonacci sequence and other math without it being gimmicky or unnatural.
A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.

DrivenBI Helps Companies Drive Analytics to the Next Level

DrivenBI Helps Companies Drive Analytics to the Next Level

Privately held company DrivenBI was formed in 2006 by a group of seasoned experts and investors in the business intelligence (BI) market in Taiwan and the United States. Currently based in Pasadena, California, the company has been steadily growing in the ten years since, gaining more than 400 customers in both the English and Chinese markets.

Led by founder and CEO Ben Tai (previously VP of global services with the former BusinessObjects, now part of SAP), DrivenBI would be considered part of what I call a new generation of BI and analytics solutions that is changing the analytics market panorama, especially in the realm of cloud computing.

A couple of weeks ago, I had the opportunity to speak with DrivenBI’s team and to have a briefing and demonstration, most of it in regards to their current analytics offerings and the company’s business strategy and industry perspective, all of which I will share with you here.

How DrivenBI Drives BI
DrivenBI’s portfolio is anchored by SRK, DrivenBI’s native cloud self-service BI platform and collaboration hub.

SRK provides a foundation for sourcing and collecting data in real time within a collaborative environment. Being a cloud platform, SRK can combine the benefits of a reduced IT footprint with a wide range of capabilities for efficient data management.

The SRK native cloud-centralized self-service BI solution offers many features, including:
  • the ability to blend and work with structured and unstructured data using industry-standard data formats and protocols;
  • a centralized control architecture providing security and data consistency across the platform;
  • a set of collaboration features to encourage team communication and speed decision making; and agile reporting and a well-established data processing logic.
SRK’s collaborative environment featuring data and information sharing between users within a centralized setting allows users to maintain control over every aspect and step of the BI and analytics process (figure 1).

Figure 1. DrivenBI’s SRK self-driven and collaborative platform (courtesy of DrivenBI)
DrivenBI: Driving Value throughout Industries, Lines of Business, and Business Roles

One important aspect of the philosophy embraced by DrivenBI has to do with its design approach, providing, within the same platform, valuable services across the multiple functional areas of an organization, including lines of business such as finance and marketing, inventory control, and resource management, as well as across industries such as fashion, gaming, e-commerce, and insurance.

Another element that makes DrivenBI an appealing offering is its strategic partnerships, specifically with Microsoft Azure and DrivenBI has the ability to integrate with both powerhouse cloud offerings.

I had the opportunity to play around a bit with DrivenBI’s platform, and I was impressed with the ease of use and intuitive experience in all stages of the data analytics process, especially for dynamic reporting and dashboard creation (figure 2).

Figure 2. DrivenBI’s SRK dashboard (courtesy of DrivenBI)
Other relevant benefits of the DrivenBI platform that I observed include:
  • elimination/automation of some heavy manual processes;
  • analysis and collaboration capabilities, particularly relevant for companies with organizational and geographically distributed operations, such as widespread locations, plants, and global customers;
  • support for multiple system data sources, including structured operational data, unstructured social media sources, and others.
As showcased in its business-centered approach and design, DrivenBI is one of a new generation of BI and analytics offerings that enable a reduced need for IT intervention in comparison to peer solutions like Domo, Tableau, and GoodData. These new-generation solutions are offered through cloud delivery, a method that seems to suit analytics and BI offerings and their holistic take on data collection well. By replacing expensive IT-centric BI tools, the DrivenBI cloud platform is useful for replacing or minimizing the use of complex spreadsheets and difficult analytics processes.

DrivenBI’s Agile Analytics
My experience with DrivenBI was far more than “interesting.” DrivenBI is a BI software solution that is well designed and built, intuitive, and offers a fast learning curve. Its well-made architecture makes the solution easy to use and versatile. Its approach—no spreadsheets, no programming, no data warehouse—is well-suited to those organizations that truly need agile analytics solutions. Still, I wonder how this approach fits with large BI deployments that require robust data services, especially in the realms of merging traditional analytics with big data and Internet of Things (IoT) strategies.

To sample what DrivenBI has to offer, I recommend checking out its SRK demo:

(Originally published on TEC's Blog)

Yep, I’m Writing a Book on Modern Data Management Platforms

Yep, I’m Writing a Book on Modern Data Management Platforms


Over the past couple of years, I have spent lots of time talking with vendors, users, consultants, and other analysts, as well as plenty of people from the data management community about the wave of new technologies and continued efforts aimed at finding the best software solutions to address the increasing number of issues associated with managing enterprise data. In this way, I have gathered much insight on ways to exploit the potential value of enterprise data through efficient analysis for the purpose of “gathering important knowledge that informs better decisions.

Many enterprises have had much success in deriving value from data analysis, but a more significant number of these efforts have failed to achieve much, if any, useful results. And yet other users are still struggling with finding the right software solution for their business data analysis needs, perhaps confused by the myriad solutions emerging nearly every single day.

It is precisely in this context that I’ve decided to launch this new endeavor and write a book that offers a practical perspective on those new data platform deployments that have been successful, as well as practical use cases and plausible design blueprints for your organization or data management project. The information, insight, and guidance that I will provide is based on lessons I’ve learned through research projects and other efforts examining robust and solid data management platform solutions for many organizations.

In the following months, I will be working hard to deliver a book that serves as a practical guide for the implementation of a successful modern data management platform.
The resources for this project will require crowdfunding efforts, and here is where your collaboration will be extremely valuable.
There are several ways in which you can participate:

  • Participating in our Data Management Platforms survey to obtain a nice discount right off the bat)
  • Pre-ordering the book (soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up at the link below)
  • Providing us with information about your own successful enterprise use case, which we may use in the book

To let us know which of these options best fits with your spirit of collaboration, and to receive the latest updates on this book, as well as other interesting news, you just need to sign up to our email list here. Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.

In the meantime, I’d like to leave you with a brief synopsis of the contents of this book, with more details to come in the near future:

New Data Management Platforms

Discovering Architecture Blueprints

About the Book

What Is This Book About?

This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.

In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.

The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.

Who Is This Book For?

This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.

The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies.
This book is also relevant for:

  • Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
  • Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
  • Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms. 
  • Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms. 

What Does This Book Cover? 

The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:

  • The evolution of data management
  • A comprehensive introduction to Big Data, NoSQL, and analytics databases 
  • The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
  • The evolution of the data warehouse and its new role within modern data management solutions 
  • New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions 
  • A revision of the data integration issue—new components, approaches, and solutions 
  • A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint 

How Is the Book Structured?

The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.

  • Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies. 
  • Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures. 
  • Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures. 
  • Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint. 

General Outline

The following is a general outline of the book:

<Table of Content>
Preface x 
Acknowledgment xi 
Prologue xii 
Introduction xiii 
Part I. Brief History of Data Management Platform Architectures 
          Chapter 1. The Never-Ending Need to Manage Data
          Chapter 2. The Evolution of Structured Data Repositories
          Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform
Part II. The Need for and Emergence of New Data Management Technologies 
          Chapter 4. Big Data: A Primer
          Chapter 5. NoSQL: A Primer
          Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies
          Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm
          Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space
Part III. New Data Management Platforms: A First Exploration 
          Chapter 9. The Data Warehouse, Expanded and Improved
          Chapter 10. Data Lakes: Concept and Approach
          Chapter 11. Data Hub: Concept and Approach
          Chapter 13. Analysis of Alternative Solutions
          Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations
          Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation
Part IV. Studying Plausible New Data Management Platforms 
          Chapter 14. Methodology
          Chapter 15. Data Lakes
               Sub-Chapter 15.1. Analyzing three real-world uses cases
               Sub-Chapter 15.2 Proposing a feasible blueprint
          Chapter 16. Data Hubs
               Sub-Chapter 16.1. Analyzing three real-world uses cases
               Sub-Chapter 16.2. Proposing a feasible blueprint
          Chapter 17. Summary and Conclusion
Summary and Conclusions
Appendix A. The Cloud Factor: Data Management Platforms in the Cloud
Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data
Appendix D. Brief Intro into Virtualization and Data Integration
Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies
Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data

About the Author 
Jorge Garcia is an industry analyst in the areas of business intelligence (BI) and data management. He’s currently a principal analyst with Technology Evaluation Centers (TEC).

His experience expands for more than 25 years in all phases of application development, database, data warehouse (DWH) and analytics and BI solution design, including more than 15 years in project management, covering best practices and new technologies in the BI/DWH space.

Prior to joining TEC, he was a senior project manager and senior analyst developing BI, DWH, and data integration applications using solutions such as Oracle, SAP, Informatica, IBM, Teradata, among others. Garcia also worked on projects related to the implementation of data management solutions for the private and public sectors including banking, insurance, retail, and services.

A proud member of the Boulder BI Brain Trust, Garcia also makes frequent public speaker appearances, and is an educator and influencer in different topics related to data management.

When not busy researching, speaking, consulting, and mingling with people in this industry, Garcia finds solace as an avid reader, music lover, and soccer fan, as well as proud father "trying" to raise his three lovely kids while his wife tries to re-raise him.

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld

Processing data fast and efficiently has become a never ending race. With the increasing need for data consumption by companies comes along a never ending “need for speed” for processing data and consequently, the emergence of new generation of database software solutions that emerging to fulfill this need for high performance data processing.

These new database management systems that incorporate novel technology provide high speed, and more efficient access and processing of large bulks of data.

EXASOL is one of this disruptive "new" database solution. Headquartered out of Nuremberg, Germany and with offices around the globe, EXASOL has worked hard to bring a fresh, new approach to the data analytics market via the offering of a world-class database solution.

In this interview, we took the opportunity to chat with EXASOL’s Aaron Auld about the company and its innovative database solution.

Aaron Auld is the Chief Executive Officer as well as the Chairman of the Board at EXASOL, positions he has held since July 2013. He was made a board member in 2009.

As CEO and Chairman, Aaron is responsible for the strategic direction and execution of the company, as well as growing the business internationally.

Aaron embarked on his career back in 1996 at MAN Technologie AG, where he worked on large industrial projects and M&A transactions in the aerospace sector. Subsequently, he worked for the law firm Eckner-Bähr & Colleagues in the field of corporate law.

After that, the native Brit joined Océ Printing Systems GmbH as legal counsel for sales, software, R&D and IT. He then moved to Océ Holding Germany and took over the global software business as head of corporate counsel. Aaron was also involved in the IPO (Prime Standard) of Primion Technology AG in a legal capacity, and led investment management and investor relations.

Aaron studied law at the Universities of Munich and St. Gallen. Passionate about nature, Aaron likes nothing more than to relax by walking or sailing and is interested in politics and history.

So, what is EXASOL and what is the story behind it?

EXASOL is a technology vendor that develops a high-performance in-memory analytic database that was built from the ground up to analyze large volumes of data extremely fast and with a high degree of flexibility.
The company was founded back in the early 2000's in Nuremberg, Germany, and went to market with the first version of the analytic database in 2008.

Now in its sixth generation, EXASOL continues to develop and market the in-memory analytic database working with organizations across the globe to help them derive business insight from their data that helps them to drive their businesses forward.

How does the database work? Could you tell us some of the main features?

We have always focused on delivering an analytic database ultra-fast, massively scalable analytic performance. The database combines in-memory, columnar storage and massively parallel processing technologies to provide unrivaled performance, flexibility and scalability.

The database is tuning-free and therefore helps to reduce the total cost of ownership while enabling users to solve analytical tasks instead of having to cope with technical limits and constraints.

With the recently-announced version 6, the database now offers a data virtualization and data integration framework which allows users to connect to more data sources than ever before.

Also, alongside out-of-the-box support for R, Lua, Python and Java, users can integrate the analytics programming language of their choice and use it for in-database analytics.

Especially today, speed of data processing is important. I’ve read EXASOL has taken some benchmarks in this regard. Could you tell us more about it?

One of the truly independent set of benchmark tests available is offered by the Transactional Processing Council (TPC).  A few years ago we decided to take part in the TPC-H benchmark and ever since we have topped the tables in terms of not only performance (i.e. analytic speeds) but also in terms of price/performance (i.e. cost aligned with speed) when analyzing data volumes ranging from 100GB right up to 100TB.   No other database vendor comes close.
The information is available online here.

One of the features of EXASOL is that, if I’m not mistaken, is deployed on commodity hardware. How does EXASOL’s design guarantee optimal performance and reliability?

Offering flexible deployment models in terms of how businesses can benefit from EXASOL has always been important to us at EXASOL.

Years ago, the concept of the data warehouse appliance was talked about as the optimum deployment model, but in most cases it meant that vendors were forcing users to use their database on bespoke hardware, on hardware that then could not be re-purposed for any other task.  Things have changed since: while the appliance model is still offered, ours is and always has been one that uses commodity hardware.

Of course, users are free to download our software and install it on their own hardware too.
It all makes for a more open and transparent framework where there is no vendor lock-in, and for users that can only be a good thing.  What’s more, because the hardware and chip vendors are always innovating, when a new processor or server is released, users only stand to benefit as they will see yet even faster performance when they run EXASOL on that new technology.
We recently discussed this in a promotional video for Intel.

Price point related, is it intended only for large organizations, what about medium and small ones with needs for fast data processing?

We work with organizations both large and small.  The common denominator is always that they have an issue with their data analytics or incumbent database technology and that they just cannot get answers to their analytic queries fast enough.

Price-wise, our analytic database is extremely competitively priced and we offer organizations of all shapes and sizes to use our database software on terms that best fit their own requirements, be that via a perpetual license model, a subscription model, a bring-your-own license model (BYOL) – whether on-premises or in the cloud.

What would be a minimal configuration example? Server, user licensing etc.?

Users can get started today with the EXASOL Free Small Business Edition.  It is a single-node only edition of the database software and users can pin up to 200GB of data into RAM.

Given that we advocate a 1:10 ratio of RAM vs raw data volume, this means that users can put 2TB of raw data into their EXASOL database instance and still get unrivaled analytic performance on their data – all for free. There are no limitations in terms of users.

We believe this is a very compelling advantage for businesses that want to get started with EXASOL.

Later, when data volumes grow and when businesses want to make use of advanced features such as in-database analytics or data virtualization, users can then upgrade to the EXASOL Enterprise Cluster Edition which offers much more in terms of functionality.

Regarding big data requirements, could you tell us some of the possibilities to integrate or connect EXASOL with big data sources/repositories such as Hadoop and others?

EXASOL can be easily integrated into every IT infrastructure.  It is SQL-compliant and, is compatible with leading BI and ETL products such as Tableau, MicroStrategy, Birst, IBM Cognos, SAP BusinessObjects, Alteryx, Informatica, Talend, Looker and Pentaho, and provides the most flexible Hadoop connector on the market.

Furthermore, through an extensive data virtualization and integration framework, users can now analyze data from more sources more easily and faster than ever before.

Recently, the company announced that EXASOL is now available on Amazon. Could you tell us a bit more about the news? EXASOL is also available on Azure, right?

As more and more organizations are deploying applications and their systems in the cloud, it’s therefore important that we can allow them to use EXASOL in the cloud, too.  As a result, we are now available on Amazon Web Services as well as Microsoft Azure.  What’s more, we continue to offer our own cloud and hosting environment, which we call EXACloud.

Finally, on a more personal topic. Being a Scot who lives in Germany, would you go for a German beer or a Scottish whisky?

That’s an easy one.  First enjoy a nice German beer (ideally, one from a Munich brewery) before dinner, then round the evening off with by savoring a nice Scottish whisky.  The best of both worlds.

Logging challenges for containerized applications: Interview with Eduardo Silva

Logging challenges for containerized applications: Interview with Eduardo Silva

Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.

In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session: Logging for Containers which will take place during the conference.

Eduardo Silva is a principal Open Source developer at Treasure Data Inc where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.

He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.

A well known speaker, Eduardo has been speaking in events across South America and in recent Linux Foundation events in the US, Asia and Europe.

Thanks so much for your time Eduardo!

What is a container and how is it applied specifically in Linux?

When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers: cgroups and namespaces.

  • cgroups (control goups) allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
  • on the other hand namespaces (associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.

For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.

So, what is the difference between a Linux Container and, for example a virtual machine?

A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.

So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?

There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.

Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?

Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers. LXC and LXD provide the next level of abstraction to manage containers, LXCFS is a user-space file system for containers (works on top of Fuse).
Since I don't play with containers at low level, I don't have a strong preference.

And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?

Docker is the big player nowadays, it provide good security and mechanisms to manage/deploy containers. CoreOS have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like Kubernetes are already providing support for it.

You are also working on a quite interesting project called Fluent-Bit. What is the project about?

I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this, Fluentd exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born: Fluent Bit.
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:

  • Input / Output plugins
  • Event driven (async I/O operations)
  • Built-in Metrics
  • Security: SSL/TLS
  • Routing
  • Buffering
  • Fluentd Integration

Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:

  • Input
    • Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
    • Head: read initial chunks of bytes from a file.
    • Health: check remote TCP server healthy.
    • kmsg: read Kernel log messages.
    • CPU: collect CPU metrics usage, globally and per core.
    • Mem: memory usage of the system or from a specific running process.
    • TCP: expect for JSON messages over TCP.
  • Output
    • Elasticsearch database
    • Treasure Data (our cloud analytics platform)
    • NATS Messaging Server
    • HTTP end-point

So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into Elasticsearch, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?

You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?

I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.

Finally, on a personal note. Which do you consider to be the geekiest songs of this century?

That's a difficult question!
 I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).

Teradata Partners Conference 2016: Teradata Everywhere

Teradata Partners Conference 2016: Teradata Everywhere

Our technologized society is becoming opaque.
As technology becomes more ubiquitous and our relationship with digital devices ever
more seamless, our technical infrastructure seems to be increasingly intangible.
- Honor Harger

An idea that I could sense was in the air during my last meeting with Teradata’s crew in California, during their last influencer event, was confirmed and reaffirmed a couple of weeks ago during Teradata’s big partner conference: Teradata is now in full-fledged transformational mode.

Of course, for companies like Teradata that are used to being on the front line of the software industry, particularly in the data management space, transformation has now become much more than a “nice to do”. These days it’s pretty much the life breath of any organization at the top of the software food chain.

These companies have the complicated mandate to, if they want to stay at the top, be fast and smart enough to provide the software, the method, and the means to enable customers to gain technology and business improvements and the value that results from these changes.

And while it seems Teradata has taken its time for this transformation it is also evident that the company is taking it very seriously. Will this be enough to keep pace with peer vendors within a very active, competitive, and transformational market? Well, it’s hard to say, but certainly with a number of defined steps, Teradata looks like it will be able to meet its goal of remaining a key player in the data management and analytics industry.

Here we take an up-to-date look at Teradata’s business and technology strategy, including its flexible approach to deployment and ability for consistent and coherent analytics over all types of deployment, platforms, and sources of data; and then explore what the changes mean for the company and its current and future customers.

The Sentient Enterprise
As explained in detail in a previous installment, Teradata has developed a new approach towards the adoption of analytics, called the “sentient enterprise.” This approach aims to guide companies to:

  • improve their data agility
  • adopt a behavioral data platform
  • adopt an analytical application platform
  • adopt an autonomous decision platform

While we won’t give a full explanation of the model here (see the video below or my recent article on Teradata for a fuller description of the approach), there is no doubt that this is a crucial pillar for Teradata’s transformational process, as it forms the backbone of Teradata‘s approach to analytics and data management.

Teradata Video: The Sentient Enterprise

As mentioned in the previous post, one aspect of the “sentient enterprise” approach from Teradata that I particularly like is the “methodology before technology” aspect, which focuses on scoping the business problem, then selecting the right analytics methodology, and at the end choosing the right tools and technology (including tools such as automatic creation models and scoring datasets).

Teradata Everywhere
Another core element of the new Teradata approach consists of spreading its database offering wide, i.e., making it available everywhere, especially in the cloud. This movement involves putting Teradata’s powerful analytics to work. Teradata Database will now be available in different delivery modes and via different providers, including on:

  • Amazon Web Services—Teradata Database will be available for a massively parallel process (MPP) configuration and scalable for up to 32 nodes, including services such as node failure recovery and backup, as well as restoring and querying data in Amazon’s Simple Storage Service (S3). The system will be available in more than ten geographic regions.
  • Microsoft’s Azure—Teradata Database is expected to be available by Q4 of 2016 in the Microsoft Azure Marketplace. It will be offered with MPP (massively parallel processing) features and scalability for up to 32 nodes.
  • VMWare——via the Teradata Virtual Machine Edition (TVME), users have the option for deploying a virtual machine edition of Teradata Database for virtual environments and infrastructures.
  • Teradata Database as a Service—Extended availability for the Teradata Database will be available to customers in Europe through a data center hosted in Germany.

Teradata’s own on-premises IntelliFlex platform.

Availability of Teradata Database on different platforms

Borderless Analytics and Hybrid Clouds
The third element in the new Teradata Database picture involves a comprehensive provision of analytics despite the delivery mode chosen, an offering which fits the reality of many organizations—a hybrid environment consisting of both on-premises and cloud offerings.

With a strategy called Borderless Analytics, Teradata allows customers to deploy comprehensive analytics solutions within a single analytics framework. Enabled by Teradata’s solutions such as its multi-source SQL and processing QueryGrid engine and Unity, its orchestration engine for Teradata’s multi-system’s environments, this strategy purposes a way to perform consistent and coherent analytics over heterogeneous platforms with multiple systems and sources of data, i.e., in the cloud, on-premises, or virtual environments.

At the same time, this is also serving Teradata as a way to set the basis for its larger strategy for addressing the Internet of Things (IoT) market. Teradata is addressing this goal with the release of a set of new offerings called Analytics of Things Accelerators (AoTAs), comprised by technology-agnostic intellectual property that emerged as a result of Teradata’s real life IoT project engagements.

These accelerators can help organizations determine which IoT analytical techniques and sensors to use and trust. Due to the AoTAs’ enterprise readiness and design, companies can deploy them without having an enterprise scaling approach in mind, and not have to go through time-consuming experimentation phases before deployment to ensure the right analytical techniques have been used. Teradata’s AoTAs accelerate adoption, enabling deployment cost reduction and ensuring reliability. This is a noteworthy effort to provide IoT projects with an effective enterprise analytics approach.

What Does this Mean for Current and Potential Teradata Customers?
Teradata seems to have a concrete, practical, and well-thought-out strategy regarding the delivery of new generation solutions for analytics, focusing on giving omnipresence, agility, and versatility to its analytics offerings, and providing less product dependency and more business focus to its product stack.

But one thing Teradata needs to consider, given the increasing number of solutions available from its portfolio, is being sure to provide clarity and efficiency to customers regarding which solution blend to choose. This is especially true when the solution choice involves increasingly sophisticated big data solutions, a market that is getting “top notch” but certainly is still difficult to navigate into, especially for those new to big data.

Teradata’s relatively new leadership team seems to have sensed right away that the company is currently in a very crucial position not only within itself but also within the industry of providing insights. If its strategy works, Teradata might be able to not only maintain its dominance in this arena but also increase its footprint in an industry destined to expand with the advent of the Internet of Things.

For Teradata’s existing customer base, these moves could be encouraging, as they could mean being able to expand the company’s existing analytics platforms using a single platform and therefore without any friction and with and cost savings.

For those considering Teradata as a new option, it means having even more options for deploying end-to-end data management solutions using a single vendor rather than a having a “best of breed” approach. Either way though, Teradata is pushing towards the future with a new and comprehensive approach to data management and analytics in an effort to remain a key player in this fierce market.

The question is if Teradata’s strategic moves will resonate effectively within the enterprise market to compete with the existing software monsters such as Oracle, Microsoft, and SAP.

Are you a Teradata user? If so, let us know what you think in the comments section below.

(Originally published on TEC's Blog)
Salesforce Acquires BeyondCore to Enable Analytics . . . and More

Salesforce Acquires BeyondCore to Enable Analytics . . . and More

In October of 2014, Salesforce announced the launch of Salesforce Wave, the cloud-based company’s analytics cloud platform. By that time, Salesforce had already realized that to be able to compete with the powerful incumbents in the business software arena—the Oracles, SAPs and IBMs of the world—arriving to the cloud at full swing would require it to expand its offerings to the business
IT Sapiens, for Those Who Are Not

IT Sapiens, for Those Who Are Not

Perhaps one of the most refreshing moments in my analyst life is when I get the chance to witness the emergence of new tech companies—innovating and helping small and big organizations alike to solve their problems with data. This is exactly the case with Latvia-based IT Sapiens, an up-and-coming company focused on helping those small or budget-minded companies to solve their basic yet crucial
Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

For anyone with even a small amount of understanding regarding current trends in the software industry it will come as no surprise that the great majority of enterprise software companies are focusing on the incorporation of analytics, big data, cloud adoption, and especially the Internet of Things into their software solutions. In fact, these capabilities have become so ubiquitous that for
Zyme: Emergence and Evolution of Channel Data Management Software

Zyme: Emergence and Evolution of Channel Data Management Software

Previous to the official launch of the new version of Zyme’s solution, I had the opportunity to chat and be briefed by Ashish Shete, VP of Products and Engineering at Zyme, in regard to version 3.0 of what Zyme describes as its channel data management (CDM) solution platform. This conversation was noteworthy from both the software product and industry perspectives. In particular, the solution
An Interview with Dataiku’s CEO: Florian Douetteau

An Interview with Dataiku’s CEO: Florian Douetteau

As an increasing number of organizations look for ways to take their analytics platforms to higher grounds, many of them are seriously considering the incorporation of new advanced analytics disciplines, this includes hiring data science specialists and solutions that can enable the delivery of improved data analysis and insights. As a consequence, this also triggers the emergence of new companies and offerings in this area.

Dataiku is one of these new breed of companies. With its Data Science Studio (DSS) solution, Dataiku aims to offer full data science solution for both data science experienced and non-experienced users.

In this opportunity I had the chance to interview Florian Douetteau, Dataiku’s CEO and be able to pick some of his thoughts and interesting views in regards to the data management industry and of course he’s company and software solution.
A brief Bio of Florian 

In 2000, at age 20, he dropped the prestigious “Ecole Normale Supérieure”  math courses and decided to look for the largest dataset he could find, and the hardest related problem he could solve.

That’s how he started working at Exalead, a search engine company that back at the time was developing technologies in web mining, search, natural language processing (NLP) and distributed computing. At Exalead, Florian scaled to be managing VP of Product and R&D. He stayed in the company until it was acquired in 2010 by Dassault Systèmes for $150M (a pretty large amount for French standards).

Still in 2010 when the data deluge was pouring into to new seas, Florian worked in the social gaming and online advertising industry, an industry where machine learning was already being applied on petabytes of data. Between 2010 and 2013 he held several positions as consultant and CTO.

 By 2013 Florian along with other 3 co-founders creates Dataiku with the goal of making advanced data technologies accessible to companies that are not digital giants, since then one of Florian’s main goals as CEO of Dataiku is to be able of democratizing access to Data Science.

So, you can watch the video or listen to the podcast in which Florian shares with us some of his views on the fast evolution of data science, analytics, big data and of course, his data science software solution.

 Of course, please feel free to let us know your comments and questions.
Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Logo courtesy of Altiscale Let me just say right off the bat that I consider Altiscale to be a really nice alternative for the provisioning of Big Data services such as Hortonworks, Cloudera or MapR. The Palo Alto, California–based company offers a full Big Data platform based in the cloud via the Altiscale Data Cloud offering. In my view, Altiscale has dramatically increased the appeal of
Hortonworks’s New Vision for Connected Data Platforms

Hortonworks’s New Vision for Connected Data Platforms

Courtesy of Hortonworks
On March 1, I had the opportunity to attend this year’s Hortonworks Analyst Summit in San Francisco, where Hortonworks announced several product enhancements and new versions and a new definition for its strategy going forward.

Hortonworks seems to be making a serious attempt to take over the data management space, while maintaining a commitment to open sources and especially to the Apache Foundation. Thus as Hortonworks keeps gaining momentum, it’s also consolidating its corporate strategy and bringing a new balance to its message (combining both technology and business).

By reinforcing alliances, and at the same time moving further towards the business mainstream with a more concise messaging around enterprise readiness, Hortonworks is declaring itself ready to win the battle for the big data management space.

The big question is if the company’s strategy will be effective enough to succeed at this goal, especially in a market already overpopulated and fiercely defended by big software providers.

Digesting Hortonworks’s Announcements
The announcements at the Hortonworks Analyst Summit included news on both the product and partner fronts. With regards to products, Hortonworks announced new versions for both its Hadoop Data (HDP) and Hadoop Dataflow (HDF) platforms.

HDP—New Release, New Cycle
Alongside specific features to improve performance and reinforce ease of use, the latest release of Apache HDP 2.4 (figure 1) includes the latest generation of Apache’s large-scale data processing framework, Spark 1.6, along with Ambari 2.2, Apache’s project for making Hadoop management easier and more efficient.

The inclusion of Ambari seems to be an important key for the provision of a solid, centric management and monitoring tool for Hadoop clusters.

Figure 1. Hortonworks emphaszes enterprise readiness for its HDP version
(Image courtesy of Hortonworks)

Another key announcement with regard to HDP is the revelation of a new release cycle for HDP. Interestingly, it aims to provide users with a consistent product featuring core stability. The new cycle will enable, via yearly releases, HDP services such as HDFS, YARN, and MapReduce as well as Apache Zookeeper to align with a compatible version of Apache Hadoop with the “ODPi Core,” currently in version 2.7.1. These can provide standardization and ensure a stable software base for mission critical workloads.

On the flip side, those extended services that run on top of the Hadoop core, including Spark, Hive, HBase, Ambari and others will be continually released throughout the year to ensure these projects are continuously updated.

Last but not least, HDP’s new version also comes with the new Smartsense 1.2, Hortonworks’s issue resolution application, featuring automatic scheduling and uploading, as well as over 250 new recommendations and guidelines.


Growing NiFi to an Enterprise Level
Along with HDP, Hortonworks also announced version 1.2 of HDF, Hortonworks’s offering for managing data in motion by collecting, manipulating, and curating data in real time. The new version includes new streaming analytics capabilities for Apache NiFi, which powers HDF at its core, and support for Apache Storm and Apache Kafka (figure 2).

Another noteworthy feature coming to HDF is its support for integration with Kerberos, a feature which will enable and ease management of centralized authentication across the platform and other applications. According to Hortonworks, HDF 1.2 will be available to customers in Q1 of 2016.

Figure 2. Improved security and control added to Hortonworks new HDF version
(Image courtesy of Hortonworks)

Hortonworks Adds New Partners to its List
The third announcement from Hortonworks at the conference was a partnership with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise (HPE).

The collaboration mainly has to do with a bipartisan effort to enhance performance and capabilities of Apache Spark. According to Hortonworks and HPE, this collaboration will be mainly focused on the development and analysis of a new class of analytic workloads which benefit from using large pools of shared memory.

Says Scott Gnau, Hortonworks’s chief technology officer, with regard to the collaboration agreement:

This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.

According to both companies, this collaboration has already generated interesting results which include more efficient memory usage and increased performance as well as faster sorting and in-memory computations for improving Spark’s performance.

The result of these collaborations will be derived as new technology contributions for the Apache Spark community, and thus carry beneficial impacts for this important piece of the Apache Hadoop framework.

Commenting on the new collaborations, Martin Fink, executive vice president and chief technology officer of HPE and board member of Hortonworks, said:

We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.

Additionally Hortonworks signed a partnership with Impetus Technologies, Inc., another solution provider based on open source technology. The agreement includes collaboration around StreamAnalytix™, an application that provides tools for rapid and less code development of real-time analytics applications using Storm and Spark. Both companies have the aim that with the use of HDF and StreamAnalytix together, companies will gain a complete and stable platform for the efficient development and delivery of real-time analytics applications.

But The Real News Is …
Hortonworks is rapidly evolving its vision of data management and integration, and this was in my opinion the biggest news of the analyst event. Hortonworks’s strategy is to integrate the management of both data at rest (data residing in HDP) and data in motion (data HDF collects and curates in real-time), as being able to manage both can power actionable intelligence. It is in this context that Hortonworks is working to increase integration between them.

Hortonworks is now taking a new go-to-market approach to provide an increase in quality and enterprise readiness to its platforms. Along with ensuring that ease of use will avoid barriers for end use adoption its marketing message is changing. Now the Hadoop-based company sees the need to take a step further and convince businesses that open source does more than just do the job; it is in fact becoming the quintessential tool for any important data management initiative—and, of course, Hortonworks is the best vendor for the job. Along these lines, Hortonworks is taking steps to provide Spark with enterprise-ready governance, security, and operations to ensure readiness for rapid enterprise integration. This to be gained with the inclusion of Apache Ambari and other Apache projects.

One additional yet important aspect within this strategy has to do with Hortonworks’s work done around enterprise readiness, especially regarding issue tracking (figure 3) and monitoring for mission critical workloads and security reinforcement.

Figure 3. SmartSense 1.2 includes more than 250 recommendations
(Image courtesy of Hortonworks)

It will be interesting to see how this new strategy works for Hortonworks, especially within the big data market where there is extremely fierce competition and where many other vendors are pushing extremely hard to get a piece of the pie, including important partners of Hortonworks.

Taking its data management strategy to a new level is indeed bringing many opportunities for Hortonworks, but these are not without challenges as the company introduces itself into the bigger enterprise footprint of the data management industry.

What do you think about Hortonworks’s new strategy in data management? If you have any comments, please drop me a line below and I’ll respond as soon as I can.

(Originally published)
Creating a Global Dashboard. The GDELT Project

Creating a Global Dashboard. The GDELT Project

There is probably no bigger dream for a data geek like myself than creating the ultimate data dashboard or scorecard of the world. One that summarizes and enables the analysis of all the data in the world. Well, for those of you who have also dreamt about this, Kalev H. Leetaru, a senior fellow at the George Washington University Center for Cyber & Homeland Security has tapped into your
Dell Toad’s Big Jump into the BI and Analytics Market

Dell Toad’s Big Jump into the BI and Analytics Market

Having a background in software and database development and design, I have a special nostalgia and appreciation for Toad’s set of database solutions, as in my past working life I was a regular user of these and other tools for database development. Of course, Toad’s applications have grown and expanded over the years and now cover the areas within data management that are key to many
TIBCO Spotfire Aims for a TERRific Approach to R

TIBCO Spotfire Aims for a TERRific Approach to R

terrific /təˈrɪfɪk/ adjective  1. very great or intense: a terrific noise 2. (informal) very good; excellent: a terrific singer The British Dictionary R is quickly becoming the most important letter in the world of analytics. The open source environment for statistical computing is now at the center of major strategies within many software companies. R is here to stay. As mentioned
Microsoft and the Revolution… Analytics

Microsoft and the Revolution… Analytics

You say you want a revolution
Well, you know
We all want to change the world
You tell me that it's evolution
Well, you know
We all want to change the world
(Revolution, Lennon &McCartney)

With a recent announcement  Microsoft took another of multiple steps towards what is now a clear internal and external revolution regarding the future of the company.

By announcing the acquisition of Revolution Analytics, a company that in a just a few years has become a leading provider of predictive analytics solutions, Microsoft looks not just to strengthen its already wide analytics portfolio but, perhaps is also trying to increase its presence in the open source and data science communities, with the latter being one with huge future potential. An interesting movement no doubt, but…  Was this acquisition one that Microsoft needed to boost its Analytics strategy against its biggest competitors? Will this movement really give Microsoft’s revolution a better entrance to the open source space, especially within the data science community? Is Microsoft ready for open source and vice versa?

The Appeal of Revolution Analytics
Without a doubt Revolution Analytics is quite an interesting company, founded lest than 10 years ago (in 2007) it has become one of the most representative software providers of predictive analytics in the market. The formula has been, if not easy to achieve, simple and practical, Revolution R software has been created on top of the increasingly popular programming language called ‘R’.

As a programming language, R is designed especially for the development of statistical and predictive analytics applications. Because this is a language that emerged from the trenches of academia and because of its open source nature, it has grown and expanded to the business market along with a vibrant community which develops and maintains its Comprehensive R Archive Network (CRAN), R’s wide library of functions.

Revolution Analytics had the apparently simple yet pretty clever strategy of developing and enhancing its analytics platform on top of R in order to offer a debugged and commercial ready R offering. It also has been clever to offer different flavors of software, ranging from a free version to a version ready for the enterprise.

At the same time, Revolution Analytics has maintained its close relation with both the R and open source communities and has developed a wide range of partnerships with important vendors such as Teradata, HP, IBM and many others, increasing its market presence, adoption and continuing technical development.

At first glance of course, Revolution Analytics is quite an interesting bet not just for Microsoft but for many other software providers eager to step big into the predictive analytics arena but.

Not so fast Microsoft…Was it a good idea?
In an article published recently on Forbes, Dan Woods states that Microsoft’s acquisition of Revolution Analytics is the wrong way to embrace R. He explains that the acquisition represents a step forward for the R language but will limit what R could bring to Microsoft’s own business. According to Mr. Woods:

It is vital to remember that R is not a piece of software created by software engineers. Like much of the open source world, R was created by those who wanted to use it – statisticians and data scientists. As a result, the architecture of the implementation has weaknesses that show up at scale and in other inconvenient ways. Fixing this architecture requires a major rewrite.


While Microsoft will be able to make its Hadoop offering on Azure better with what Revolution has done, the open source model will inhibit the wider deployment of R throughout the rest of the Microsoft ecosystem.

Both points are absolutely valid especially considering how the open source code would need to be accommodated within the Microsoft analytics portfolio. However, I would not be surprised if Microsoft had already taken this into account and had contemplated putting R on Azure as a short-term priority and the immersion of R with the rest of the portfolio as a medium-term priority –considering that they have not just acquired the software, but the expertise of the Revolution Analytics team. Important will be then to maintain cohesion on the team to pursue these major changes.

Another interesting aspect is Mr. Woods’  comparison of Microsoft’s acquisition vs TIBCO’s approach which took a radical posture and re-implemented R to make it suitable for high performance tasks and highly compatible with its complete set of analytics offerings and, thus creating TERR.

While TIBCO’s approach is quite outstanding (it deserves its own further post), it was somehow more feasible for TIBCO due to its experience with Bell Labs S, a precursor and similar offering to R and, its longtime expertise within the predictive analytics field. Microsoft by the contrary, is in the need for shortening distances with IBM, SAS and many others to enter the space with a strong foothold, one R can certainly provide, and also to  give the company some air and space to further work on an already stable product such as the one provided by Revolution Analytics.

One thing to consider though is Microsoft’s ability to enter and maintain active a community that at times has proven to be hostile to the Seattle software giant and, of course, willing to turn their backs on them. About this David Smith, Chief Community Officer with Revolution Analytics, mentioned:

Microsoft might seem like a strange bedfellow for an open-source company, but the company continues to make great strides in the open-source arena recently. Microsoft has embraced Linux as a fully-supported operating system on its Azure cloud service.

While it’s true that Microsoft has increased its presence in the open source community, whether the inclusion of Linux under Azure, contributing to its kernel or maintaining close partnerships with Hortonworks —big data’s big name— being able to convince and conquer the huge R community can prove to be difficult yet highly significant to increase its presence in market that has huge potential.

This of course, considering that Microsoft has changed its strategy regarding its development platforms by making them available to enable free development and community growth, like with .NET, Microsoft’s now open source development platform.

Embracing the revolution
While for Microsoft the road to embrace R can potentially be bumpy, it might still prove to be the way to go, if not the only, in order to foresee a bright future in the predictive analytics market. Much work perhaps will need to be done, including rewriting and optimizing but, at the end of the day, it might be a movement that could catapult Microsoft to compete in better shape in the predictive analytics market before it is too late.

At this point it Microsoft seems to rely that the open source movement is mature enough to accept Microsoft as another common contributor, while Microsoft seems to be ready to take what appears to be a logical step to reposition itself in line with modern times and ready to embrace new tech trends.

Like any new relationship, adjustment and adaption is needed. Microsoft’s (R) evolution and transformation seems to be underway.
Have a comment? Drop me a line below. I’ll respond the soonest I can.

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potential benefits, uses, and challenges within the scope of Business Intelligence and Analytics.

In this installment of the series, and the last devoted to machine learning before we step into cognitive systems, I will attempt to provide a general overview of the Machine Learning (ML) market landscape, describing some, yes, only some, of the vendors and software products that are using ML for performing Analytics and Intelligence, so, here a brief market landscape overview.

Machine learning: a common guest with no invitation

It is quite surprising to find Machine Learning has a vast presence in many of today’s modern analytics applications. Its use is driven by:

  • The increasing need to crunch data that is more complex and more voluminous, at greater speed and with more accuracy—I mean really big data
  • The need to solve increasingly business problems that require methods out of conventional data analysis.

An increasing number of traditional and new software providers, forced by specific market needs to radically evolve their existing solutions or moved by the pure spirit of innovation, have followed the path of incorporating new data analytics techniques to their analytics offering stack, both explicitly, or simply hidden within white curtains.

For software providers that already offer advanced analytics tools such as data mining, incorporating machine learning functionality into their existing capabilities stack is an opportunity to evolve their current solutions and take analytics to the next level.

So, it is quite possible that if you are using an advanced business analytics application, especially for Big Data, you are already using some machine learning technology, whether you know it or not.

The machine learning software landscape, in brief 

One of the interesting aspects of this seemingly new need for dealing with increasingly large and complex sets of information is that many of the machine learning techniques originally used within pure research labs have already gained entrance to the business world, via their incorporation within analytics offerings. New vendors often may incorporate machine learning as the core of their analytics offering, or just as another of the functional features available in their stack.

Taking this into consideration, we can find a great deal of software products that offer machine learning functionality, to different degrees. Consider the following products, loosely grouped by type:

From the lab to the business

In this group we can find a number of products, most of them based on an open-source licensing model that can help organizations to test machine learning and maybe take their first steps.


A collection of machine learning algorithms written in Java that can be applied directly over a dataset, or can be called from a custom Java-coded program, Weka is one of the most popular machine learning tools used in research and academia. It is written under the GNU General Public License, so it can be downloaded and used freely, as long as you comply with the GNU license terms.

Because of its popularity, a lot of information is available about the use of and development with Weka. It still can prove to be challenging for some users not familiar with machine learning, but it’s quite good for those who want to uncover explore the bits and bytes of using machine learning analysis on large datasets.


Probably the most popular language and environment for statistical computing and graphics, R is a GNU project that comprises a wide variety of statistical and graphical techniques with a high degree of scalability. No wonder that R is one of the most widely used statistical tools used by students.

The way the R project is designed to work is by having a core or based system set of statistical features and functions that can be extended with a large set of function libraries provided within the Comprehensive R Archive Network (CRAN).

Within the CRAN library, it is possible to download the necessary functions for multivariate analysis, data mining, and machine learning. But it is fair to assume that it takes a bit of effort to put machine learning to work with R.

Note: R is also of special interest owing to its increasing popularity and adoption via a commercial offering for R called Revolution Analytics, an offering I discuss below.


Jubatus is an online distributed machine learning framework. It is distributed under GNU Lesser General Public License  version 2.1, which makes Jubatus another good option for the learning, trial, and—why not—exploitation of machine learning techniques within a reduced budget.

The framework can be installed in different flavors of Linux, such as Red Hat, Ubuntu, and others, as well as within the Mac OS X. Jubatus includes client libraries for C++, Python, Ruby, and Java. Some of its functional features include a list of machine learning libraries for applying different techniques such as graph mining, anomaly detection, clustering, classification, regression, recommendation, etc.

Apache Mahout

Mahout is Apache’s machine learning algorithm library. Distributed under a commercially friendly Apache software license, Mahout comprises a core set of algorithms for clustering, classification and collaborative filtering that can be implemented on distributed systems.

Mahout supports three basic types of algorithms or use cases to enable recommendation, clustering and classification tasks.

One interesting aspect of Mahout is its goal to build a strong community for the development of new and fresh machine learning algorithms.

Apache Spark

Spark is Apache Hadoop’s general engine for processing large-scale data sets. The Spark engine is also an open source engine that enables users to generate applications in Java, Scala, or Python.

Just like the rest of the Hadoop family, Spark is designed to deal with large amounts of data, both structured and unstructured. The Spark design supports cyclic data flow and in-memory computing, making it ideal for processing large data sets at high speed.

In this scenario, one of the engine’s main components is the MLlib, which is Spark’s machine learning library. The library works using the Spark engine to perform faster than MapReduce and can operate in conjunction with NumPy, Python’s core scientific computing package, giving MLlib a great deal of flexibility to design new applications in these languages.

Some of the algorithms included within MLlib are:

  • K-means clustering with K-means|| initialization
  • L1- and L2-regularized linear regression
  • L1- and L2-regularized logistic regression
  • Alternating least squares collaborative filtering, with explicit ratings or implicit feedback
  • Naïve-Bayes multinomial classification
  • Stochastic gradient descent

While this set of applications gives users hands-on machine learning, at no cost, they can still be somewhat challenging when it comes to putting these applications to work. Many of them require special skills in the art of machine learning or in Java or MapReduce to fully develop a business solution.

Still, these applications can enable new teams to start working on machine learning and experienced ones to develop complex solutions for both small and big data. 

Machine learning by the existing players

As we mentioned earlier in this series, the evolution of Business Intelligence is demanding an increasing incorporation of machine learning techniques into existing BI and Analytics tools.

A number of popular enterprise software applications have already expanded their functional coverage to include machine learning—a useful ally—within their stacks.

Here are just a couple of the vast number of software vendors that have added machine learning either to their core functionality or as an additional feature-product of their stack.


It is no secret that IBM is betting strong in the areas of advanced analytics and cognitive computing, especially with Watson, IBM’s cognitive computing initiative and an offering which we will examine in the cognitive computing part of this series. IBM can potentially enable users to develop machine learning analytics approaches via its SPSS product stack, which incorporates the ability to develop some specific machine learning algorithms via the SPSS Modeler.


Indubitably SAS is one of the key players in the advanced analytics arena, with a solid platform for performing mining and predictive analysis, for both general and industry vertical purposes. It has incorporated key machine language techniques to be adopted for different uses. Several ML techniques can be found within SAS’ vast analytics platform, from SAS Enterprise and Tex Miner products to its SAS High-Performance Optimization offering.

An interesting fact to consider is SAS’ ability to provide industry and line-of-business approaches for many of its software offerings, encapsulating functionality with prepackaged vertical functionality.

Embedded machine learning

Significantly, machine learning techniques are reaching the core of many of the existing powerhouses as well as the newcomers in the data warehouse and Big Data spaces. Via its incorporation as embedded technologies within their database technologies, some analytic and data warehouse providers have now incorporated machine learning techniques, to varying degrees, to their database structures. 


The New York-based company, a provider of Big Data and discovery software solutions, offers a set of what it calls in-database analytics in which a set of analytics capabilities is built right into 1010Data’s database management engine. Machine learning is included along with a set of in-database analytics such as clustering, forecasting, optimization, and others.


Among its multiple offerings for enterprise data warehouse and Big Data environments, Teradata offers the Teradata Warehouse Miner, an application that packages a set of data profiling and mining functions that includes machine learning algorithms alongside predictive and mining ones. The Warehouse Miner is able to perform analysis directly in the database without undergoing a data movement operation, which ease the process of data preparation. 


SAP HANA, which may be SAP’s most important technology initiative ever, will now support almost all (if not actually all) of SAP’s analytics initiatives, and its advanced analytics portfolio is not the exception.

Within HANA, SAP originally launched SAP HANA Advanced Analytics, in which a number of functions for performing mining and prediction take place. Under this set of solutions it is possible to find a set of specific algorithms for performing machine learning operations.

Additionally, SAP has expand its reach into predictive analysis and machine learning via the SAP InfiniteInsight predictive analytics and mining suite, a product developed by KXEN, which SAP recently acquired.

Revolution Analytics

As mentioned previously, the open source R language is becoming one of the most important resources for statistics and mining available in the market. Revolution Analytics, a company founded in 2007, has been able to foster the work done by the huge R community and at the same time develop a commercial offering for exploiting R benefits, giving R more power and performance resources via technology that enables the use of R for enterprise data intensive applications.

Revolution R Enterprise is Revolution Analytics’ main offering and contains the wide range of libraries provided by R enriched with major technology improvements for enabling the construction of enterprise-ready analytics applications. The application is available for download both as workstation and server versions as well as on demand via the AWS Marketplace.

The new breed of advanced analytics

The advent and hype of Big Data has also become a sweet spot for innovation in many areas of the data management spectrum, especially in the area of providing analytics for large volumes of complex data.

A new wave of fresh and innovative software providers is emerging with solutions that enable businesses to perform advanced analytics over Big Data and using machine learning as a key component or enabler for this analysis.

A couple of interesting aspects of these solutions:

  1. Their unique approach to providing specific solutions to complex problems, especially adapted for business environments, combining flexibility and ease of use to make it possible for business users with a certain degree of statistical and mathematical preparation to address complex problems in the business.
  2. Many have them have already, at least partially, configured and prepared specific solutions for common business problems within line-of-business and industries via templates or predefined models, easing the preparation, development, and deployment process.

Here is a sampling of some of these vendors and their solutions:


Being that Skytree’s tagline is “The Machine Learning Company,� it’s pretty obvious that the company has machine learning in its veins. Skytree has entered the Big Data Analytics space with a machine learning platform for performing mining, prediction, and recommendations with, according to Skytree, an enterprise-grade machine learning offering.

Skytree Server is its main offering. A Hadoop-ready machine learning platform with high-performance analytics capabilities, it can also connect to diverse data streams and can compute real-time queries, enabling high-performance analytics services for churn prediction, fraud detection, and lead scoring, among others.

Skytree also offers a series of plug-ins to be connected to the Skytree Server Foundation to improve Skytree’s existing capabilities with specific and more advanced machine learning models and techniques.


If you Google BigML, you will find that “BigML is Machine Learning for everyone.�

The company, founded in 2011 in Corvallis, Oregon, offers a cloud-based large-scale machine learning platform centered on business usability and at highly competitive costs by providing advanced analytics via a subscription-based offering.

The application enables users to prepare complete analytics solutions for a wide range of analysis scenarios, from collecting the data and designing the model to creating special analytics ensembles.

Since it is a cloud-based platform, users can start using BigML services via a number of subscription-based and/or dedicated options. An attractive approach for those organizations trying to make the best of advanced analytics with less use of technical and monetary resources.

Yottamine Analytics

Founded in 2009 by Dr. David Huang, Yottamine has taken Dr. Huang contributions to the theory of machine learning to practice and reflected it within the Yottamine Predictive Service (YPS).

YPS is an on-demand advanced analytics solution based on the use of web services, which allows users to build, deploy, and develop advanced big data analytics solutions.

As an on-demand solution it offers a series of subscription models based on clusters and nodes, with payment based on the usage of the service in terms of node hours—a pretty interesting quota approach. 

Machine learning is pervasive

Of course, this is just a sample of the many advanced analytics offerings that exist. Others are emerging. They use machine learning techniques to different degrees and for many different purposes, specific or general. New companies such as BuildingIQ, emcien, BayNote,  Recommind, and others are taking advantage of the use of machine learning to provide unique offerings in a wide range of industry and business sectors.

So what?

One of the interesting effects of companies dealing with increasing volumes of data and, of course, increasing problems to solve is that techniques such as Machine Learning and other Artificial Intelligence and Cognitive Computing methods are gaining terrain in the business world.

Companies and information workers are being forced to learn about these new disciplines and use them to find ways to improve analysis accuracy, the ability to react and decide, and prediction, encouraging the rise of what some call the data science discipline.

Many of the obscure tools for advanced analytics traditionally used in the science lab or at pure research centers are now surprisingly popular within many business organizations—not just within their research and development departments, but within all their lines of business.

But on the other hand, new software is increasingly able not only to help in the decision-making process, but also to be proactive in reproducing and automatically improving complex analysis models, recommendations, complex scenario analysis to enable early detection and prediction and, potentially, data-based decisions. 

Whether measuring social media campaign effectiveness, effectively predicting sales, detecting fraud, or performing churn analysis, these tools are remaking the way data analysis is done within many organizations.

But this might be just the beginning of a major revolution in the way software serves and interacts with humans. An increasing number of Artificial Intelligence disciplines, of which machine learning is a part, are rapidly evolving and reaching mainstream spaces in the business software world in the form of next-generation cognitive computing systems.

Offerings such as Watson from IBM might be the instigators of a new breed of solutions that go well beyond what we have so far experienced with regard to computers and the analysis process, So, I dare you to stay tuned for my next installment on Cognitive Systems and walk with me to discovery these new offerings.

Qlik: Newer, Bigger, Better?

Qlik: Newer, Bigger, Better?

Originally published in the TEC Blog

During the second half of last year and what has already passed this year, the Pennsylvania-based software company QlikTech has undergone a number of important adjustments, from its company name to a series of changes allowing the company to remain as a main force driving the evolution of the business intelligence (BI) and analytics scene. Are these innovations enough to enable the in-memory software company to retain its success and acceptance within the BI community?

From QlikTech to Qlik

One big shift in the past few months was with the company’s name, from QlikTech to Qlik; though a mainly cosmetic change, it’s still worth being taken into account, as it will enable the software provider to be more easily identified and better branded, and also to reposition its entire product portfolio stack as well as all the company’s services, resources, and communities.

Having a name that is simple to identify, as the biggest umbrella of a product stack that has been growing over time, is a smart move from business, marketing, and even technical perspectives.

Qlik goes Big… Data

A second recent event within Qlik’s realm is the revelation of their strategy regarding big data, something that Qlik had been quietly working on for some time. During a very interesting call with John Callan, senior director of global product marketing, Callan took us through some of the details of Qlik’s recently revealed strategy to help users make use of the company’s big data initiatives. Two starting statements could not state more clearly the role of Qlik, and many other BI providers, in the big data space:

QlikView as the catalyst for implementing big data

This certainly is true, as many new big data projects find their motivation in the data analysis and discovery phases, and it’s also true that an offering like QlikView can lower some of the technical and knowledge barriers when implementing a big data initiative.

The second statement was:

QliKView relieves the big data bottleneck.

According to Qlik, it grants access to a wider number of users, augmenting the potential use of big data and providing implicit benefits—access to a wider number of data sources while at the same time having access to QlikView’s in-memory computing power.

True to its goal of bringing BI closer to the business user, the approach from Qlik is to enable the use of big data and offer a new connection and integration with technology provided by some of the most important big data players in the market: Cloudera, Hortonworks, MongoDB, Google BigQuery, Teradata, HP Vertica and Attivio.

What makes QlikView so interesting in the context of big data is that, being a long-time provider of an in-memory architecture for data analysis and having a unique data association model, it can not only ensure a reliable architecture for a big data analysis platform, but it can also add speed to the process. Plus, QlikView’s data association model, along with its business user orientation, can provide an ease-of-use component, often hard to accomplish within a big data initiative.

So, while QlikView provides for its users all the necessary connectors from their big data partners, it also makes an effort to maintain simplicity of use when dealing with information coming from other more common sources.

On this same topic, one key aspect of Qlik’s approach to big data is the vendor’s flexibility regarding data sourcing; Qlik provides users with three possibilities for performing data exploration and discovery from big data sources:

  1. Loading the information within Qlik’s in-memory computing engine;
  2. Performing data discovery directly from big data sources; and
  3. A hybrid approach, which includes the possibility to combine both previous models, configuring which data should be in-memory and which should be based on direct discovery.

Having this three-pronged approach could prove to be effective for those organizations in initial phases of big data adoption, especially while undergoing initial tests, or those that already require big data services with a certain degree of functionality, but it would be interesting to see if it brings about some difficulties for users and organizations regarding finding the appropriate schema or identifying when and where to apply which approach for better performance and effectiveness.

New “Natural” Analytics

Recently, a blog written by Donald Farmer, VP of product management at Qlik, established what Qlik has been up to for some time now: working towards bringing a new generation of analytics to the market. In this sense, two things seem to be particularly interesting.

First, Qlik’s continuous work towards changing and evolving analytics from its traditional role and state and delivering new ways analysis can be performed, thus improving associations and correlations to provide extensive context. As Farmer states:

Consider how we understand customers and clients. What patterns do we see? What do they buy? How are they connected or categorized? Similarly every metric tracking our execution makes a basic comparison.

These artifacts may be carefully prepared and designed to focus on what analysts used to call an "information sweet spot"—the most valuable data for the enterprise, validated, clarified, and presented without ambiguity to support a specific range of decisions.

Second, regarding providing users with the necessary abilities to, beyond predict, actually anticipate and help discover:

It's not enough to know our quarter's sales numbers. We must compare them to the past, to our plans, and to competitors. Knowing these things, we ask what everyone in business wants to know: What can we anticipate? What does the future hold?

Particularly interesting is how Farmer addresses a core aspect of the decision-making process, which is to anticipate, especially in our modern business world which operates increasingly in real-time, ways to get away from traditional operations in a linear sequence with long time latencies.

Of course, little can be said here about Qlik’s future vision, but we can get a glimpse—Qlik has built a prototype showing its new natural analytics approach and much more in QlikView > next, Qlik’s own vision of the future of BI.

This is a vision in which analysis is carried following five basic themes, to accomplish, according to Qlik, two main objectives: 1) an understanding of what people need, and 2) an understanding of who those people are.

These five themes are:

  • Gorgeous and genius—a user interface that is intuitive and natural to use, but aiming to be productive, improving the user’s visual and analysis experience.
  • Mobility and agility—having access to the Qlik business discovery platform from any device and with seamless user experience.
  • Compulsive collaboration—providing users with more than one way to collaborate, analyze, and solve problems as a team by providing what Qlik call a “new point of access”.
  • The premier platform—Qlik’s vision for providing users with improved ways to provide new apps quickly and easily.
  • Enabling the new enterprise—Qlik aims to provide IT infrastructure with the necessary resources to offer true self-service for their users while easing the process of scaling and reconfiguring QlikView’s infrastructures to adapt to new requirements.

Qlik, Serving Modern BI with a Look Into the Future

It would be hard not to consider Qlik from its inception, as an in-memory computing pioneer in the business space, and Qlik is keeping that pioneer status two decades later, innovative both in backend and forefront design, and able to wear more than one hat in the business intelligence space. An end-to-end platform, from storage to analysis and visualization, Qlik is both adapting to the increasingly fast-paced current evolution of BI and looking into the future to maintain and gain markets in this disputed space.

However, to maintain its place in the industry it will be crucial for Qlik to maintain the pace on the many fronts where QlikView, its flagship product, is front-and-center: business-ready for those small to medium-sized customers, as well as powerful, scalable, and governable for large organizations. These days Qlik is surrounded by other innovation sharks within the BI ocean, so remaining unique, original, and predominant will prove to be increasingly difficult for Qlik and the rest of the players in the space. As with nature, let those more capable of fulfilling their customer’s need survive and prosper.

It comes as no surprise that Qlik is already looking forward to anticipating the next step in the evolution of BI and analytics. Qlik has a brand that stands for innovation, and certainly, Qlik is working to make QliView newer, better, and bigger. It will be really interesting to see how the company’s innovative vision will play out, and if it will gain the same or more traction as Qlik’s previous innovations in the market.

Have a comment on Qlik or the BI space in general? Let me know by dropping a line or two below. I’ll respond the soonest I can.
The BBBT Sessions: HortonWorks, Big Data and the Data Lake

The BBBT Sessions: HortonWorks, Big Data and the Data Lake

Some of the perks of being an analyst are the opportunities to meet with vendors and hear about their offerings, their insight on the industry and best of all, to be part of great discussions and learn from those that are the players in the industry.

For some time now, I have had the privilege of being a member of the Boulder BI Brain Trust (BBBT), an amazing group consisting of Business Intelligence and Data Management analysts, consultants and practitioners covering various specific and general topics in the area. Almost every week, the BBBT engages a software provider to give us a briefing of their software solution. Aside from being a great occasion to learn about a solution, the session is also a tremendous source for discussion. 

I will be commenting on these sessions here (in no particular order), providing information about the vendor presenting, giving my personal view, and highlighting any other discussion that might arise during the session.

I would like to start with Hortonworks, one of the key players in the Big Data space, and a company that has a strong influence on how Big Data is evolving in the IT industry.

The session

In a session conducted by David McJannet and Jim Walker, Hortonworks’ Marketing VP and Director of Product Marketing respectively, BBBT members had the chance to learn in more detail about Hortonworks’ offerings, strategy, and services aimed at bringing Hadoop to the enterprise, as well as to discuss Big Data and its insertion into the enterprise data management infrastructure especially in relation to data warehousing, analytics, and governance. Here are some of the highlights of the session…

About Hortonworks 

Hortonworks is a recently emerged company, but with a lot of experience in the Big Data space. Founded in 2011, it was formed by the original Hadoop development and operations team from Yahoo! Why is this so relevant? Well, because Hortonworks lives and breathes Hadoop, and the company makes a living by building its data solutions on top of Hadoop and many of its derivative projects. And Hadoop is arguably the most important open source software project of all time, or maybe just after Linux.

Hadoop is described on its Web page as follows:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers […].

Hortonworks focuses on driving innovation exclusively via the Apache Software Foundation, producing open source–based software that enables organizations to deal with their Big Data initiatives by delivering Apache Hadoop solutions ready for enterprise consumption. Hortonworks’ mission, as stated in Hortonworks’ presentation title:

Our mission is is to enable your Modern Data Architecture by delivering Enterprise Apache Hadoop.

Hortonworks’ commitment to Hadoop

One of the interesting aspects of Hortonworks is its commitment to Hadoop, in many regards, from the way it handles Hadoop offerings for corporate consumption, to the amount of effort Hortonworks’ team devotes to evolving and enhancing Hadoop’s capabilities. To this point, Hortonworks shared the following graph, in which it’s possible to see the level of contribution of Hortonworks to the famed Apache project in 2013.

Figure 1. List of contributors for Hadoop and number of lines contributed (Source:

In the same vein, the contribution of the Hortonworks team to Hadoop extends across its multiple sub-subprojects—HBase (Hadoop’s distributed data store), Pig (Hadoop’s large data set analysis language), and Hive (Hadoop’s data warehouse infrastructure), among others (Figure 2)—making Hortonworks a hub with some of the most important experts in Apache Hadoop and a strong commitment to its open source nature.

Figure 2. List of contributors to Hadoop and number of lines contributed (Courtesy of: Hortonworks)

Hortonworks’ approach to the business market is quite interesting. While maintaining its commitment to both Hadoop and open source ecosystems, Hortonworks has also been able to:

  1. Package corporate-ready solutions, and
  2. Ensuring strong partnerships with some important software companies such as Microsoft, Teradata, SAP, HP, RackSpace, and, most recently, Red Hat, extending Hortonworks’ reach and influence in the Big Data space and especially into corporate markets.

So what does Hortonworks offer?

Hortonworks clearly says it: They do Hadoop. And what this means is that Hortonworks flagship product—the Hortonworks Data Platform (HDP2)—is an enterprise solution based 100% on the open source Apache Hadoop platform. HDP2’s architecture uses the core set of Hadoop’s modules architected and certified for enterprise use, then includes fully tested and certified versions of Hadoop modules as well as a complete set of professional services provided by Hortonworks for its customers.

Another offering from the company includes Hortonworks sandbox, a Hadoop environment that includes interactive tutorials and the most recent Hadoop developments for learning and testing.

How does Hortonworks fit into an organization?

One of the main concerns of many organizations trying to embrace Big Data is how their Big Data initiative will fit within their existing data management infrastructure. More importantly, the organization needs to evolve its traditional data management infrastructure (Figure 3) so that Big Data adoption doesn’t generate more problems than solutions. Hortonworks is by no means  the only software provider; vendors such as Cloudera and MapR also embrace Hadoop to solve an organization’s Big Data issues, but with a different approach.

Figure 3. A traditional data management approach (Courtesy of: Hortonworks)

Wayne Eckerson explains in The Battle for the Future of Hadoop:

Last November, Cloudera finally exposed its true sentiments by introducing the Enterprise Data Hub in which Hadoop replaces the data warehouse, among other things, as the center of an organization's data management strategy. In contrast, Hortonworks takes a hybrid approach, partnering with leading commercial data management and analytics vendors to create a data environment that blends the best of Hadoop and commercial software.

During the session, aside from the heated debates about whether or not to replace the data warehouse with new information hubs, both David McJannet and Kim Walker confirmed Hortonworks’ position, which consists of enabling companies to expand their existing data infrastructures (in contrast to Cloudera’s approach)—let companies to evolve, without replacing their data management platforms (Figure 4).

Figure 4. Hortonworks expands an organization’s traditional data management capabilities for addressing Big Data (Courtesy of: Hortonworks)

The appealing part of Hortoworks schema is that its Hadoop offerings act as an expansion of the rest of the data repository spectrum (relational databases, data warehouses, data marts, and so on). This makes sense in the context of coupling new data management strategies with existing ones; while Hadoop has proven to be effective for certain tasks and type of data, some problems still need to be handled with the use of “traditional” methods and existing tools. According to Mark Madsen ( What Hadoop Is. What Hadoop Isn’t.):

What it doesn’t resolve is aspects of a database catalog, strong schema support, robust SQL, interactive response times or reasonable levels of interactive concurrency—all things needed in a data warehouse environment that delivers traditional BI functions. In this type of workload, Hadoop doesn’t come close to what a parallel analytic database can achieve, including scaling this workload into the Petabyte range.

Yet Hadoop offers features the database can’t: extremely low cost storage and retrieval, albeit through a limited SQL interface; easy compatibility with parallel programming models; extreme scalability for storing and retrieving data, provided it isn’t for interactive, concurrent, complex query use; flexible concepts of schema (as in, there is no schema other than what you impose after the fact); processing over the stored data without the limitations of SQL, without any limitations other than the use of the MapReduce model; compatibility with public or private cloud infrastructures; and free, or support-only, so a price point far below that of databases.

Hortonworks’ approach is then to enable expansion and evolution of the existing data management platform by offering an enterprise-ready version of Hadoop, one that can be nicely integrated and fill those gaps between the data warehouse and the analysis of huge amounts of non-structured (polystructured) information.

What is Hortonworks for, anyway?

Despite the hype and eagerness about Big Data, many people still don’t have a clear idea about the context and use cases where a Hadoop approach can be useful. Hortonworks showed us a good list of examples of how some of their customers are using Hortonworks. Their current deployments run mainly within the financial services, telecom, retail, and manufacturing industries and expand for applications such as fraud prevention, trading risk, call detail records, and infrastructure investment as well as for assembly-line quality assurance and many other potential uses.

How Hortonworks addresses its customers Big Data needs is by demonstrated by how a customer typically embraces Hadoop in the context of working with increasing volumes of information.

The graph below shows a diagram correlating data (volume) and the value that it can bring to the organization by enhancing an organization’s capability to derive insight.

Figure 5. Described as a “Common journey to the data lake,” Hortonworks shows the relation between data volume and its potential value in the context of addressing specific problems (Courtesy of: Hortonworks)

Another interesting thing about this is the notion of the data lake. Pentaho CTO James Dixon, who’s credited with coining the term, describes it in the following simple terms:

If you think of a datamart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Hortonworks uses Hadoop as the platform to provide a solution for the two main issues that this implies:

  1. A new approach to analytics by enabling the expansion from a single query engine and a deterministic list of questions to a schema-on-read basis, enabling information analysis that addresses polystructured as well as real-time and batch data.
  2. A  means for data warehouse optimization, expanding the boundaries of strict data schemas.

The Hortonworks Data Platform uses the full open source Hadoop platform. It provides an enterprise-ready basis for handling Big Data within an organization, and aims to fit and optimize—not disrupt—the existing data platform (Figure 6). Some of the challenges of Hadoop deployments have been to cope with the often unfriendly environment and technical lack of expertise to handle Hadoop projects properly, especially for hybrid and complex environments mixing and interconnecting both traditional and Hadoop deployments. 

The recent addition of YARN—Hadoop’s recent resource, job, and application manager—within Hadoop 2.0 and its inclusion in Hortonworks’ HDP2-enabled Hortonworks to provide more robust processing platform, which now can work and manage process loads aside from MapReduce, expanding HDP capabilities to managing both MapReduce and external applications and resources more efficiently. The Hortonworks website has a good summary of the use of YARN within HDP

Figure 6. Hortonworks Data Platform General Architecture (Courtesy of: Hortonworks)

Open source software, especially projects based on Hadoop and big data, traditionally has a Linux orientation, so it’s worth mentioning that HDP2 platform is available on both Linux and Windows operating systems.

Hortonworks Data Platform, enterprise tested

During the session, one thing David McJannet and Jim Walker emphasized was Hortonworks’ testing and quality and assurance model, which includes testing HDP directly within Yahoo’s data environment, providing Hortonworks with a vast and ideal testing platform with complex and data-flooded scenarios—a good testing scenario for any data application.

To conclude

I have no doubt that the new breed of solutions such as Hortonworks and others offer impressive and innovative approaches to the analysis and management of complex and big data problems. Clearly, frameworks such as the data warehouse need to adapt to these new conditions or die (I tend to believe they will not die).

Instead, it seems that data warehouse methodologies and platforms potentially have the necessary elements—such as enterprise readiness, methodology, and stability—to evolve and include these new computing paradigms, or at least live within these new ecosystems.

So some of the challenges of deploying Big Data solutions, aside from the natural technological issues, could come from how these new concepts fit within existing infrastructures. They need to avoid task duplication, actually streamline processes and data handling, and fit within complex IT and data governance initiatives, ultimately to procure better results and return of investment for an organization.

Hortonworks takes an approach that should appeal to many organizations by fitting within their current infrastructures and enabling a smooth yet radical evolution of their existing data management platforms, whether via its HDP2 platform or delivered via Hortonworks’ strategic partners. It will be interesting to see what their big competitors have to offer.

But don’t take my word for granted. You can replay the session with Hortonworks—just go to the BBBT web page and subscribe.

Have comments? Feel free to drop me a line and I’ll respond as soon as possible.

BI on the Go: About Functionality and Level of Satisfaction

BI on the Go: About Functionality and Level of Satisfaction

Originally published on the TEC Blog

TEC recently published its 2014 Mobile BI Buyers Guide and a related blog post in which some results from a survey on mobile business intelligence (BI) usage, needs, and trends were discussed. We thought it would be useful to take another look at what was revealed from the survey regarding what’s important for mobile BI users, and of course, how satisfied they are with the mobile BI solutions they work with. Let’s take a look at some of our findings in this regard. Here we will discuss two additional criteria and how they affect mobile BI practices and decision-making: functionality and level of satisfaction.

General Functionality: What Tops the List?

One of the questions we asked mobile BI users in the survey had to do with the functionality they find the most important when using the services of their mobile BI application. From the list we provided, including ad hoc querying, alerting, collaboration, data analysis and discovery, and dashboarding (Figure 1), users were clear that both dashboarding and data analysis/discovery are an essential part of their day-to-day lives with a mobile BI application. It is clear reporting in mobile media is slowly decreasing and leaving space for more data discovery functions.

On the other hand, two things were surprising for me; first, the level of importance that users gave to the alerting functionality above collaboration abilities. Despite the buzz around the importance of collaboration embedded within all types of enterprise software, the ability of a mobile BI application to alert users quickly about any given emergency or contingency is vital for users, especially these days, when acting in real-time is becoming increasingly important for many organizations.

Second, I was surprised that collaboration was positioned in fifth place, while the top places went to more common BI functionality features such as dashboarding, data analysis, and reporting and alerting. It seems that although collaboration is important, users have their priorities clear, and they first and foremost want analysis capabilities and other key tasks in a BI application.

Figure 1. Top functionality (Source: TEC Mobile BI Survey 2014)

Mobile BI Satisfaction Levels: Still Not There Yet?

Another question we asked in the survey refers to how satisfied users are with their mobile BI applications. As Figure 2 shows, despite not showing high levels of dissatisfaction, the survey did indicate a lot of respondents are only “somewhat satisfied,” revealing there’s still a high number of users who are not totally impressed with what a mobile BI solution can do for them. Why is this?

Many things can play into these results, from limitations of mobile BI applications to misconceptions about what a mobile BI application should or should not be able to do, but it seems that in this technological world we live in, mobile is a synonym of innovation and user experience, so users in general are giving broad attention not just to the efficiency of mobile BI applications, but increasingly to the degree of innovation of mobile apps.

Figure 2. General satisfaction level (Source: TEC Mobile BI Survey 2014)

According to an article in Enterprise Apps Today, big business intelligence vendors are not quite satisfying users. The article mentions a Gartner study looking at mobile BI based on its ability to

provide devices with reports and other dashboard content.” The study revealed that mobile BI usage “showed the highest growth among all business intelligence features considered, with 6.2 percent of respondents stating they make extensive use of mobile functionality

And, according to the article,

small independent vendors continue to lead the way on mobile business intelligence. However, mega-vendors and large independents are beginning to gain some ground. That said, they still have a good amount of ground to cover based on the number of them being graded below average.”

While I have also noted that mobile BI is experiencing a level of popularity over other BI features in recent times, our survey gave us a slightly different view, in which customer satisfaction is located mostly in the middle, with the majority of users being very or somewhat satisfied, indicating perhaps that efficiency still makes up a huge portion of what matters for BI users. Of course, many users are waiting for more than that, hoping for the real wow factor that gives them that benefit that the mobile experience might be providing them with through their non-commercial mobile applications, mobile social platforms, and even maybe in other mobile business applications.

Along the same lines, and to make things a bit more interesting, let’s mix these two results together and see what happens (Figure 3).

Figure 3. Satisfaction level vs functionality (Source: TEC Mobile BI Survey 2014)

When looking at top functionality and customer satisfaction together, it is interesting to note several things:

  1. Across the board, dashboarding remains as one of the most important features for performing business intelligence with mobile devices. Across this sample of mobile BI users, in the “not very satisfied” category of users dashboarding seems to be quite popular, perhaps signaling what we mentioned before: users are waiting to see more enriching experiences within their mobile BI applications.
  2. For those users that are “completely satisfied” with their current mobile BI solution, alerting plays an important role within their mobile BI criteria, an essential feature for enabling early issue, risk, or opportunity detection. It is possible that for these organizations having an effective way to receive alerts is key to ensuring successful operation and planning.
  3. It seems users are increasingly expecting to have more features for performing data analysis and discovery; this fact is somehow surprising as I know many business intelligence providers are making big efforts to improve their functionalities in this area.

So, it seems users recognize the importance of three main functional features (dashboarding, data discovery, and alerting) for a reliable mobile BI solution but, still, they have expectations of further evolution of mobile BI functionality in the future.

Functionality and Organization Size: How do They Relate?

In a final exercise, we segmented our respondents according to the size of their organization and their most relevant functional features (Figure 4) and noted some clear differences among different sized organizations.

Figure 4. Functionality vs company size (Source: TEC Mobile BI Survey 2014)

As the graph shows, for very small organizations, the functional interest is distributed relatively evenly throughout all six main functional features, with data analysis and discovery ranking as the most important feature. On the other hand, for corporations it is clear that dashboarding and data analysis/discovery, as well as alerting, all play a major role. This seems to be a good indication that within large corporations efficiency and strong response is extremely important for mobile BI users on staff. On the other hand, for those organizations sitting in the middle (from 250 to 1000 employees), dashboarding is clearly the most important feature, which seems to make sense, as many of these organizations might have a certain level of BI maturity where dashboarding remains key for the decision-making process.   

It is also relevant to notice that while collaboration features, which I personally expected to rank higher, did not display a high level of importance in our survey results, showing that while collaboration is a basic feature for mobile BI applications, other important features in mobile BI are a higher priority to end users.

Where Will Mobile BI Go From Here? 

In this final part of our mobile BI mini-series (in the first part we explored who is using mobile BI offerings and which vendors they are selecting) we have found that despite being an important change agent in the business intelligence space, the mobile BI arena still has a lot of potential and a lot of ground to break.

As organizations on one side (and mobile BI products on the other) mature and grow, the adoption and evolution of mobile BI applications will enable both end-users and vendors to incorporate key functionalities into mobile BI solutions, for example, reinforcing collaboration, making mobile BI customization and configuration more flexible and accessible, and enabling mobile BI to continue changing the way traditional users consume and produce business intelligence and analytics solutions.

But what do you think? Tell us your experience with mobile BI. Drop me a line below and I’ll respond as soon as I can.

Further Reading

BI on the Go . . . So, Who’s Using Mobile BI? (February 2014)
TEC 2014 Mobile BI Buyer's Guide (January 2014)
BI on the Go Infographic (January 2014)
VIDEO: Mobile Business Intelligence in the Enterprise (November 2013)
This Week in the DoT, 03/14/2014

This Week in the DoT, 03/14/2014

As my father use to say, better late than never so, here is a list of things you might want to check including news, humor and more…


In the news:

To read:

To watch:

The Internet of Things: Dr. John Barrett at TEDxCIT

Kinoma Create — The JavaScript-Powered Internet of Things Construction Kit

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Finally, to end your week with a smile:

- Agile Methodology - Applied to Other Fields...
- Big Data Analysis... in the Cloud

Bon weekend!

This Week in the DoT, 03/07

This Week in the DoT, 03/07

Another week, another month, and the year goes by...

Before heading to you local… place of weekend rest, here’s a list of things I've crossed by this week that you might want to check out.

 Have a tremendous weekend!

In the news:

To read:

To watch:

Big Data and the Rise of Augmented Intelligence: Sean Gourley at TEDxAuckland

Teradata and Big Data - from the CTO's Point of View - Stephen Brobst

Influencers on Twitter you certainly need to follow:

  • Carla Gentry (@data_nerd)
  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Mark Smith (@marksmithvr)
  • Merv Adrian  (@merv)
  • Mike Ferguson (@mikeferguson1)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

Some Humor:

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry.

In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective.

Machine learning in the context of BI and Big Data analytics

Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data mining or predictive analysis—but also with complex problems in which the algorithms used need to adapt to frequent changing conditions. This is the case for successful applications of machine learning techniques in software applications such as those for spam detection, or those from Amazon to automate employee access control or Cornell for protecting animals.

But the incorporation of machine learning techniques within enterprise software is rapidly expanding to many other areas of business, especially those related to business intelligence or analytics, or in general as part of the decision support framework of an organization. As I mentioned in Part 1, as information collection increases in volume, velocity, and variety (the three Vs of big data) and as business pressures to expedite and decrease the latency of the analysis grow, new and existing business software solutions are incorporating improved ways to analyze these large data sets, taking new approaches to perform effective analysis over large and complex amounts of data sets, but most importantly, furthering the reach of what analytics and BI solutions can do.

As data sources become increasingly complex, so do the means of analyzing them, and the maturity model of the analytics BI platform is forced to accommodate the process and expand to the next level of evolution—and sometimes even revolution—of the decision-making process. So the role of a BI and analytics framework is changing from being solely a decision support companion to a framework that can trigger decision automation. To show this, I have taken the following standard BI maturity model from TEC’s BI Maturity and Software Selection Perspectives report (Figure 1) to show in a simple form some of the pressures that this complexity has on the maturity process. As a consequence, the process is expanded to a double-phase decision-making process, which implies giving the system an increased role in the decision.

Figure 1. Standard BI maturity model is being expanded by complexity of data and processes

The decision phase can happen in two ways: as a supported decision made by users, or by enabling the system to delegate the ability to make a decision to itself, automating the decision-making process based on a previous analysis and letting the system learn and adapt. By delegating the decision to the system, for the process extends the reach of analytics to prediction analysis, early warning messaging, and data discovery.

At this stage we might find more permutations of analytics platforms and frameworks that combine both assisted and automated decisions, ideally increasing the effectiveness of the process and streamlining it (Figure 2).

Figure 2. Standard BI maturity model expands to be able to automate decisions

In this context, due to new requirements coming from different directions, especially from Big Data sources in which systems deal with greater and more complex sets of data, BI and analytics platforms become, most of the time, hubs containing dynamic information that changes in volume, structure, and value over time.

In many cases decisions are still made by humans, but with software assistance to different degrees. In some more advanced cases, decisions are made by the system with no human intervention, triggering the evolution of analytics systems, especially in areas such as decision management, and closing the gap between analytics and operations, which can mean boosting tighter relations between the operations, management, and strategy of an organization.

Opportunities and challenges

The opportunities for implementing machine learning within the context of Big Data, and especially Big Data analytics, are enormous. From the point of view of decision support, it can enhance the complete decision management cycle by

  1. Enhancing existing business analytics capabilities such as mining and predictive which enable organizations to address more complex problems and enhance precision of the analysis process.
  2. Enhancing the level of support for decisions by providing increased system abilities for performing adaptable data discovery features such as detecting patterns, enabling more advanced search capabilities, reinforcing knowledge discovery by identifying correlations, and many other things, much along the same line of what data mining and predictive analytics can do.
  3. Boosting the incorporation of early detection capabilities within traditional or new BI and analytics systems, a key component of modern organizations that want to anticipate or detect short-term trends that might have great impact on an organization.
  4. Enabling the process of enabling a system to perform autonomous decisions, at least at early stages, to optimize the decision process in cases where the application can decide by itself.

Many organizations that already use machine learning can be considered to be exploiting the first level of this list—improving and enabling the analysis of large volumes of complex data. A smaller number of organizations can be considered to be transitioning to the subsequent levels of Big Data analysis using machine learning.

At this point in time, much of the case for the application of machine learning is based on reinforcing the first point of the list. But aside from its intrinsic relevance, it is, in my view, in the area of early detection and automation of decisions where machine learning has a great deal of potential to help boost BI and analytics to the next level. Of course this will occur most probably alongside other new information technologies in artificial intelligence and other fields.

Many organizations that already have robust analytics infrastructures need to take steps to incorporate them within their existing BI and analytics platforms, for example, building machine learning into their strategies. But organizations that wish to leverage machine learning potential may encounter some challenges:

  1. The complexity of applying machine learning requires a great deal of expertise. This in turn leads to the challenge of gaining the expertise to interpret the right patterns for the right causes.
  2. There may be a shortage of people who can take care of a proper deployment. Intrinsically, the challenge is to find the best people in this discipline.
  3. As an emerging technology, for some organizations it still is a challenge to measure the value of applying these types of advance analytics disciplines, especially if they don’t have sufficiently mature BI and Big Data analytics platforms.
  4. Vendors need to make these technologies increasingly suitable for the business world, easing both deployment and development processes.

Despite these challenges, there is little doubt that over time an increasing number of organizations will continue to implement machine learning techniques, all in order to enhance their analytics potential and consequently mature their analytics offerings.

Some real-life use cases

As we mentioned earlier, there are a number of cases where machine learning is being used to boost an organization’s ability to satisfy analytics needs, especially for analytics applied to Big Data platforms. Following are a couple of examples of what some organizations are doing with machine learning applied to Big Data analytics, which surprisingly are tied to solving not complex scientific projects but more business-oriented ones. These cases were taken from existing machine learning and Big Data analytics vendors, which we will describe in more detail in the next post of this series:

Improving and optimizing energy consumption

  • NV Energy, the electricity utility in northern Nevada, is now using software from Big Data analytics company BuildingIQ for an energy-efficient pilot project using machine learning at their headquarters building in Las Vegas. The 270,000-square-foot building uses BuildingIQ to reduce energy consumption by using large sets of data such as weather forecasts, energy costs and tariffs, and other datasets within proprietary algorithms to continuously improve energy consumption for the building

Optimizing revenue for online advertising

  • Adconion Media Group, an important Media Company with international reach, uses software from machine learning and Big Data analytics provider Skytree for ad arbitrage, improving predictions for finding the best match between buyers and sellers of web advertising.

Finding the right partner

  • eHarmony, the well-known matchmaking site uses advanced analytics provided by Skytree to find the best possible matches for prospective relationship seekers. Skytree machine learning finds the best possible matching scenarios for each customer, using profile data and website behavior along with specific algorithms.

This is just a small sample of real use cases of machine learning in the context of Big Data analytics. There is new but fertile ground for machine learning to take root in and grow.

So what?

Well, in the context of analytics, and specifically Big Data analytics, the application of machine learning has a lot of potential for boosting the use of analytics to higher levels, and extend its use alongside other disciplines, such as artificial intelligence and cognition. But the applications need to be approached within the context of machine learning as enabler and enhancer, and must be integrated within an organizational analytics strategy.

As with other disciplines, the success of the implementation of machine learning and its evolution to higher stages needs to be ensured by an organization’s extensive adaptability to business needs, operations, and processes.

One of the most interesting trends in analytics is its increasing pervasiveness and tighter relation with all levels of an organization. As the adoption of new features increases the power of analytics, it also closes the gap of two traditionally separated worlds within the IT space, the transactional and the non-transactional, enabling analytics to be consumed and used in ways that just a decade ago were unimaginable. The line between business operations and analysis is blurrier than ever, and disappearing. The new IT space will live within these colliding worlds with analytics being performing at each level of an organization, from operations to strategy.

In upcoming posts in this series, we will address the machine learning market landscape and look at some vendors that currently use machine learning to perform Big Data analytics. And we will go a step further, into the space of cognitive systems.

In the meantime, please feel free to drop me a line with your comment. I’ll respond as soon as I can.

This Week in the DoT, 02/28

This Week in the DoT, 02/28

Yeap, Thank God is Friday.

And before you go home and hopefully have a relaxing weekend, here is a list of some interesting things that happened in the Data of Things during this week: news, events, tweets and a bit of humor.

These are some relevant things you might want to check...

Snow Boarding in Fernie, Canada by Chris Barton

In the news:

With interesting readings:

Interesting to watch:

Live from Strata 2014: In-Hadoop analytics and IBM's Watson

What Does Collaboration Among Humans and Machines Actually Look Like? Structure:Data 2013

Influencers on Twitter you certainly need to follow:

  • Cindi Howson (@BIScorecard)
  • Claudia Imhoff (@Claudia_Imhoff)
  • Colin White (@ColinWhite) 
  • Curt Monash (@curtmonash)
  • Howard Dresner (@howarddresner)
  • Jim Harris (@ocdqblog)
  • Josep di Paloantonio (@JAdP)
  • Julie Hunt (@juliehunt)
  • Karen Lopez (@datachick)
  • Marcus Borba (@marcusborba)
  • Merv Adrian  (@merv)
  • Neil Raden (@NeilRaden)
  • Richard Hackathorn (@hackathorn)

And finally, to end your week with a smile:


This Week in the DoT (Data of Things)

This Week in the DoT (Data of Things)

Every Friday, starting today, I will try to post some of what in my view were the relevant events during the week for the Data of Things, including news, videos, etc.

For today, I have a short list of influencers on Twitter — in no particular order — that you might want to follow for all data-related topics. I’m sure you will enjoy their tweets as much as I do:

Claudia Imhoff       @Claudia_Imhoff

Merv Adrian           @merv
Neil Raden             @NeilRaden
Marcus Borba         @marcusborba
Howard Dresner     @howarddresner
Curt Monash           @curtmonash
Cindi Howson         @BIScorecard
Jim Harris              @ocdqblog
Julie Hunt              @juliebhunt

Of course, the list will grow in time. For now, enjoy following this group of great data experts.

Bon weekend!

BI on the Go . . . So, Who’s Using Mobile BI?

BI on the Go . . . So, Who’s Using Mobile BI?

Piggybacking off of the success of the most recent TEC Buyer’s Guide, the 2014 Buyer's Guide on Mobile BI applications, we took the opportunity to survey users of mobile business intelligence (BI) applications and collect their impressions in regard to these tools. Most of the results of this survey with more than 250 respondents were captured in an Infographic. Additional information garnered from the survey, while not conclusive, may provide a glance into the sentiment of the respondents on their use of mobile BI applications. In this post, I’ll describe some of those results, which may be useful for those organizations evaluating a new mobile BI solution for their business needs.

Most popular mobile BI apps

The top 10 mobile BI apps in use are depicted in figure 1. Microsoft takes a clear lead, followed by the other big software powerhouses SAP, Oracle, and IBM. These results are in line with what we would expect considering that most of these vendors already have large sets of BI implementations worldwide and that customers tend to choose mobile BI offerings from their existing BI provider.

Figure 1 shows that more than 8 percent of respondents either still have not implemented or are not using a mobile BI application. This is a relatively large segment of potential BI users, especially considering that most of the existing BI software providers now have mobile BI offerings, suggesting that it’s relatively effortless to put them in place. This apparent avoidance of mobile BI offerings within some organizations may stem from the following:

  • Lack of use case for mobile BI apps
  • Technical limitations to implementing a mobile BI app
  • Budget restrictions

Figure 1. Top 10 Mobile Apps Used by Respondents to TEC’s 2014 Mobile BI Survey

Other mobile BI offerings used in the organizations of the survey respondents come from QlikView, MicroStrategy, and Tableau—all great promoters of mobile offerings in the BI space, which are rapidly increasing their footprint in not only mobile BI, but also the mobile space overall. The remaining mobile BI offerings cited in our survey come the long-time and well-known BI player Information Builders; Infor, a traditional player in the enterprise resource planning (ERP) space which has been growing its BI presence; and, last but not least, an experienced BI player from the open-source community, Pentaho, which now has a robust mobile BI solution covering most, if not all, the aspects of an enterprise BI solution.

Who’s using mobile BI and how?

We also wanted to determine who is using mobile BI solutions and which mobile BI applications. When we took our top 10 provider list and segmented it in terms of the size of the company our survey respondents work for, some results became immediately apparent. Microsoft was by far the most widely used mobile BI solution by companies with 1 to 50 employees, while SAP was the most popular solution among companies with 51 to 100 employees (see figure 2). On the other hand, there seems to be healthy competition between the big four (IBM, SAP, Microsoft, and Oracle) in the large enterprise segment, with increased presence of other players such as Information Builders, MicroStrategy, QlikView, and Tableau.

Figure 2. Top 10 Mobile BI Apps Used by Respondents According to their Company Size (TEC’s 2014 Mobile BI Survey)

Figure 2 shows that the most widely used mobile BI offerings are from Microsoft and SAP regardless of respondent’s company size—from small companies with 1 to 50 employees to large enterprises with more than 10,000 employees. These results may reflect the intense efforts both these vendors have undertaken to evolve their enterprise BI solutions with new mobile technologies and capabilities to enable customers to use mobile more seamlessly.

If we look at the type of industry the respondents’ companies belong to, we can see that the top 10 industries in figure 3. The computer, IT, and software industry takes the lead in usage of mobile BI solutions, a field of course that typically spurs trends in technology adoption. Additionally, business services and consulting and manufacturing are in second and third place, respectively, followed by finance and banking in fourth place. All these industries are in my opinion justified in their need for mobile services, as are many of their lines of business. I have to admit that I was surprised to find hotels and restaurants in the top 10 industries using mobile BI offerings, not because there is no use case for mobile BI in that industry, but because there are other industries which according to previous research appear to be more amenable to the adoption of mobile BI solutions. Examples include utilities and service industries.

Figure 3. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

If we dig a little deeper and look at what mobile BI apps are used by the top 10 industries, we see that Microsoft still leads the pack with dominant presence in the hotels and restaurants industry, and in finance and banking. On the other hand, SAP shows a prominent dominance in hotels and restaurants, and in the finance and banking and manufacturing industries.

It is noteworthy to mention that QlikView, among other vendors, has a strong presence in electronics and high-tech. Organizations in these areas typically have great technical expertise, attesting to QlikView’s technical and functional capabilities.

Figure 4. Top 10 Industries Using Mobile BI Apps (TEC’s 2014 Mobile BI Survey)

Additionally, Oracle shows mobile BI presence nearly across the board, from the software industry to electronics and banking. Furthermore, the three powerhouses Oracle, SAP, and Microsoft dominate the mobile BI usage in the insurance and finance and banking industries.


Based on the results of our survey on mobile BI usage, we can see that the four main players—Microsoft, SAP, IBM and Oracle—are well positioned in the mobile BI market, pretty much inheriting success from their high-profile BI solutions.

Other vendors such as Tableau, QlikView, MicroStrategy, and Information Builders are rapidly establishing themselves as major BI providers and making their presence known on the mobile BI stage.

Though the information presented in this post can be considered neither conclusive nor extensive, it can be used as a good starting point or basic point of reference for gauging what mobile BI solutions your peers in companies with a similar size as yours and in the same industry are using. This information may be useful before you embark upon the venture of acquiring a new mobile BI solution or replacing the one you already have.

Stay tuned for a second post on the survey, where I will present the most requested mobile BI functionality and the users’ level of satisfaction with their mobile BI offerings.

Link to original article

Machine Learning and Cognitive Systems, Part 1: A Primer

Machine Learning and Cognitive Systems, Part 1: A Primer

IBM’s recent announcements of three new services based in Watson technology make it clear that there is pressure in the enterprise software space to incorporate new technologies, both in hardware and software, in order to keep pace with modern business. It seems we are approaching another turning point in technology where many concepts that were previously limited to academic research or very narrow industry niches are now being considered for mainstream enterprise software applications.

Image by Penn Sate

Machine learning, along with many other disciplines within the field of artificial intelligence and cognitive systems, is gaining popularity, and it may in the not so distant future have a colossal impact on the software industry. This first part of my series on machine learning explores some basic concepts of the discipline and its potential for transforming the business intelligence and analytics space.

So, what is machine learning anyway?

In simple terms, machine learning is a branch of the larger discipline of Artificial Intelligence, which involves the design and construction of computer applications or systems that are able to learn based on their data inputs and/or outputs. Basically, a machine learning system learns by experience; that is, based on specific training, the system will be able to make generalizations based on its exposition to a number of cases and then be able to perform actions after new or unforeseen events.

The discipline of machine learning also incorporates other data analysis disciplines, ranging from predictive analytics and data mining to pattern recognition. And a variety of specific algorithms are used for this purpose, frequently organized in taxonomies, these algorithms can be used depending on the type of input required (a list of algorithms can be found in Wikipedia based on their type).

As a discipline, machine learning is not new. Initial documents and references can be traced back to the early fifties with the work of Alan Turing, Arthur Samuel, and Tom M. Mitchell. And the field has undergone extensive development since that time.

One of the more important applications of machine learning is to automate the acquisition of knowledge bases used by so-called expert systems, systems that aims to emulate the decision making process of human expertise in a field. But the scope of its application has been growing. In Applications of Machine Learning and Rule Induction, Langley and Simon review some major paradigms for machine learning scenarios, all based on a very important premise:

To improve performance on some task, and the general approach involves finding and exploiting regularities in training data.

The major approaches include using neural networks, case-based learning, genetic algorithms, rule induction, and analytic learning. While in the past they were applied independently, in recent times these paradigms or models are being used in a hybrid fashion, closing the boundaries between them and enabling the development of more effective models. The combination of analytic methods can ensure effective and repeatable and reliable results, a required component for practical usage in mainstream business and industry solutions.

According to A Few Useful Things to Know about Machine Learning, while the discipline by itself is far from simple, it is based on a simple (but not simplistic) principle:



  • representation means the use of a classifier element represented in a formal language that a computer can handle and interpret; 
  • evaluation consists of a function needed to distinguish or evaluate the good and bad classifiers; and
  • optimization represents the method used to search among these classifiers within the language to find the highest scoring ones.

As the paper states:

The fundamental goal of machine learning is to generalize beyond the examples in the training set.

This way the system can infer new decisions or correct answers that then serve to increase learning and optimize accuracy and performance.

Also, each component of the machine learning process comprises a good mix of mathematical techniques, algorithms, and methodologies that can be applied (Figure 1).

Figure 1. The Three components of learning algorithms. Source: A Few Useful Things to Know about Machine Learning

In this context, machine learning can be done by applying specific learning strategies, such as:

  • a Supervised strategy to map the data inputs and model them against desired outputs, and
  • an unsupervised strategy, to map the inputs and model them to find new trends.

Derivative ones that combine these for a semi-supervised approach and others are also be used. This opens the door onto a multitude of applications for which machine learning can be used, in many areas, to describe, prescribe, and discover what is going on within large volumes of diverse data.

The increasing presence of machine learning in business, especially for analytics
Thanks to the success of the application of machine learning within certain disciplines such as speech recognition, computer vision, bio-surveillance, and robot control, the interest in and adoption of machine language technologies has grown, particularly over the last decade. Interesting also is how, in many fields, machine learning is escaping its containment from science labs to reach commercial and business applications.

There are several scenarios where machine learning can have a key role: in those systems that are so complex that algorithms are very hard to design, or when an application requires the software to adapt to an operational environment, or with complex systems that need to work with extensive and complex data sets. In this way, machine learning methods play an increasing role not just in general in the field of computer science, but also in terms of enterprise software applications, especially for those types of applications that need in-depth data analysis and adaptability. These areas include analytics, business intelligence, and Big Data.

Why business intelligence and Big Data?

In 1958, H. P. Luhn wrote what perhaps is the first document on Business Intelligence. The abstract begins as:

An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This paper shows the flexibility of such a system in identifying known information, in finding who needs to know it and in disseminating it efficiently either in abstract form or as a complete document.

The premise of BI systems has remained pretty much the same: to collect an organization’s data from disparate sources and process it in the best possible way to produce useful information to—and perhaps this is the most important part—help decision makers to make the best informed decision for the benefit of an organization. A simple definition, not a simple task.

In this regard, Business Intelligence has been adapting and evolving with greater or lesser degrees of accuracy and failure to provide information workers with the ability to make these decisions, and has played a very important role in the decision support platform of many organizations.

This evolution has changed the role of BI systems not just to provide high-level decision support at a strategic level, but to inform an increasing number of areas involved with middle management and operations. It has also has increased the need for BI systems and initiatives to evolve so that they are able to deal with increasingly complex data analysis problems. Applications need to be boosted so that they can deal with larger and complex amounts of data and can not only provide current status results, but also predict, play with hypothetical scenarios, and finally learn to make accurate suggestions—a green field for machine learning (Figure 2).

Figure 2. Some factors triggering the need for faster, better, and improved ways for decision support, analytics, and BI systems

A good model to understand the evolution of BI systems is D. J. Power’s  history of decision support systems, which of course BI is an important part of. According to Mr. Power, decision support systems and applications have evolved in the following stages:

  1. Model Driven. Emphasizes access to and manipulation of financial, optimization and/or simulation models. Simple quantitative models provide the most elementary level of functionality. Use limited data and parameters provided by decision makers to aid decision makers in analyzing a situation, but in general large data bases are not needed for model-driven. 
  2. Data Driven. In general, a data-driven DSS emphasizes access to and manipulation of a time-series of internal company data and sometimes external and real-time data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality. Data warehouse systems that allow the manipulation of data by computerized tools tailored to a specific task and setting or by more general tools and operators provide additional functionality. Data-Driven DSS with On-line Analytical Processing OLAP provide the highest level of functionality and decision support that is linked to analysis of large collections of historical data.
  3. Communications Driven. Communications-driven DSS use network and communications technologies to facilitate decision-relevant collaboration and communication. In these systems, communication technologies are the dominant architectural component. Tools used include groupware, video conferencing and computer-based bulletin boards.
  4. Document Driven. Uses computer storage and processing technologies to provide document retrieval and analysis. Large document databases may include scanned documents, hypertext documents, images, sounds and video. Examples of documents that might be accessed by a document-driven DSS are policies and procedures, product specifications, catalogs, and corporate historical documents, including minutes of meetings and correspondence. A search engine is a primary decision-aiding tool associated with a document-driven DSS. These systems have also been called text-oriented DSS.
  5. Knowledge Driven. Knowledge-driven DSS can suggest or recommend actions to managers. These DSS are person-computer systems with specialized problem-solving expertise. The "expertise" consists of knowledge about a particular domain, understanding of problems within that domain, and "skill" at solving some of these problems.

Within these descriptions there are clear elements in place to boost the adoption of technologies and methodologies such as machine learning—collaboration, intensive management, and increase of non-traditional data (relational). The need for systems to solve complexity coincides with the advent of phenomena such as Big Data and advanced analytics in business giving a natural space for machine learning to make the entrance to help crunching big sets of complex data, and to be part of the increasingly complex machinery in place for data analysis and decision making.

Along with disciplines like data mining, natural language processing, and others, machine learning is being seen in business as a tool of choice for transforming what use to be a business intelligence application approach into a wider enterprise intelligence or analytics platform or ecosystem, which goes beyond the traditional scope of BI—focused on answering “what is going on with my business?”—to give all possible answers to “why are we doing what we’re doing?” and “how can we do it better?” and even “what should we do?”.

As businesses models are becoming more complex and producing massive amounts of data to be handled with less and less latency, decision support and BI systems are required to grow in complexity and in their ability to handle those volumes of data. This demand is boosting the growth of more sophisticated solutions to address specific business and industry problems; it’s not enough to sit out a straightforward result, systems need to provide business guidance.

Some scenarios where machine learning is gaining increased popularity in the context of analytics and BI can be found in applications for risk analysis, marketing analytics, and advanced analytics for Big Data sources.

Machine learning is a reality for business

As Tim Negris states in Getting ready for machine learning:

Despite what many business people might guess, machine learning is not in its infancy. It has come to be used very effectively across a wide array of applications.

And it’s being increasingly adopted within many analytics, Big Data, and business intelligence initiatives in the form of a component laying side-to-side with other analytics solutions, or packaged within a solution that has already adopted it as part of its functional stack.

In either case, machine learning is preparing to be part of the next evolution of enterprise intelligence business offerings.

In the next part of this series on machine learning, I will address some specifics of the usage of machine learning as part of Big Data and Advanced Analytics, as well as its role in the formation of the new so-called area of cognitive systems. In the mean time, please share comments below and let me know your thoughts.

Hello World!

Hello World!

There is a first time for everything... at least, that’s what my father used to say, and sometimes he was right. As I have been blogging for quite some time for my employers or through other channels, I think the time has come for me to have a personal blog that allows me a bit more freedom to explore what might be closer to my personal interest, where I can let go a bit, and include a deeper (or not) and personal view on topics concerning data:

Data in its several forms, with multiple layers, and from many perspectives. From traditional databases to new databases, from small to big data, simple to complex events. Intelligent and not so intelligent data.

Hello to the Data of Things!

I want to start with the iconic Hello World! phrase because it marked one of the most important moments in my career in IT. The phenomenal book written by Brian W. Kernighan and Dennis Ritchie called “The C programming language” was my introduction to the world of C and UNIX, which led, eventually, via a software programming career, to the challenging and awesome experience of data mingling.

Brian Kernighan playing tribute to Dennis Ritchie at Bell Labs

Data has become a fundamental material for almost all human activities in our lives, and as this presumably will not change and by the contrary will be reinforced, we need to think about data as a key driver in current and future human life. This blog will be devoted to talk about data, the technology, and the people who work with it, from its source, its processing, and movement, to its destination. People are changing our lives by using data in a unique or special way.

So, dearest reader, this blog is devoted to the Data of Things, from data sources and targets, the technologies involved, and those who produce it, use it, and manage it, … and maybe more.

A huge chunk to bite off, I know, but a delicious one, too. :)

Of course, do not hesitate to comment, discuss, and make this blog live… You just need to use the comment space below to start the conversation.

Privacy Policy

Copyright © 2017 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa