Jorge Garcia posted this in #BigData, analytic database, Analytics, Data Management, database, databases, in-memory, inmemory, interviews, Member Via RSS, rdbms, relational, voltdb on February 27th, 2017
As organizations deal with challenging times ―technologically and business wise―, managing increasing volumes of data has become a key to success.
As data management rapidly evolve, the main Big Data paradigm has changed from just “big” to “big, fast, reliable and efficient”.
No more than today in the evolution of the big data and database markets, the pressure is on for software companies to deliver new and improved database solutions capable not just to deal with increasing volumes of data but also to do it faster, better and in a more reliable fashion.
A number of companies have taken the market by storm, infusing the industry with new and spectacularly advanced database software —for both transactional and non-transactional operations— that are rapidly changing the database software landscape.
One of these companies is VoltDB. This New England (Massachusetts) based company has rapidly become a reference when it comes to the offering of next-generation of database solutions and, has gained the favor of important customers in key industries such as communications, finance and gaming.
VoltDB was co-founded by no other than world known database expert and 2014 ACM A.M. Turing Award recipient, professor, Dr. Michael Stonebraker who has been key in the development of a new generation database solution and the formation of a talented team in charge of its development.
With the new VoltDB 7.0 already in the market, we had the opportunity to chat with VoltDB’s John Piekos about VoltDB’s key features and evolution.
John is VoltDB’s Vice President of Engineering at VoltDB, where he heads up VoltDB’s engineering operations, including product development, QA, technical support, documentation and field engineering.
John has more than 25 years of experience leading teams and building software, delivering both enterprise and Big Data solutions.
John has held tech leadership positions at several companies, most recently at Progress Software where he led the OpenEdge database, ObjectStore database and Orbix product lines. Previously, John was vice president of Web engineering at EasyAsk, and chief architect at Novera Software, where he led the effort to build the industry’s first Java application server.
John holds an MS in computer science from Worcester Polytechnic Institute and a BS in computer science from the University of Lowell.
Thank you John, please allow me to start with the obvious question:
What’s the idea behind VoltDB, the company, and what makes VoltDB the database, to be different from other database offerings in the market?
What if you could build a database from the ground-up, re-imagine it, re-architect it, to take advantage of modern multi-core hardware and falling RAM prices, with the goal of making it as fast as possible for heavy write use cases like OLTP and the future sensor (IoT) applications? That was the basis of the research Dr. Stonebraker set out to investigate.
Working with the folks at MIT, Yale, and Brown, they created the H-Store project and proved out the theory that if you eliminated the overhead of traditional databases (logging, latching, buffer management, etc), ran an all in-memory workload, spread that workload across all the available CPUs on the machine and horizontally scaled that workload across multiple machines, you could get orders of magnitude performance out of the database.
The commercial realization of that effort is VoltDB. VoltDB is fully durable, able to process hundreds of thousands to millions of multi-statement SQL transactions per second, all while producing SQL-driven real-time analytics.
Today an increasing number of emerging databases work partially or totally in-memory while existing ones are changing their design to incorporate this capability. What are in your view the most relevant features users need to look for when trying to choose from an in-memory based database?
First and foremost, users should realize that not all in-memory databases are created equal. In short, architecture choices require trade-offs. Some IMDBs are created to process reads (queries) faster and others, like VoltDB, are optimized for fast writes. It is impractical (impossible) to get both the fastest writes and the fastest reads at the same time on the same data, all while maintaining high consistency because the underlying data organization and architecture is different for writes (row oriented) than it is for reads (columnar).
It is possible to maintain two separate copies of the data, one in row format, the other in compressed column format, but that reduces the consistency level - data may not agree, or may take a while to agree between the copies.
Legacy databases can be tweaked to run in memory, but realize that, short of a complete re-write, the underlying architecture may still be disk-based, and thus incur significant (needless) processing overhead.
VoltDB defines itself as an in-memory and operational database. What does this mean in the context of Big Data and what does it mean in the context of IT’s traditional separation between transactional and analytical workloads, how does VoltDB fit or reshapes this schemas?
VoltDB supports heavy write workloads - it is capable of ingesting never-ending streams of data at high ingestion rates (100,000+/second per machine, so a cluster of a dozen nodes can process over a million transactions a second).
While processing this workload, VoltDB can calculate (via standard SQL) and deliver strongly consistent real-time analytics, either ad hoc, or optimally, as pre-computed continuous queries via our Materialized View support.
These are capabilities simply not possible with traditional relational databases. In the Big Data space, this places VoltDB at the front end, as the ingestion engine for feeds of data, from telco, digital ad tech, mobile, online gaming, IoT, Finance and numerous other application domains.
Just recently, VoltDB passed the famous Jepsen Testing for improving safety of distributed databases with VoltDB 6.4, Could you share with us some details of the test, the challenges and the benefits it brought for VoltDB?
We have a nice landing page with this information, including Kyle’s and VoltDB’s founding engineer John Hugg’s blog.
In summary, distributed systems programming is hard. Implementing the happy path isn’t hard, but doing the correct thing (such as returning the correct answer) when things go wrong (nodes failing, networks dropping), is where most of the engineering work takes place. VoltDB prides itself on strong consistency, which means returning the correct answer at all times (or not returning an answer at all - if, for example, we don’t have all of the data available).
Kyle’s Jepsen test is one of the most stringent tests out there. And while we hoped that VoltDB would pass on the first go-around, we knew Kyle was good at breaking databases (he’s done it to many before us!). He found a couple of defects, thankfully finding them before any known customer found them, and we quickly went to work fixing them. Working with Kyle and eventually passing the Jepsen test was one of the 2016 engineering highlights at VoltDB. We’re quite proud of that effort.
One interesting aspect of VoltDB is that It’s a relational database complies fully with ACID and bring native SQL support, what are the differences of this design compared to, for example NoSQL and some so-called NewSQL offerings? Advantages, tradeoffs perhaps?
In general, NoSQL offerings favor availability over consistency - specifically, the database is always available to accept new content and can always provide content when queried, even if that content is not the most recent (i.e., correct) version written.
NoSQL solutions rely on non-standard query languages (some are SQL-like), to compute analytics. Additionally, NoSQL data stores do not offer rich transaction semantics, often providing “transactionality” on single key operations only.
Not all NewSQL database are created equal. Some favor faster reads (over fast writes). Some favor geo-distributed data sets, often resulting in high latency, or at least unpredictable latency access and update patterns. VoltDB’s focus is low and predictable OLTP (write) latency at high transactions/second scale, offering rich and strong transaction semantics.
Note that not all databases that claim to provide ACID transactions are equal. The most common place where ACID guarantees are weakened is isolation. VoltDB offers serializable isolation.
Other systems offer multiple levels of isolation, with a performance tradeoff between better performance (weak guarantees) and slower performance (strong guarantees). Isolation models like Read-Committed and Read-Snapshot are examples; many systems default to one of these.
VoltDB’s design trades off complex multi-dimensional (OLAP) style queries for high throughput OLTP-style transactions while maintaining an ACID multi-statement SQL programming interface. The system is capable of surviving single and multi-node failures.
Where failures force a choice between consistency and availability, VoltDB chooses consistency. The database supports transactionally rejoining failed nodes back to a surviving cluster and supports transactionally rebalancing existing data and processing to new nodes.
Real-world VoltDB applications achieve 99.9% latencies under 10ms at throughput exceeding 300,000 transactions per second on commodity Xeon-based 3-node clusters.
How about the handling of non-structured information within VoltDB? Is it expected VoltDB to take care of it or it integrates with other alternative solutions? What’s the common architectural scenario in those cases?
VoltDB supports the storage of JSON strings and can index, query and join on fields within those JSON values. Further, VoltDB can process streamed JSON data directly into the database using our Importers (See the answer for question #9) and custom formatters (custom decoding) - this makes it possible for VoltDB to transactionally process data in almost any format, and even to act as an ETL engine.
How does VoltDB interact with players in the Big Data space such as Hadoop, both open source and commercial distributions?
The VoltDB database supports directly exporting data into a downstream data lake. This target could be Hadoop, Vertica, a JDBC source or even flat files. VoltDB handles the real-time data storage and processing, as it is capable of transactionally ingesting (database “writes”) millions of events per second.
Typically the value of this data decreases with age - it becomes cold or stale - and eventually would be migrated to historical storage such as Hadoop, Spark, Vertica, etc. Consider applications in the telco or online gaming space - the “hot data” may have a lifespan of one month in telco, or even one hour or less, in the case of game play.
Once the data becomes “historical” and is of less immediate value, it may be removed from VoltDB and stored on disk in the historical archive (such as Hadoop, Vertica, etc).
What capabilities VoltDB offers not just for database administration but for development on top of VoltDB with Python, R, or other languages?
While VoltDB offers traditional APIs such as JDBC, ODBC, Java and C++ native bindings, as well as Node.js, Go, Erlang, PHP, Python, etc., I think one of the more exciting next-generation features VoltDB offers is the ability to stream data directly into the database via our in-process Importers. VoltDB is a clustered database, meaning a database comprises one (1) or more processes (usually a machine, VM or container).
A database can be configured to have an “importer,” which is essentially a plug-in that listens to a source, reads incoming messages (events, perhaps) and transactionally processes them. If the VoltDB database is highly available, then the importer is highly available (surviving node failure). VoltDB supports a Kafka Importer and a socket importer, as well as the ability to create your own custom importer.
Essentially this feature “eliminates the client application” and data can be transactionally streamed directly into VoltDB. The data streamed can be JSON, CSV, TSV or any custom-defined format. Further, the importer can choose which transactional behavior to apply to the incoming data. This is how future applications will be designed: by hooking feeds, streams of data, directly to the database - eliminating much of the work of client application development.
We have one customer who has produced one of the top 10 games in the app store - their application streams in-game events into VoltDB at a rate upwards of 700,000 events per second. VoltDB hosts a Marketing Optimization application that analyzes these in-game events in an effort to boost revenue.
If you had a crystal ball, how would you visualize the database landscape in 5 years from now? Major advancements?
Specialized databases will continue to carve out significant market share from established vendors. IoT will be a major market, and will drive storage systems to support two activities: 1) Machine learning (historical analysis) on the Data Lake/Big Data; storage engines will focus on enabling data scientists to capture value from the vast increases of data, and 2) Real-time processing of streams of data. Batch processing of data is no longer acceptable - real-time becomes a “must have”.
Data creation continues to accelerate and capturing value from fresh data in real-time is the new revenue frontier.
Finally, could tell us a song that is an important part of the soundtrack of your life?
I’m a passionate Bruce Springsteen fan (and also a runner), so it would have to be “Born to Run”.
Springsteen captures that youthful angst so perfectly, challenging us to break out of historic norms and create and experience new things, to challenge ourselves.
This perfectly captures the entrepreneurial spirit both of personal “self” as well as “professional self,” and it matches the unbridled spirit of what we’re trying to accomplish with VoltDB. “Together we could break this trap We'll run till we drop, baby we'll never go back.”
IDOL-powered appliance delivers better decisions via comprehensive business information searches
Dana Gardner posted this in BriefingsDirect, Dana Gardner, David Meyer, Hewlett Packard Enterprise, HPE, HPE Vertica, Interarbor Solutions, Member Via RSS, SEC 1.01 AG on February 23rd, 2017
The next BriefingsDirect digital transformation case study highlights how a Swiss engineering firm created an appliance that quickly deploys to index and deliver comprehensive business information.
By scouring thousands of formats and hundreds of languages, the approach then provides via a simple search interface unprecedented access to trends, leads, and the makings of highly informed business decisions.
We will now explore how SEC 1.01 AG delivers a truly intelligent services solution -- one that returns new information to ongoing queries and combines internal and external information on all sorts of resources to produce a 360-degree view of end users’ areas of intense interest.
Gardner: What are some of the trends that are driving the need for what you've developed. It's called the i5 appliance?
Meyer: The most important thing is that we can provide instant access to company-relevant information. This is one of today’s biggest challenges that we address with our i5 appliance.
Decisions are only as good as the information bases they are made on. The i5 provides the ability to access more complete information bases to make substantiated decisions. Also, you don’t want to search all the time; you want to be proactively informed. We do that with our agents and our automated programs that are searching for new information that you're interested in.
Gardner: As an organization, you've been around for quite a while and involved with large applications, packaged applications -- SAP, for example and R/3 -- but over time, more data sources and ability to gather information came on board, and you saw the need in the market for this appliance. Tell us a little bit about what led you to create it?
Accelerating the journey
Meyer: We started to dive into big data about the time that HPE acquired Autonomy, December 2011, and we saw that it’s very hard for companies to start to become a data-driven organization. With the i5 appliance, we would like to help companies accelerate their journey to become such a company.
Gardner: Tell us what you mean by a 360-degree view? What does that really mean in terms of getting the right information to the right people at the right time?
Meyer: In a company's information scope, you don’t just talk about internal information, but you also have external information like news feeds, social media feeds, or even governmental or legal information that you need and don’t have to time to search for every day.
So, you need to have a search appliance that can proactively inform you about things that happen outside. For example, if there's a legal issue with your customer or if you're in a contract discussion and your partner loses his signature authority to sign that contract, how would you get this information if you don't have support from your search engine?
Gardner: And search has become such a popular paradigm for acquiring information, asking a question, and getting great results. Those results are only as good as the data and content they can access. Tell us a little bit about your company SEC 1.01 AG, your size and your scope or your market. Give us a little bit of background about your company.
Meyer: We've been an HPE partner for 26 years, and we build business-critical platforms based on HPE hardware and also the HPE operating system, HP-UX. Since the merger of Autonomy and HPE in 2011, we started to build solutions based on HPE's big-data software, particularly IDOL and Vertica.
Gardner: What was it about the environment that prevented people from doing this on their own? Why wouldn't you go and just do this yourself in your own IT shop?
Meyer: The HPE IDOL software ecosystem, is really an ecosystem of different software, and these parts need to be packed together to something that can be installed very quickly and that can provide very quick results. That’s what we did with the i5 appliance.
We put all this good software from HPE IDOL together into one simple appliance, which is simple to install. We want to accelerate the time that is needed to start with big data to get results from it and to get started with the analytical part of using your data and gain money out of it. Multiple formats
Gardner: As we mentioned earlier, getting the best access to the best data is essential. There are a lot of APIs and a lot of tools that come with the IDOL ecosystem as you described it, but you were able to dive into a thousand or more file formats, support a 150 languages, and 400 data sources. That's very impressive. Tell us how that came about.
Meyer: When you start to work with unstructured data, you need some important functionality. For example, you need to have support for lot of languages. Imagine all these social media feeds in different languages. How do you track that if you don't support sentiment analysis on these messages?
On the other hand, you also need to understand any unstructured format. For example, if you have video broadcasts or radio broadcasts and you want to search for the content inside these broadcasts, you need to have a tool to translate the speech to text. HPE IDOL brings all the functionality that is needed to work with unstructured data, and we packed that together in our i5 appliance.
Gardner: That includes digging into PDFs and using OCR. It's quite impressive how deep and comprehensive you can be in terms of all the types of content within your organization.
How do you physically do this? If it's an appliance, you're installing it on-premises, you're able to access data sources from outside your organization, if you choose to do that, but how do you actually implement this and then get at those data sources internally? How would an IT person think about deploying this?
Meyer: We've prepared installable packages. Mainly, you need to have connectors to connect to repositories, to data ports. For example, if you have a Microsoft Exchange Server, you have a connector that understands very well how the Exchange server can communicate to that connector. So, you have the ability to connect to that data source and get any content including the metadata.
You talk about metadata for an e-mail, for example, the “From” to “To”, to “Subject,” whatever. You have the ability to put all that content and this metadata into a centralized index, and then you're able to search that information and refine the information. Then, you have a reference to your original document.
When you want to enrich the information that you have in your company with external information, we developed a so-called SECWebConnector that can capture any information from the Internet. For example, you just need to enter an RSS feed or a webpage, and then you can capture the content and the metadata you want it to search for or that is important for your company.
Gardner: So, it’s actually quite easy to tailor this specifically to an industry focus, if you wish, to a geographic focus. It’s quite easy to develop an index that’s specific to your organization, your needs, and your people.
Meyer: Exactly. In our crowded informational system that we have with the Internet and everything, it’s important that companies can choose where they want to have the information that is important for them. Do I need legal information, do I need news information, do I need social media information, and do I need broadcasting information? It’s very important to build your own informational scope that you want to be informed about, news that you want to be able to search for.
Gardner: And because of the way you structured and engineered this appliance, you're not only able to proactively go out and request things, but you can have a programmatic benefit, where you can tell it to deliver to you results when they arise or when they're discovered. Tell us a little bit how that works.
Meyer: We call them agents. You can define which topics you're interested in, and when some new documents are found by that search or by that topic, then you get informed, with an email or with a push notification on the mobile app.
Gardner: Let’s dig into a little bit of this concept of an appliance. You're using IDOL and you're using Vertica, the column-based or high-performance analytics engine, also part of HPE, but soon to be part of Micro Focus. You're also using 3PAR StoreServ and ProLiant DL380 servers. Tell us how that integration happened and why you actually call this an appliance, rather than some other name?
In our crowded informational system that we have with the Internet and everything, it’s important that companies can choose where they want to have the information that is important for them.
Meyer: Appliance means that all the software is patched together. Every component can talk to the others, talks the same language, and can be configured the same way. We preconfigure a lot, we standardize a lot, and that’s the appliance thing.
And it’s not bound on hardware. So, it doesn’t need to be this DL380 or whatever. It also depends on how big your environment will be. It can also be a c7000 Blade Chassis or whatever.
When we install an appliance, we have one or two days until it’s installed, and then it starts the initial indexing program, and this takes a while until you have all the data in the index. So, the initial load is big, but after two or three days, you're able to search for information.
You mentioned the HPE Vertica part. We use Vertica to log every action that goes on, on the appliance. On one hand, this is a security feature. You need to prove if nobody has found the salary list, for example. You need to prove that and so you need to log it.
On the other hand, you can analyze what users are doing. For example, if they don’t find something and it’s always the same thing that people are searching in the company and can't find, perhaps there's some information you need to implement into the appliance.
Gardner: You mentioned security and privileges. How does the IT organization allow the right people to access the right information? Are you going to use some other policy engine? How does that work?
Meyer: It's included. It's called mapped security. The connector takes the security information with the document and indexes that security information within the index. So, you will never be able to find a document that you don't have access to in your environment. It's important that this security is given by default.
Gardner: It sounds to me, David, like were, in a sense, democratizing big data. By gathering and indexing all the unstructured data that you can possibly want to, point at it, and connect to, you're allowing anybody in a company to get access to queries without having to go through a data scientist or a SQL query author. It seems to me that you're really opening up the power of data analysis to many more people on their terms, which are basic search queries. What does that get an organization? Do you have any examples of the ways that people are benefiting by this democratization, this larger pool of people able to use these very powerful tools?
Meyer: Everything is more data-driven. The i5 appliance can give you access to all of that information. The appliance is here to simplify the beginning of becoming a data-driven organization and to find out what power is in the organization's data.
For example, we enabled a Swiss company called Smartinfo to become a proactive news provider. That means they put lots of public information, newspapers, online newspapers, TV broadcasts, radio broadcasts into that index. The customers can then define the topics they're interested in and they're proactively informed about new articles about their interests.
Gardner: In what other ways do you think this will become popular? I'm guessing that a marketing organization would really benefit from finding relationships within their internal organization, between product and service, go-to market, and research and development. The parts of a large distributed organization don't always know what the other part is doing, the unknown unknowns, if you will. Any other examples of how this is a business benefit?
Meyer: You mentioned the marketing organization. How could a marketing organization listen what customers are saying? For example, on social media they're communicating there, and when you have an engine like i5, you can capture these social media feeds, you can do sentiment analysis on that, and you will see an analyzed view on what's going on about your products, company, or competitors.
You can detect, for example, a shitstorm about your company, a shitstorm about your competitor, or whatever. You need to have an analytic platform to see that, to visualize that, and this is a big benefit.
On the other hand, it's also this proactive information you get from it, where you can see that your competitor has a new campaign and you get that information right now because you have an agent with the customer's name. You can see that there is something happening and you can act on that information.
Gardner: When you think about future capabilities, are there other aspects that you can add on? It seems extensible to me. What would we be talking about a year from now, for example?
Meyer: It's pretty much extensible. I think about all these different verticals. You can expand it for the health sector, for the transportation sector, whatever. It doesn't really matter.
We do network analysis. That means when you prepare yourself to visit a company, you can have a network picture, what relationships this company has, what employees work there, who is a shareholder of that company, which company has contracts with any of other companies?
This is a new way to get a holistic image of a company, a person, or of something that you want to know. It's thinking how to visualize things, how to visualize information, and that's the main part we are focusing on. How can we visualize or bring new visualizations to the customer?
Gardner: In the marketplace, because it's an ecosystem, we're seeing new APIs coming online all the time. Many of them are very low cost and, in many cases, open source or free. We're also seeing the ability to connect more adequately to LinkedIn and Salesforce, if you have your license for that of course. So, this really seems to me a focal point, a single pane of glass to get a single view of a customer, a market, or a competitor, and at the same time, at an affordable price.
Let's focus on that for a moment. When you have an appliance approach, what we're talking about used to be only possible at very high cost, and many people would need to be involved -- labor, resources, customization. Now, we've eliminated a lot of the labor, a lot of the customization, and the component costs have come down.
We've talked about all the great qualitative benefits, but can we talk about the cost differential between what used to be possible five years ago with data analysis, unstructured data gathering, and indexing, and what you can do now with the i5?
Meyer: You mentioned the price. We have an OEM contract, and that that's something that makes us competitive in the market. Companies can build their own intelligence service. It's affordable also for small and medium businesses. It doesn't need to be a huge company with own engineering and IT staff. It's affordable, it's automated, it's packed together, and simple to install.
Companies can increase the workplace performance and shorten the processes. Anybody has access to all the information they need in their daily work, and they can focus more on their core business. They don't lose time in searching for information and not finding it and stuff like that.
Gardner: For those folks who have been listening or reading, are intrigued by this, and want to learn more, where would you point them? How can they get more information on the i5 appliance and some of the concepts we have been discussing?
Meyer: That's our company website, sec101.ch. There you can find any information you would like to have. And this is available now.
Andrew Deen posted this in Big Data, Member Via RSS on February 23rd, 2017
Long before capitalism and big corporations ruled the world, we used to survive using different tactics: sharing. Bartering, sharing skills, and helping one another in order to feed ourselves and keep sheltered was simply a necessity. These days, we’re starting to shift back to those roots with the “sharing economy”—a concept that would not be possible in modern times without the help of big data’s ability to bring us together. Whether you know it or not, you’re probably participating in the new trend of the sharing economy—but what is it, exactly, and how does big data make this new (old) way of life possible?
What is the Sharing Economy?
Essentially, the sharing economy is the concept of crowdsourcing goods and services from others. This is usually done off of an online or mobile platform that fields requests and provides “matchmaking” services to facilitate sharing. This might mean calling a ride from someone with a car and a few extra hours to spare, buying a meal your neighbor cooked, or renting out someone’s apartment when you’re visiting a new city. You can even start a business by leveraging peer-to-peer lending—a process that bypasses banks and allows individuals to invest in a business via ...
Ronald van Loon posted this in Internet of things, Member Via RSS on February 23rd, 2017
The Internet of Things (IoT) is changing our world. This may seem like a bold statement, but consider the impact this revolutionary technology has already had on communications, education, manufacturing, science, business, and many other fields of life. Clearly, the IoT is moving really fast from concept to reality and transforming how industries operate and create value.
As the IoT creeps towards mass adoption, IT giants experiment and innovate with the technology to explore new opportunities and create new revenue streams. I was invited to Genius of Things Summit as a Futurist by Watson IoT and WIRED Insider and attended the long-awaited grand opening of IBM’s headquarters for Watson Internet of Things in Munich. The two-day event provided me an insight into what IBM’s doing to constantly push the boundaries of what’s possible with the IoT.
In this article, I have discussed the major developments that caught my interest and that, in my opinion, will impact and improve customer experience substantially.
IoT capabilities become an integral part of our lifestyle
According to IBM the number of connected devices is expecting to rise as high as 30 billion in the next three years. This increasingly connected culture presents businesses with an opportunity to harness digital ...
Dana Gardner posted this in BriefingsDirect, computer security, continuous intelligence, Dana Gardner, DevOps, Interarbor Solutions, machine learning, Member Via RSS, Ramin Sayar, Sumo Logic on February 22nd, 2017
The next BriefingsDirect applications health monitoring interview explores how a new breed of continuous intelligence emerges by gaining data from systems infrastructure logs -- either on-premises or in the cloud -- and then cross-referencing that with intrinsic business metrics information.
We’ll now explore how these new levels of insight and intelligence into what really goes on underneath the covers of modern applications help ensure that apps are built, deployed, and operated properly.
Today, more than ever, how a company's applications perform equates with how the company itself performs and is perceived. From airlines to retail, from finding cabs to gaming, how the applications work deeply impacts how the business processes and business outcomes work, too.
We’re joined by an executive from Sumo Logic to learn why modern applications are different, what's needed to make them robust and agile, and how the right mix of data, metrics and machine learning provides the means to make and keep apps operating better than ever.
Gardner: There’s no doubt that the apps make the company, but what is it about modern applications that makes them so difficult to really know? How is that different from the applications we were using 10 years ago?
Sayar: You hit it on the head a little bit earlier. This notion of always-on, always-available, always-accessible types of applications, either delivered through rich web mobile interfaces or through traditional mechanisms that are served up through laptops or other access points and point-of-sale systems are driving a next wave of technology architecture supporting these apps.
These modern apps are around a modern stack, and so they’re using new platform services that are created by public-cloud providers, they’re using new development processes such as agile or continuous delivery, and they’re expected to constantly be learning and iterating so they can improve not only the user experience -- but the business outcomes.
Gardner: Of course, developers and business leaders are under pressure, more than ever before, to put new apps out more quickly, and to then update and refine them on a continuous basis. So this is a never-ending process.
Sayar: You’re spot on. The obvious benefits around always on is centered on the rich user interaction and user experience. So, while a lot of the conversation around modern apps tends to focus on the technology and the components, there are actually fundamental challenges in the process of how these new apps are also built and managed on an ongoing basis, and what implications that has for security. A lot of times, those two aspects are left out when people are discussing modern apps.
Gardner: That's right. We’re now talking so much about DevOps these days, but in the same breath, we’re taking about SecOps -- security and operations. They’re really joined at the hip.
Sayar: Yes, they’re starting to blend. You’re seeing the technology decisions around public cloud, around Docker and containers, and microservices and APIs, and not only led by developers or DevOps teams. They’re heavily influenced and partnering with the SecOps and security teams and CISOs, because the data is distributed. Now there needs to be better visibility instrumentation, not just for the access logs, but for the business process and holistic view of the service and service-level agreements (SLAs).
Gardner: What’s different from say 10 years ago? Distributed used to mean that I had, under my own data-center roof, an application that would be drawing from a database, using an application server, perhaps a couple of services, but mostly all under my control. Now, it’s much more complex, with many more moving parts.
Sayar: We like to look at the evolution of these modern apps. For example, a lot of our customers have traditional monolithic apps that follow the more traditional waterfall approach for iterating and release. Often, those are run on bare-metal physical servers, or possibly virtual machines (VMs). They are simple, three-tier web apps.
We see one of two things happening. The first is that there is a need for either replacing the front end of those apps, and we refer to those as brownfield. They start to change from waterfall to agile and they start to have more of an N-tier feel. It's really more around the front end. Maybe your web properties are a good example of that. And they start to componentize pieces of their apps, either on VMs or in private clouds, and that's often good for existing types of workloads.
Now there needs to be better visibility instrumentation, not just for the access logs, but for the business process and holistic view of the service and service-level agreements.
The other big trend is this new way of building apps, what we call greenfield workloads, versus the brownfield workloads, and those take a fundamentally different approach.
Often it's centered on new technology, a stack entirely using microservices, API-first development methodology, and using new modern containers like Docker, Mesosphere, CoreOS, and using public-cloud infrastructure and services from Amazon Web Services (AWS), or Microsoft Azure. As a result, what you’re seeing is the technology decisions that are made there require different skill sets and teams to come together to be able to deliver on the DevOps and SecOps processes that we just mentioned.
Gardner: Ramin, it’s important to point out that we’re not just talking about public-facing business-to-consumer (B2C) apps, not that those aren't important, but we’re also talking about all those very important business-to-business (B2B) and business-to-employee (B2E) apps. I can't tell you how frustrating it is when you get on the phone with somebody and they say, “Well, I’ll help you, but my app is down,” or the data isn’t available. So this is not just for the public facing apps, it's all apps, right?
It's a data problem
Sayar: Absolutely. Regardless of whether it's enterprise or consumer, if it's mid-market small and medium business (SMB) or enterprise that you are building these apps for, what we see from our customers is that they all have a similar challenge, and they’re really trying to deal with the volume, the velocity, and the variety of the data around these new architectures and how they grapple and get their hands around it. At the end of day, it becomes a data problem, not just a process or technology problem.
Gardner: Let's talk about the challenges then. If we have many moving parts, if we need to do things faster, if we need to consider the development lifecycle and processes as well as ongoing security, if we’re dealing with outside third-party cloud providers, where do we go to find the common thread of insight, even though we have more complexity across more organizational boundaries?
Sayar: From a Sumo Logic perspective, we’re trying to provide full-stack visibility, not only from code and your repositories like GitHub or Jenkins, but all the way through the components of your code, to API calls, to what your deployment tools are used for in terms of provisioning and performance.
We spend a lot of effort to integrate to the various DevOps tool chain vendors, as well as provide the holistic view of what users are doing in terms of access to those applications and services. We know who has checked in which code or which branch and which build created potential issues for the performance, latency, or outage. So we give you that 360-view by providing that full stack set of capabilities.
Unlike others that are out there and available for you, Sumo Logic's architecture is truly cloud native and multitenant, but it's centered on the principle of near real-time data streaming.
Gardner: So, the more information the better, no matter where in the process, no matter where in the lifecycle. But then, that adds its own level of complexity. I wonder is this a fire-hose approach or boiling-the-ocean approach? How do you make that manageable and then actionable?
Sayar: We’ve invested quite a bit of our intellectual property (IP) on not only providing integration with these various sources of data, but also a lot in the machine learning and algorithms, so that we can take advantage of the architecture of being a true cloud native multitenant fast and simple solution.
So, unlike others that are out there and available for you, Sumo Logic's architecture is truly cloud native and multitenant, but it's centered on the principle of near real-time data streaming.
As the data is coming in, our data-streaming engine is allowing developers, IT ops administrators, sys admins, and security professionals to be able to have their own view, coarse-grained or granular-grained, from our back controls that we have in the system to be able to leverage the same data for different purposes, versus having to wait for someone to create a dashboard, create a view, or be able to get access to a system when something breaks.
Gardner: That’s interesting. Having been in the industry long enough, I remember when logs basically meant batch. You'd get a log dump, and then you would do something with it. That would generate a report, many times with manual steps involved. So what's the big step to going to streaming? Why is that an essential part of making this so actionable?
Sayar: It’s driven based on the architectures and the applications. No longer is it acceptable to look at samples of data that span 5 or 15 minutes. You need the real-time data, sub-second, millisecond latency to be able to understand causality, and be able to understand when you’re having a potential threat, risk, or security concern, versus code-quality issues that are causing potential performance outages and therefore business impact.
The old way was hope and pray, when I deployed code, that I would find something when a user complains is no longer acceptable. You lose business and credibility, and at the end of the day, there’s no real way to hold developers, operations folks, or security folks accountable because of the legacy tools and process approach.
Center of the business
Those expectations have changed, because of the consumerization of IT and the fact that apps are the center of the business, as we’ve talked about. What we really do is provide a simple way for us to analyze the metadata coming in and provide very simple access through APIs or through our user interfaces based on your role to be able to address issues proactively.
Conceptually, there’s this notion of wartime and peacetime as we’re building and delivering our service. We look at the problems that users -- customers of Sumo Logic and internally here at Sumo Logic -- are used to and then we break that down into this lifecycle -- centered on this concept of peacetime and wartime.
Peacetime is when nothing is wrong, but you want to stay ahead of issues and you want to be able to proactively assess the health of your service, your application, your operational level agreements, your SLAs, and be notified when something is trending the wrong way.
Then, there's this notion of wartime, and wartime is all hands on deck. Instead of being alerted 15 minutes or an hour after an outage has happened or security risk and threat implication has been discovered, the real-time data-streaming engine is notifying people instantly, and you're getting PagerDuty alerts, you're getting Slack notifications. It's no longer the traditional helpdesk notification process when people are getting on bridge lines.
No longer do you need to do “swivel-chair” correlation, because we're looking at multiple UIs and tools and products.
Because the teams are often distributed and it’s shared responsibility and ownership for identifying an issue in wartime, we're enabling collaboration and new ways of collaboration by leveraging the integrations to things like Slack, PagerDuty notification systems through the real-time platform we've built.
So, the always-on application expectations that customers and consumers have, have now been transformed to always-on available development and security resources to be able to address problems proactively.
Gardner: It sounds like we're able to not only take the data and information in real time from the applications to understand what’s going on with the applications, but we can take that same information and start applying it to other business metrics, other business environmental impacts that then give us an even greater insight into how to manage the business and the processes. Am I overstating that or is that where we are heading here?
Sayar: That’s exactly right. The essence of what we provide in terms of the service is a platform that leverages the machine logs and time-series data from a single platform or service that eliminates a lot of the complexity that exists in traditional processes and tools. No longer do you need to do “swivel-chair” correlation, because we're looking at multiple UIs and tools and products. No longer do you have to wait for the helpdesk person to notify you. We're trying to provide that instant knowledge and collaboration through the real-time data-streaming platform we've built to bring teams together versus divided.
Gardner: That sounds terrific if I'm the IT guy or gal, but why should this be of interest to somebody higher up in the organization, at a business process, even at a C-table level? What is it about continuous intelligence that cannot only help apps run on time and well, but help my business run on time and well?
Need for agility
So, we're able to help the developers, the DevOps teams, and ultimately, line of business deliver on the speed and agility needs for these new modes. We do that through a single comprehensive platform, as I mentioned.
At the same time, what’s interesting here is that no longer is security an afterthought. No longer is security in the back room trying to figure out when a threat or an attack has happened. Security has a seat at the table in a lot of boardrooms, and more importantly, in a lot of strategic initiatives for enterprise companies today.
At the same time we're helping with agility, we're also helping with prevention. And so a lot of our customers often start with the security teams that are looking for a new way to be able to inspect this volume of data that’s coming in -- not at the infrastructure level or only the end-user level -- but at the application and code level. What we're really able to do, as I mentioned earlier, is provide a unifying approach to bring these disparate teams together.
Download the State Of Modern Applications In AWS Report
Gardner: And yet individuals can extract the intelligence view that best suits what their needs are in that moment.
Sayar: Yes. And ultimately what we're able to do is improve customer experience, increase revenue-generating services, increase efficiencies and agility of actually delivering code that’s quality and therefore the applications, and lastly, improve collaboration and communication.
Gardner: I’d really like to hear some real world examples of how this works, but before we go there, I’m still interested in the how. As to this idea of machine learning, we're hearing an awful lot today about bots, artificial intelligence (AI), and machine learning. Parse this out a bit for me. What is it that you're using machine learning for when it comes to this volume and variety in understanding apps and making that useable in the context of a business metric of some kind?
Sayar: This is an interesting topic, because of a lot of noise in the market around big data or machine learning and advanced analytics. Since Sumo Logic was started six years ago, we built this platform to ensure that not only we have the best in class security and encryption capabilities, but it was centered on the fundamental purpose around democratizing analytics, making it simpler to be able to allow more than just a subset of folks get access to information for their roles and responsibilities, whether you're security, ops, or development teams.
To answer your question a little bit more succinctly, our platform is predicated on multiple levels of machine learning and analytics capabilities. Starting at the lowest level, something that we refer to as LogReduce is meant to separate the signal-to-noise ratio. Ultimately, it helps a lot of our users and customers reduce mean time to identification by upwards of 90 percent, because they're not searching the irrelevant data. They're searching the relevant and oftentimes occurring data that's not frequent or not really known, versus what’s constantly occurring in their environment.
In doing so, it’s not just about mean time to identification, but it’s also how quickly we're able to respond and repair. We've seen customers using LogReduce reduce the mean time to resolution by upwards of 50 percent.
Our core analytics, at the lowest level, is helping solve operational metrics and value. Then, we start to become less reactive. When you've had an outage or a security threat, you start to leverage some of our other predictive capabilities in our stack.
For example, I mentioned this concept of peacetime and wartime. In the notion of peacetime, you're looking at changes over time when you've deployed code and/or applications to various geographies and locations. A lot of times, developers and ops folks that use Sumo want to use log compare or outlier predictor operators that are in their machine learning capabilities to show and compare differences of branches of code and quality of their code to relevancy around performance and availability of the service and app.
We allow them, with a click of a button, to compare this window for these events and these metrics for the last hour, last day, last week, last month, and compare them to other time slices of data and show how much better or worse it is. This is before deploying to production. When they look at production, we're able to allow them to use predictive analytics to look at anomalies and abnormal behavior to get more proactive.
So, reactive, to proactive, all the way to predictive is the philosophy that we've been trying to build in terms of our analytics stack and capabilities.
Sumo Logic is very relevant for all these customers that are spanning the data-center infrastructure consolidation to new workload projects that they may be building in private-cloud or public-cloud endpoints.
Gardner: How are some actual customers using this and what are they getting back for their investment?
Sayar: We have customers that span retail and e-commerce, high-tech, media, entertainment, travel, and insurance. We're well north of 1,200 unique paying customers, and they span anyone from Airbnb, Anheuser-Busch, Adobe, Metadata, Marriott, Twitter, Telstra, Xora -- modern companies as well as traditional companies.
What do they all have in common? Often, what we see is a digital transformation project or initiative. They either have to build greenfield or brownfield apps and they need a new approach and a new service, and that's where they start leveraging Sumo Logic.
Second, what we see is that's it’s not always a digital transformation; it's often a cost reduction and/or a consolidation project. Consolidation could be tools or infrastructure and data center, or it could be migration to co-los or public-cloud infrastructures.
The nice thing about Sumo Logic is that we can connect anything from your top of rack switch, to your discrete storage arrays, to network devices, to operating system, and middleware, through to your content-delivery network (CDN) providers and your public-cloud infrastructures.
As it’s a migration or consolidation project, we’re able to help them compare performance and availability, SLAs that they have associated with those, as well as differences in terms of delivery of infrastructure services to the developers or users.
So whether it's agility-driven or cost-driven, Sumo Logic is very relevant for all these customers that are spanning the data-center infrastructure consolidation to new workload projects that they may be building in private-cloud or public-cloud endpoints.
Gardner: Ramin, how about a couple of concrete examples of what you were just referring to.
Sayar: One good example is in the media space or media and entertainment space, for example, Hearst Media. They, like a lot of our other customers, were undergoing a digital-transformation project and a cloud-migration project. They were moving about 36 apps to AWS and they needed a single platform that provided machine-learning analytics to be able to recognize and quickly identify performance issues prior to making the migration and updates to any of the apps rolling over to AWS. They were able to really improve cycle times, as well as efficiency, with respect to identifying and resolving issues fast.
Another example would be JetBlue. We do a lot in the travel space. JetBlue is also another AWS and cloud customer. They provide a lot of in-flight entertainment to their customers. They wanted to be able to look at the service quality for the revenue model for the in-flight entertainment system and be able to ascertain what movies are being watched, what’s the quality of service, whether that’s being degraded or having to charge customers more than once for any type of service outages. That’s how they're using Sumo Logic to better assess and manage customer experience. It's not too dissimilar from Alaska Airlines or others that are also providing in-flight notification and wireless type of services.
The last one is someone that we're all pretty familiar with and that’s Airbnb. We're seeing a fundamental disruption in the travel space and how we reserve hotels or apartments or homes, and Airbnb has led the charge, like Uber in the transportation space. In their case, they're taking a lot of credit-card and payment-processing information. They're using Sumo Logic for payment-card industry (PCI) audit and security, as well as operational visibility in terms of their websites and presence.
They were able to really improve cycle times, as well as efficiency, with respect to identifying and resolving issues fast.
Gardner: It’s interesting. Not only are you giving them benefits along insight lines, but it sounds to me like you're giving them a green light to go ahead and experiment and then learn very quickly whether that experiment worked or not, so that they can find refine. That’s so important in our digital business and agility drive these days.
Sayar: Absolutely. And if I were to think of another interesting example, Anheuser-Busch is another one of our customers. In this case, the CISO wanted to have a new approach to security and not one that was centered on guarding the data and access to the data, but providing a single platform for all constituents within Anheuser-Busch, whether security teams, operations teams, developers, or support teams.
We did a pilot for them, and as they're modernizing a lot of their apps, as they start to look at the next generation of security analytics, the adoption of Sumo started to become instant inside AB InBev. Now, they're looking at not just their existing real estate of infrastructure and apps for all these teams, but they're going to connect it to future projects such as the Connected Path, so they can understand what the yield is from each pour in a particular keg in a location and figure out whether that’s optimized or when they can replace the keg.
So, you're going from a reactive approach for security and processes around deployment and operations to next-gen connected Internet of Things (IoT) and devices to understand business performance and yield. That's a great example of an innovative company doing something unique and different with Sumo Logic.
Gardner: So, what happens as these companies modernize and they start to avail themselves of more public-cloud infrastructure services, ultimately more-and-more of their apps are going to be of, by, and for somebody else’s public cloud? Where do you fit in that scenario?
Data source and location
Sayar: Whether you’re running on-prem, whether you're running co-los, whether you're running through CDN providers like Akamai, whether you're running on AWS or Azure, Heroku, whether you're running SaaS platforms and renting a single platform that can manage and ingest all that data for you. Interestingly enough, about half our customers’ workloads run on-premises and half of them run in the cloud.
We’re agnostic to where the data is or where their applications or workloads reside. The benefit we provide is the single ubiquitous platform for managing the data streams that are coming in from devices, from applications, from infrastructure, from mobile to you, in a simple, real-time way through a multitenant cloud service.
Gardner: This reminds me of what I heard, 10 or 15 years ago about business intelligence (BI), drawing data, analyzing it, making it close to being proactive in its ability to help the organization. How is continuous intelligence different, or even better, and something that would replace what we refer to as BI?
The expectation is that it’s sub-millisecond latency to understand what's going on, from a security, operational, or user-experience point of view.
Sayar: The issue that we faced with the first generation of BI was it was very rear-view and mirror-centric, meaning that it was looking at data and things in the past. Where we're at today with this need for speed and the necessity to be always on, always available, the expectation is that it’s sub-millisecond latency to understand what's going on, from a security, operational, or user-experience point of view.
I'd say that we're on V2 or next generation of what was traditionally called BI, and we refer to that as continuous intelligence, because you're continuously adapting and learning. It's not only based on what humans know and what rules and correlation that they try to presuppose and create alarms and filters and things around that. It’s what machines and machine intelligence needs to supplement that with to provide the best-in-class type of capability, which is what we refer to as continuous intelligence.
Gardner: We’re almost out of time, but I wanted to look to the future a little bit. Obviously, there's a lot of investing going on now around big data and analytics as it pertains to many different elements of many different businesses, depending on their verticals. Then, we're talking about some of the logic benefit and continuous intelligence as it applies to applications and their lifecycle.
Where do we start to see crossover between those? How do I leverage what I’m doing in big data generally in my organization and more specifically, what I can do with continuous intelligence from my systems, from my applications?
Sayar: We touched a little bit on that in terms of the types of data that we integrate and ingest. At the end of the day, when we talk about full-stack visibility, it's from everything with respect to providing business insights to operational insights, to security insights.
We have some customers that are in credit-card payment processing, and they actually use us to understand activations for credit cards, so they're extracting value from the data coming into Sumo Logic to understand and predict business impact and relevant revenue associated with these services that they're managing; in this case, a set of apps that run on a CDN.
At the same time, the fraud and risk team are using us for threat and prevention. The operations team is using us for understanding identification of issues proactively to be able to address any application or infrastructure issues, and that’s what we refer to as full stack.
Full stack isn’t just the technology; it's providing business visibility insights to line the business users or users that are looking at metrics around user experience and service quality, to operational-level impacts that help you become more proactive, or in some cases, reactive to wartime issues, as we've talked about. And lastly, the security team helps you take a different security posture around reactive and proactive, around threat, detection, and risk.
In a nutshell, where we see these things starting to converge is what we refer to as full stack visibility around our strategy for continuous intelligence, and that is technology to business to users.
Daniel Matthews posted this in AR / VR, Member Via RSS on February 22nd, 2017
This is the companion piece to my last article on mobile data, The dark (and not-so-dark) side of mobile data.
The stage is set, we’re just not quite there yet. When 5G internet becomes a reality, when the mobile data source is unlimited, and when Virtual Reality and Augmented applications proliferate, it won’t be that odd to see someone walking down the street with a headset on.
First, the mobile web and data. If a mobile web provider limits the amount of data you can use, it’s undoubtedly a money-making scheme. The data itself is not a commodity. This is evidenced by a provider like T-Mobile offering 2 smartphone lines of unlimited data and including video streaming and mobile hotspots for no extra charge. To offer streaming data and hotspot data as add-ons is to imply that these data are fundamentally different than cellular data, they’re bonus data, a special kind of data the provider could potentially charge extra for.
But data is data. There’s no limit to it, and increasingly, providers are making unlimited data plans the standard. The difference between streaming video data and audio data, say, is that a packet of video data is much larger, because it includes both ...
Graeme Caldwell posted this in artificial intelligence, Member Via RSS on February 22nd, 2017
In 2016, the artificial intelligence and machine learning hype cycle reached fever pitch. We’ve been here before; it seems as if revolutionary artificial intelligence has been on the cusp of reality for decades. It’s understandable that many in the eCommerce world are skeptical, particularly smaller eCommerce merchants who have to be careful where they invest resources.
Although the benefits of AI / ML are often overblown, it’s clear that it will make — and is already making — a real difference. The contrast between this and other “AI changes everything” hype cycles is significant. The technology in the limited domains relevant to eCommerce has advanced enormously: we’re better at data analytics on huge scales, we’re better at natural language processing and pattern recognition, we’re better at building cheap and scalable infrastructure, and the market has matured: many companies offer products and services that use AI / ML to provide real-world benefits to retailers without requiring a phD in computer science.
Artificial intelligence is the application of automated systems to decision making, the discovery of solutions, and the delivery of insights. That sounds great, but it doesn’t mean anything without practical applications, and, in 2017, there’s no shortage of eCommerce companies and solution ...
Matt Reaney posted this in Big Data, Member Via RSS, strategy on February 22nd, 2017
People have an innate suspicion of numbers.
We understand that the answer to life, the universe, and everything is too complex to be boiled down to the number 42 (according to Hitchhiker’s Guide to the Galaxy), but in the search to quantify our existence, we do allow our lives to be ruled by numbers. We count the calories in our food, we count the minutes on our daily commute, and we definitely count the number of emails in our inboxes. We use our experience to decide which numbers are good and which numbers are bad. If we didn’t manage our lives by numbers, we would be obese, late and overwhelmed.
However, there are increasing amounts of data in our lives where we are not certain of the origin. The failure of opinion polling has already been widely debated, and if such a “fine art” can get it wrong, who is to say that Big Data is any different? Isn’t polling Big Data by another name? We haven’t really got a clue how these opinion pollsters got their figures, and most corporate people are equally unsure where all their stats are conjured up from.
Akash Deb posted this in artificial intelligence, Member Via RSS on February 21st, 2017
Artificial Intelligence or AI is defined as the capacity of machines and software to exhibit or imitate a sense of cognitive intelligence. The idea behind AI holds great potential and yet also raises many concerns. In fictions, scientific and otherwise, developing AI tools and applications, capable of thinking, learning and, reacting like living beings doesn’t end very well. But in the real world, the concerns it generates are not that grave but does, however, pose the threat of a revolutionary disruption trend similar to that of cloud accounting.
There is no need to deny the fact that AI can become an invaluable business enhancement tool in professions where significant training is a necessary prerequisite. Professions where technical precision and right-minded judgments are necessary such as accounting also has great scope for AI applications. In fact, a report from Deloitte states that in the very near future, AI could help develop a whole new paradigm of services, and product, creating a whole new market with huge investor profits.
Niche business areas like customer service, research and development, logistics, sales, and marketing also have great potential for AI applications. Although AI as a technology is still in its infancy, a report from the European Commission states that the global market for AI and AI applications grew from €700 million in 2013 to €27 billion by the end of 2015.
Similar to the cloud accounting disruption in the industry a few years earlier, accountants and the accounting industry in general, will ...
Prithvijit Roy posted this in Big Data, Member Via RSS, strategy on February 21st, 2017
If you are on the verge of joining the Analytics industry, your vision of an ideal day in a life of a practitioner could be someone who effortlessly dives into data to create that perfect beer-and-nappy Eureka insight moment (technically termed as market basket analysis) which shows the unlikely insight that beer and diapers get bought together during game nights; a solution that gets implemented immediately in the retail store design and in turn leads to a dramatic increase in revenue growth.
Truth is the journey from data to revenue growth is more like a well-crafted drama that unfolds across many different stages. There are various actors that need to enact their roles, successfully, on each of these make-shift stages, for the journey to be completed.
Intrigued? Let’s see how:
"However beautiful the strategy, you should occasionally look at the results” – Sir Winston Churchill
Script: Let’s say the business faces a profitability problem or an issue of increased credit risk. The strategist must work with the business to understand the context and convert that into an analytics problem which can be addressed by data. Data then becomes an enabler to understand patterns and help solve the problem.
Prince Kapoor posted this in Member Via RSS, Technical on February 21st, 2017
Federated SSO and SSO may look similar to many people. Cannot blame them as users are only able to see the upper crust of the processes. They need to login with their credentials and enjoys different applications or multiple systems without even repeating the login process.” It’s a snap! But originally, both techniques work differently. So, do you want to know how federated SSO is different from SSO? And if you are perplexed about federated SSO or your organization is struggling to make their choice between federated SSO and SSO? The article will help to offer you an insight about federated SSO and state that how SSO and federated SSO are on a quite different page. Please read on.
What is federated SSO?
To understand, federated SSO, you need to understand federation. Federation is a relationship which is maintained between organizations. User from each organization gets access across each other’s web properties. Hence, federated SSO provides an authentication token to the user which is trusted across organizations. So, user does not need to create a different account for every organization in federation to access web properties and applications.
Note:- use of SAML is common in federation protocols.
Linda Gimmeson posted this in Big Data, Member Via RSS on February 20th, 2017
Big data is quickly expanding to a number of industries, and healthcare is no exception. With the use of big data, all kinds of medical records and studies can be digitized and easily analyzed. However, using big data in the healthcare space comes with its own set of challenges that anyone involved in the industry should be aware of.
The U.S. government and other private organizations have poured billions of dollars into digitizing medical records, but so far the data has basically just stayed where it is. Next to nothing has been done to analyze and actually use that data, in large part because the data is incredibly difficult to use and interpret. Medical data is often stored in databases, which tend to not be easily compatible with each other. Some of the best and most useful information is often added to records as freeform notes, which can be hard to digitize and interpret. Medical records also pass through multiple people’s hands, from nurses to techs and doctors, before making it to the digital world, meaning it is relatively easy for errors or discrepancies to enter someone’s personal information. One of the biggest pushes for big data in healthcare is ...
Rick Delgado posted this in Big Data, Member Via RSS, strategy on February 20th, 2017
While businesses of all sizes have challenges to overcome, small businesses often deal with growing pains. It can be hard to keep an operation profitable, especially when resources are limited. Most small-business managers want to concentrate on only the most vital aspects of their organization.
Though certain things in the world of business were once reserved for only the largest and most successful organizations, technological breakthroughs are changing this fact. Big data is becoming more user-friendly than ever.
This type of change has big implications for small businesses. While some people would assume that businesses would only analyze and consider data relevant to their size, this isn’t always the case. In fact, analyzing large scale data and information can help a small business grow effectively.
Why Analyzing Data is a Good Move for Businesses
While organizations of all sizes will experience their share of successes and failures if their organization has been going long enough, both of these instances can prove valuable. Businesses can learn what works (and what doesn’t) by analyzing data.
Everything from search engine results to sales records can help organizations find out what steps they should be taking next. For smaller organizations, this is especially valuable. These organizations can’t afford to ...
Pat Fredshaw posted this in Big Data, Member Via RSS, strategy on February 20th, 2017
Let’s face it, data analysis reports, whether you’re writing them for universities or for big data, are intimidating. They’re also not a great deal of fun to write. I asked some people where they’d rank writing one and it came in just above going to the dentist. That’s not a good place to find yourself (and here I’m talking both about the list and the chair).
You know what the crazy thing is? They’re actually not that hard to write! Like so many things in life you just need to know where to start. For that reason, I thought I’d write you up a quick article so that the next time you at least know how to get it over with as quickly as possible. And then, maybe you’ll start to enjoy it more and it will become only as unpleasant as being woken by your neighbor drilling holes in the wall. We can only hope, right?
There is no one right way
The first thing you’ve got to realize is that there is not yet one way to present your data. Admittedly, that’s unfortunate. It would probably be helpful for everybody if there was some standard way to do these things. That’s ...
Melissa Crooks posted this in AR / VR, Member Via RSS on February 20th, 2017
Virtual reality is a technology that creates an artificial environment. With a few clicks and a headset, users are allowed to build a new world. This innovation is useful in Education, Medicine, Filming, Engineering and other fields of study. Mechanics, Virtual reality app development is a revenue-generating opportunity. Most of the existing VR apps are in the gaming area, but some App developers have ventured into other fields. The benefits and detriments of VR apps are discussed below.
Virtual reality apps are used for training purposes in business, education, medicine, armed forces, and construction. The use of VR apps in the healthcare sector is particularly in the field of surgery. For instance, the robotic surgery has been adopted in medical schools for teaching future surgeons. The benefits of this technology for training purposes are listed below.
VR apps provide realistic scenarios ideal for teaching.
They present little or no risk to the students and instructors.
These applications are safe to use and can be remotely controlled.
Virtual reality applications simplify complex situations during training.
They are innovative and ideal for different methods of teaching.
These apps are engaging and fun to use.
VR apps are cost-effective. They reduce the cost of creating different prototypes for training purposes. ...
Francisco Maroto posted this in Blockchain, Internet of things, Member Via RSS on February 19th, 2017
It was a matter of time to end up writing an article to talk about the connection between Internet of Things (IoT) and the technology (arguably still in the infancy of its development) that may have the greatest power to transform our world, Blockchain.
In a future planet interconnected not just by devices, but by the events taking place across it, with billions of devices talking to one another in real time, the Internet of Things will require a secure and efficient way to track all interactions, transactions, and activities of every “thing” in the network.
Blockchain’s role could be a coordination layer across devices and the enabler of the IoT to securely facilitate interactions and transactions between devices, and may also support certain processes related to architecture scalability, data sharing, and advancements in encryption and private key technology, enhanced security, and potentially even privacy.
With blockchain, the Achilles’ heel of the IoT of heterogeneous OEM devices world now becomes viable. I wonder, however, if it is feasible that this decentralized IoT network may co-exist IoT sub-networks or centralized cloud-based IoT models.
But let's face it, blockchain is still a nascent and controversial technology (experts estimate that it might take 5 -10 years for the ...
Ramesh Dontha posted this in Big Data, Member Via RSS on February 17th, 2017
When I first heard the term Big Data a few years ago, I didn’t think much of it. Soon after, Big Data started appearing in many of my conversations with many of my tech friends. So I started asking a very simple question 'What is Big Data?'. I kept asking that question to various folks and I did not get the same answer twice from any number of people. ‘Oh, it’s a lot of data’. ‘It’s variety of data’. ‘It’s how fast the data is piling up’. Really? I thought to myself but was afraid to ask more questions. As none of it made much sense to me, I decided to dig into it myself. Obviously, my first stop was Google.
When I typed ‘Big Data’ at that time, this showed up.
Ahh, It all made sense right away. None of the people I was talking to really knew much about Big Data but were talking about it anyway as everyone else was talking about it.
In this series of articles, I am planning to write on Big Data, my target audience is those people who come across the term Big Data but don’t live and breathe Big Data on a daily basis for their regular jobs. ...
Shahid Mansuri posted this in artificial intelligence, Member Via RSS, Technical on February 17th, 2017
Artificial Intelligence is nothing new, the renewed interest in it, is though. The biggest tech companies in the world are dedicating efforts to the field. Whether it's making smart devices, smart vehicles or voice assistants and robots, every company has its own take on AI. Facebook, Google, Microsoft, Samsung, and more companies are all working on AI in some form.
The developments in AI have been rapid and the momentum is not poised to slow down anytime soon. When Google introduced Google Now and voice search, Apple had Siri and Microsoft introduced Cortana. Digital assistants allowed to interact with devices through voice and have conversational help always available to the users. We could talk to our devices just like any other human. That is how AI chatbot technology changes the humech bond. That was AI going mainstream.
We had glimpses of robots learning quickly and adapting as per human needs in recent movies like Her and Ex Machina. Even saw the pitfalls of them outsmarting their own creators and theories of a war between robots and humans not being too far away.
There has even been a controversial conference on Love and Sex with Robots just on 19-20th December in London where as ...
Blake Davies posted this in artificial intelligence, Member Via RSS on February 16th, 2017
Google is always up to something new and fascinating, and one of the things that they have been working on is the evolution of Google Search from machine learning to “something more”. Originally, a group of Google’s engineers worked on the search engine’s recognition of synonyms. With users inputting different words interchangeable with one another, Google would implement its knowledge to understand better what they were searching for.
The next step came with the translation of websites, where the engineers fed the system with a large number of translated documents, “teaching” Google how one language is mapped to another. This way, Google was able to translate sites into languages that none of the engineers spoke. And now, the ultimate step in this evolution is deep learning.
What is deep learning?
Deep learning is based on the notion of digital neurons, which are organized into layers. Every layer takes features of a higher level one from the data that it receives, and passes them to the next layer. The result of this is that higher layers can understand the notions from the input data. Take images for example.
The first layer receives an input of pixels and is “taught” to recognize shapes from them. Higher layers ...
Larry Alton posted this in Big Data, Member Via RSS on February 14th, 2017
For decades, stock traders and other investors have been on the cutting edge of technology. They’re always looking for the slightest advantage that will allow them to be successful. And in recent years, the savviest traders have been relying on big data.
Big Data Improves Technical Analysis
Analysis has always played a significant role in the evaluation of stocks, bonds, and options. Specifically, technical analysis has played a key part. As RJO Futures explains, “Technical analysis is the study of price action and volume through the careful analysis of various different chart types. Modern-day technical analysis looks to expand upon such principles as price trends, moving averages, volume and open interest, support and resistance levels, as well as momentum indicators.”
Until recently, technical analysis has relied on outdated tools – like spreadsheets and rudimentary equations – to provide traders with insights into which trades make sense under a specific set of circumstances. When big data entered the picture, everything changed.
3 Ways Big Data Intersects Trading
Walk through offices on Wall Street today and you’ll notice some stark differences from what was happening 10 or 15 years ago. Let’s take a look at some of the changes that have been brought on by big data:
Raj Dalal posted this in Big Data, Member Via RSS, strategy on February 14th, 2017
There’s almost no one left in the world, except those perhaps residing in the farthest corners of the earth, who has not yet heard of Uber and the Uber business model.
BigInsights Principal Raj Dalal met up with Uber’s Chief Data Architect M C Srivas on a recent visit to San Francisco, where, in the course of the hour-long conversation, Srivas spoke of what data analytics meant for Uber, and how data innovation was being used to further, what is now popularly known around the world as “the Uber model.”
In the first part of this three-part series, we had looked at how data analytics was the key to Uber’s success.
In this, second part, we now look at Uber’s future plans based on data analytics.
Raj: As chief data architect here at Uber what are some breakthroughs you hope to make to take Uber to the next phase?
Srivas: Unlike Google or Facebook or Twitter, every bit of Uber's data is actually a monetary transaction. There’s livelihood getting that exactly right. At Uber, every piece of data is impacting somebody's pocket. So the need for quality is much higher than that of any standard website. So that's a whole different set of challenges.
This article is Sponsored by Search Strategy Solutions, experts in offering your data scientists high-quality, reliable human judgments and support.
The quality of your data determines the quality of your insights from that data. Of course, the quality of your data models and algorithms have an impact on your results as well, but in general it is garbage in, garbage out. Therefore, (Total) Data Quality Management (DQM) or Master Data Management (MDM) have been around for a very long time and it should be a vital aspect of your data governance policies.
Data governance can offer many benefits for organizations, including reduced head count, higher quality of data, better data analysis and time savings. As such, those companies that can maintain a balance of value creation and risk exposure in relation to data can create competitive advantage.
Human judgments and Data Quality
Garbage in, garbage out. Especially with the hype around artificial intelligence and machine learning, that has become more important than ever. Any organization that takes itself serious and employs data scientists to develop artificial intelligence and machine learning solutions should take the quality of data very serious. Data that is used to develop, test and train algorithms should be of high quality, ...
Audrey Willis posted this in artificial intelligence, Member Via RSS on February 13th, 2017
Big data is constantly making our lives better by helping businesses to serve us better while becoming more efficient, pinpointing trends in large datasets, and even improving healthcare. It’s also definitely making criminals’ lives harder in many ways, making our communities safer overall. From helping police to identify crime patterns in cities to scanning large databases for specific information, big data is revolutionizing criminal justice, and making it a lot harder for people to get away with breaking the law. Fingerprint analysis has long been an important part of forensic evidence to build a case, but thanks to big data and artificial intelligence, there’s a newer tool that can help identify suspects: facial recognition.
How it Works
If you’re out and about these days, there’s a good chance you’re getting caught on video somewhere. Surveillance cameras aren’t just for gas stations these days, and they can be helpful witnesses in criminal activity. With enhancement technology available, it’s often an easy way to identify a suspect. However, calling on the public to identify individuals in these photos often wastes precious time—time for criminals to skip town or change their appearance.
Facial recognition in criminal justice relies on a database—a big data dataset that contains ...
Brooke Campbell posted this in Member Via RSS, strategy on February 13th, 2017
When selected with care, enterprise software can be a strategic differentiator that drives efficiency and cost savings throughout your organization. Furthermore, as the great crew change occurs in any industry, the younger generation has certain implicit expectations about how you will leverage that technology, from automation for repetitive processes to the usability of application interfaces, to remote access anywhere on the devices of their preference. If your organization finds itself labeled a technology dinosaur, then you will neither attract nor retain the best young talent in a competitive employment landscape.
Beyond the human resources aspect, the bar that defines baseline efficiency levels in the business is on an upward move. If your organization fails to leverage technology effectively, then your business will ultimately suffer and fall behind as your competition leap frogs ahead. If you accumulate enough technical debt in your IT organization then you will find it difficult to catch-up to the competition.
Selecting the right software product can be a complicated and daunting process, especially when the costs to purchase, implement, and maintain the product can be in the hundreds of thousands to millions of dollars. Improper selection of a software product results in a direct impact to information management ...
Ronald van Loon posted this in Big Data, Infographics, Member Via RSS on February 13th, 2017
Today, more than ever before, organisations realise the strategic importance of data and consider it to be a corporate asset that must be managed and protected just like any other asset. Considering the strategic importance of data, increasing number of farsighted organisations are investing in the tools, skills, and infrastructure required to capture, store, manage, and analyse data.
More organisations are now viewing data management as a holistic activity that requires enterprise-wide collaboration and coordination to share data across the organisation, extract insights, and rapidly convert them into action before opportunities are lost. However, despite the increasing investment in data management infrastructure, there are not many organisations that spend time and effort on anticipating the future events that may impact their data management practices.
From upcoming rules and regulations to the need to create better customer experiences in order to discover hidden value in customer journeys, there are a number of factors that demand a more proactive approach from organisational leaders and decision makers when it comes to the planning and design of an enterprise’s data management infrastructure.
Breaking Down the Data Silos
When it comes to efficient data management, the biggest challenge that enterprises need to overcome is the elimination of the silos ...
Bill Franks posted this in Big Data, Member Via RSS on February 10th, 2017
Last month, I wrote about why simply making predictions isn’t enough to drive value with analytics. I made the case that behind stories of failed analytic initiatives, there is often a lack of action to take the predictions and turn them into something valuable. It ends up that identifying and then taking the right action often leads to additional requirements for even more complex analyses beyond the initial effort to get to the predictions! Let’s explore what that means.
Identifying The Action Is The Next Step
Once I have a prediction, simulation, or forecast, the next step is to identify what action is required to realize the potential value uncovered. Let’s consider the example of using sensor data for predictive or condition-based maintenance. In this type of analysis, sensor data is captured and analyzed to identify when a problem for a piece of equipment is likely. For example, an increase in friction and temperature within a gear might point to the need to replace certain components before the entire assembly fails.
Identifying the problem ahead of time sounds great. All we have to do is to identify when something is going to break and then fix it before it breaks. Doing so saves ...
Ronald van Loon posted this in Big Data, Infographics, Member Via RSS on February 10th, 2017
The world is increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is only going to continue growing in the coming years. It is a fantastic career move and it could be just the type of career you have been trying to find.
Professionals who are working in this field can expect an impressive salary, with the median salary for data scientists being $116,000. Even those who are at the entry level will find high salaries, with average earnings of $92,000. As more and more companies realize the need for specialists in big data and analytics, the number of these jobs will continue to grow. Close to 80% of the data scientists say there is currently a shortage of professionals working in the field.
What Type of Education Is Needed?
Most data scientists - 92% - have an advanced degree. Only eight percent have a bachelor’s degree; 44% have a master’s degree and 48% have a Ph.D. Therefore, it stands to reason that those who want to boost their career and have the best chance for a long and fruitful career with great compensation will work toward getting higher education.
There is no doubt that the industries are going ablaze with the huge eruption of data. None of the sectors have remained untouched of this drastic change in a decade. Technology has crept inside each business arena and hence, it has become an essential part of every processing unit. Talking about IT industry specifically, software and automation are the bare essential terms and are used in each and every phase of a process cycle.
Businesses are focusing more on agility and innovation rather than stability and adopting the big data technologies help the companies achieve that in no time. Big data analytics has not only allowed the firms to stay updated with the changing dynamics but has also let them predict the future trends giving a competitive edge.
What is driving the widespread adoption of big data across the industries?
Let's find out the reasons behind all the hype of big data-
Firms witnessing surprising growth
Needless to say that Big Data is taking the world by storm through its countless benefits. Big Data is allowing the leading firms like IBM, Amazon, to develop some of the cutting-edge technologies providing high-end services to their customers.
Rick Delgado posted this in Big Data, Member Via RSS on February 9th, 2017
Technology has now infiltrated nearly every facet of human life, making the world more connected than it’s ever been. Think about it - in just a few short years a completely new tech vernacular has emerged, making commonplace inventive terms like trolling, cookies, memes and overgrams. One of the most descriptive, and now relevant phrases that has come into being as a result of the digital revolution is “data mining.” Data mining is exactly what it sounds like - digging for data. The idea that data is a valuable commodity is implicit in this phrase. And it is. In fact, recently data has become such a critical part of the business world and the world in general that the question has been raised: is data becoming a form of currency?
The intrinsic, quantifiable value of data is without question. Think about all the ways that data provides monetary value to companies. First, there is the obvious fact that data-informed insights are more likely to pay off. Eliminating “gut-based” decision-making allows companies to grow more consistently and methodically. Secondly, consider how the wealth of consumer insights available through data mining has allowed marketers and brand ambassadors to connect with their customers like ...
Tatsiana Levdikova posted this in artificial intelligence, Member Via RSS on February 9th, 2017
Artificial Intelligence (AI) is not just the buzzword used by scientists, science fiction authors, and filmmakers. The future has already arrived. AI has a large number application fields. Many of them are robotics-related (e.g. Google Home and Amazon Echo). But let's make a journey into the most interesting AI use cases.
If you are accustomed to robot-like chatbots ineffectively simulating human behavior, you will have to forget about them very soon. AI is believed to be the future of customer service. Many companies work on AI projects, and platforms, such as Init.ai, motion.ai and Octane AI already assist businesses in creating feature-rich chatbots using AI. Technology solutions empowered by AI can do much more than just conducting a conversation. They can be personal assistants like Siri, Microsoft’s Cortana, and Google Now.
What about a new crop? Developers offer AI apps like Mezi and Claire that can help people to manage their trips. Virtual humans, such as 3D intelligent virtual assistants developed by iDAvatars are emotionally expressive and can speak customers' language, thus delivering superior customer experience.
Food and Drinks Industry
You may have heard about some "boring" probable uses of AI like "intelligent machines" for food sorting. What about this one? The ...
Andrew Deen posted this in Member Via RSS, Robotics on February 8th, 2017
Data is essential to the way we live our lives here on Earth. But it’s also proven to be very helpful in exploring the mysteries of the universe beyond our tiny planet. Because there are so many phenomena observed in space that are difficult to measure, many mysteries remain about how the universe functions, changes, and evolves. Dark matter has been of particular interest to researchers, but it has been notoriously difficult to study—because all that has been detected of the elusive matter is its gravitational pull. New research has revealed that there may be a way to find this matter, however, using a common technology that we use every day: GPS satellites and data.
What is Dark Matter?
Dark matter is one of the most elusive and mysterious substances that exists. It’s the stuff that is thought to be responsible for the formation of galaxies, but very little is known about its nature, aside from the hypothesis that it makes up 85% of the galaxy’s matter, and that its gravity is thought to keep galaxies together. Scientists have been hunting for answers about dark matter for years, with few solid answers. Even extra-sensitive sensors haven’t been able to detect dark matter ...
Erik Kangas posted this in Member Via RSS, strategy on February 8th, 2017
Compared to previous types of marketing, digital marketing is a data goldmine. There’s no 100% verifiable way to measure how many people read a newspaper ad, but Google Analytics can quickly tell you how many people visited each page on your website, how long they visited for, and so much more.
In the realm of digital marketing, email marketing takes the lead as the most effective — with websites coming in at a close second. On average, every dollar spent on email marketing generates $38 in return. If you’re ignoring email marketing, you’re throwing away money.
To make the most of email marketing, it is important to understand the wealth of data every email campaign generates. By digging into the data, you can improve your email marketing for even better results for your company.
Fortunately, many email marketing services make it easy to understand important email marketing data — no degree in research methods or mathematics required.
While your particular marketing goals will influence which email marketing data is most important for you to examine, there are five types of data that are always worth tracking.
Part of what makes email marketing so amazing is how effective it is at putting your brand in ...
Daniel Matthews posted this in Big Data, Member Via RSS, security on February 8th, 2017
For information consumers, constant communicators and internet junkies everywhere, unlimited mobile data is a pass to unlimited playtime. It’s like unlimited crack for the addict. For data scientists, analysts and marketers, the words ‘unlimited mobile data’ spark raised eyebrows. Unlimited data, by its very nature, makes for limitless possibilities. How its use will pan out is, of course, a different story.
One way companies such as Target and Macy’s use smartphone data is movement tracking. When a customer arrives, a camera gets a shot of their face or license plate, then the retailer tracks where they go in the store through WiFi. Although these stores aren’t malicious, and are ostensibly doing this to build user profiles and personalize marketing, it seems ominous: 80% of shoppers feel in-store tracking is unacceptable; 68.5% are concerned their data will not be kept safe, and 67% say “tracking feels like spying.”
This is the grey area of mobile data usage, because many people might not be okay with tracking, but they might be happy to take advantage of a special discount the store offers as a result of tracking. Customers with Wal-Mart’s app, which uses geolocation tracking, spend nearly 40% more per month. This benefits Wal-Mart, ...
Brigg Patten posted this in Big Data, Member Via RSS on February 8th, 2017
Big data is, well, big business. It's believed over the next four years, over two billion people will be utilizing personal cloud storage at a capacity of 1.7 gigabytes per user per month. The cloud will gradually become the standard method of storage for both personal and professional needs, minimizing the use of everything from physical servers to flash drives.
The cloud is expected to take the forefront in a lot more than storage. Here are five major big data predictions that will be with us in the workplace long beyond 2017.
eLearning companies were among the first industries to take full advantage of the cloud. Through mobile learning management systems, they provide exceptional tools that have transformed onboarding and training for companies and employees in every way. The cloud has given every employer great opportunities to create and customize training to fit specific needs and goals. The tech lets the employer design training unique to each individual employee. Employees can log in, pause sessions, pick up during their commute, and take the exam at home.
The cloud has significantly altered the way we create, edit, share, and present content. The once mighty Microsoft and Corel office staples are taking a step back ...
Prince Kapoor posted this in Member Via RSS, security on February 8th, 2017
For any business dealing in any niche, cost-price, etc are the most discussed and looked after synonyms. And why not? Most often or not, cost either works as the deal maker or the deal breaker when it comes to decision-making. Need I say everybody is trying to optimize this factor as much as possible? But many times, in order to reduce this factor, people end up making wrong decisions resulting in failure in the original quest.
For example, with the hike in cases of data breaches, the implementation of two-factor authentication is at its peak. If you are dealing with sensitive customer data, you also would have either implemented this solution or are going to implement it very soon. And obviously, the idea of “building two-factor authentication on your own”, must have crossed your mind more than once. After all, it will reduce costs (or it seems so), you would have full control over your data and it is exciting, right? So let’s do it!
Well, hold back your horses! If you do some analysis, building this 2FA on your own is not that wise, if you look at other factors like prior investment, after effects, etc. Don’t worry, we’ll soon dive into ...
Matt Reaney posted this in Big Data, Member Via RSS, strategy on February 8th, 2017
Data science might not be seen as the most creative of pursuits.
You add a load of data into a repository, and you crunch it the other end to draw your conclusions. Data in, data out, where is the scope for creativity there? It is not like you are working with a blank canvas.
For me, the definition of creativity is when you are able to make something out of nothing. This requires an incredible amount of imagination, and seeing past the obvious headline statistics to reach a deeper conclusion is the hallmark of a great Big Data professional.
Many of the most successful people come across as innovative thinkers when they have interviews with us. They have no choice, moulding the data in unique and unexpected ways is their job. Just as Einstein found inspiration in his violin playing, many leading data scientists find that when their creative juices are flowing, they often find the most elegant solutions to the challenges that they face. These Data Creatives are some of hardest to find candidates – mainly due to the subjectivity involved. (See also a previous blog for more on Data Creatives)
It is actually one of my favourite interview questions:
Johnny Morgan posted this in Big Data, Member Via RSS on February 8th, 2017
Is Your Big Data Vulnerable?
The year 2017 started on a dramatic Hollywood-style note for a few database-service providers. In a series of events, some database became victim of ransomware. This raised questions about the security of cloud-based databases which has been the biggest drawback of cloud so far.
MongoDB and Elasticsearch Fell Victim
In what seemed to be a hacking attempt by an individual or a group, thousands of MongoDB-based databases fell victim to a cyber-attack. The hacker claimed to have access to many databases and threatened to delete or encrypt the data if a ransom was not paid.
While many were trying to get over the MongoDB instance, another news regarding Elasticsearch clusters started doing rounds. In this case, the data was deleted from the cluster and a message asking for ransom was left behind. Experts are working on establishing whether this is an isolated event or is it related to the MongoDB attack.
Hadoop and CouchDB Instances
When Hadoop and CouchDB started facing similar issues where data was deleted from the instances, the first thought that crossed many people’s mind was that these have also been hit by ransomware. But, the reality came out to be different than thought. The hackers targeting Hadoop ...
Josh McAllister posted this in Big Data, Member Via RSS, strategy on February 8th, 2017
The idea of "Big Data" has generated a lot of hype, especially in recent years as the variety of data sources increases and affordable new data tools appear. While success with big data analytics is not guaranteed, the chance for discovering operational improvements from your own data still makes it attractive. It's been estimated that a large retailer can increase margins by over 60 percent through process insights. Increasingly in the social media era, big data is shaping consumer-brand relationships.
For many companies, however, big data is not providing the anticipated returns. The value in data-driven decisions seems to be elusive. Colin Strong, the author of "Humanizing Big Data", advises looking to the human rather than the technical side to obtain positive results.
Companies need to re-examine big data operations in terms of the humans involved. This can mean not only the consumers feeding a flood of data points, but the data analysts themselves.
The Humans Behind the Data
Regarding data analysis as a statistical exercise is missing out on the human circumstances that generate the information. Human social environments can shape consumer preferences rapidly and profoundly. Basing decisions solely on the numbers means that potential changes are missed, and brands could miss spotting ...
Jorge Garcia posted this in Analytics, AWS, azure, Big Data, cloud computing, Data Management, Data Warehouse, interviews, Member Via RSS, microsoft azure, Microsoft Azure Marketplace, teradata, Teradata Database, Teradata Everywhere, Teradata in the Cloud on February 7th, 2017
(Image Courtesy of Teradata)
In a post about Teradata’s 2016 Partners event I wrote about the big effort Teradata is making to ensure its software offerings are now available both on-premises and in the Cloud, in variety of forms and shapes, making a big push to ensure Teradata’s availability, especially for hybrid cloud configurations.
So, the data management and analytics software giant seems to be sticking to its promise by increasingly incorporating its flagship Teradata Database other solutions to the Cloud in the form of its own Manage Cloud for Americas and Europe, a private cloud-ready solution or via public cloud providers such as AWS and most recently announced on Microsoft’s Azure Marketplace.
To chat about this latest news and Teradata’s the overall strategy directed to the cloud we’ve sat with Teradata’s Brian Wood.
Brian Wood is director of cloud marketing at Teradata. He is a results-oriented technology marketing executive with 15+ years of digital, lead gen, sales / marketing operations & team leadership success.
Brian has an MS in Engineering Management from Stanford University, a BS in Electrical Engineering from Cornell University, and served as an F-14 Radar Intercept Officer in the US Navy.
All along 2016 and especially during its 2016 Partners conference, Teradata made it clear it is undergoing an important transformation process and, a key strategy includes its path to the cloud. Offerings such as Teradata Database on different private and public cloud configurations, including AWS, VMware, Teradata Managed Cloud, and of course Microsoft Azure are available now. Could you share some details about the progress of this strategy so far?
Thanks for asking, Jorge. It’s been a whirlwind because Teradata has advanced tremendously across all aspects of cloud deployment in the past few months; the progress has been rapid and substantial.
To be clear, hybrid cloud is central to Teradata’s strategy and it’s all about giving customers choice. One thing that’s unique to Teradata is that we offer the very same data and analytic software across all modes of deployment – whether managed cloud, public cloud, private cloud, or on-premises.
What this means to customers is that it’s easy for them to transfer data and workloads from one environment to another without hassle or loss of functionality; they can have all the features in any environment and dial it up or down as needed. Customers like this flexibility because nobody wants to locked in and it’s also helpful to be able to choose the right tool for the job and not worry about compatibility or consistency of results.
Specific cloud-related advancements in the last few months include:
Expanding Teradata Managed Cloud to now include both Americas and Europe
Increasing the scalability of Teradata Database on AWS up to 64 nodes
Launching Aster Analytics on AWS with support up to 33 nodes
Expanding Teradata Database on VMware scalability up to 32 virtual nodes
Bolstering our Consulting and Managed Services across all cloud options
And announcing upcoming availability of Teradata Database on Azure in Q1
These are just the ones that have been announced; there are many more in the pipeline queued up for release in the near future. Stay tuned!
The latest news is the availability of Teradata Database on Microsoft’s Azure Marketplace. Could you give us the details around the announcement?
We’re very excited about announcing Q1 availability for Teradata Database on Azure because many important Teradata customers have told us that Microsoft Azure is their preferred public cloud environment. We at Teradata are agnostic; whether AWS, Azure, VMware, or other future deployment options, we want what’s best for the customer and listen closely to their needs.
It all ties back to giving customers choice in how they consume Teradata, and offering the same set of capabilities across the board to make experimentation, switching, and augmentation as easy as possible.
Our offerings on Azure Marketplace will be very similar to what we offer on AWS Marketplace, including:
Teradata Consulting and Managed Services to help customers get the most value from their cloud investment
Azure Resource Manager Templates to facilitate the provisioning and configuration process and accelerate ecosystem deployment
What about configuration and licensing options for Teradata Database in Azure?
The configuration and licensing options for Teradata Database on Azure will be similar to what is available on AWS Marketplace. Customers use Azure Marketplace as the medium through which to find and subscribe to Teradata software; they are technically Azure customers but Teradata provides Premier Cloud Support as a bundled part of the software subscription price.
One small difference between what will be available on Azure Marketplace compared to what is now available on AWS Marketplace is subscription duration. Whereas on AWS Marketplace we currently offer both hourly and annual subscription options, on Azure Marketplace we will initially offer just an hourly option.
Most customers choose hourly for their testing phase anyway, so we expect this to be a non-issue. In Q2 we plan to introduce BYOL (Bring Your Own License) capability on both AWS Marketplace and Azure Marketplace which will enable us to create subscription durations of our choosing.
Can we expect technical and functional limitations from this version compared with the on-premises solution?
No, there are no technical or functional limitations of what is available from Teradata in the cloud versus on-premises. In fact, this is one of our key differentiators: customers consume the same best-in-class Teradata software regardless of deployment choice. As a result, customers can have confidence that their existing investment, infrastructure, training, integration, etc., is fully compatible from one environment to another.
One thing to note, of course, is that a node in one environment will likely have a different performance profile than what is experienced with a node in another environment. In other words, depending on the workload, a single node of our flagship Teradata IntelliFlex system may require up to six to ten instances or virtual machines in a public cloud environment to yield the same performance.
There are many variables that can affect performance – such as query complexity, concurrency, cores, I/O, internode bandwidth, and more – so mileage may vary according to the situation. This is why we always recommend a PoC (proof of concept) to determine what is needed to meet specific customer requirements.
Considering a hybrid cloud scenario. What can we expect in regards to the integration with the rest of the Teradata stack, especially on-premises?
Hybrid cloud is central to Teradata’s strategy; I cannot emphasize this enough. We define hybrid cloud as a customer environment consisting of a mix on managed, public, private, and on-premises resources orchestrated to work together.
We believe that customers should have choice and so we’ve made it easy to move data and workloads in between these deployment modes, all of which use the same Teradata software. As such, customers can fully leverage existing investments, including infrastructure, training, integration, etc. Nothing is stranded or wasted.
Hybrid deployment also introduces the potential for new and interesting use cases that were less economically attractive in an all-on-premises world. For example, three key hybrid cloud use cases we foresee are:
Cloud data labs – cloud-based sandboxes that tie back to on-premises systems
Cloud disaster recovery – cloud-based passive systems that are quickly brought to life only when needed
Cloud bursting – cloud-based augmentation of on-premises capacity to alleviate short-term periods of greater-than-usual utilization
How about migrating from existing Teradata deployments to Azure? What is the level of support Teradata and/or Azure will offer?
Teradata offers more than a dozen cloud-specific packages via our Consulting and Managed Services team to help customers get the most value from their Azure deployments in three main areas: Architecture, Implementation, and Management.
Specific to migration, we first always recommend that customers have a clear strategy and cloud architecture document prior to moving anything so that the plan and expectations are clear and realistic. We can facilitate such discussions and help surface assumptions about what may or may not be true in different deployment environments.
Once the strategy is set, our Consulting and Managed Services team is available to assist customers or completely own the migration process, including backups, transfer, validation, testing, and so on. This includes not only Teradata-to-Teradata migration (e.g., on-premises to the cloud), but also competitor-to-Teradata migrations as well. We especially love the latter ones!
Finally, can you share with us a bit of what is next for Teradata in the Cloud?
Wow, where should I start? We’re operating at breakneck pace. Seriously, we have many new cloud developments in the works right now, and we’ve been hiring cloud developers like crazy (hint: tell ‘em Brian sent you!).
You’ll see more cloud announcements from us this quarter, and without letting the cat out of bag, expect advancements in the realm of automation, configuration assistance, and an expansion in managed offers.
Cloud is a key enabler to our ability to help customers get the most value from their data, so it’s definitely an exciting time to be involved in helping define the future of Teradata. Thanks for your questions and interest!
Aston Calvin posted this in Internet of things, Member Via RSS on February 7th, 2017
The Internet of Things (IoT) is a hot topic in business circles and with current innovations in the field; it is easy to see why. Related technologies like sensor embedded connected systems have the potential to transform the way we live and even how businesses operate. To illustrate the impact of the trend, here are top ten applications that showcase the Internet of Things:
When it comes to technologies that made it possible for there to be an Internet of Things, wearable’s deserve special mention. Wearables are often in the form of sensor-embedded objects driven by software and are designed to collect data about users, send and sometimes even receive it. Depending on what a device is for, the data that is received about the user can be used to glean insights on anything from a user’s health to bank transactions.
Since their inception, there has been an increasing demand for wearable devices and brands like Google and Apple are already cashing in on it. Case in point both are versions of the Apple Watch and Google Glass.
2 - Smart Homes
If wearable’s like smart watches aren’t enough to convince you that IoT is ...
Issam Hijazi posted this in Member Via RSS, Technical on February 7th, 2017
Streaming data is becoming an essential part of every data integration project nowadays, if not a focus requirement, a second nature. Advantages gained from real-time data streaming are so many. To name a few: real-time analytics and decision making, better resource utilization, data pipelining, facilitation for micro-services and much more.
Python has many modules out there which are used heavily by data engineers and scientist to achieve different goals. While "Scala" is gaining a great deal of attention, Python is still favorable by many out there, including myself. Apache Spark has a Python API, PySpark, which exposes the Spark programming model to Python, allowing fellow "pythoners" to make use of Python on the amazingly, highly distributed and scalable Spark framework.
Often, persisting real-time data streams is essential, and ingesting MapR Streams / Kafka data into MapR-DB / HBase is a very common use case. Both, Kafka and HBase are built with two very important goals in mind: scalability and performance. In this blog post, I'm going to show you how to integrate both technologies using Python code that runs on Apache Spark (via PySpark). I've already tried to search such combination on the internet with no luck, I found Scala examples but not ...
Ronald van Loon posted this in Big Data, Infographics, Member Via RSS on February 6th, 2017
Today’s customers are socially driven and more value conscious than they were ever before. Believe it or not, everyday customer interactions create a whopping 2.5 exabytes of data, which is equal to 1,000,000 terabytes, and this figure has been predicted to grow by 40 percent with every passing year. As organisations face the mounting challenges of coping with the surge in the amount of data and number of customer interactions, it has become extremely difficult to manage the huge quantities of information, whilst providing a satisfying customer experience. It is imperative for businesses and corporations to create a customer-centric experience by adopting a data-driven approach, based on predictive analytics.
Integrating an advanced self-service analytics (SSA) environment for strengthening your analytics and data handling strategy can prove to be beneficial for your business, regardless of the type and size of your enterprise. A corporate SSA environment can assist in dramatically improving your operations capabilities, as it provides an in-depth understanding of consumer data. This, in turn, facilitates your workforce in taking up a more responsive, nimble approach to analyzing data, and fosters fact-based decision making rather than on predictions and guesswork. Self-service analytics offers a wealth of intelligence and insights into how ...
Janet Anthony posted this in Big Data, Member Via RSS on February 6th, 2017
There is no doubt about it. Millennials, unlike previous generations, love self-service options. They want the freedom to customize orders, change account information, and find answers to their questions without intervention from a middle man. Think of them as the first demographic who prefers navigating automated phone systems over speaking to a real person, at least most of the time.
This is even true within organizations. Users would much rather access and query data themselves. It’s faster and more customized than relying on standard reports or submitting a request for an ad hoc report.
This trend is only going to grow as 2017 continues. Check out six ways that data analytics will trend over the next twelve months.
1. Data Gets Smarter
The Vs of big data are veracity, volume, velocity, and variety. Smart data involves removing volume, variety, and velocity from the picture and focusing on veracity. The idea is that by doing so, the value of that data to people and organizations increases in a meaningful way.
Smart data is useful and actionable. It involves weeding out the fluff and providing information that people can use to make decisions and predict trends. In 2017, brands will increasingly use artificial intelligence when analyzing data ...
Alena Lysiakova posted this in Big Data, Member Via RSS on February 6th, 2017
Everybody heard about Big Data, but a relatively small number of people knows what it means and how we can actually use it. Those who know can build their business around Big Data to optimize the spendings and forecast the behavior of their clients.
The meaning behind Big Data
The amount of digital data is growing every year. According to IBS the total number of data stored in 2015 is more than 6,5 zettabytes and is constantly growing. These huge masses of data are called Big Data. In Russia, we can add the tools for its analysis to the term, but the main idea stays untouched. Only 1,5% of this data is actually useful and we need to have great analytics to depict the information needed from the whole data pool.
Big Data over the world
Nowadays the USA is the pioneer of Big Data practices with the majority of companies being at least interested in the subject. In 2014, according to IDC, more countries began to look into the data corner. Europe, Asia (excluding Japan) and Africa took 45% share of big data software and hardware.
Where Big Data is used
The usage of Big Data is wide. With clever usage of data analytics, you can find out the effectiveness ...
Amir Rasool posted this in Big Data, Member Via RSS, strategy on February 6th, 2017
We recently spoke with Daniel Cantorna, Head of Data Products at ICLP, to give us his insights with regards to the future of big data in 2017 and his specialty which is data driven marketing. Daniel has worked on marketing automation programmes and has architected solutions to collect, store and understand customer data, as well as having created several data products that allow marketers to drive better decisions. Here are some of his thoughts:
The Difference Between Data-Driven Marketing or Data-Informed Marketing. There is a big difference between data-driven and data-informed marketing, but the two approaches can strengthen each other.
Essentially, data-driven marketing is the direct use of insights or outcomes from data analysis to inform a marketing decision which then impacts on a customer experience in some way. For example, offering special discounts to customers who click on your ad, visit your web page, or sign up for your mail list.
On the other hand, data-informed marketing describes a process whereby the outcome of some data analysis may inform or direct a future decision e.g. we have noticed that there is a dip in sales when products are advertised using these (x, y, z) words, so let’s try using a different product ...
Issam Hijazi posted this in Member Via RSS, Technical on February 6th, 2017
"Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. The rise of lambda architecture is correlated with the growth of big data, real-time analytics, and the drive to mitigate the latencies of map-reduce." - Wikipedia
Previously, I've written some blogs covering many use-cases for using Oracle Data Integrator (ODI) for batch processing on top of MapR distribution and for using Oracle GoldenGate (OGG) to stream transactional data into MapR Streams and other Hadoop components. While combining both products perfectly fit for the lambda architecture, the latest release of ODI (18.104.22.168.6) has many new great features, including the ability to deal with Kafka streams as source and target from ODI itself. This feature has tremendous advantages to anyone already having or planning to have a lambda architecture, by simplifying the way we process and handle both batch and fast data within the same logical design, under one product. Now if we combine OGG streaming capabilities and ODI batch/streaming capabilities, ...
Ronald van Loon posted this in Big Data, Infographics, Member Via RSS on February 3rd, 2017
Businesses today need to do more than merely acknowledge big data. They need to embrace data and analytics and make them an integral part of their company. Of course, this will require building a quality team of data scientists to handle the data and analytics for the company. Choosing the right members for the team can be difficult, mainly because the field is so new and many companies are still trying to learn exactly what a good data scientist should offer. Putting together an entire team has the potential to be more difficult. The following information should help to make the process easier.
The Right People
What roles need to be filled for a data science team? You will need to have data scientists who can work on large datasets and who understand the theory behind the science. They should also be capable of developing predictive models. Data engineers and data software developers are important, too. They need to understand architecture, infrastructure, and distributed programming.
Some of the other roles to fill in a data science team include the data solutions architect, data platform administrator, full-stack developer, and designer. Those companies that have teams focusing on building data products will also likely want ...
Technology has a lot to play in the evolution of the Banking & Finance Industry in the last one to two decades. The service and the way the banks operated have advanced to make life easier for both the customers and the banking professionals. When the Big Data Revolution hit the various industries, the banking industry realised the opportunity avenues associated with it. This article provides an insight on the impact of Big Data on the banking sector.
How Big is Big Data in the Banking Sector?
Banking firms always had a huge amount of information stored in their database; clueless about what to do with it. Big Data has unlocked the doors converting this huge amount of data into meaningful benefits for themselves and their customers.
According to a report by Alacer, the banks in the US have currently around 1 exabyte of stored data, which is equal to 273 Billion mp3s. Typically, the source of information in the banking industry comes from the sources – customer bank visits, credit/debit card histories, banking volumes, account transactions, call logs, and web interactions.
Role and Impact of Big Data
As mentioned in the above paragraphs, there are a lot of areas which have been or can ...
Larry Alton posted this in Cloud, Member Via RSS on February 2nd, 2017
You’ve surely heard the term “cloud computing” by now. Even parents and grandparents know they can use their iPhone to backup their family photos in the cloud, although they may not know exactly what that means.
In basic terms, cloud computing is the act of storing and accessing your data and programs over the internet, as opposed to a stand-alone desktop computer’s hard drive. While the term “cloud computing” is fairly new, the concept is not. People have been using cloud computing since the 1960s, even prior to the internet as we know it today.
With Salesforce.com making its debut in 1999, they delivered the first enterprise application over a basic website. It wasn’t long before other companies would follow suit, and in 2002, Amazon Web Services was launched.
Why it’s called the cloud
Referring to this technology as “the cloud” is a genius marketing strategy that paints a picture in the user’s mind of an all-powerful network where data is accessible at all times, from all devices. It’s much like a puffy, white cloud that hovers overhead and follows you wherever you go.
Cloud computing is important because it allows people to collaborate in real time from opposite ends of the world. It’s also ...
Labdhi Shah posted this in Member Via RSS, security on February 2nd, 2017
A security system is literally a means/method by which something is secured through a system of interworking components and devices. When it comes to information, it is defined as the protection of information to minimize exposure to unauthorized personnel.
Cybercrime is a wide range of malicious activities including the illegal interception of data, system interferences that compromise network integrity and availability of copyright infringements. These offenses are committed using telecommunication networks such as the internet and mobile phones. The crime may be committed by individuals or small groups, as well as by criminal organizations that are often spread around the world and committing crimes on an unprecedented scale with a criminal motive to intentionally harm the reputation of the victim. The committed offenses can cause physical/mental harm or loss to the victim directly or indirectly. These crimes threaten a nation’s security and financial health. Cyber criminals often chose to operate in countries with weak or nonexistent cybercrime laws.
How Your Website is Getting Harmed by Hackers?
A hacker is a highly skilled computer expert, capable of breaking into computer systems and network using bugs that exploit and gain unauthorized access to data. Below are the most common types of attacks on websites by the hacker:
Jorge Garcia posted this in #BigData, analytic database, Analytics, Big Data, Business Intelligence, cloud computing, data lake, Data Management, data platform, Data Warehouse, database, Member Via RSS, nosql on February 1st, 2017
(Image courtesy of Thomas Skirde)
As I mentioned in a first blog about the book, I'm now working hard to deliver a piece that will hopefully, serve as a practical guide for the implementation of a successful modern data management platform.
I'll try to provide frequent updates and, perhaps, share some pains and gains about its development. For now, here's some additional information, including the general outline and the type of audience is intended for.
I invite you to be part of the process and leave your comments, observations and encouragement quotes right below, or better yet, to consider:
Pre-ordering the book, soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up to our pre-order list, or for
Providing us with information about your own successful enterprise use case, which we may use in the book
Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book. So here, take a look at the update...
New Data Management Platforms
Discovering Architecture Blueprints
About the Book What Is This Book About?
This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.
In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms. These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.
The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.
Who Is This Book For?
This book is intended for both business and technical professionals in the area of information technology (IT). These roles would include chief information officers (CIOs), chief technology officers (CTOs), chief financial officers (CFOs), data architects, and data management specialists interested in learning, evaluating, or implementing any of the plethora of new technologies at their disposal for modernizing their existing data management frameworks.
The book is also intended for students in the fields of computer sciences and informatics interested in learning about new trends and technologies for deploying data architecture platforms. It is not only relevant for those individuals considering pursuing a big data/data management‒related career, but also for those looking to enrich their analytics/data sciences skills with information about new platform technologies. This book is also relevant for:
Professionals in the IT market who would like to enrich their knowledge and stay abreast of developments in information management.
Entrepreneurs who would like to launch a data management platform start-up or consultancy, enhancing their understanding of the market, learning about some start-up ideas and services for consultants, and gaining sample business proposals.
Executives looking to assess the value and opportunities of deploying and/or improving their data management platforms.
Finally, the book can also be used by a general audience from both the IT and business areas to learn about the current data management landscape and technologies in order to acquire an informed opinion about how to use these technologies for deploying modern technology data management platforms.
What Does This Book Cover?
The book covers a wide variety of topics, from a general exploration of the data management landscape, to a more defined revision of the topics, including the following:
The evolution of data management
A comprehensive introduction to Big Data, NoSQL, and analytics databases
The emergence of new technologies for faster data processing—such as in-memory databases, data streaming, and real-time technologies—and their role in the new data management landscape
The evolution of the data warehouse and its new role within modern data management solutions
New approaches to data management, such as data lakes, enterprise data hubs, and alternative solutions
A revision of the data integration issue—new components, approaches, and solutions
A detailed review of real-world use cases, and a suggested approach to finding the right deployment blueprint
How Is the Book Structured?
The book is divided into four comprehensive parts that offer a historical perspective and the ground basis for the development of management platforms and associated concepts, and the analysis of real-world cases of modern data management frameworks toward the establishment of potential development blueprints for deployment.
Part I. A brief history of diverse data management platform architectures, and how their evolution has set the stage for the emergence of new data management technologies.
Part II. The need for and emergence of new data management technologies such as Big Data, NoSQL, data streaming, and real-time systems in reshaping existing data management infrastructures.
Part III. An in-depth exploration of these new technologies and their interaction with existing technologies to reshape and create new data management infrastructures.
Part IV. A study of real-world modern data management infrastructures, along with a proposal of a concrete and plausible blueprint.
The following is a general outline of the book:
<Table of Content> Preface x Acknowledgment xi Prologue xii Introduction xiii Part I. Brief History of Data Management Platform Architectures Chapter 1. The Never-Ending Need to Manage Data Chapter 2. The Evolution of Structured Data Repositories Chapter 3. The Evolution of Data Warehouse as the Main Data Management Platform Part II. The Need for and Emergence of New Data Management Technologies Chapter 4. Big Data: A Primer Chapter 5. NoSQL: A Primer Chapter 6. Need for Speed 1: The Emergence of In-Memory Technologies Chapter 7: Need for Speed 2: Events, Streams, and the Real-Time Paradigm Chapter 8 The Role of New Technologies in Reshaping the Analytics and Business Intelligence Space Part III. New Data Management Platforms: A First Exploration Chapter 9. The Data Warehouse, Expanded and Improved Chapter 10. Data Lakes: Concept and Approach Chapter 11. Data Hub: Concept and Approach Chapter 13. Analysis of Alternative Solutions Chapter 12. Data Lake vs. Data Hub: Key Differences and Considerations Chapter 13. Considerations on Data Ingestion, Integration, and Consolidation Part IV. Studying Plausible New Data Management Platforms Chapter 14. Methodology Chapter 15. Data Lakes Sub-Chapter 15.1. Analyzing three real-world uses cases Sub-Chapter 15.2 Proposing a feasible blueprint Chapter 16. Data Hubs Sub-Chapter 16.1. Analyzing three real-world uses cases Sub-Chapter 16.2. Proposing a feasible blueprint Chapter 17. Summary and Conclusion Summary and Conclusions Appendix A. The Cloud Factor: Data Management Platforms in the Cloud Appendix B. Brief Intro into Analytics and Business Intelligence with Big Data Appendix D. Brief Intro into Virtualization and Data Integration Appendix E. Brief Intro into the Role of Data Governance in Big Data & Modern Data Management Strategies Appendix F. Brief Intro into Analytics and Business Intelligence with Big Data Glossary Bibliography Index
István Nagy posted this in BME, Member Via RSS, oktatás, python, rapidminer on February 1st, 2017
Folytatva a hagyományokat: szeretnénk a figyelmetekbe ajánlani a tavaszi félévben futó data science kurzusunkat a Műegyetemen, amelyben az érdeklődők betekintést nyerhetnek az adatelemzés világába elméleti és gyakorlati órák keretében. A tárgy órái hetente keddenként 10:15-től és minden második pénteken 10:15-től lesznek. Az első alkalom február 7-én, kedden, 10:15-kor kezdődik.
Téma szempontjából az adatelemzés alapjait vesszük át: adatmodell, CRISP-DM, felügyelt és nem felügyelt tanulási eljárások, adatbányászati modellezés, és sok alkalmazási példa: elvándorlás-előrejelzés, kockázatbecslés, szegmentáció, idősorok előrejelzése. Az első hetekben RapidMinerrel dolgozunk, majd a python adatelemzős alapjait sajátítjuk el a gyakorlati alkalmak keretében. Mindenkitől azt kérjük, hogy a gyakorlati órákra hozzon saját számítógépet, amire a megfelelő programcsomagokat telepítette (az ingyenes verziókkal dolgozunk).
A tárgyhoz házifeladat is tartozik, ami egy felügyelt tanulási feladat lesz valós adathalmazon - sőt a kaggle.com rendszerén keresztül egy zárt adatbányászati versenyen is megversenyeztetjük majd a házifeladatra adott megoldásokat. A helyszínről és a pontos beosztásról a jelentkezés után, annak elfogadása esetén tájékoztatunk.
Már most látszik, hogy elég sok külsős hallgató jelentkezése várható, ezért némi korlátozással is élnünk kell majd: a tárgyra annyi embert tudunk befogadni, ahány hallgató is felvette azt. Ez a jelenlegi állapot szerint 24 fő. Emiatt arra kérünk, hogy amennyiben érdekel a tárgy és szívesen velünk tartanál, úgy jelentkezésed add le az alábbi rövid kérdőív kitöltésével.
Mindenkit pár napon belül értesíteni fogunk, hogy jelentkezését módunkban áll-e elfogadni. Ez a válaszolás sorrendjétől, valamint a motivációtól fog függeni. Ugyanazon cégtől csak kivételes esetben fogunk nagy számú (3+) jelentkezést befogadni.
Irine Papuc posted this in Member Via RSS, Technical on February 1st, 2017
Any set of data that needs a history of events can be considered stateful. Managing state can be hard and error prone, but working with immutable data (rather than mutable) and certain supporting technologies- namely Redux, for the purposes of this article- can help significantly.
Immutable data has restrictions, namely that it can’t be changed once it’s created, but it also has many benefits, particularly in reference versus value equality, which can greatly speed up applications that rely on frequently comparing data (checking if something needs to update, for example).
Using immutable states allows us to write code that can quickly tell if the state has changed, without needing to do a recursive comparison on the data, which is usually much, much faster.
This article will cover the practical applications of Redux when managing state through action creators, pure functions, composed reducers, impure actions with Redux-saga and Redux Thunk and, finally, use of Redux with React. That said, there are a lot of alternatives to Redux, such as MobX, Relay, and Flux-based libraries.
Ajith Nayar posted this in Big Data, Internet of things, Member Via RSS on January 31st, 2017
Throughout history, the business world has been driven forward by new technologies, with these developments rapidly redefining what companies are capable of doing. As early business adopters take advantage of these technological opportunities, it often becomes apparent to industry observers that there exists a mutually beneficial relationship between new technologies which impact the growth of both.
We’re seeing one example of this phenomenon today, with the growing importance of both the Internet of Things (IoT) and analytics technologies in businesses worldwide.
Most industry observers already understand that each of these technologies will play a tremendously important role in the business world of the future, but by looking at the symbiotic relationship between the two that we can best understand what each has to offer.
What is the Internet of Things?
The Internet of Things is a collective term for all smart devices in the world capable of generating and delivering data without human intervention. The Internet of Things is set to redefine our understanding of how our industrial systems operate, and the potential upside for the business world is massive.
How does Analytics fit in?
The development of analytics technology closely mirrored the beginning of the big data era. As organizations saw the exponential growth of ...
Marcus Jensen posted this in Big Data, Member Via RSS on January 31st, 2017
With Super Bowl LI rapidly approaching, there are tons of questions that need answering. They range from the ones about the halftime show – why did Adele turn it down, what will it look like this year and will it be more spectacular than the infamous Super Bowl XXXVIII with its wardrobe malfunctioning incident? – to the actual game.
Namely, is Tom Brady going to be allowed anywhere near the ball before the game starts and, most importantly, how did the Atlanta Falcons manage to get here? After having more or less a horrible decade ever since Michael Vick left – except the 2012 season which was quite good, actually – almost nobody saw this coming. And while the New England Patriots have been consistently good season after season and even won four championship titles in the last 15 years, the Falcons are rather a surprise. Maybe it’s their way to say farewell and show respect to the Georgia Dome?
However, these aren’t the only questions since lots of people are going to be focusing on ads and social media reports. So, if you’re interested in the tech side of the Super Bowl more than in the sports aspect of it, here ...
Preetish Panda posted this in AR / VR, Member Via RSS on January 31st, 2017
Imagine yourself meeting your friend far away from your place at a coffee shop by projecting your digital self and your friend interacting with you, sipping coffee and a virtual kitten that’s indistinguishable from a real one playing on the table. Exciting, isn’t it? Well, that’s what Mixed Reality has in store for you. Before diving deep, let’s first understand how it’s different from Virtual Reality (VR) and Augmented Reality (AR).
VR completely overrides the physical world and creates its own digital world. It takes the user to another immersive environment and allows the viewer to experience something that would not have been feasible otherwise. For example, VR would let you have 3D experience of Rio de Janeiro Carnival, FIFA world cup, your favorite band, sea surfing and much more just by wearing a headgear (irrespective of the physical place). Here is a video of Google’s Artist in Residence (AIR) program which shows artists producing creative pieces using HTC Vive and Google Tilt Brush:
Liked it? Now don’t miss Katie Rodgers create a garment using a virtual mannequin.
Unlike VR, AR works in tandem with the real world by superimposing digital objects on the real-world environment. It functions by augmenting the physical ...
Mark van Rijmenam posted this in artificial intelligence, Member Via RSS on January 30th, 2017
Artificial Intelligence offers a lot of advantages for organisations by creating better and more efficient organisations, improving customer services with conversational AI and reducing a wide variety of risks in different industries. Although we are only at the beginning of the AI revolution that is upon us, we can already see that artificial intelligence will have a profound effect on our lives. As a result, AI governance is also becoming increasingly important, if we want to reap the benefits of artificial intelligence.
Data governance and ethics have always been important and a few years ago, I developed ethical guidelines for organisations to follow, if they want to get started with big data. Such ethical guidelines are becoming more important, especially now since algorithms are taking over more and more decisions. Automated decision-making is great until it has a negative outcome for you and you can’t change that decision or, at least, understand the rationale behind that decision. In addition, algorithms offer tremendous opportunities, but they have two major flaws:
Algorithms are extremely literal; they pursue their (ultimate) goal literally and do exactly what is told while ignoring any other, important, consideration;
Algorithms are black boxes; whatever happens inside an algorithm is only known ...
Mark van Rijmenam posted this in artificial intelligence, Member Via RSS on January 30th, 2017
Artificial Intelligence offers a lot of advantages for organisations by creating better and more efficient organisations, improving customer services with conversational AI and reducing a wide variety of risks in different industries. Although we are only at the beginning of the AI revolution that is upon us, we can already see that artificial intelligence will have a profound effect on our lives. As a result, AI governance is also becoming increasingly important, if we want to reap the benefits of artificial intelligence.
Data governance and ethics have always been important and a few years ago, I developed ethical guidelines for organisations to follow, if they want to get started with big data. Such ethical guidelines are becoming more important, especially now since algorithms are taking over more and more decisions. Automated decision-making is great until it has a negative outcome for you and you can’t change that decision or, at least, understand the rationale behind that decision. In addition, algorithms offer tremendous opportunities, but they have two major flaws:
Algorithms are extremely literal; they pursue their (ultimate) goal literally and do exactly what is told while ignoring any other, important, consideration;
Algorithms are black boxes; whatever happens inside an algorithm is only known ...
Audrey Willis posted this in Big Data, Member Via RSS on January 30th, 2017
Intuition. That’s what most recruiters and hiring managers use to make their decisions on which applicants get offers. In a world that is moving toward more data and logic-driven decisions at every turn, recruiting is a surprising change of pace. Or is it? Big data has touched nearly every facet of business, and recruiting is no exception. Large companies have been using data to choose their new employees and decrease turnover for several years. But how much better is predictive analysis than the human intuition when it comes to recruiting?
Can Big Data Save Companies Money on Hiring?
It costs a lot to hire and train an employee. Figures vary widely based on the type of position, but some studies estimate that it costs 6-9 months’ of the new employee’s salary for onboarding and training. With costs like that, minimizing turnover is a key money-saving goal for any business.
Data points available from information about applicants online have helped hiring managers to devise new ways of evaluating potential employees, including complex personality tests that can help predict culture fit. But only when large data sets are used to predict specific outcomes, like retention rates, does big data come into recruiting.
Dirk Lerner posted this in Data Modeling, Member Via RSS on January 30th, 2017
Our 25th anniversary roundtable in Frankfurt with FCO-IM was a great success. Almost 90 registrations and more than 60 attendees is an unexpected outcome for a topic that is almost unknown in Germany. If you want to know what happened at the roundtable, read about it in my previous blog post.
Ronald Damhof posted this in Member Via RSS on January 29th, 2017
Years ago my son came home with some geography homework. He had to learn the countries and capitals of the European continent. When I was practicing with him I encountered the country Yugoslavia...Now, this is 2010 I am talking about, Yugoslavia is split up into (a.o) Serbia and Montenegro. I honestly thought this was a mistake (old material used by the teacher), so I went to the teacher and asked her whether we could get an update on the material. This was her response:
"The method has not been updated yet"
This is – in my opinion – a small symptom of what is very wrong with education (which is completed saturated with this ‘method-illness), society as a whole, and – more particular - Enterprise Architecture. History tells us that these beliefs, like a religion, in a method and blindly following it and demanding others should too, will not end well. Methods used in this way are killing critical thinking, creativity and innovation.
Methods, best practices, checklists and other attempts to mask (especially to management and society) the inherent uncertainty of the work/world we’re in, is extremely damaging to people, organisations and society.
Only ‘systems’ (using a very broad definition of ‘system’ here) that are simple, linear, predictable and have a generic context might benefit from this method-approach. In my world of data-enterprise-IT-Business-Blabla-architecture, these systems do simply not exist.
The moment a system gets complicated, complex or even chaotic* it all breaks down and it becomes dangerous on so many levels when design-participants of these systems still act as if it’s a simple-linear-method-based-system. The inherent complexity and uncertainty of these systems require a deep dive into the context (real world) surrounding the system. It requires architects to experiment, try something, rinse-and-repeat, being a full member of the operationalisation (!) and hanging in there (!), learning, discussing. And yeah, for many architects this is scary….
I understand its temptation, the phrase ‘we use TOGAF’ or ‘we have written a PSA1’ sends a vibe of trust and safety (‘he, I followed the method, not my fault’) and is highly rewarded by senior management. What is not rewarded is stating this uncertainty (‘What do you mean, you don’t know, that’s why I hired you’). Make senior management part of the uncertainty, figure out together how to approach it, how we design a learning-by-doing mentality as well as a mutual respect for one another and emerging insights (!).
We ‘the architects’ should stand firm in stating the message ‘this is complicated, we can do it, but I am not sure how’.
How do we need to change? We need to go back to the fundamentals, stir away from the various technical hypes, distance ourselves from architectural methods. We should separate concerns fiercely, isolate, abstract, collaborate with many experts in the field, communicate and above all, we should honour the context of the domain we are working in and how it’s affecting the real world. Remember, there is no ‘one architecture’ in complex systems that you can design on forehand. And if you tell that fairy-tale, you are deluding yourself, your colleagues, your organisation and ultimately, society.
Back to my opening story, if teachers are not evolving towards educators truly interested in the kids they need to educate and if they keep relying on the methods to educate our kids, I say…lets automate the teachers, long live the MOOC’s and YouTube. And the sad thing is, this teaching in methods is educating future architects to do the same, this method-illness is deeply rooted in our society. Instead, teach them how to think critically, to think outside the box using fundamental skills. So, throw away your methods, burn them, educating kids is about connecting with kids, parents, their concerns, it's about an unique and proud profession, something you need to learn and train hard for, it's a valuable, hard-to-learn and respected skill.
I leave it to the reader to convert this analogy to Enteprise Architecture and the state we are in.
*referring to the Cynafin framework of Dave Snowden
Amit Kuntal posted this in Member Via RSS, security on January 27th, 2017
As ubiquitous computing continues to become the focal point of our daily lives, it is more important than ever to make decisions about the scope of data people unknowingly share. With wearable technology, users are enjoying the comfort that comes through ambient intelligence. However, they also risk potentially exposing their private data to nefarious actors. For example, a hacker may gauge the best time to rob victims while they sleep based on their leaked heartbeat data. Hackers can also use data to discover medical conditions that can be exploited for illegal gains.
Security in wearable technology is different from the precautions people take in other settings. This is due to the increased attack surface exposed by such devices. In 2015, a vulnerability in the Fitbit wristband was disclosed that allowed an attacker to upload malicious code when the device was in close range. The code could then be transferred to any connected computer making other devices vulnerable as well.
Bluetooth is becoming the premier connectivity option for a majority of wearable devices. Unfortunately, Bluetooth is not secure, and it continues to be a weak link in the security chain. Freely available tools, such as Crackle, can be used to crack Bluetooth encryption. ...
Raman Sharma posted this in Cloud, Member Via RSS on January 27th, 2017
Does cloud analytics really help in the overall scope of Information Technology? This is one question asked by most IT businesses. As the cloud continues to gain prominence in both, technology and business world, questions such as these arise more frequent. Before we delve into the significance of cloud analytics for IT businesses, it’s important to first define it. Cloud analytics is a service model in which the data analytics process is provided through private and public cloud under a subscription-based model. In this blog post, we take a look at how cloud computing analytics can have a positive impact on the way IT businesses operate.
1. Streamlined Business Operations
One of the major benefits of cloud computing are the web-based services that host all the programs that users need. This involves all the elements of analytics - rather than invest in multiple programs, the cloud offers a single place for all your hosting needs. Cloud analytics is a streamlined model for IT businesses, especially call centers, which previously had separate software to run specific applications. The cloud simplifies the function by providing a single platform.
Apart from being simplified, the option of subscription-based services allows businesses to opt for the pay-per-use ...
Francesco Corea posted this in Big Data, Member Via RSS, security on January 26th, 2017
I. The problem(s)
Data security represents one of the main problems of this data yield generation, since a higher magnitude of data is correlated with a loose control and higher fraud probability, with a higher likelihood of losing own privacy, and becoming targets of illicit or unethical activities. Today more than ever a universal data regulation is needed — and some steps have already been taken toward one (OECD, 2013). This is especially true because everyone claims privacy leakages, but no one wants to give up on the extra services and customized products that companies are developing based on our personal data.
It is essential to protect individual privacy without erasing companies’ capacity to use data for driving businesses in a heterogeneous but harmonized way. Any fragment of data has to be collected with prior explicit consent, and guaranteed and controlled against manipulation and fallacies. A privacy assessment to understand how people would be affected by the use of data is crucial as well.
II. Fairness and Data Minimization
There are two important concepts to be considered from a data protection point of view: fairness and minimization.
Fairness concerns how data are obtained, and the transparency needed from organizations that are collecting them, especially about their future potential uses.
Data minimization, instead, regards the ability of gathering ...
Andrew Deen posted this in Internet of things, Member Via RSS on January 25th, 2017
Today, we take GPS (Global Positioning System) technology for granted. Thinking back on the days of navigating with maps seems like a long time ago, even though modern GPS systems are really fairly new. That’s because the technology has been so revolutionary to the way we live our lives, that we can’t really imagine life without it. It helps us to navigate while driving, help us track our steps, and even crowdsource crime analysis maps to help keep us safe. Technology is always evolving, however, and there may be something new on the horizon that could make GPS obsolete: synthetic diamonds.
How Does GPS Work?
Currently, GPS works using three different devices: satellites, ground devices, and receivers. Satellites, of course, orbit the Earth and are able to relay positioning data back to the ground devices. The ground devices then use radar to confirm positioning. Our receivers are the devices we use on an everyday basis: our phones, GPS units in the car, smart watches, and any other instrument that can enable GPS. While this system works perfectly well for most of our needs, there are some limitations on the accuracy and speed of GPS. For some technology, like driverless cars, we’ll need ...
Rick Delgado posted this in Member Via RSS, security on January 25th, 2017
Big data environments are now more common in companies; nearly every industry has its hand in the cookie jar. Because of this, companies are generating more data today than any other point in history. Vast “silos” of information are being structured and filled every day giving companies a competitive advantage to better tailor their products to their customers’ needs. But with this growth of big data and big data environments, we are also seeing growth in cybercrime as well. Security for big data has become a great concern in the last few years. Much of the data that is gleaned is sensitive; we know it, you know it, and cybercriminals absolutely know it and love it.
For companies that run big data environments, securing their data warehouses and modes of deployment is vital to not only their success but also to their customers’, clients’ and business partners’ privacy. Knowing how to secure your big data isn’t as difficult as it might sound.
Security strategies can be found through asking specific questions such as “Who is running specific big data requests,” “What analytics requests are users running” or “Are users trying to download sensitive data or is the request part of a job ...
Paul te Braak posted this in Data Vault, Data Vault data warehousing, Data Warehouse, Member Via RSS, Other Tools on January 24th, 2017
If your looking to get some great Data Vault training in Australia in March, the Genesee academy is running the CDVDM course in Brisbane, Melbourne and Sydney. You can get the schedule dates at the Genesee site. It’s a great course and well worth the investment for anyone looking to get in to Data Vault […]
Journey Science in Telecom: Take Customer Experience to the Next Level
Ronald van Loon posted this in Big Data, Member Via RSS, strategy on January 24th, 2017
Journey Science, being derived from connected data from different customer activities, has become pivotal for the telecommunications industry, providing the means to drastically improve the customer experience and retention. It has the ability to link together scattered pieces of data, and enhance a telco business’s objectives. Siloed approaches are becoming obsolete – take call centers as an example – there is only so much that you can do with data from only one system.
By using insights from customer journey analytics, telco businesses can better measure the user experience, and make informed decision for refining it. The data not only allow them to take proactive approach towards customer satisfaction, but enable the prediction of future failures as well. With customer journey analytics, you can evaluate the touchpoints to journeys, and revamp your strategies to better cater to customers’ needs.
In the telecom industry, it is difficult for a business to effectively manage the massive volume of data with the existing systems and technology. There are several aspects where telecom companies need to make improvements, such as reduce costs, improve customer experience, increase conversion rates, and many more. To do so, they need to derive meaning from the collected data by finding ...
Dana Gardner posted this in BriefingsDirect, cloud computing, Dana Gardner, Hewlett Packard Enterprise, HPE, hybrid cloud, Interarbor Solutions, Mark Skelton, Member Via RSS, OCSL on January 24th, 2017
The next BriefingsDirect digital transformation case study explores how UK IT consultancy OCSL has set its sights on the holy grail of hybrid IT -- helping its clients to find and attain the right mix of hybrid cloud.
Gardner: People increasingly want to have some IT on premises, and they want public cloud -- with some available continuum between them. But deciding the right mix is difficult and probably something that’s going to change over time. What drivers are you seeing now as organizations make this determination?
Accelerate Your Business With Hybrid Cloud from HPE Learn More
Skelton: It’s a blend of lot of things. We've been working with enterprises for a long time on their hybrid and cloud messaging. Our clients have been struggling just to understand what hybrid really means, but also how we make hybrid a reality, and how to get started, because it really is a minefield. You look at what Microsoft is doing, what AWS is doing, and what HPE is doing in their technologies. There's so much out there. How do they get started?
We've been struggling in the last 18 months to get customers on that journey and get started. But now, because technology is advancing, we're seeing customers starting to embrace it and starting to evolve and transform into those things. And, we've matured our models and frameworks as well to help customer adoption.
Gardner: Do you see the rationale for hybrid IT shaking down to an economic equation? Is it to try to take advantage of technologies that are available? Is it about compliance and security? You're probably temped to say all of the above, but I'm looking for what's driving the top-of-mind decision-making now.
Start with the economics
Skelton: The initial decision-making process begins with the economics. I think everyone has bought into the marketing messages from the public cloud providers saying, "We can reduce your costs, we can reduce your overhead -- and not just from a culture perspective, but from management, from personal perspective, and from a technology solutions perspective."
CIOs, and even financial officers, are seeing economics as the tipping point they need to go into a hybrid cloud, or even all into a public cloud. But it’s not always cheap to put everything into a public cloud. When we look at business cases with clients, it’s the long-term investment we look at. Over time, it’s not always cheap to put things into public cloud. That’s where hybrid started to come back into the front of people’s minds.
We can use public cloud for the right workloads and where they want to be flexible and burst and be a bit more agile or even give global reach to long global businesses, but then keep the crown jewels back inside secured data centers where they're known and trusted and closer to some of the key, critical systems.
So, it starts with the finance side of the things, but quickly evolves beyond that, and financial decisions aren't the only reasons why people are going to public or hybrid cloud.
Gardner: In a more perfect world, we'd be able to move things back and forth with ease and simplicity, where we could take the A/B testing-type of approach to a public and private cloud decision. We're not quite there yet, but do you see a day where that choice about public and private will be dynamic -- and perhaps among multiple clouds or multi-cloud hybrid environment?
Skelton: Absolutely. I think multi-cloud is the Nirvana for every organization, just because there isn't one-size-fits-all for every type of work. We've been talking about it for quite a long time. The technology hasn't really been there to underpin multi-cloud and truly make it easy to move on-premises to public or vice versa. But I think now we're getting there with technology.
Are we there yet? No, there are still a few big releases coming, things that we're waiting to be released to market, which will help simplify that multi-cloud and the ability to migrate up and back, but we're just not there yet, in my opinion.
There are still a few big releases coming, things that we're waiting to be released to market, which will help simplify that multi-cloud and the ability to migrate up and back, but we're just not there yet.
Gardner: We might be tempted to break this out between applications and data. Application workloads might be a bit more flexible across a continuum of hybrid cloud, but other considerations are brought to the data. That can be security, regulation, control, compliance, data sovereignty, GDPR, and so forth. Are you seeing your customers looking at this divide between applications and data, and how they are able to rationalize one versus the other?
Skelton: Applications, as you have just mentioned, are the simpler things to move into a cloud model, but the data is really the crown jewels of the business, and people are nervous about putting that into public cloud. So what we're seeing lot of is putting applications into the public cloud for the agility, elasticity, and global reach and trying to keep data on-premises because they're nervous about those breaches in the service providers’ data centers.
That's what we are seeing, but we are seeing an uprising of things like object storage, so we're working with Scality, for example, and they have a unique solution for blending public and on-premises solutions, so we can pin things to certain platforms in a secure data center and then, where the data is not quite critical, move it into a public cloud environment.
Gardner: It sounds like you've been quite busy. Please tell us about OCSL, an overview of your company and where you're focusing most of your efforts in terms of hybrid computing.
Rebrand and refresh
Skelton:OCSL had been around for 26 years as a business. Recently, we've been through a re-brand and a refresh of what we are focusing on, and we're moving more to a services organization, leading with our people and our consultants.
We're focusing on transforming customers and clients into the cloud environment, whether that's applications or, if it's data center, cloud, or hybrid cloud. We're trying to get customers on that journey of transformation and engaging with business-level people and business requirements and working out how we make cloud a reality, rather than just saying there's a product and you go and do whatever you want with it. We're finding out what those businesses want, what are the key requirements, and then finding the right cloud models that to fit that.
Gardner: So many organizations are facing not just a retrofit or a rethinking around IT, but truly a digital transformation for the entire organization. There are many cases of sloughing off business lines, and other cases of acquiring. It's an interesting time in terms of a mass reconfiguration of businesses and how they identify themselves.
Skelton: What's changed for me is, when I go and speak to a customer, I'm no longer just speaking to the IT guys, I'm actually engaging with the finance officers, the marketing officers, the digital officers -- that's he common one that is creeping up now. And it's a very different conversation.
Accelerate Your Business With Hybrid Cloud from HPE Learn More
We're looking at business outcomes now, rather than focusing on, "I need this disk, this product." It's more: "I need to deliver this service back to the business." That's how we're changing as a business. It's doing that business consultancy, engaging with that, and then finding the right solutions to fit requirements and truly transform the business.
Gardner: Of course, HPE has been going through transformations itself for the past several years, and that doesn't seem to be slowing up much. Tell us about the alliance between OCSL and HPE. How do you come together as a whole greater than the sum of the parts?
Skelton: HPE is transforming and becoming a more agile organization, with some of the spinoffs that we've had recently aiding that agility. OCSL has worked in partnership with HPE for many years, and it's all about going to market together and working together to engage with the customers at right level and find the right solutions. We've had great success with that over many years.
Gardner: Now, let’s go to the "show rather than tell" part of our discussion. Are there some examples that you can look to, clients that you work with, that have progressed through a transition to hybrid computing, hybrid cloud, and enjoyed certain benefits or found unintended consequences that we can learn from?
Skelton: We've had a lot of successes in the last 12 months as I'm taking clients on the journey to hybrid cloud. One of the key ones that resonates with me is a legal firm that we've been working with. They were in a bit of a state. They had an infrastructure that was aging, was unstable, and wasn't delivering quality service back to the lawyers that were trying to embrace technology -- so mobile devices, dictation software, those kind of things.
We came in with a first prospectus on how we would actually address some of those problems. We challenged them, and said that we need to go through a stabilization phase. Public cloud is not going to be the immediate answer. They're being courted by the big vendors, as everyone is, about public cloud and they were saying it was the Nirvana for them.
We challenged that and we got them to a stable platform first, built on HPE hardware. We got instant stability for them. So, the business saw immediate returns and delivery of service. It’s all about getting that impactful thing back to the business, first and foremost.
Building cloud model
Now, we're working through each of their service lines, looking at how we can break them up and transform them into a cloud model. That involves breaking down those apps, deconstructing the apps, and thinking about how we can use pockets of public cloud in line with the hybrid on-premise in our data-center infrastructure.
They've now started to see real innovative solutions taking that business forward, but they got instant stability.
Gardner: Were there any situations where organizations were very high-minded and fanciful about what they were going to get from cloud that may have led to some disappointment -- so unintended consequences. Maybe others might benefit from hindsight. What do you look out for, now that you have been doing this for a while in terms of hybrid cloud adoption?
Skelton: One of the things I've seen a lot of with cloud is that people have bought into the messaging from the big public cloud vendors about how they can just turn on services and keep consuming, consuming, consuming. A lot of people have gotten themselves into a state where bills have been rising and rising, and the economics are looking ridiculous. The finance officers are now coming back and saying they need to rein that back in. How do they put some control around that?
People have bought into the messaging from the big public-cloud vendors about how they can just turn on services and keep consuming, consuming, consuming.
That’s where hybrid is helping, because if you start to hook up some workloads back in an isolated data center, you start to move some of those workloads back. But the key for me is that it comes down to putting some thought process into what you're putting into cloud. Just think through to how can you transform and use the services properly. Don't just turn everything on, because it’s there and it’s click of a button away, but actually think about put some design and planning into adopting cloud.
Gardner: It also sounds like the IT people might need to go out and have a pint with the procurement people and learn a few basics about good contract writing, terms and conditions, and putting in clauses that allow you to back out, if needed. Is that something that we should be mindful of -- IT being in the procurement mode as well as specifying technology mode?
Skelton: Procurement definitely needs to be involved in the initial set-up with the cloud whenever they're committing to a consumption number, but then once that’s done, it’s IT’s responsibility in terms of how they are consuming that. Procurement needs to be involved all the way through in keeping constant track of what’s going on; and that’s not happening.
The IT guys don’t really care about the cost; they care about the widgets and turning things on and playing around that. I don’t think they really realized how much this is going to cost-back. So yeah, there is a bit of disjoint in lots of organizations in terms of procurement in the upfront piece, and then it goes away, and then IT comes in and spends all of the money.
Gardner: In the complex service delivery environment, that procurement function probably should be constant and vigilant.
Big change in procurement
Skelton: Procurement departments are going to change. We're starting to see that in some of the bigger organizations. They're closer to the IT departments. They need to understand that technology and what’s being used, but that’s quite rare at the moment. I think that probably over the next 12 months, that’s going to be a big change in the larger organizations.
Gardner: Before we close, let's take a look to the future. A year or two from now, if we sit down again, I imagine that more micro services will be involved and containerization will have an effect, where the complexity of services and what we even think of as an application could be quite different, more of an API-driven environment perhaps.
So the complexity about managing your cloud and hybrid cloud to find the right mix, and pricing that, and being vigilant about whether you're getting your money’s worth or not, seems to be something where we should start thinking about applying artificial intelligence (AI), machine learning, what I like to call BotOps, something that is going to be there for you automatically without human intervention.
Hopefully, in 12 months, we can have those platforms and we can then start to embrace some of this great new technology and really rethink our applications.
Does that sound on track to you, and do you think that we need to start looking to advanced automation and even AI-driven automation to manage this complex divide between organizations and cloud providers?
Skelton: You hit a lot of key points there in terms of where the future is going. I think we are still in this phase if we start trying to build the right platforms to be ready for the future. So we see the recent releases of HPE Synergy for example, being able to support these modern platforms, and that’s really allowing us to then embrace things like micro services. Docker and Mesosphere are two types of platforms that will disrupt organizations and the way we do things, but you need to find the right platform first.
Hopefully, in 12 months, we can have those platforms and we can then start to embrace some of this great new technology and really rethink our applications. And it’s a challenge to the ISPs. They've got to work out how they can take advantage of some of these technologies.
Accelerate Your Business With Hybrid Cloud from HPE Learn More
We're seeing a lot of talk about Cervalis and computing. It's where there is nothing and you need to spin up results as and when you need to. The classic use case for that is Uber; and they have built a whole business on that Cervalis type model. I think that in 12 months time, we're going to see a lot more of that and more of the enterprise type organizations.
I don’t think we have it quite clear in our minds how we're going to embrace that but it’s the ISV community that really needs to start driving that. Beyond that, it's absolutely with AI and bots. We're all going to be talking to computers, and they're going to be responding with very human sorts of reactions. That's the next way.
I am bringing that into enterprise organizations for how we can solve some business challenges. Service test management is one of the use cases where we're seeing, in some of our clients, whether they can get immediate response from bots and things like that to common queries, so they don’t need as many support staff. It’s already starting to happen.
Larry Alton posted this in Member Via RSS, security on January 24th, 2017
A blog is one of the most precious digital assets a company has in 2017. It provides SEO value, engages customers, and pushes site visitors through the conversion funnel. But a blog can also be a magnet for malicious behavior and external threats, which is why businesses really need to hunker down on blog security in the coming months.
Here’s How You Can Protect Your Blog
Protecting your blog is a lot like protecting your home. There are many different entry points through which hackers can enter your blog and you’ll have to account for all of them in order to fully protect your investment.
Here are some of the most important things you can do to get started.
1. Choose the Right CMS Platform
Blogging security all starts with the CMS platform you’re using to run your blog. If you’re using a platform with a questionable history of website security, then don’t be surprised when you discover that your blog isn’t as safe as you’d like.
By far, the most secure platform is WordPress. When you set up your blog with WordPress, you can rest assured knowing that you’re relying on a platform that millions of people use. As a result, security updates are continually ...
Kayla Matthews posted this in Cloud, Member Via RSS on January 23rd, 2017
Cloud computing has had a pretty slow start over the past few years,. It makes sense, because there are many security and privacy concerns weighing down the technology.
It looks like that sentiment is finally waning, however. Cloud adoption in 2016 was higher than it’s ever been, and that trend continues as we move into 2017. Most recently, worldwide software company Epicor acquired the cloud-based enterprise content management company, docSTAR, providing just one example of a trend we can expect to see more of throughout the coming year.
Surveys indicate that 17% of enterprises have over 1,000 virtual machines deployed in the cloud, a big increase from 13% in 2015. Furthermore, 95% of survey respondents indicated they utilized cloud services.
As cloud interest — and adoption — rises, enterprises are forced to jump on the bandwagon or risk getting left behind. There are roadblocks and obstacles you may encounter along the way, but more importantly, the benefits far outweigh the risks.
Embracing the Cloud
Cloud solutions now provide benefits like rapid elasticity, proper scaling support, broad network access and on-demand self-service software. And these are quickly becoming necessary in today’s hyper-efficient, technology-oriented world.
This, coupled with the fact that local hardware is aging fast, will push ...
Aditya Rana posted this in Member Via RSS, security on January 23rd, 2017
In 2016 alone, more than 2.2 billion records were exposed in data breaches. Using a password-based authentication with a good hashing scheme like Bcrypt is perfectly fine as long as you can guarantee that your users won’t use easy to guess passwords or reuse passwords at many portals. Both of these assumptions turn out to be wrong in a substantial number of cases, however. There are ample reasons to require using multi-factor authentication. Other than asking your employees and users to use a password manager, here are five strategies that a business can use to protect themselves (that SMS as a second factor is not recommended because of some serious security implications):
This is the second most common way of providing multi-factor authentication (after SMS). HOTP and TOTP are One-Time-Password strategies that generate a secret code to be entered by the user to log in. This secret code has a time-based expiry. The code is shared with the user using an already authenticated application that is installed on the user’s mobile device. Google Authenticator is an example of one such application. Note that any application that supports HOTP/TOTP can be used to login to multiple services, you do not need a new application for each service.
Rehan Ijaz posted this in Member Via RSS, security, Technical on January 23rd, 2017
The vast majority of developers are self-taught. This makes sense, considering how quickly technology evolves and adapts to changing market conditions. It just isn’t feasible to return to universities and coding boot camps every time something changes. But, the most shocking statistic to come out of a recent survey of software developers is that approximately 13% of coders are completely self-taught; completing bypassing traditional education programs.
For the self-taught coder out there, or the intrepid high schooler with a laptop and a passion for creating digital magic, security is something that is usually lacking. I’ve worked on projects with dev teams consisting of individuals with a variety of backgrounds. The common thread, in my experience, is that self-taught coders oftentimes lack the foundational concepts behind securing the content they produce.
Granted, some of these guys started out as hackers, but the majority of white-hat, self-taught developers need the help of formally trained developers in completely securing their project for release into the wild.
1. Collaborate with Your Peers and Build a Reputation the Hard Way
If you’ve learned to code on your own, don’t limit yourself by remaining a solo operator. Teaming up with a group of seasoned experts can dramatically increase your knowledge, ...
Raj Dalal posted this in Big Data, Member Via RSS on January 23rd, 2017
From a simple limo hailing app for friends to the world’s go-to taxi app. Uber’s growth in the approximately 7 years of existence can be described by one word, “Phenomenal”.
But there’s another way to define Uber, one that not many have given thought to. Uber is a Big Data company, on the likes of Google and Amazon. It not only uses existing data in its banks effectively for its business operations, but the process of gathering data – data from drivers, data about drivers, data of passengers, data about passengers, data of traffic systems around the world, transactional data – and analyzing all of it in real time, continues.
BigInsights Principal Raj Dalal caught up with Uber’s Chief Data Architect M C Srivas on his recent trip to San Francisco. In the course of the hour-long conversation, among many things, Srivas spoke of what data analytics means for Uber, and how innovation in data is being used to further what is now popularly known around the world as “the Uber model.”
Raj: I have been tracking for a while now how data can be used to drive “extreme customer service”. Uber has done some exciting stuff, matching supply and demand and estimating ...
Ronald Damhof posted this in Data Architecture, Data Governance, Data Management, Data Quadrant Model, Datascience, Member Via RSS on January 21st, 2017
People familiar with my thinking know that I am a bit of a 'fundamentalist' when it comes to 'data'. I am the guy that pushes the non-sexy part of data; data quality, data governance, metadata, data protection, data integration, semantics, rules, etc..
It is hard to stand your ground in a time where shortermism, technology-fetish and data-populism is thriving. I see ‘data architectures’ in my industry that boils down to superdooper databases, ultrafast massively parallel hardware and of course huge amounts of software that seem to glorify ‘coding’ your way to the promised kingdom.
Call me old school, but I want (data) architectures to separate concerns on various levels (conceptual, logical and technical), dimensions (process, data, interaction) and aspects (law & regulation, people, organisation, governance, culture). Architectures should enable businesses to reach certain goals that (preferably) serve the customer, civilian, patient, student, etc..
Lately I have been studying the ‘datascience’ community, attempting to comprehend how they think, act & serve the common goals of an organisation. I have abandoned (just a bit :-)) my declarative nature of data modelling, semantics, dataquality or governance and I have drowned myself in coursera courses, learning to code in Python, Julia and R. Dusting off my old Statistics books I learned in Uni, installed Anaconda, Jupyter notebook, Rstudio, Git, etc..
And oh my, it is soooo cool. Give me some data, I don’t care what, where it comes from and what it exactly means, but I can do something cool with it. Promise!
Now my problem…
(1) It seems to me that the ‘science’ in ‘datascience’ is on average extremely low to non-existent. Example; I have heard of ‘data science’ labs/environments where the code is not versioned at all & data is not temporal freezed, ergo; reproducibility is next to zero. Discovering any relationship between variables does not mean it is a proven fact, more is needed. Datasciene is not equal to data-analysis with R (or whatever), is it?
(2) There seems to be a huge trust in relevance and quality of data wherever it comes from, whatever its context and however it is tortured. Information sits at the fabric of our universe1, it’s life, it’s the real world. Data is the ‘retarded little brother’ of this ‘information’, it is an attempt of humankind to capture information in a very poor way. Huge amounts of contexts are lost in this capturing. Attempting to retrofit ‘information’ from this ‘retarded brother’ called ‘data’ is dangerous and should be done with great care. Having these conversations with data scientists is hard and we simply seem to completely disconnect.
(3) There seems to be little business-focus, bottom-line focus. Datascientists love to ‘play’, they call it experiment or innovate. I call it ‘play’ (if you are lucky they call their environment ‘sandbox’, wtf?). Playing on company resources should be killed. Experiments (or innovations) start with a hypothesis, something you wanna proof or investigate. You can fail, you can succeed, but you serve the bottom-line (and yes, failing is serving the bottom-line!) and the purpose/mission of an organisation. Datascientists seem to think they are done when they’ve made some fascinating machine learning-, predictive- or whatever-model in their sandbox or other lab-kind-of-environment. Getting this model deployed on scale in a production environment for ‘everyone’ to use, affecting the real world…..that is where the bottom-line value really shines, you are not done until this is achieved.
(4) There seems to be little regard for data protection aspects. The new GDPR (General Data Protection Regulation) is also highly relevant for datascience. Your ‘sandbox’ or your separated research environment needs to be compliant as well! The penalties for non-compliance are huge.
There is huge value in datascience, its potential is staggering and it is soo much fun. But please, stop fooling around. This is serious business with serious consequences and opportunities for everyone and for probably every domain you can think of, whether it be automotive, banking, healthcare, poverty, climate control, energy, education, etc…
The ‘science of data’ and ‘datascience’ are the yin & yang of fulfilling the promise of being truly datadriven. Both are needed.
For my Data Quadrant Model followers; it is the marriage between quadrant I & II versus quadrant III & IV.
1 Increasingly, there is more and more 'evidence' originating from theoretical physics that this statement hold some truth to it, link [Dutch]. I would also like to give attention to this blogpost by John O'Gorman, very insightful, putting DIKW to rest....
Have you ever had any interest in computer, system or software programming? Have you so far experienced any challenges that might be holding you back or almost making you to give up? Many people think that programming is a challenging task but that is wrong. Here is a list of the best practices you can follow to ensure that you perfect your programming skills.
Use a real world problem
Programs are developed to solve a given real world problem or make improvements to existing solutions. One of the best ways to perfect your programming skills is to apply them in real life. Take a given problem in the society and use it as your personal learning idea. Apply the skills you have learned to create a solution to that problem. At the end of it all, you'll have learned something new as well as perfecting what you had already learned.
Take your time, don't rush
When you start understanding a given task, a certain kind of joy comes into your mind and you're tempted to do more complicated tasks without taking your time to think through. This is bad practice since you'll end up wasting more time than if you handled one small bit ...
Audrey Willis posted this in Internet of things, Member Via RSS on January 19th, 2017
There’s been a lot of buzz about driverless cars since the first prototypes were announced. However, aside from a few grim stories about accidents (like the fatal crash involving a driverless Tesla in 2016 due to a combination of sensor and driver failure) and sporadic updates on the progress of driverless cars, most people haven’t seen them on the road—because there aren’t that many out there yet. So what’s really going on with autonomous vehicles in 2017? Let’s find out.
Almost Ready for the Showrooms
This year’s CES (formerly the Consumer Electronics Show) is hosting a special guest: Delphi Automotive Plc, which is showing off its new Audis with self-driving systems that consumers will eventually be able to purchase. This shows that driverless car manufacturers are shifting their focus from proving that the technology works to actually selling the new cars. Showrooms will soon feature these vehicles with autonomous features, and the goal for manufacturers of these systems is to get them in the hands of automakers and public and private transit services. Projections are that these cars will be on the road within about 5 years.
What Features Will They Have?
With self-driving cars coming to a road near you (Uber and Lyft ...
Martin Doyle posted this in Big Data, Member Via RSS on January 19th, 2017
People change jobs, get promoted and move home. Companies go out of business, expand and relocate. Every one of these changes contributes to data decay. It’s been said that business databases degrade by around 30% per year, but why?
A report by IDG states that companies with effective data grow 35% faster year-on-year. However, for this to happen your data needs to have a high level of accuracy, consistency and completeness. Yet for many businesses, data quality is seen as an abstract concept – let’s examine why…
What is data decay?
Data decay refers to the gradual loss of data quality within a system, including key company information, personal details and most importantly, accurate contact information. As a result, the data becomes outdated and often invalid.
Why does data decay so quickly?
The world is constantly changing and sadly data is not immune to that change. From the moment you capture information, your data is at the mercy of processes and systems, as well as a number of human factors:
Collecting data across multiple systems can often lead to inaccuracies, including typos, incomplete information or duplicate records.
If you integrate your systems without a cleansing exercise you are only bringing across your “dirty data”. As a result, ...
Nate Vickery posted this in Big Data, Member Via RSS on January 19th, 2017
Many people think that internet in the US is the fastest in the world. The truth is that the internet network in the United States is fundamentally broken. It requires huge investments in order to compete with the nationwide networks of Norway, Japan, Singapore, South Korea and many other developed countries.
Such a slow internet often obstructs the implementation of various advanced systems, processes and technologies. Since big data requires a complex infrastructure and extremely high computer performance, storing, structuring and analysis processes are sometimes directly affected by the low internet speeds and various other forms of outdated technology.
Why is U.S. internet so slow?
While Tier 1 and Tier 2 networks work relatively fine, the so called ‘last mile’ part of the network drastically decreases overall internet speed. This is the last stretch of infrastructure that connects individual homes and corporate offices with the rest of the network and brings worldwide data directly to our modems.
A huge part of this ‘last mile’ infrastructure is made of outdated coaxial copper cables that have connected our phones to the network since the time of Alexander Graham Bell. These cables have many ‘bottlenecks’ that slow data flow down. Although data easily travels thousands of miles ...
Enterprises are often characterized as repulsive to open source software, both in usage and offerings. Building enterprise software is not only about building software, but also about building processes. Open source software usually lacks long-lasting patrons that can provide support to such processes and thus present itself as a serious contender to proprietary offerings. Therefore enterprises are usually hesitant with adopting open source software solutions and as a result end up not putting their offerings as OSS either, creating an impasse.
A new line of thought has emerged recently, however, in which enterprises are offering their solutions under OSS licenses. This rise is mostly fueled by the success of products like Android, MongoDB, Elasticsearch and many more. Here are 5 reasons why enterprises are choosing to open source their products increasingly:
Builds User Base and Trust
When the users can evaluate the product before making a commitment to it, they have a better fulfillment experience. An important point to note is that OSS does not mean that the product is free of cost. Enterprises usually have two parallel offerings; a Community Edition that is free of cost and an Enterprise Edition that has some additional features, plus other important add-ons like priority support, ...
Ajith Nayar posted this in Big Data, Member Via RSS on January 18th, 2017
In a former life, I worked for a cloud-based retail technology solutions provider looking to bring a retail merchandise planning application to the US market. As with all IT vendors, the actual marketing of the solution preceded the finished product.
In the solution pitch, we talked a lot about top-down and bottom-up planning processes. Top-down planning includes the strategic objectives mandated by management based on a number of inputs, included last year's actual company performance, growth objectives, forecasts and general market indicators. For most, top-down planning was only as granular as the department/category level (most retailers employ at least a four-level merchandise hierarchy often starting with department and/or category). Then the merchandisers work the plan to devise a bottom-up plan. Oftentimes, these plans can go as granular as the SKU by location level, but might be higher in the overall product/store hierarchy.
OK, enough background. This is what I found interesting: when we were building the messaging for the merchandise financial planning solution, the VP of Product Management, our resident retail expert, insisted that the following message was included in the sales pitch, "The bottom-up plan always wins." I remember the conversation like it was yesterday. I looked at her inquisitively and ...
Kent Graziano posted this in #BuiltForTheCloud, Business Benefits, Cloud, cloud data warehouse, Cloud Data Warehousing for Dummies, customer story, customer success, Data Warehouse, Member Via RSS, Snowflake Cloud Data Warehouse, SnowflakeDB on January 18th, 2017
Our industry is full of hype and hyped terms. Big Data. NoSQL. The Cloud. Self-service <whatever>. And Cloud Data Warehousing. Some of the offerings and solutions are real. Some less so. Newest on the scene is cloud data warehousing (or data warehousing in the cloud). As with all new tech, there are a variety […]
How to Protect the Online Reputation of a Small Business
Delia Taylor posted this in Member Via RSS, strategy on January 18th, 2017
When customers conduct online research before spending their money, they want to see excellent reviews, high ratings and competitive prices. According to a Bright Local survey, nearly 92 percent of customers search for online reviews, and an average customer reads two to six reviews before trusting a business. Additionally, 88 percent of customers trust online reviews more than personal recommendations.
Customer reviews play a significant role in determining the online reputation of your small business. However, for small businesses, not paying attention to customer reviews could result in a loss of reputation, revenue and customers. With so many people looking at your brand’s online reputation, it's important to make sure your customer reviews are as positive as possible.
Here are some tips for managing the online reputation of a small business:
1. Build an online presence
You need to create your business website to be able to appear in search results and for providing accurate and relevant information to your visitors. In addition to your business website, you should create an active blog for your small business.
2. Take charge of social media
To support and promote your business activities, you must have an active presence on popular social media platforms such as Facebook, Twitter, LinkedIn, ...
Larry Alton posted this in Member Via RSS, strategy on January 18th, 2017
Technology exists to make our lives easier. Even simple machines and basic tools saved our ancestors countless hours of manual labor; these days, the latest gadgets promise to shave minutes off our already-lightning-fast tasks, help us communicate more efficiently, or even fully automate tasks that once populated our to-do lists.
When you purchase a new device, upgrade to a new kind of software, or phase out some antiquated technology, your instinct tells you that your team will be more productive—but what do you have to back that up? Some companies, like Dialpad, have been able to run general studies; confirmation that eliminating desk phones can save a company more than $1 million over the course of 6 months. But how do they calculate this figure? And more importantly, how can you make these calculations for your own investments in technology, before you even pull the trigger on them?
Potential Values of Upgrades
First, make a list of all the ways that your planned purchase would benefit your company. These are some of the most common ways:
Reduced direct costs. Technology could help you eliminate direct costs altogether. For example, if you’re paying hundreds of dollars a month for a subscription service that could be ...
Andrew Deen posted this in Internet of things, Member Via RSS, Robotics on January 17th, 2017
For some, drones are nothing but a punch line—Amazon’s next attempt at replacing humans in their day-to-day operations, or a device that crashes at the drop of the hat. To others, they’re a sign of an uneasy future—a future they’re not comfortable with. Despite these misgivings, there are a lot of exciting uses for drones, and they’re earning a spot in our world. Mapping is something we all take for granted, but it takes a lot of work to keep those maps up-to-date and secure. It can be extremely frustrating when they’re not. The big players—Google and Apple—still use trucks to provide map data—but that might be changing in the future.
Bring in the Drones
Apple has had a hard time competing with Google’s dominance in the mapping space, but it may have found the answer: drones. The company has already gotten approval from the Federal Aviation Administration (FAA) to move forward with the plan, though drone laws are always evolving, and may not fall in Apple’s favor. The drones may not be coming right away, but it’s an option that will help Apple compete with Google’s massively popular Maps application. The drones, in addition to the other features the company is building ...
Ken Mafli posted this in Internet of things, Member Via RSS, security on January 17th, 2017
In a recent study in 2016, Gartner estimates that 6.4 billion connected devices will be used by companies worldwide. By 2020, it will top 20 billion. The increasing use of IoT devices in our corporate networks brings the promise of unparalleled coordination and productivity. But it also brings the promise of multiplied threat vectors and vulnerabilities.
Companies must react to this changing environment. The old methods of data protection relied on defensible perimeters around large, corporate data centers. More companies, however, are adopting mobile, IoT, and cloud technologies that live outside the data center. The perimeter is now becoming a thing of the past.
Below are seven strategies to help equip the enterprise business meet the challenges of the current (and future) technology landscape. This, by no means, is exhaustive. Rather, they are stepping stones to create a defense-in-depth model as you secure your sensitive data.
The First Thing to Realize: There is No Perimeter
Back in 2014, Google laid the groundwork for looking at data protection in a whole new way with BeyondCorp, their internal data security initiative, co-authored by Rory Ward and Betsy Beyer. Here is part of their opening statement:
The perimeter security model works well enough when all employees work exclusively in buildings owned by an ...
Kali Geldis posted this in Member Via RSS, security on January 17th, 2017
For the first time in history, shoppers are spending more money online than in brick-and-mortar retail stores. According to non-adjusted estimates from the U.S. Department of Commerce, online sales continue to show steady year-on-year growth - last year’s web sales were in excess of $342 billion, accounting for more than a third of the total retail sales growth that year.
Considering the figures, it's unsurprising that more retailers and small businesses are looking to break into online sales. The potential for increased income from a thriving virtual marketplace is not without new threats, however. Because even a single breach of a company’s data security has the potential to ruin a thriving business, adopting cybersecurity measures is critical.
Here are five steps to help you get started.
1. Use a Secure Ecommerce Platform
Any business owner looking to start an online store requires an ecommerce platform, which is a software technology solution for creating and hosting a digital storefront — a means of offering goods for sale and accepting payment via the internet. Outdated platforms are security risks that business owners can combat via the use of a reliable platform that they monitor regularly. These platforms aren’t necessarily cheap, so do your research and make ...
JT Ripton posted this in AR / VR, Member Via RSS, strategy on January 17th, 2017
There’s no consensus on when business became all about technology, but it probably happened somewhere around the advent of the cloud. Business went from using technology to being about technology; if your business isn’t taking advantage of the latest technology advances, it is an almost irreparable competitive disadvantage. This is the case not just for technology companies, but for all business in 2017.
Because technology matters so much, especially to small business that always must act more nimble than larger enterprises, business technology trends matter. A lot. So whether or not you consider your business about technology, you should keep these five emerging technology trends in mind.
1. Interactive Broadcasting for SMB Grows
Facebook Live and Twitter’s Periscope introduced the Western world to interactive broadcasting, also known as livestreaming. But even though this trend is just now picking up steam in the U.S., China and other markets in Asia already are showing the power of this new technology.
“There’s some pretty original innovation going on in Asia right now around interactive broadcasting,” notes Tony Zhao, CEO of real-time communications platform for SMBs, Agora.io. “Companies in China have figured out that they can create compelling content by letting their customers drive these live broadcasts, and they’ve ...
Dana Gardner posted this in artificial intelligence, Big Data, bots, BriefingsDirect, Dana Gardner, data analysis, Hewlett Packard Enterprise, HPE, HPE IDOL, Interarbor Solutions, LogitBot, machine learning, Member Via RSS, Michael Bishop, Mutsiya Ndunda on January 17th, 2017
The next BriefingsDirect Voice of the Customer digital transformation case study highlights how high-performing big-data analysis powers an innovative artificial intelligence (AI)-based investment opportunity and evaluation tool. We'll learn how LogitBot in New York identifies, manages, and contextually categorizes truly massive and diverse data sources.
By leveraging entity recognition APIs, LogitBot not only provides investment evaluations from across these data sets, it delivers the analysis as natural-language information directly into spreadsheets as the delivery endpoint. This is a prime example of how complex cloud-to core-to edge processes and benefits can be managed and exploited using the most responsive big-data APIs and services.
Gardner: Let’s look at some of the trends driving your need to do what you're doing with AI and bots, bringing together data, and then delivering it in the format that people want most. What’s the driver in the market for doing this?
Ndunda: LogitBot is all about trying to eliminate friction between people who have very high-value jobs and some of the more mundane things that could be automated by AI.
Today, in finance, the industry, in general, searches for investment opportunities using techniques that have been around for over 30 years. What tends to happen is that the people who are doing this should be spending more time on strategic thinking, ideation, and managing risk. But without AI tools, they tend to get bogged down in the data and in the day-to-day. So, we've decided to help them tackle that problem.
Gardner: Let the machines do what the machines do best. But how do we decide where the demarcation is between what the machines do well and what the people do well, Michael?
Bishop: We believe in empowering the user and not replacing the user. So, the machine is able to go in-depth and do what a high-performing analyst or researcher would do at scale, and it does that every day, instead of once a quarter, for instance, when research analysts would revisit an equity or a sector. We can do that constantly, react to events as they happen, and replicate what a high-performing analyst is able to do.
Gardner: It’s interesting to me that you're not only taking a vast amount of data and putting it into a useful format and qualitative type, but you're delivering it in a way that’s demanded in the market, that people want and use. Tell me about this core value and then the edge value and how you came to decide on doing it the way you do?
Ndunda: It’s an evolutionary process that we've embarked on or are going through. The industry is very used to doing things in a very specific way, and AI isn't something that a lot of people are necessarily familiar within financial services. We decided to wrap it around things that are extremely intuitive to an end user who doesn't have the time to learn technology.
So, we said that we'll try to leverage as many things as possible in the back via APIs and all kinds of other things, but the delivery mechanism in the front needs to be as simple or as friction-less as possible to the end-user. That’s our core principle.
Humanization of Machine Learning For Big Data Success Learn More
Bishop: Finance professionals generally don't like black boxes and mystery, and obviously, when you're dealing with money, you don’t want to get an answer out of a machine you can’t understand. Even though we're crunching a lot of information and making a lot of inferences, at the end of the day, they could unwind it themselves if they wanted to verify the inferences that we have made.
We're wrapping up an incredibly complicated amount of information, but it still makes sense at the end of the day. It’s still intuitive to someone. There's not a sense that this is voodoo under the covers.
Gardner: Well, let’s pause there. We'll go back to the data issues and the user-experience issues, but tell us about LogitBot. You're a startup, you're in New York, and you're focused on Wall Street. Tell us how you came to be and what you do, in a more general sense.
Ndunda: Our professional background has always been in financial services. Personally, I've spent over 15 years in financial services, and my career led me to what I'm doing today.
In the 2006-2007 timeframe, I left Merrill Lynch to join a large proprietary market-making business called Susquehanna International Group. They're one of the largest providers of liquidity around the world. Chances are whenever you buy or sell a stock, you're buying from or selling to Susquehanna or one of its competitors.
What had happened in that industry was that people were embracing technology, but it was algorithmic trading, what has become known today as high-frequency trading. At Susquehanna, we resisted that notion, because we said machines don't necessarily make decisions well, and this was before AI had been born.
Internally, we went through this period where we had a lot of discussions around, are we losing out to the competition, should we really go pure bot, more or less? Then, 2008 hit and our intuition of allowing our traders to focus on the risky things and then setting up machines to trade riskless or small orders paid off a lot for the firm; it was the best year the firm ever had, when everyone else was falling apart.
That was the first piece that got me to understand or to start thinking about how you can empower people and financial professionals to do what they really do well and then not get bogged down in the details.
Then, I joined Bloomberg and I spent five years there as the head of strategy and business development. The company has an amazing business, but it's built around the notion of static data. What had happened in that business was that, over a period of time, we began to see the marketplace valuing analytics more and more.
Make a distinction
Part of the role that I was brought in to do was to help them unwind that and decouple the two things -- to make a distinction within the company about static information versus analytical or valuable information. The trend that we saw was that hedge funds, especially the ones that were employing systematic investment strategies, were beginning to do two things, to embrace AI or technology to empower your traders and then also look deeper into analytics versus static data.
That was what brought me to LogitBot. I thought we could do it really well, because the players themselves don't have the time to do it and some of the vendors are very stuck in their traditional business models.
Bishop: We're seeing a kind of renaissance here, or we're at a pivotal moment, where we're moving away from analytics in the sense of business reporting tools or understanding yesterday. We're now able to mine data, get insightful, actionable information out of it, and then move into predictive analytics. And it's not just statistical correlations. I don’t want to offend any quants, but a lot of technology [to further analyze information] has come online recently, and more is coming online every day.
For us, Google had released TensorFlow, and that made a substantial difference in our ability to reason about natural language. Had it not been for that, it would have been very difficult one year ago.
At the moment, technology is really taking off in a lot of areas at once. That enabled us to move from static analysis of what's happened in the past and move to insightful and actionable information.
Relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach.
Ndunda: What Michael kind of touched on there is really important. A lot of traditional ways of looking at financial investment opportunities is to say that historically, this has happened. So, history should repeat itself. We're in markets where nothing that's happening today has really happened in the past. So, relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach that can actually incorporate things that are nontraditional in many different ways.
So, unstructured data, what investors are thinking, what central bankers are saying, all of those are really important inputs, one part of any model 10 or 20 years ago. Without machine learning and some of the things that we are doing today, it’s very difficult to incorporate any of that and make sense of it in a structured way.
Gardner: So, if the goal is to make outlier events your friend and not your enemy, what data do you go to to close the gap between what's happened and what the reaction should be, and how do you best get that data and make it manageable for your AI and machine-learning capabilities to exploit?
Ndunda: Michael can probably add to this as well. We do not discriminate as far as data goes. What we like to do is have no opinion on data ahead of time. We want to get as much information as possible and then let a scientific process lead us to decide what data is actually useful for the task that we want to deploy it on.
As an example, we're very opportunistic about acquiring information about who the most important people at companies are and how they're connected to each other. Does this guy work on a board with this or how do they know each other? It may not have any application at that very moment, but over the course of time, you end up building models that are actually really interesting.
We scan over 70,000 financial news sources. We capture news information across the world. We don't necessarily use all of that information on a day-to-day basis, but at least we have it and we can decide how to use it in the future.
We also monitor anything that companies file and what management teams talk about at investor conferences or on phone conversations with investors.
Bishop: Conference calls, videos, interviews. Audio to text
Ndunda: HPE has a really interesting technology that they have recently put out. You can transcribe audio to text, and then we can apply our text processing on top of that to understand what management is saying in a structural, machine-based way. Instead of 50 people listening to 50 conference calls you could just have a machine do it for you.
Gardner: Something we can do there that we couldn't have done before is that you can also apply something like sentiment analysis, which you couldn’t have done if it was a document, and that can be very valuable.
Bishop: Yes, even tonal analysis. There are a few theories on that, that may or may not pan out, but there are studies around tone and cadence. We're looking at it and we will see if it actually pans out.
Gardner: And so do you put this all into your own on-premises data-centerwarehouse or do you take advantage of cloud in a variety of different means by which to corral and then analyze this data? How do you take this fire hose and make it manageable?
Bishop: We do take advantage of the cloud quite aggressively. We're split between SoftLayer and Google. At SoftLayer we have bare-metal hardware machines and some power machines with high-power GPUs.
Humanization of Machine Learning For Big Data Success Learn More
On the Google side, we take advantage of Bigtable and BigQuery and some of their infrastructure tools. And we have good, old PostgreSQL in there, as well as DataStax, Cassandra, and their Graph as the graph engine. We make liberal use of HPE Haven APIs as well and TensorFlow, as I mentioned before. So, it’s a smorgasbord of things you need to corral in order to get the job done. We found it very hard to find all of that wrapped in a bow with one provider.
We're big proponents of Kubernetes and Docker as well, and we leverage that to avoid lock-in where we can. Our workload can migrate between Google and the SoftLayer Kubernetes cluster. So, we can migrate between hardware or virtual machines (VMs), depending on the horsepower that’s needed at the moment. That's how we handle it.
Gardner: So, maybe 10 years ago you would have been in a systems-integration capacity, but now you're in a services-integration capacity. You're doing some very powerful things at a clip and probably at a cost that would have been impossible before.
Bishop: I certainly remember placing an order for a server, waiting six months, and then setting up the RAID drives. It's amazing that you can just flick a switch and you get a very high-powered machine that would have taken six months to order previously. In Google, you spin up a VM in seconds. Again, that's of a horsepower that would have taken six months to get.
Gardner: So, unprecedented innovation is now at our fingertips when it comes to the IT side of things, unprecedented machine intelligence, now that the algorithms and APIs are driving the opportunity to take advantage of that data.
Let's go back to thinking about what you're outputting and who uses that. Is the investment result that you're generating something that goes to a retail type of investor? Is this something you're selling to investment houses or a still undetermined market? How do you bring this to market?
Natural language interface
Ndunda: Roboto, which is the natural-language interface into our analytical tools, can be custom tailored to respond, based on the user's level of financial sophistication.
At present, we're trying them out on a semiprofessional investment platform, where people are professional traders, but not part of a major brokerage house. They obviously want to get trade ideas, they want to do analytics, and they're a little bit more sophisticated than people who are looking at investments for their retirement account. Rob can be tailored for that specific use case.
He can also respond to somebody who is managing a portfolio at a hedge fund. The level of depth that he needs to consider is the only differential between those two things.
In the back, he may do an extra five steps if the person asking the question worked at a hedge fund, versus if the person was just asking about why is Apple up today. If you're a retail investor, you don’t want to do a lot of in-depth analysis.
Bishop: You couldn’t take the app and do anything with it or understand it.
If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.
Ndunda: Rob is an interface, but the analytics are available via multiple venues. So, you can access the same analytics via an API, a chat interface, the web, or a feed that streams into you. It just depends on how your systems are set up within your organization. But, the data always will be available to you.
Gardner: Going out to that edge equation, that user experience, we've talked about how you deliver this to the endpoints, customary spreadsheets, cells, pivots, whatever. But it also sounds like you are going toward more natural language, so that you could query, rather than a deep SQL environment, like what we get with a Siri or the Amazon Echo. Is that where we're heading?
Bishop: When we started this, trying to parameterize everything that you could ask into enough checkboxes and forums pollutes the screen. The system has access to an enormous amount of data that you can't create a parameterized screen for. We found it was a bit of a breakthrough when we were able to start using natural language.
TensorFlow made a huge difference here in natural language understanding, understanding the intent of the questioner, and being able to parameterize a query from that. If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.
I can't imagine having to go back to a SQL query if you're able to do it natural language, and it really pans out this time, because we’ve had a few turns of the handle of alleged natural-language querying.
Gardner: And always a moving target. Tell us specifically about SentryWatch and Precog. How do these shake out in terms of your go-to-market strategy? How everything relates
Ndunda: One of the things that we have to do to be able to answer a lot of questions that our customers may have is to monitor financial markets and what's impacting them on a continuous basis. SentryWatch is literally a byproduct of that process where, because we're monitoring over 70,000 financial news sources, we're analyzing the sentiment, we're doing deep text analysis on it, we're identifying entities and how they're related to each other, in all of these news events, and we're sticking that into a knowledge graph of how everything relates to everything else.
It ends up being a really valuable tool, not only for us, but for other people, because while we're building models. there are also a lot of hedge funds that have proprietary models or proprietary processes that could benefit from that very same organized relational data store of news. That's what SentryWatch is and that's how it's evolved. It started off with something that we were doing as an import and it's actually now a valuable output or a standalone product.
Precog is a way for us to showcase the ability of a machine to be predictive and not be backward looking. Again, when people are making investment decisions or allocation of capital across different investment opportunities, you really care about your forward return on your investments. If I invested a dollar today, am I likely to make 20 cents in profit tomorrow or 30 cents in profit tomorrow?
We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process. That will give you these forward expectations about stock returns in a very easy-to-use format, where you don't need to have a PhD in physics or mathematics.
We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process.
You just ask, "What is the likely return of Apple over the next six months," taking into account what's going on in the economy. Apple was fined $14 billion. That can be quickly added into a model and reflect a new view in a matter of seconds versus sitting down in a spreadsheet and trying to figure out how it all works out.
Gardner: Even for Apple, that's a chunk of change.
Bishop: It's a lot money, and you can imagine that there were quite a few analysts on Wall Street in Excel, updating their models around this so that they could have an answer by the end of the day, where we already had an answer.
Gardner: How do the HPE Haven OnDemand APIs help the Precog when it comes to deciding those sources, getting them in the right format, so that you can exploit?
Ndunda: The beauty of the platform is that it simplifies a lot of development processes that an organization of our size would have to take on themselves.
The nice thing about it is that a drag-and-drop interface is really intuitive; you don't need to be specialized in Java, Python, or whatever it is. You can set up your intent in a graphical way, and then test it out, build it, and expand it as you go along. The Lego-block structure is really useful, because if you want to try things out, it's drag and drop, connect the dots, and then see what you get on the other end.
For us, that's an innovation that we haven't seen with anybody else in the marketplace and it cuts development time for us significantly.
Gardner: Michael, anything more to add on how this makes your life a little easier?
Bishop: For us, lowering the cost in time to run an experiment is very important when you're running a lot of experiments, and the Combinations product enables us to run a lot of varied experiments using a variety of the HPE Haven APIs in different combinations very quickly. You're able to get your development time down from a week, two weeks, whatever it is to wire up an API to assist them.
In the same amount of time, you're able to wire the initial connection and then you have access to pretty much everything in Haven. You turn it over to either a business user, a data scientist, or a machine-learning person, and they can drag and drop the connectors themselves. It makes my life easier and it makes the developers’ lives easier because it gets back time for us.
Gardner: So, not only have we been able to democratize the querying, moving from SQL to natural language, for example, but we’re also democratizing the choice on sources and combinations of sources in real time, more or less for different types of analyses, not just the query, but the actual source of the data.
The power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents.
Ndunda: Again, the power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents. In the past, you'd have to have a team of people to scour through text, extract what they thought was valuable, and summarize it for you. You could miss out on 90 percent of the other valuable stuff that's in the document.
With this ability now to drag and drop and then go through a document in five different iterations by just tweaking, a parameter is really useful.
Gardner: So those will be IDOL-backed APIs that you are referring to.
Bishop: It’s something that would be hard for an investment bank, even a few years ago, to process. Everyone is on the same playing field here or starting from the same base, but dealing with unstructured data has been traditionally a very difficult problem. You have a lot technologies coming online as APIs; at the same time, they're also coming out as traditional on-premises [software and appliance] solutions.
Humanization of Machine Learning For Big Data Success Learn More
We're all starting from the same gate here. Some folks are little ahead, but I'd say that Facebook is further ahead than an investment bank in their ability to reason over unstructured data. In our world, I feel like we're starting basically at the same place that Goldman or Morgan would be.
Gardner: It's a very interesting reset that we’re going through. It's also interesting that we talked earlier about the divide between where the machine and the individual knowledge worker begins or ends, and that's going to be a moving target. Do you have any sense of how that changes its characterization of what the right combination is of machine intelligence and the best of human intelligence?
Ndunda: I don’t foresee machines replacing humans, per se. I see them empowering humans, and to the extent that your role is not completely based on a task, if it's based on something where you actually manage a process that goes from one end to another, those particular positions will be there, and the machines will free our people to focus on that.
But, in the case where you have somebody who is really responsible for something that can be automated, then obviously that will go away. Machines don't eat, they don’t need to take vacation, and if it’s a task where you don't need to reason about it, obviously you can have a computer do it.
What we're seeing now is that if you have a machine sitting side by side with a human, and the machine can pick up on how the human reasons with some of the new technologies, then the machine can do a lot of the grunt work, and I think that’s the future of all of this stuff.
I don’t foresee machines replacing humans, per se. I see them empowering humans.
Bishop: What we're delivering is that we distill a lot of information, so that a knowledge worker or decision-maker can make an informed decision, instead of watching CNBC and being a single-source reader. We can go out and scour the best of all the information, distill it down, and present it, and they can choose to act on it.
Our goal here is not to make the next jump and make the decision. Our job is to present the information to a decision-maker. Gardner: It certainly seems to me that the organization, big or small, retail or commercial, can make the best use of this technology. Machine learning, in the end, will win.
Ndunda: Absolutely. It is a transformational technology, because for the first time in a really long time, the reasoning piece of it is within grasp of machines. These machines can operate in the gray area, which is where the world lives.
Gardner: And that gray area can almost have unlimited variables applied to it.
Technological advancements have reduced global poverty significantly in the past 100 years. Although many people have been able to leave poverty due to this, however, still more than 1.3 billion people live in extreme poverty. Extreme poverty is defined as having less than $1.25 to spend every day. There are a wide variety of causes for poverty, which differ per country, but in general causes for poverty include: lack of education, environmental problems, lack of access to banking facilities, lack of legal ownership of property, lack of rule of law, overpopulation, epidemic diseases or changing trends in a country’s economy.
Overcoming poverty is vital if we want to create a world that is peaceful and fair for everyone. Besides, in 2015 the United Nations adopted the Sustainable Development Goals, which includes challenging global leaders to help end poverty in all its forms, everywhere, by 2030. In this article, I will argue why and how Blockchain could help achieving this goal. Breaking the cycle of poverty begins with investing in children and providing them with quality education, knowledge and skills to enable them to realise their full potential. Next to education comes access to affordable proper health care, access to clean water ...
Jorge Garcia posted this in Analytics, automation, Big Data, Data Management, DevOps, digital transformation, Hadoop, interviews, IT, IT operations, machine learning, Member Via RSS, open-source, rocana on January 16th, 2017
It is in this context that we had the chance to have an excellent interview with Eric Sammer, CTO and Co-Founder of Rocana who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.
Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at Cloudera where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.
He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.
Eric is also the author of Hadoop Operations published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.
Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?
Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.
These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.
Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.
Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.
We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.
How would you describe what Rocana Ops is?
Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.
A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.
Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?
I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.
There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.
Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is Conway’s Law in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.
IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.
We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.
What are the major pains of IT Ops and how Rocana helps them to solve this pains?
Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.
Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.
Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.
Could you describe Rocana’s main architecture?
Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.
As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.
Rocana's General Architecture (Courtesy of Rocana)
A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.
Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.
What are the basic or general requirements needed for a typical Rocana deployment?
Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.
Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.
A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.
As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.
What are then, the key functional differentiators of Rocana?
Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.
Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.
Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.
Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.
Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?
Rocana Ops 2.0 is really exciting for us.
We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.
Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.
One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!
How does Rocana interact with the open source community, especially the Apache Hadoop project?
Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.
The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.
Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.
Finally, what is in your opinion the best and geekiest song ever?
Now you’re speaking my language; I studied music theory. Lateralus by Tool for the way it plays with the fibonacci sequence and other math without it being gimmicky or unnatural. A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.
Bill Franks posted this in Big Data, Member Via RSS on January 16th, 2017
In recent times, I have read a number of articles lamenting the frequent lack of value resulting from large-scale analytics and data science initiatives. While I have seen substantial value driven from many efforts, I have also seen examples where the results were very poor. My belief is that oftentimes the problems can be boiled down to one basic mistake. Namely, thinking that generating predictions, forecasts, or simulations is enough. It is not.
Predictions Are The Starting Point…
Almost by definition, advanced analytics or data science initiatives involve applying some type of algorithm to data in order to find patterns. These algorithms are typically then used to generate one or more of the following:
Predictions about future events. For example, who is most likely to respond to a given offer?
Forecasts of future results. For example, what sales can we expect from the upcoming promotion?
Simulations of various scenarios. For example, what will happen if I shift some of my budget from paid search to television advertising?
There are other uses of algorithms and nuances between different types of predictions, but for our purposes here these three examples suffice and illustrate the point.
In each case, the output is information about what might be expected in the ...
Big Data processing and analysis are penetrating the healthcare industry further and further. Telemedicine, EHR, wearables, and Internet of Things getting more and more popular, the medical sphere is accumulating more and more healthcare records. To make this data useful, practitioners need a convenient access to the information, as well as the ability to easily interpret and interact with it. So, let's see how the medical specialists use Big Data approaches.
Electronic Health Records
EHR is real-time electronically-stored information about a patient (demographics, allergies, test results, medical history, etc.) in a digital record. The benefit it brings is clear: less paperwork, more control over data, an authorized instant access to it of all the clinicians involved, and automated workflow.
Wearables have already conquered the consumer electronics market: they are used by millions of people. IDTechEX expects the wearables market to grow to $40 billion in 2018. The devices can not only track the number of steps a person makes per day but also monitor chronic diseases (Parkinson's, diabetes, heart diseases) and send the information directly to the medical specialists. In such a way, loads of healthcare information is accumulated and doctors get large healthcare databases that can be used in the treatment ...
Francesco Corea posted this in artificial intelligence, Member Via RSS on January 16th, 2017
It seems to me that the hype about AI makes really difficult for experienced investors to understand where the real value and innovation are. I would like then to humbly try to bring some clarity to what is happening on the investment side of the artificial intelligence industry.
We have seen as in the past the development of AI has been stopped by the absence of funding, and thus studying the current investment market is crucial to identify where AI is going. First of all, it should be clear that investing in AI is extremely cumbersome: the level of technical complexity goes out of the pure commercial scope, and not all the venture capitalists are able to fully comprehend the functional details of machine learning. This is why the figures of the “Advisors” and “Scientist-in-Residence” are becoming extremely important nowadays. Those roles would also help in setting the right expectations level, and figuring out what is possible and what is not.
AI investors are also slightly different from other investors: they should have a deep capital base (it is still not clear what approach will pay off), and a higher than usual risk tolerance: investing in AI is a marathon, and it might take ten years ...
Jaymin Dangi posted this in Big Data, Member Via RSS on January 13th, 2017
You can't make good decisions unless you have good information to support them. For instance, you can't justify hiring new workers unless you know that orders are up or customer traffic is higher in your store. Data can also help you decide whether or not to accept credit cards or whether to cater more to your online customers as consumer shopping preferences evolve. What are some other ways that big data can influence your business?
Big Data May Influence Your Return or Refund Policy
Many companies have limited return policies or strict refund policies that dictate whether a customer may receive cash or store credit when a product is brought back after purchase. When determining how to handle a customer return, you should consider how much money you stand to lose if you honor the return compared to how much you stand to lose if you stick to your policy.
For instance, let's say a customer comes to your customer service desk and says that he or she wants to return a hair dryer purchased last week. However, that person doesn't have a receipt, or the receipt was clearly from another purchase. Let's also say that the hair dryer retailed for $20. ...
Chris Low posted this in Member Via RSS, strategy on January 12th, 2017
Businesses that work on an omnichannel marketing strategy rely on what is called a ‘Single Customer View’ to monitor customer behavior patterns. Essentially, a Single Customer View (or SCV for short) refers to an aggregated view of a customer that ties in their visits from multiple channels and avoids duplication or dilution of customer data. For instance, SCV ensures that a customer who visits your website to check your products and then calls up your office to place an order gets tracked as one unique customer and not two.
Setting up an SCV is challenging for more than one reason. Firstly, it is not always possible to track down visitor credentials across multiple channels. A customer calling up your sales team could do so from their landline while they sign up on your website with their mobile number. It may hence not always be possible to consolidate these details into a SCV pattern. Secondly, the quality of data extracted from channels like TV or radio could be insufficient to map a clear picture of your customer.
Businesses typically make use of a Data Management Platform (DMP) to handle customer data. This is essentially a data warehouse software that pulls in data from ...
Rick Delgado posted this in Big Data, Member Via RSS on January 11th, 2017
Fields within data science have continued to prove their worth as in-demand, rewarding careers. Unlike broader subjects like information science, data science is about utilizing data in the search for trends and prevailing tendencies, which can help produce worthwhile concepts and insight.
As Big Data continues to accumulate, an even greater amount of research and focus has been placed on assessing the collective of shared information across the world. As a result, data science is about more than just analyzing and understanding the past. It is now becoming an essential component for developing predictive trends to better understand the best course of action across all industries.
Valuable Skills with Impressive Salaries
Data scientists are charged with integrating a variety of skills – including database compilation and access, AI, mathematics, machine-based learning, and more. The development of these predictive trends yields highly valuable insight that companies are willing to pay handsomely to acquire and curate.
Take a look at these statistics on typical data scientist salaries:
The average salary rests at $117,000 per year – with $89,200 as the low and $242,000 the impressive high-end.
Data scientists can also expect perks such as an average $24,500 of annual bonus, equity levels above $50,000, and signing bonuses which ...
Charlotte McKee posted this in artificial intelligence, Member Via RSS on January 11th, 2017
Life after death is a topic that has kept some of the greatest minds occupied for millennia. It’s something we tend to shy away from talking about, but unfortunately, death and grieving are inevitable realities of being human. With ground-breaking research happening in Natural Language Processing and Artificial Intelligence, could technology hold the answer we’ve been looking for?
When you suffer loss, there is a longing to speak to the deceased again – and technology is making this possible. Death today comes with the uncanny nature of leaving behind a digital footprint, a legacy of social media posts, videos, pictures and text messages – what are the living meant to do with these? Feed them into an artificial neural network and create a chatbot version of the deceased, obviously.
These controversial projects are looking to how we can leverage deep learning to extend our lives by imitating an extension of life through a chatbot, or ‘griefbot’ – yes, you’re right this is very Black Mirror.
We’ve come a long way since ELIZA, Joseph Weizenbaum’s computer program from the mid-1960’s that responded to questions like a psychotherapist by using predetermined phrases, simply repeating back the user’s questions in different formats to form an ...
Tech has come a long way over the years. The world ten years ago, was radically different from the world of today. Despite the best efforts of tech experts, they have failed to predict many of the innovations we take for granted today. Here are the biggest tech gadgets nobody predicted a decade ago.
The Smartphone Revolution
Tech experts knew that smartphones were going to be big. But what they never would have expected is for smartphones to take over conventional desktop computers and laptops. Smartphones were expected to be the natural evolution of the mobile phone, but nobody expected them to take over like they did.
These days a smartphone is so much more than a device for SMS messaging and making calls. It’s used for practically everything. Most people couldn’t live without a smartphone.
It’s true that the idea of virtual reality has existed for decades, but a decade ago nobody expected it to become a reality. Today virtual reality is at the forefront of gaming, retail, and a variety of other industries. It’s yet to hit the mainstream, but it’s advancing at an incredible rate. Virtual reality is set to disrupt multiple industries and nobody predicted it would have arrived so soon.