Data Supply Framework 3.0 – ETL Patterns
This article is the first in a series of articles that discuss aspects of the use of architectural patterns in the Cambriano Information Supply Framework 3.0
The term architectural pattern may sound grand, misleading or daunting, but it’s really quite a simple concept. It’s like writing a function in a programming language to log in to a database, check that the connection is alive and working and report back the success of the connection request. If that function can be reused either in the same application development, in the same IT shop or in IT in general (e.g. Java code to connect and test the connection to SQL Server) then it’s well on its way to becoming an architectural pattern. Of course, there are much more sophisticated architectural patterns. But generally a pattern is a simplified and generic template for address a generally occurring problem. But as with much in architecture, less usually turns out to be more.
In this article I will be looking at patterns for the process known as ETL (Extract, Transform and Load), which is the typical mechanism used to take date from source systems (which may be systems of record) through transformation (to put the data into ...Read More on Datafloq
Seven Magnificent Big Data Success Stories
Big data has arrived. Big Data is here for keeps. Big Data is the future.
Despite some of the malicious, mendacious and malodorous words of naysayers, sceptics and contrarians, the world of big data and big data analytics is replete with totally amazing and fabulous success stories.
Big Data gurus are often accused of not delivering coherent, cohesive and verifiable accounts of Big Data successes. Which is understandable but at the same time a pity. So here, to illustrate this miraculous and remarkable turnaround, I give you not three but seven of the many Big Data success stories that I could have casually grabbed out of the ether.
First, we take a trip to Glasgow to discover the leveraging of Big Data in alternative investments. Then we pass over to Boston to explore the magic of Big Data at Universal Legal.
The Richy Rich Student Debt Mega Alpha Fund – Big Data and Corporate Welfare
Govan based Hedge Fund operators RCN are proudly leveraging Big Data to the max. Their Student Debt Mega Alpha Fund is one of the most imaginative schemes in the whole of the financial industry landscape, from Singapore, through Soho, to Stateside.
RCN use Big Data in innovative, unique and inventive ways. ...Read More on Datafloq
Big Data on the Roof of the World
Once upon a time, there was a mountain known as Peak 15. Very little was known about it. Then in 1852, surveyors found it was the highest in the world, and they named it Everest.
As with other significant challenges that we can identify in life, many people have been driven by a passionate desire to conquer peaks all around the world. This is just one illustration of those of us who can identify their significant challenges and rise to them. This sharp focus, determination and courage turns ordinary citizens into people who are invariably on a mission. People who know what they want.
When we know what we want to accomplish, almost anything flowing from that can be driven by a single unified goal, objective and mission.
From the outset, having a clear idea of the significant challenges and why we want to address those significant challenges is far more important than knowing how we are going to go about addressing that challenge.
Of course, we should also have an idea of how we might go about meeting our challenges, whilst at the same time recognising that few things waylay legitimate ambitions more than being held-hostage to the means. This requires that we ...Read More on Datafloq
Free Business Analytics Content – Part 5
Why buy when you can get it for free?
Back at you! Here is the fifth fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you.
Corporate Social Responsibility (CSR) Analytics – Sometimes seen as mere window dressing, CSR is ostensibly corporate self-regulation that seeks to ensure that a business not only actively complies with the letter of the law but also with the spirit of the law. “CSR strategies encourage the company to make a positive impact on the environment and stakeholders including consumers, employees, investors, communities, and others.” http://en.wikipedia.org/wiki/Corporate_social_responsibility
CSR Analytics can be used to measure and influence the management of a number of key business facets related to CSR, including how CSR can affect and influence an organisation’s performance with regards to:
It’s competitive situation and advantage
It’s ability to attract, acquire, retain and nurture workers, associates, customers, clients and users
The morale, commitment and productivity of employees and associates
The views of stakeholders, investors, owners, donors, sponsors and the financial community
Its relationship with companies, governments, political parties, unions, pressure groups, peers and industry governing and advisory bodies, the media, suppliers, customers and the communities in which it operates
So, clearly, ...Read More on Datafloq
Free Business Analytics Content – Part 4
Why buy when you can get it for free?
Back at you! Here is the fourth fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you.
Data Mining – Data mining is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets (“big data”) involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Data mining is the analysis step of the “knowledge discovery in databases” process, or KDD. https://en.wikipedia.org/wiki/Data_mining
Another interesting article on the subject is Raymond Li’s piece titled ‘Top 10 Data Mining Algorithms, Explained’ which is available on Gregory Piatetsky-Shapiro’s great KDNuggets site: http://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html
Decision analysis (DA) is the discipline comprising the philosophy, theory, methodology, and professional practice necessary to address important decisions in a formal manner. Decision ...Read More on Datafloq
Free Business Analytics Content – Part 3
Why buy when you can get it for free?
Back at you! Here is the third fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you. You can find part 1 here and part 2 here.
Unmet needs analytics – aspects of needs analysis. A needs assessment is a systematic process for determining and addressing needs, or “gaps” between current conditions and desired conditions or “wants”. The discrepancy between the current condition and wanted condition must be measured to appropriately identify the need. The need can be a desire to improve current performance or to correct a deficiency. https://en.wikipedia.org/wiki/Needs_assessment
Need theory, also known as Three Needs Theory, proposed by psychologist David McClelland, is a motivational model that attempts to explain how the needs for achievement, power, and affiliation affect the actions of people from a managerial https://en.wikipedia.org/wiki/Need_theory
The concept Information need is seldom, if ever, mentioned in the general literature about needs, but is a common term in the literature of information science. According to Hjørland (1997) it is closely related to the concept ofrelevance: If something is relevant for a person in relation to a given task, we might say that the person needs the information for that task. https://en.wikipedia.org/wiki/Information_needs
Market size analytics. ...Read More on Datafloq
Free Business Analytics Content – Part 2
Why buy when you can get it for free?
Back at you! Here is the second fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you.
Electronic data capture. An Electronic Data Capture (EDC) system is a computerized system designed for the collection of clinical data in electronic format for use mainly in human clinical trials. EDC replaces the traditional paper-based data collection methodology to streamline data collection and expedite the time to market for drugs and medical devices. EDC solutions are widely adopted by pharmaceutical companies and clinical research organizations (CRO). https://en.wikipedia.org/wiki/Electronic_data_capture
You may also be interested in this video from REDCap Saudi Arabia – Eledtronic Data Capture Fundamentals 1: https://www.youtube.com/watch?v=Z5LK16mrvW8
Sensor fusion. Sensor fusion is combining of sensory data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision (calculation of depth information by combining two-dimensional images from two cameras at slightly different viewpoints). https://en.wikipedia.org/wiki/Sensor_fusion
You may also be vaguely interested in this video from Google titled Sensor Fusion on ...Read More on Datafloq
Free Business Analytics Content – Thanks to Wikipedia – Part 1
Why buy when you can get it for free?
Here is the first fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you.
A/B testing is a way to compare two versions of a single variable typically by testing a subject’s response to variable A against variable B, and determining which of the two variables is more effective. https://en.wikipedia.org/wiki/A/B_testing
Choice modelling attempts to model the decision process of an individual or segment via Revealed preferences or stated preferences made in a particular context or contexts. Typically, it attempts to use discrete choices (A over B; B over A, B & C) in order to infer positions of the items (A, B and C) on some relevant latent scale (typically “utility” in economics and various related fields). https://en.wikipedia.org/wiki/Choice_modelling
Adaptive control is the control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. https://en.wikipedia.org/wiki/Adaptive_control
Multivariate Testing. In marketing, multivariate testing or multi-variable testing techniques apply statistical hypothesis testing on multi-variable systems, typically consumers on websites. ...Read More on Datafloq
How Hadoop Has Truly Revolutionised IT
This is the story of how the amazing Hadoop ecosphere revolutionised IT. If you enjoy it then consider joining The Big Data Contrarians.
Before the advent of Hadoop and its ecosphere, the IT was a desperate wasteland of failed opportunities, archaic technology and broken promises.
In the dark Cambrian days of bits, mercury delay lines and ferrite core, we knew nothing about digital. The age of big iron did little to change matters, and vendors made huge profits selling systems that nobody could use and even less people could understand.
Then along came Jurassic IT park, in the form of UNIX, and suddenly it was far cheaper to provide systems that nobody can use and even less people could understand.
The sad, desperate and depressing scenario that typified IT, on all levels, spanned forty years. It would have continued had it not been for Google and their HDFS (Hadoop Distributed File Store).
Before Hadoop, we were as dumb as rocks. With Hadoop, we were lead into the Promised Land of milk and honey, digital freedom and limitless opportunities, sexy jobs and big bucks, immortality and designer drugs.
Hadoop and its attendant ecosphere changed the Information Technology world overnight, providing as it did, technology and techniques never ...Read More on Datafloq
12 Amazing Big Data Success Stories for 2016
Every year I ask myself the same question. Will there be any tangible, coherent and verifiable Big Data success stories in the coming year? Every year I come up with nothing. Nothing at all. “Sorry, no rooms at the Big Data Success Inn, as we are closed for vacations.”
However, this year things are different. More positive, more alive and more fantastic.
As you can probably guess, I am well excited to be able to reach out and tell you about the twelve amazingly fab Big-Data stories that will appear during the course of 2016. The year of the incredible, startling and awesome Big Data monkey.
To this end, and as this is a magically special occasion, I have made an extra-special effort to deliver the goods, to do full justice to the task, and to go that extra Big Data kilometer for my demanding readership.
So, I gazed into Madame Frufru’s crystal ball, I opened up the kimono with the Ouija spirits of Von Neumann, Babbage and Jobs, and I pushed the envelope in the vast disruptive solution-spaces habited by Ada Augusta, Audrey Tautou and Jennifer Saunders… and, I came back with the best of the best.
I only hope it was all worth ...Read More on Datafloq
Why Are There So Many ‘Fake’ Big Data Gurus?
Why so many ‘fake’ Big Data Gurus?
Where do you all come from?
Where do you all come from?
All your integrity’s gone
Now tell me, where do you all come from?
From ‘Where Do You All Come From‘ by Mott the Hoople
And now for something completely different…
You may have noticed the massive relative-growth in the number of people who are describing themselves as Big Data gurus, data science Kaisers or analytics evangelists. Okay, I exaggerate to evidence the trend, but you’ll hopefully get the gist. Your fellow comrade on the picket line, that sweetie you met at the Pitt Club — even your darling masseuse has had their carte de visite transformed according to the prevailing désir de Jour.
Many people out there in the big data world suddenly call themselves ‘Big Data gurus’ simply because it is the latest vogue. The Caerfyrddin Good Pub Guide even went so far as to say that, “adding Big Data to your job title was the equivalent of sexing up a dodgy dossier”. They also later suggested that boosting your resume with the judicious incorporation of a titillating title, such as Big Data analytic pole-dancer, may get you a few chuckles, even if most people don’t understand and ...Read More on Datafloq
What Makes a Great Data Architect, Truly Great
As a child, I had a great love of stories of Spain, of the idea of travelling through the Iberian Peninsula and of mastering, and not just learning, the classical Spanish guitar. One of the phrases that stuck with me from those days was the in underivable quote of “amateurs practice until they get it right; professionals practice until they can’t get it wrong.”
In my professional working life, I have striven to identify those things that I want to be sufficiently competent at doing and those things that I consider a fundamental part of my professional competence, and then in making a clear distinction between the two.
As many of those who know me will know, a significant part of my professional life has been dedicated to the architecture and management of data, information and structured intellectual capital. Therefore, in the light of this fact and with reference to the previous bit of whimsical fancy, I will address the following question posed to me some time ago: What makes a great Data Architect, truly great?
What follows is by no means an exhaustive list of essential elements, but it should give you a flavour of what a great Data Architect is.
ONE – ...Read More on Datafloq
The Hadoop Honeymoon is Over
Listen up Big Data playmates! The ubiquitous Big Data gurus, tied up in their regular chores of astroturfing mega-volumes, velocities and varieties of superficial flim flam, may not have noticed this, but, Hadoop is getting set up for one mighty fall – or a fast-tracked and vertiginous black run descent. Why do I say that? Well, let’s check the market.
According to Gartner there is “continuing enthusiasm for the big data phenomenon”. However, “demand for Hadoop specifically is not accelerating.” So what’s up doc?
The Hadoop companies have been spending big-time on marketing, and I mean big, and lo and behold, the sales are dipping as fast as the prospects evaporate. You might even have noticed this. Of course, this can’t go on forever. You cannot maximise your marketing spend when your revenues are heading south, and not even the most naïve of angels or other types of investors will keep up with misguided corporate-welfare for too long. Even if you’re flogging a moribund yellow elephant, you make bank or you die.
So what went wrong?
First of all the Hadoop people got well ahead of themselves. Their own enthusiasm for the toys they helped to ‘create’ got the better of them. But first, ...Read More on Datafloq
Taming Big Data: 3 Reasons to Reduce the Data Footprint
Simply stated, the best applications of Big Data is in systems and methods that will significantly reduce the data footprint.
Why would we want to reduce the data footprint?
Years of knowledge and experience in information management strongly suggests that more data does not necessarily lead to better data.
The more data there is to generate, move and manage, the greater the development and administrative overheads.
The more data we generate, store, replicate, move and transform, the bigger the data, energy and carbon footprints will become.
How can Big Data reduce Big Data?
We can use it in profiling, in order to identify the data that could be useful.
We can use it to identify immaterial, surplus and redundant data.
By using it to catalogue, categorise and classify certain high-volume data sources.
What can we do with the Big Data profile data?
We can use it to audit, analyse and review the generation, storage and transmission of data.
We can use the data to parameterise data generators and filters, and
To be used to generate ‘Big-Data-by-exception’ discrimination rules and as the basis for data discrimination based on directed machine-learning approaches.
So why would we do all of this?
We hear that Big Data represents a significant challenge.
The best way of dealing with significant challenges is ...Read More on Datafloq
The Big Data Workout: Five Examples of a Data Governors Approach
To begin at the beginning
Miss Piggy said, “Never eat more than you can lift”. That statement is no less true today, especially when it comes to Big Data.
The biggest disadvantage of Big Data is that there is so much of it, and one of the biggest problems with Big Data is that few people can agree on what it is. Overcoming the disadvantage of size is possible; overcoming the problem of understanding may take some time.
As I mentioned in my piece Taming Big Data, “the best application of Big Data is in systems and methods that will significantly reduce the data footprint.” In that piece I also outlined three conclusions:
Taming Big Data is a business, management and technical imperative.
The best approach to taming the data avalanche is to ensure there is no data avalanche – this approach is moving the problem upstream.
The use of smart ‘data governors’ will provide a practical way to control the flow of high volumes of data.
“Data Governors”, I hear you ask, “What are Data Governors?”
Let me address that question.
Simply stated, the Data Governor approach to Big Data obtuseness is this:
The Big Data Governor’s role is to help in the purposeful and meaningful reduction of the ...Read More on Datafloq
What is all the fuss about Dark Data? Big Datas New Best Friend
What is Dark Data?
Dark data, what is it and why all the fuss?
First, I’ll give you the short answer. The right dark data, just like its brother right Big Data, can be monetised – honest, guv! There’s loadsa money to be made from dark data by ‘them that want to’, and as value propositions go, seriously, what could be more attractive?
Let’s take a look at the market.
Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes” (IT Glossary – Gartner)
Techopedia describes dark data as being data that is “found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.” (Techopedia – Cory Jannsen)
Cory also wrote that “IDC, a research firm, stated that up to 90 percent of big data is dark data.”
In an interesting whitepaper from C2C Systems it was noted that “PST files and ZIP files account for nearly 90% of dark data by IDC Estimates.” and that dark data is “Very simply, all those ...Read More on Datafloq
Consider this: Big Data is not Data Warehousing
Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.
I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big Data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.
Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.
In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.
Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.
Data Warehousing is ...Read More on Datafloq
The Amazing World of Freds Big Data
Hold this thought: There are real golden nuggets of data that many organisations are oblivious to. But first let’s look at business process management.
Business process management, what is there not to like. It has revolutionised the way we organise and do business. Right?
Your business is applying, measuring and managing the process, right?
The process is king, it’s a great process, and your organisation loves it, right?
There is no process other than the known, approved, socialised and proven process, right?
Well, we’ll see about that.
You may think that your best data is in the enterprise data warehouse (Bill or Ralph). I will not argue with that.
You may consider that the gold rush starts and ends in the exploitation of Big Data (what I like to call All Data). You may have a point, or not.
You may speculate that your best data may come from external social media sources? Well, that’s a commonly held view, and who am I to challenge a commonly held view, for as erroneous and popularly acclaimed as it may be.
But, if you thought all this were true, would you be right?
Now, consider this…
In companies, good, mediocre and bad, people follow processes, they follow these processes as best as they ...Read More on Datafloq
Big Data, a Promised Land Where the Big Bucks grow
Consider this. Many people come up to me in the street, and, apropos of nothing, they ask me how they can make money from Big Data.
Normally I would send such people to see a specialist – no, not a guru, but a sort of health specialist, but because this has happened to me so many times now, I eventually decided to put pen to paper, push the envelope, open up the kimono, and to record my advice for posterity and the great grandchildren.
So, here are my top seven tips for cashing in quick on the new big thing on the block.
1 – A business Opportunity For Faith
Like every new religion, trend or fad, Big Data has its own founding myths, theology and liturgy, and there is money to be made in it; loadsa lovely jubbly money. By predicating and evangelising Big Data you will be welcomed with open arms into the Big Data faith, and will receive all the attendant benefits that will miraculously and mysteriously fall upon you and your devout friends. Go on, I dare you. Be a Big Data guru, a shepherd to a flock of sheep, and enjoy the wealth, health and happiness that most surely ...Read More on Datafloq
Why Social Media Big Data killed Ad-land
Hold this thought: Big Data is the future of online business and interactive advertising is its profit.
Much is being made of Big Data and its role in social media and online interactive advertising. The advertising industry itself has a “big crush” on Big Data, and it fuels the elevated revenues, profits and share prices of a number of online companies.
In 2011, a predominant social media company in preparation for their IPO announced that 85% of their $3.5B revenue came from advertising.
In the same year another high profile internet company made 96% of their revenue of $37B from advertising.
There are quite a few examples of internet companies that make the overwhelming bulk of their revenue from online advertising.
Much of the attraction of these companies can be found in the tremendous volumes of ‘social media’ data that they gather, mine and use to in the target the interactive advertising of brands, products and services.
Now consider this: advertising legend Dave Trott reckons that 89% of advertising is ignored, and who would argue with Dave Trott.
What does that mean?
Out of every $1M spent on advertising, $890K is money down the drain.
Let’s raise the stakes a little. If $37B is spent on advertising, and our ...Read More on Datafloq
Consider this: Big Data and the Analytics Data Store
To begin at the beginning
Hold this thought: If Data Warehousing was Tesco then Big Data would be the “try something different”.
Since the publication of the article Aligning Big Data, which basically laid out a draft view of DW 3.0 Information Supply Framework and placed Big Data within a larger framework, I have been asked on a number of occasions recently to go into a little more detail with regards to the Analytics Data Store (ADS) component. This is an initial response to those requests.
To recap, the overall architecture consists of 3 major components: Data Sources; Core Data Warehousing; and, Core Analytics.
Data Sources – This element covers all the current sources, varieties and volumes of data available which may be used to support processes of ‘challenge identification’, ‘option definition’, decision making, including statistical analysis and scenario generation.
Core Data Warehousing – This is a suggested evolution path of the DW 2.0 model. It faithfully extends the Inmon paradigm to not only include unstructured and complex data but also the information and outcomes derived from statistical analysis performed outside of the Core Data Warehousing landscape.
Core Statistics – This element covers the core body of statistical competence, especially but not only with regards to evolving data volumes, ...Read More on Datafloq
Consider this: Aligning Big Data
In order to bring some semblance of simplicity, coherence and integrity to the Big Data debate I am sharing an evolving model for pervasive information architecture and management.
This is an overview of the realignment and placement of Big Data into a more generalized architectural framework, an architecture that integrates data warehousing (DW 2.0), business intelligence and statistical analysis.
The model is currently referred to as the DW 3.0 Information Supply Framework, or DW 3.0 for short.
In a previous piece with the name of Data Made Simple – Even ‘Big Data’, I looked at three broad-brush classes of data: Enterprise Operational Data; Enterprise Process Data; and, Enterprise Information Data. The following is a diagram taken from that piece:
Fig. 1 – Data Made Simple
In simple terms the classes of data can be defined in the following terms:
Enterprise Operational Data – This is data that is used in applications that support the day to day running of an organisation’s operations.
Enterprise Process Data – This is measurement and management data collected to show how the operational systems are performing.
Enterprise Information Data – This is primarily data which is collected from internal and external data sources, the most significant source being typically Enterprise Operational Data.
These ...Read More on Datafloq