It is all about the people (10th Peter Drucker Forum)

It is all about the people (10th Peter Drucker Forum)

Last week I attended the 10th Peter Drucker Forum in Vienna, it was the second time I attended this forum and I consider it ‘food for the soul’. This year the theme was ‘The Human Dimension’. The greatest thinkers on management, leadership and organisation come together and discuss the state of things.

And in my humble opinion, we are in dire straits.

6E28C7DE-3A29-4370-9BFC-CC4BEF98E223

Let’s face it, organisations are killing people or - as described by Gary Hamel - organisations have become in-human by design. Huge bureaucracies are ‘emotional deadzones’, creativity and innovation is impossible, ideas are often approached in a hostile manner, greed and testosteron have become main drivers.

Organisations are not a safe place to engage in a truly meaningful conversation about improving oneself, the organisation and society. Management and leadership are increasingly distancing themselves of the problems and challenges that we face, they put no skin in the game. Blaming if something goes wrong, celebrating their great leadership if something went right. Or, according to Mintzberg; “we got too much disconnected leadership in organisations�.

In government this is even worse. Government has been derailed for decades by bureaucratic mindsets, powerbased scoundrels and the very defintion of silos. Talent management in government is a joke until they fix its culture, leadership habits and processes (Dan Pontefract). 

Government is broken and we need to fix it. 3959E02D-CFE5-4082-A342-C0F3623C1360

It is a challenge to change this, but I believe we can. And this message was heard during the two days as well. Just start with acknowledging this great injustice to humans, the devastating effect it has on productivity, quality of service, the ability to innovate and the ability of organisations to be socially responsible (dear organisation, answer this question; in terms of purpose, why do you exists except for making more money?).

We can change, but it all starts with leadership being selfless, mindful and compassionate (Rasmus Hougaard). Listen!, be curious, ask questions (Hal Gregersen), facilitate decision-making (resist making decisions yourself) in small teams of people, reward initiative and competency. Power is not given to you by hierarchy, but its given to you if you take initiative (Xavier Huilard).

Or to put it more succinctly; just start giving a shit.

Reflection of oneself is vital in all of this. Don’t just act selfless, be it. Don’t just act social responsible, proof it. Don’t just act compassionate, let me feel it.

And I am writing this plea not only for management or leaders, I am also writing it to myself and others as well, because I truly belief that we are all leaders. 

“Have some skin in the game�, stop rewarding yourself 500 times (or more) the average wage. Stop accumulating obscene amounts of wealth and underpaying your employees (Jeff Bezos, looking at you). Stop striving for global monopolies and crushing human rights and values in the process (Amazon, Google, Facebook, Uber, ....). Stop grabbing, start giving. Oh my, this is starting to sound like a sermon now.

I was impressed by Paul Polman, CEO of Unilever. What a passionate man, I felt his anger, driven by purpose, the common good and acknowledging that his company can change something for the better. Not only talking about it, but acting on it! Unilever has lost a great leader (after 10 years he will retire as CEO of Unilever), I really hope Paul remains to be on the stage. We need people like him.

Stop talking, start walking........Humans first.

Datascience; more bang for the buck

Datascience; more bang for the buck

Bang

'I made some Python code that really rocks, Ronald'

'It extracts data from various sources, validates it, does some cleansing, codes xxx business rules, lots of integration of course, executes some kind of predictive model, outputs the data and visualizes it'.

And then the magical words are uttered; lets deploy it to production……

Alas,, the magic of datascience ends abruptly, IT is probably being blamed for not being agile and architects are scorned for being too restrictive and even killing innovation in the process.

Datascience has got a problem, it fails to operationalize its brilliant models and it therefor fails to deliver value to the business. There I said it, I know, pretty polarizing, but I encounter it on a daily basis now. Datascience needs to grow up....

It’s all about: (1) the required quality of services and (2) separating concerns.  Both seem to be not that important in datascience. It should though!

Let me clarify;

Quality of services

Two use cases;

(a) deploying a riscmodel at scale (lets say 500K transactions per day) that evaluates a transaction (from a customer) in realtime based on contra-information and determining in the process the level of supervision needed. Oh and by the way; one has to take into account ‘equality of right’ since the organization is a publicly owned organization.

(b) doing a one-time analysis on various sources, using advanced machine learning and where the output is used for a one-time policy influencing decision.

The quality of services between (a) and (b) are like night and day. (a) needs to be run at scale, realitime (direct feedback), using contra-information, the provenance is hugely important, it is subject based, so there are privacy concerns, it’s an automated decision (there is heavy legal shit here), equality of rights (metadata like; what model did we use on what transaction, what data did we evaluate,…) and many more.

(b) is a one-off….its output influences new policy or contributes to some insight. Your quality of services might be that the model is versioned, properly annotated and that the dataset is somehow archived properly to ensure repeatability. 

My point is that, whenever you start on an analytic journey, establish the quality of services that you require on forehand as much as possible. And for that to happen you need to have a clear explicit statement on how the required informationproduct contributes to the bottom line. So yes, a proper portfolio management process, a risk based impact assessment (!) and deployment patterns (architecture!) that are designed in advance!

With regard to datascience it is vital to make a conscious choice, before you start, of the required quality of services. If these services are high, you might wanna work together closely with system engineers, datamodelling experts, rule experts, legal experts, etc.. Only then, you might be able to deploy stuff and generate the value the field of datascience promises us.

 

Separation of concerns

For those who do not know what ‘separation of concerns’ means, start with Wikipedia or google Edsger Dijkstra, one of the greatest (Dutch) computer scientist…..

Anything IT related is suffering from the ‘overloading concerns’ issue. Some examples;

  • XBRL is a great standard, but suffers from overloading; integrity, validation, structure, meaning and presentation concerns are all bundled into one technical exchange format.
  • Datavault is a great technical modeling paradigm, but it does not capture logical, linguistic or semantic concerns, and yet the data modelling community still tries
  • Archimate is a great modeling notation in the Enterprise Architecture arena, why is it overloaded with process concerns? BPMN is such a better choice.

And of course we see it in code and we have seen it for ages in all programming languages; human tendency to solve all challenges/problems with the tool they are dominantly familiar/trained with. Datascience is no different. Failing to separate concerns lies at  the root of many software related problems like maintainability, transparancy, changeability, performance, scaleability and many many more. 

Like the example I started with in this blog;

A brilliant Python script where a staggering number of concerns have all been dealt with. This might not be a problem when the required quality of services is not that high. But when the required quality of service are high it becomes painfully clear that ‘deploying’ this code to production is a fantasy.

Extraction, validation, cleansing and integration concerns might be better dealt with by making use of tools and techniques in the information (modeling) arena.

Business rules might be better of designed (for example) by means of Rulespeak and subsequently making them more transparent for legal people and domain experts (which is btw – especially in AI – a huge concern!).

Visualization and presentation might be better of by using  tools that are already purchased by your organization, be it Tableau, SAS Visual Analytics, Qlik, Tibco or whatever.

 

Finally

Dear datascientist, stop blaming IT, architects or whoever that your brilliant code is not being deployed in production. Instead, reflect on your own role and your own part in the total supply chain of stuff that need to happen to actually get things working in production at the quality of services that are required.

Dear organization, stop blaming datascientists for not delivering on the value which was promised. Start organizing, consciously, the operationalization of datascience. It is not a walk in the park, it is an assignment that requires an extremely broad skillset, an holistic view, cooperation and of course attention to human behavior/nature.

And the majority of these challenges fall with management!

Starting a datascience team or department without organizing the operationalisation is a waste of resources. 

Operationalization of datascience IS NOT a technical problem!

For my dataquadrant fans; it is all about transitioning from quadrant IV to II.

 

 

 

 

Musings about Gartner Data & Analytics in London

Musings about Gartner Data & Analytics in London

The last three days I was visiting the Gartner conference in London.

It has been a mixed experience. In terms of data strategy, organization and culture I can honestly say that some sessions were really inspiring. I caught myself re-thinking my own convictions and extending my scope to areas that matter as well, and that is always a good thing. I liked the keynote with the four central themes; Diversity, Trust, Complexity and (data) Literacy. These themes  abstract – as it should – from technology and introduce explicitly the human factor and aspects of uncertainty. In data, I think Gartner kinda nailed it with these themes. I recognized them all in my daily work as a Data Architect.

The second keynote of David Rowan – founding editor-in-chief of WIRED - was simply brilliant and scary at the same time. His central theme was that we stand on the tippingpoint of finally having the ethical discussion. What an excellent timing, considering the facebook privacy scandal.

I also enjoyed sessions of Alan Duncan about the Chief Data Officer and Nick Heudecker. The latter analyst I followed for some time now and he certainly didn’t let me down. Excellent, down to earth sessions on blockchain and datalakes.

What I also liked were the datascience platform vendors, really cool stuff that let you leverage open source interpreted languages (R, Python), solves a great deal of governance pain (e.g. versioning, deployment, collaboration, protection, etc..), give you options as to where the processing is done, abstract (drag-and-drop interfaces) – if you desire – from the code, etc.. Screen Shot 2015-07-07 at 10.16.32

Gartner is also very aware of the operationalization issue with datascience. In their terms, going from mode 2 to mode 1. In our terms, going from
quadrant III/IV to quadrant I/II (see figure). In my opinion this is the holy grail in data, leveraging datascience brilliance for the whole organization and serving the bottom line.

To be honest, this holy grail is not solved, not by Gartner and not by me. On twitter some excellent discussions unfolded in this regard. 

But now the not-so-good stuff. In terms of data architecture, Gartner is stuck.  They confine data very narrow, as in data-for-analytics. And therein lies a fundamental problem. We have to move on and view the data holistically, from conception to retention where data is used for many use cases, not only analytics and subject to many constraints (like privacy, transparancy, etc..). Furthermore, lots (the majority) of problems (meaning, (time) consistency, validation, integration, transparency, etc..) in data are created upstream, but Gartner seems to have given up and decided that we need to deal with the problems downstream. A first example would be their logical datawarehouse concept. Now, I have to be honest, I do not like this concept, at all. I think it is deeply flawed. Why?

First, think of this logical datawarehouse as a beautiful patio, green grass, nice swimming pool, birds are singing. Underground however, right beneath the surface, the ground is highly contaminated. Would you want your children to play in this patio?

Second, Gartner seems to be stuck in metanarratives like a datalake, datawarehouse and even a datahub. The latter one is completely vague to me, it sounds like a datalake-warehouse-wannabe. These metanarratives are still very much technological bound, ambiguously defined and calling them logical is just a façade. Lets abstract from these technical bounded concepts and talk about an organisational capability to systematically ingest, create, connect, validate, integrate and disseminate data taking into account contextual aspects like transparancy, protection, consistency, etc..

Third, Gartner seems to be oblivious of vertical data architecture. This is the architecture of data in rest, as opposed to the horizontal data architecture, where data is in flow. Gartner seems to be hugely biased towards flow/logistics, getting data from a to b. A vertical data architecture would  start (depending on the concerns you want to address) with natural language, translating it to ontologies, fact based models and finally truly logical models that drive the technical models (often automated), persist data only once and virtualize the logical domain model.  Such an architecture would result in logical consistent data, not a logical data warehouse!

Fourth, the logical datawarehouse concept seems to be very much associated to datavirtualisation technology, which makes it a lot less logical.    

A second, after the fact solution, Gartner seems to be very fond of, are Masterdata Management (MDM) solutions. In my view MDM is a legacy problem and stems from a lack of holistic (vertical) data-architecture. A solid vertical data architecture solves many masterdata problems at its core and prevents the constant remediating of symptoms.

A third after the fact solution is this infatuation with the availability of data. Lets get all the data!! Why? Because we (technically) can….There seems to be no conscious decision between availability and consistency. Two concerns that lie at the heart of data and where business needs to weigh in, not the technology.  Furthermore, with GDPR coming, it deeply violates the data minimalisation principle. 

Getting to the root of things; it is like modern medicine. We keep on curing the symptoms with nasty side effects, but we never engage in prevention. It is vital in my opinion that an holistic data-strategy and -architecture is adopted! It is like the brilliant closing keynote of Tuesday regarding sleep deprivation by Professor Matt Walker. If you sleep enough you will age healthier, the scientific research is overwhelming. If you don’t, your chances of getting sick, cancer, heart diseases or psychological disorders are increasing fast! We are now hooked to hugely expensive medicines that cure or alleviate only the symptoms…..

Why not try to sleep a bit longer…..

 

 

Data architecture in a digital world; empowering the data driven enterprise

Data architecture in a digital world; empowering the data driven enterprise

Several months ago, Daan Rijsenbrij approached me (Ronald Damhof) and Martijn Evers to assist him in researching data architecture as it manifests itself in organisations. 

Daan: "In the Netherlands an international research is performed about data architecture. The purpose of this research is to investigate the maturity in the thinking about & working with data in modern enterprises. You can download a survey from drop box: https://lnkd.in/ghVCqSx The survey consists of a short intro, a scoring list and a list with 12 open questions. Answer only those items that are relevant according to you in your situation. Please return the filled in survey to Daan as an attachment to an email: daan@rijsenbrij.eu." 

The survey is written by me and Martijn and reviewed by a dozen or so fellow data architects. The intro of the survey is copied 1:1 in the next section. We would like to urge architects - affiliated with data architecture - to do the survey, return it to Daan and help us in advancing data architecture.

Architecture

Nowadays, one frequently hears senior executives, management consultants or strategists proclaim the phrase: “Data is an asset.� While not necessarily incorrect, it is usually a hollow phrase because it is often misunderstood and seldom operationalized. With data comes the need and responsibility to manage it in a dedicated and professional manner, both from a liability point-of-view as well as to create the necessary conditions to truly leverage its potential to add value.

Data is not an asset like financial capital, which can be spent. Nor is it like human capital, which walks out of the door when it sees a better opportunity. Data asset are different: they are uniquely yours, closely tied to your business language and processes, full of nuance, always defined in a specific context and it provides the ability to generate new data (assets).

Data is also unique in that it depreciates in a nonphysical and often undetectable way, losing its meaning, accuracy and/or relevance as time goes by. Every organization can use its data in its own way to differentiate itself. When data is consumed in your organization, it will not be depleted nor will it expire (like a patent). No other type of asset has these particular characteristics.

Data defines other assets. For example, it reveals your financial state, it holds a reliable record of your employees or customer behavior. Arguably, data is the ultimate proprietary asset. And unlike technology, it cannot be commoditized. Data uniquely defines the state and meaning of an organization and its intrinsic value, which cannot be transferred to another organisation.

While all other assets are managed consciously by entire departments with ample resources to do so, data is often considered a by-product of information systems, something often perceived as technological. Or as Frank Buytendijk of Gartner put it: “Most companies manage their parking-lot better than their data.�

In all the hype and buzz surrounding #BigData #InternetOfThings #Datascience #MachineLearning #Digitization and the like, technology is perceived as a primary differentiator. Well, it's not.

While technology is essential for any business, it is usually of lesser importance in terms of competitive advantage. Technological innovations may give a company a competitive advantage, but the effect is only temporary. Sooner or later, innovations will be copied by competitors. Technology tends to become commoditized over time.

Ignoring the hype that surrounds new data-related technology, the common (success) factor is always data that is relevant, reliable, consistent and timely. Investments that neglect data quality requirements are doomed to fail.

The technological evolution continues to accelerate, yet ultimately it’s not about the technology. “IT does not matter�. What does matter is architecture. We need to understand the data, how it’s used, and to build a supporting data infrastructure. This requires fundamental thinking, not buying. It requires a holistic approach, not a siloed approach. You cannot buy your way out of the data misery you are in - it takes blood, sweat and tears. Or in other words: stamina, discipline, trust and courage, especially by senior management.

How well an organization is able and willing to incorporate this approach ultimately determines how well it can leverage its data assets for the benefit of its customers and operational excellence. It is this approach that we need to focus our energy on, instead of indiscriminately following the latest fad, promise or glossy brochure, with an unfounded belief that this new technology will solve our data issues.

There is no quick fix!

It is imperative to realize that data is a foundational element that drives and leverages innovation, business transformations, acquisitions and other organizational-critical events; data architecture is the guardian of this foundation.

Curing method-illness in Enterprise Architecture

Curing method-illness in Enterprise Architecture

Years ago my son came home with some geography homework. He had to learn the countries and capitals of the European continent. When I was practicing with him I encountered the country Yugoslavia...Now, this is 2010 I am talking about, Yugoslavia is split up into (a.o) Serbia and Montenegro. I honestly thought this was a mistake (old material used by the teacher), so I went to the teacher and asked her whether we could get an update on the material. This was her response:

"The method has not been updated yet"

This is – in my opinion – a small symptom of what is very wrong with education (which is completed saturated with this ‘method-illness), society as a whole, Sick and – more particular - Enterprise Architecture. History tells us that these beliefs, like a religion, in a method and blindly following it and demanding others should too, will not end well. Methods used in this way are killing critical thinking, creativity and innovation.

Methods, best practices, checklists and other attempts to mask (especially to management and society) the inherent uncertainty of the work/world we’re in, is extremely damaging to people, organisations and society.

Only ‘systems’ (using a very broad definition of ‘system’ here) that are simple, linear, predictable and have a generic context might benefit from this method-approach. In my world of data-enterprise-IT-Business-Blabla-architecture, these systems do simply not exist.

The moment a system gets complicated, complex or even chaotic* it all breaks down and it becomes dangerous on so many levels when design-participants of these systems still act as if it’s a simple-linear-method-based-system. The inherent complexity and uncertainty of these systems require a deep dive into the context (real world) surrounding the system. It requires architects to experiment, try something, rinse-and-repeat, being a full member of the operationalisation (!) and hanging in there (!), learning, discussing. And yeah, for many architects this is scary….

I understand its temptation, the phrase ‘we use TOGAF’ or ‘we have written a PSA1’ sends a vibe of trust and safety (‘he, I followed the method, not my fault’) and is highly rewarded by senior management. What is not rewarded is stating this uncertainty (‘What do you mean, you don’t know, that’s why I hired you’). Make senior management part of the uncertainty, figure out together how to approach it, how we design a learning-by-doing mentality as well as a mutual respect for one another and emerging insights (!).

We ‘the architects’ should stand firm in stating the message ‘this is complicated, we can do it, but I am not sure how’.

How do we need to change? We need to go back to the fundamentals, stir away from the various technical hypes, distance ourselves from architectural methods. We should separate concerns fiercely, isolate, abstract, collaborate with many experts in the field, communicate and above all, we should honour the context of the domain we are working in and how it’s affecting the real world. Remember, there is no ‘one architecture’ in complex systems that you can design on forehand. And if you tell that fairy-tale, you are deluding yourself, your colleagues, your organisation and ultimately, society.

Back to my opening story, if teachers are not evolving towards educators truly interested in the kids they need to educate and if they keep relying on the methods to educate our kids, I say…lets automate the teachers, long live the MOOC’s and YouTube. And the sad thing is, this teaching in methods is educating future architects to do the same, this method-illness is deeply rooted in our society. Instead, teach them how to think critically, to think outside the box using fundamental skills. So, throw away your methods, burn them, educating kids is about connecting with kids, parents, their concerns, it's about an unique and proud profession, something you need to learn and train hard for, it's a valuable, hard-to-learn and respected skill.

I leave it to the reader to convert this analogy to Enteprise Architecture and the state we are in.

*referring to the Cynafin framework of Dave Snowden

1 Project Start Architecture

 

The Yin & Yang of data: ‘Science of data’ & Datascience

The Yin & Yang of data: ‘Science of data’ & Datascience

Afbeeldingsresultaat voor yin and yang

People familiar with my thinking know that I am a bit of a 'fundamentalist' when it comes to 'data'. I am the guy that pushes the non-sexy part of data; data quality, data governance, metadata, data protection, data integration, semantics, rules, etc..

It is hard to stand your ground in a time where shortermism, technology-fetish and data-populism is thriving. I see ‘data architectures’ in my industry that boils down to superdooper databases, ultrafast massively parallel hardware and of course huge amounts of software that seem to glorify ‘coding’ your way to the promised kingdom.

Call me old school, but I want (data) architectures to separate concerns on various levels (conceptual, logical and technical), dimensions (process, data, interaction) and aspects (law & regulation, people, organisation, governance, culture). Architectures should enable businesses to reach certain goals that (preferably) serve the customer, civilian, patient, student, etc..

Lately I have been studying the ‘datascience’ community, attempting to comprehend how they think, act & serve the common goals of an organisation. I have abandoned (just a bit :-)) my declarative nature of data modelling, semantics, dataquality or governance and I have drowned myself in coursera courses, learning to code in Python, Julia and R. Dusting off my old Statistics books I learned in Uni, installed Anaconda, Jupyter notebook, Rstudio, Git, etc..

And oh my, it is soooo cool. Give me some data, I don’t care what, where it comes from and what it exactly means, but I can do something cool with it. Promise!

Now my problem…

  • (1) It seems to me that the ‘science’ in ‘datascience’ is on average extremely low to non-existent. Example; I have heard of ‘data science’ labs/environments where the code is not versioned at all & data is not temporal freezed, ergo; reproducibility is next to zero. Discovering any relationship between variables does not mean it is a proven fact, more is needed. Datasciene is not equal to data-analysis with R (or whatever), is it?  
  • (2) There seems to be a huge trust in relevance and quality of data wherever it comes from, whatever its context and however it is tortured. Information sits at the fabric of our  universe1, it’s life, it’s the real world. Data is the ‘retarded little brother’ of this ‘information’, it is an attempt of humankind to capture information in a very poor way. Huge amounts of contexts are lost in this capturing. Attempting to retrofit ‘information’ from this ‘retarded brother’ called ‘data’ is dangerous and should be done with great care. Having these conversations with data scientists is hard and we simply seem to completely disconnect.
  • (3) There seems to be little business-focus, bottom-line focus. Datascientists love to ‘play’, they call it experiment or innovate. I call it ‘play’ (if you are lucky they call their environment ‘sandbox’, wtf?). Playing on company resources should be killed. Experiments (or innovations) start with a hypothesis, something you wanna proof or investigate. You can fail, you can succeed, but you serve the bottom-line (and yes, failing is serving the bottom-line!) and the purpose/mission of an organisation. Datascientists seem to think they are done when they’ve made some fascinating machine learning-, predictive- or whatever-model in their sandbox or other lab-kind-of-environment. Getting this model deployed on scale in a production environment for ‘everyone’ to use, affecting the real world…..that is where the bottom-line value really shines, you are not done until this is achieved.
  • (4) There seems to be little regard for data protection aspects. The new GDPR (General Data Protection Regulation) is also highly relevant for datascience. Your ‘sandbox’ or your separated research environment needs to be compliant as well! The penalties for non-compliance are huge.

There is huge value in datascience, its potential is staggering and it is soo much fun. But please, stop fooling around. This is serious business with serious consequences and opportunities for everyone and for probably every domain you can think of, whether it be automotive, banking, healthcare, poverty, climate control, energy, education, etc…

The ‘science of data’ and ‘datascience’ are the yin & yang of fulfilling the promise of being truly datadriven. Both are needed.

For my Data Quadrant Model followers; it is the marriage  between quadrant I & II versus quadrant III & IV.

1 Increasingly, there is more and more 'evidence' originating from theoretical physics that this statement hold some truth to it, link [Dutch]. I would also like to give attention to this blogpost by John O'Gorman, very insightful, putting DIKW to rest....

 

 

Data: consistency vs. availability

Data: consistency vs. availability

There is a fundamental choice to be made when data is to be 'processed':

  • a choice between consistency vs. availability  Afbeelding1
    or
  • a choice between work upstream vs. work downstream
    or
  • a choice between a sustainable (long term) view vs. an opportunistic (short term) view on data

Cryptic, I know. Let me explain myself a bit.

Let me take you on a short journey. Suppose we receive a dataset from <XXX>. In the agreement with <XXX> we state the structure, semantics and rules regarding the dataset. This might be shaped as a logical data model or - for communication-sake - it might just be natural language, I don't care as long as the level of ambiguity is low. On the spectrum of Afbeelding2consistency vs. availability, I choose a for a position skewed towards consistency. 

If I choose consistency, I need to validate the data before it is processed, right? We have an agreement, and I am honoring this agreement by validating whether the 'goods' are delivered as agreed. In data I like to validate the data on the logical data model. So, when the data violates the logical model, the data does not adhere to the agreement, agree? 

"But, but, but...we need the data....",  data scientists, BI developers, dashboard magicians are becoming worried. Oh my, now we don't have data. 

What are the options? Simple, you give feedback to <XXX> that they need to solve this issue fast, it's a violation of the agreement. 

"But, but, but...we can't ask them that, they never change it, we gotta deal with the data as we receive it". 

Ah, so you want to process the data, despite the fact that it does not adhere to the logical model? So, what you are saying is that Afbeelding3you want to slide to the right on the spectrum of consistency vs. availability? Fine, no problem, we will 'weaken' the logical model a bit, so the data survives validation, will be processed and will be made available.

But, what happened here is crucial for any data engineer to fully comprehend. We chose to weaken the logical model, a model that correctly reflects the business. We chose to accept something that is broken. The burden of this acceptance will shift downstream towards the user of the data. They need to cope with this problem now. What they can not say is 'oh crap, data dudes, you gotta fix this', it is there problem now! 

There is another aspect at play here, which might as well be the most important one; data-integration. The logical model ideally stems from a conceptual model (either formal or informal) that states the major concepts of the business, or in other words, future integration points. Afbeelding4Logical models re-use these concepts, to assure proper integration of any data coming in. Suppose I have a dataset coming in, where the data is so bad that the only unique key we can identify is the row number. We basically get a 'data lake' strategy; the logical model is one entity (maybe 2) and the data is basically loaded as it was received. We are way down the spectrum of consistency vs. availability. You probably guessed it; data integration (if at all possible) is pushed downstream.

Whether or not you can push the slider towards consistency poses great value for the business since we can alleviate the (e.g.) data scientists from the burden of endlessly prepping, mangling and cleaning data before they can actually do the work they were hired to do. But we also got to acknowledge the fact that sometimes (often) we do not have a choice and we are forced to slide towards availability. It's not a bad thing! Be conscious about it though, you are introducing 'data-debt' and somebody is paying the price. My advise; communicate that 'price' as clearly as you can....

And if you are forced into the realm of availability, setup your data governance accordingly. Are you able to setup policies in such a way that the slider - in time - gradually moves towards consistency? A great option is to keep the agreed-upon logical model (highly skewed towards consistency) but you agree on a weakened version (so you can process the data and make it available). However, you report back to <XXX> on the validation-errors of the original logical model.

Final thought; lets be honest, we tend to often go for the easy way out, lets process all we receive and deal with it later. Our product owners and management are happy; yeah, we got data. But then reality kicks in, other sources are to be combined and the data is very hard to use because of its inconsistencies ("we seem to own cars with five wheels, is that correct?"). Lets buy some data-mangling tools and let each data scientist code its way out of the problem he or she is facing, increasing the data-debt even more. My suggestion; make a conscious choice between consistency vs. availability - all the work upstream done in the name of consistency will - in the end- pay high returns. Resist the opportunistic urges....;-)

This post is trying to describe a very subtle orchestration that is going on between data architecture, data governance, data processing and data quality.

 

Data: consistency vs. availability

Data: consistency vs. availability

There is a fundamental choice to be made when data is to be 'processed':

  • a choice between consistency vs. availability  Afbeelding1
    or
  • a choice between work upstream vs. work downstream
    or
  • a choice between a sustainable (long term) view vs. an opportunistic (short term) view on data

Cryptic, I know. Let me explain myself a bit.

Let me take you on a short journey. Suppose we receive a dataset from <XXX>. In the agreement with <XXX> we state the structure, semantics and rules regarding the dataset. This might be shaped as a logical data model or - for communication-sake - it might just be natural language, I don't care as long as the level of ambiguity is low. On the spectrum of Afbeelding2consistency vs. availability, I choose a for a position skewed towards consistency. 

If I choose consistency, I need to validate the data before it is processed, right? We have an agreement, and I am honoring this agreement by validating whether the 'goods' are delivered as agreed. In data I like to validate the data on the logical data model. So, when the data violates the logical model, the data does not adhere to the agreement, agree? 

"But, but, but...we need the data....",  data scientists, BI developers, dashboard magicians are becoming worried. Oh my, now we don't have data. 

What are the options? Simple, you give feedback to <XXX> that they need to solve this issue fast, it's a violation of the agreement. 

"But, but, but...we can't ask them that, they never change it, we gotta deal with the data as we receive it". 

Ah, so you want to process the data, despite the fact that it does not adhere to the logical model? So, what you are saying is that Afbeelding3you want to slide to the right on the spectrum of consistency vs. availability? Fine, no problem, we will 'weaken' the logical model a bit, so the data survives validation, will be processed and will be made available.

But, what happened here is crucial for any data engineer to fully comprehend. We chose to weaken the logical model, a model that correctly reflects the business. We chose to accept something that is broken. The burden of this acceptance will shift downstream towards the user of the data. They need to cope with this problem now. What they can not say is 'oh crap, data dudes, you gotta fix this', it is there problem now! 

There is another aspect at play here, which might as well be the most important one; data-integration. The logical model ideally stems from a conceptual model (either formal or informal) that states the major concepts of the business, or in other words, future integration points. Afbeelding4Logical models re-use these concepts, to assure proper integration of any data coming in. Suppose I have a dataset coming in, where the data is so bad that the only unique key we can identify is the row number. We basically get a 'data lake' strategy; the logical model is one entity (maybe 2) and the data is basically loaded as it was received. We are way down the spectrum of consistency vs. availability. You probably guessed it; data integration (if at all possible) is pushed downstream.

Whether or not you can push the slider towards consistency poses great value for the business since we can alleviate the (e.g.) data scientists from the burden of endlessly prepping, mangling and cleaning data before they can actually do the work they were hired to do. But we also got to acknowledge the fact that sometimes (often) we do not have a choice and we are forced to slide towards availability. It's not a bad thing! Be conscious about it though, you are introducing 'data-debt' and somebody is paying the price. My advise; communicate that 'price' as clearly as you can....

And if you are forced into the realm of availability, setup your data governance accordingly. Are you able to setup policies in such a way that the slider - in time - gradually moves towards consistency? A great option is to keep the agreed-upon logical model (highly skewed towards consistency) but you agree on a weakened version (so you can process the data and make it available). However, you report back to <XXX> on the validation-errors of the original logical model.

Final thought; lets be honest, we tend to often go for the easy way out, lets process all we receive and deal with it later. Our product owners and management are happy; yeah, we got data. But then reality kicks in, other sources are to be combined and the data is very hard to use because of its inconsistencies ("we seem to own cars with five wheels, is that correct?"). Lets buy some data-mangling tools and let each data scientist code its way out of the problem he or she is facing, increasing the data-debt even more. My suggestion; make a conscious choice between consistency vs. availability - all the work upstream done in the name of consistency will - in the end- pay high returns. Resist the opportunistic urges....;-)

This post is trying to describe a very subtle orchestration that is going on between data architecture, data governance, data processing and data quality.

 

[Dutch] Data: van Innovatie naar Waarde

[Dutch] Data: van Innovatie naar Waarde

Voor diegenen die geen abonnement op Automatiseringsgids hebben, hierbij de mogelijk mijn artikel toch te kunnen lezen.  DataQuadrantModel

Het artikel beschrijft wat er nodig is om innovaties op het gebied van data science/data discovery, om te zetten naar herhaalbare, schaalbare, robuuste en veilige oplossingen.

Hierbij de link

 

[Dutch] Data: van Innovatie naar Waarde

[Dutch] Data: van Innovatie naar Waarde

Voor diegenen die geen abonnement op Automatiseringsgids hebben, hierbij de mogelijk mijn artikel toch te kunnen lezen.  DataQuadrantModel

Het artikel beschrijft wat er nodig is om innovaties op het gebied van data science/data discovery, om te zetten naar herhaalbare, schaalbare, robuuste en veilige oplossingen.

Hierbij de link

 

Trust, Safety & Management Accounting; Innovating is hard

Trust, Safety & Management Accounting; Innovating is hard

It is a nasty cocktail; making a business case upfront and executing it 'change driven'.

Why? Because the majority of business cases are grounded in a kind of 'plan-driven-we-can-mold-the-world-as-we-see-it'. A business case is unfortunately often used - lets be frank guys - as a management fetish, a stick that can be used when things go differently and somebody needs to be blamed. 

I work in environments where the IT-component is often pretty big and if we have learned one thing, Cynefin-modelit's that execution of anything should be change-driven. Why? Because, we know one thing for sure; we know very little when we start and we know nothing of what the future will bring.

Development of software is often situated in the complex domain and when something completely new needs to be done, it can even be situated in the chaotic domain (see figure). In these domains we do not have BEST PRACTICES! Yes, we might have good practices, maybe emergent practices (stuff we need to discover) or even novel practices (we need to experiment...). Context is leading, no cookbooks, methods, frameworks, blueprints or protocols. 

And still we write our business plan: "If we do <x> it will cost <y> and the benefits will be <z>" 

And shit happens:

1) The business case is followed by the letter and something is created that is totally useless. We have seen this happening over and over again with large ICT developments in government. 'but we executed the business plan'. And now the weird part; senior management is satisfied.  

2) We go 'change-driven', discover, experiment, learn, improve and eventually end up with something successful. But unfortunately, something different as the stuff stated in the original business plan. And now the weird part; senior management is dissatisfied.

What is happening here? Two things.

1. TRUST/SAFETY is missing:
a) TRUST of senior management in its executioners and to be truly interested
b) Executioners that need to feel SAFE to give feedback to senior management

2. Management Accounting sucks
There is another dynamic going on. And that dynamic is rooted in the school of management accounting and in the DNA of many executives. Suppose I am manager of business unit X and I execute a highly innovative solution for a specific purpose. This innovative solution turns out to be efficient and effective for business unit Y and Z as well, so they decide to use it. Hell, you even adapt your solution here and there to match their requirements. 

What happens in old school management accounting? I can hear their creepy bureacratic voices: "your solution was way over budget and you did not make your benefits". You are dumbfounded...

NASA went to the moon and in doing so they revolutionised healthcare and saved countless lives. I'll bet ya, old school management accounting would regard NASA's trip to the moon as a failure.....

Old school management accounting is much to prominent and its decisive power is far too high. I was at the Drucker conference in Vienna last year and I heard Clayton Christensen saying that 'the finance role in the average Board of Directors should be downgraded in order for innovation to thrive'.

I get him now...

 

 

 

 

 

Trust, Safety & Management Accounting; Innovating is hard

Trust, Safety & Management Accounting; Innovating is hard

It is a nasty cocktail; making a business case upfront and executing it 'change driven'.

Why? Because the majority of business cases are grounded in a kind of 'plan-driven-we-can-mold-the-world-as-we-see-it'. A business case is unfortunately often used - lets be frank guys - as a management fetish, a stick that can be used when things go differently and somebody needs to be blamed. 

I work in environments where the IT-component is often pretty big and if we have learned one thing, Cynefin-modelit's that execution of anything should be change-driven. Why? Because, we know one thing for sure; we know very little when we start and we know nothing of what the future will bring.

Development of software is often situated in the complex domain and when something completely new needs to be done, it can even be situated in the chaotic domain (see figure). In these domains we do not have BEST PRACTICES! Yes, we might have good practices, maybe emergent practices (stuff we need to discover) or even novel practices (we need to experiment...). Context is leading, no cookbooks, methods, frameworks, blueprints or protocols. 

And still we write our business plan: "If we do <x> it will cost <y> and the benefits will be <z>" 

And shit happens:

1) The business case is followed by the letter and something is created that is totally useless. We have seen this happening over and over again with large ICT developments in government. 'but we executed the business plan'. And now the weird part; senior management is satisfied.  

2) We go 'change-driven', discover, experiment, learn, improve and eventually end up with something successful. But unfortunately, something different as the stuff stated in the original business plan. And now the weird part; senior management is dissatisfied.

What is happening here? Two things.

1. TRUST/SAFETY is missing:
a) TRUST of senior management in its executioners and to be truly interested
b) Executioners that need to feel SAFE to give feedback to senior management

2. Management Accounting sucks
There is another dynamic going on. And that dynamic is rooted in the school of management accounting and in the DNA of many executives. Suppose I am manager of business unit X and I execute a highly innovative solution for a specific purpose. This innovative solution turns out to be efficient and effective for business unit Y and Z as well, so they decide to use it. Hell, you even adapt your solution here and there to match their requirements. 

What happens in old school management accounting? I can hear their creepy bureacratic voices: "your solution was way over budget and you did not make your benefits". You are dumbfounded...

NASA went to the moon and in doing so they revolutionised healthcare and saved countless lives. I'll bet ya, old school management accounting would regard NASA's trip to the moon as a failure.....

Old school management accounting is much to prominent and its decisive power is far too high. I was at the Drucker conference in Vienna last year and I heard Clayton Christensen saying that 'the finance role in the average Board of Directors should be downgraded in order for innovation to thrive'.

I get him now...

 

 

 

 

 

It’s not about winning

It’s not about winning

My two sons play Badminton, four years now. They are fourteen and eleven. I remember vividly the first year, boldly they entered the championships. They got beaten, with big numbers, very big numbers. No chance, whatsoever. 

Bulutangkis2

There were 2 possible reactions; 1) hmm..I want to try soccer or 2) I want to train more.

They chose the second one, they began training a lot more. Entered the badminton school where they got in contact with likeminded kids and passionate, skilled trainers.  

So, the next year, another championship. The nice thing about youth championships is that you keep on battling the same guys, every year - they are the same age-class. So, we had a baseline from the first year. The second year, same guys, and they still got beaten. But the numbers were not that high. 

And you know what? My sons noticed that....."I did not win, but I am closing in, I am improving myself! Let's keep on training." My sons are both (14 and 11) playing in the highest youth league (Under-19) now and the coming championships will be thrilling. The rate of improvement is staggering. And you know what my eldest son said to me last week when we were driving with the team to our next game in the competition?

"I so hope I get to play a good guy and we can make it three sets (the whole nine yards..), I want to be in the field as long as I can and play my best. I don't care what the result is."

And you know what? I believed him, he was not just saying that to cover his ass. He lost his single (21-19, 19-21, 19-21) and he was smiling.....

It is a life lesson he gave to me at that very moment. It is not about winning, it is about improving oneself. I am still privileged that both of my sons allow me to coach them and I belief wholeheartedly that a coach should not aim for the kill/win, he is to be focused on the progress and improving self consciousness. That is something my son has teached me.

My son (and me) learned something which I think is a bit lost in society; it is not about winning, profit or maximizing shareholder value (!). Those are consequences (very nice consequences btw), just play the game, do your job, give it your best, try to attain an ever-increasing deeper understanding of the field you work in. The result? That is  just a consequence. 

 

It’s not about winning

It’s not about winning

My two sons play Badminton, four years now. They are fourteen and eleven. I remember vividly the first year, boldly they entered the championships. They got beaten, with big numbers, very big numbers. No chance, whatsoever. 

Bulutangkis2

There were 2 possible reactions; 1) hmm..I want to try soccer or 2) I want to train more.

They chose the second one, they began training a lot more. Entered the badminton school where they got in contact with likeminded kids and passionate, skilled trainers.  

So, the next year, another championship. The nice thing about youth championships is that you keep on battling the same guys, every year - they are the same age-class. So, we had a baseline from the first year. The second year, same guys, and they still got beaten. But the numbers were not that high. 

And you know what? My sons noticed that....."I did not win, but I am closing in, I am improving myself! Let's keep on training." My sons are both (14 and 11) playing in the highest youth league (Under-19) now and the coming championships will be thrilling. The rate of improvement is staggering. And you know what my eldest son said to me last week when we were driving with the team to our next game in the competition?

"I so hope I get to play a good guy and we can make it three sets (the whole nine yards..), I want to be in the field as long as I can and play my best. I don't care what the result is."

And you know what? I believed him, he was not just saying that to cover his ass. He lost his single (21-19, 19-21, 19-21) and he was smiling.....

It is a life lesson he gave to me at that very moment. It is not about winning, it is about improving oneself. I am still privileged that both of my sons allow me to coach them and I belief wholeheartedly that a coach should not aim for the kill/win, he is to be focused on the progress and improving self consciousness. That is something my son has teached me.

My son (and me) learned something which I think is a bit lost in society; it is not about winning, profit or maximizing shareholder value (!). Those are consequences (very nice consequences btw), just play the game, do your job, give it your best, try to attain an ever-increasing deeper understanding of the field you work in. The result? That is  just a consequence. 

 

(Enterprise) Data Architecture 101

(Enterprise) Data Architecture 101

I loath the misuse of the term 'Data Architecture' by most software/consultancy/technology firms.
I loath the misuse of the term 'Data Architecture' by the average Enterprise Architect.

This is a blogpost that attempts to clarify the term 'Data Architecture'. And I do not want to complicate things, so lets just quote DAMA and its Body of Knowledge that extends the term to Data Architecture Management:

Data Architecture Management is the process of defining and maintaining specifications that:

    • Provide a standard common business vocabulary,
    • Express strategic data requirements,
    • Outline high level integrated designs to meet these requirements, and
    • Align with enterprise strategy and related business architecture.

      Data architecture is an integrated set of specification artifacts used to define data requirements, guide integration and control of data assets, and align data investments with business strategy. It is also an integrated collection of master blueprints at different levels of abstraction

The DAMA Body of Knowledge also defines Enterprise Data Architecture. Aye, dear Enterprise Architect, do you have an Enterprise Data Architecture?

Enterprise data architecture is an integrated set of specifications and documents. It includes three major categories of specifications:

    1. The enterprise data model: The heart and soul of enterprise data architecture,
    2. The information value chain analysis: Aligns data with business processes and other enterprise architecture components, and
    3. Related data delivery architecture: Including database architecture, data integration architecture, data warehousing / business intelligence architecture, document content architecture, and meta-data architecture.

      Enterprise data architecture is really a misnomer. It is about more than just data; it is also about terminology. Enterprise data architecture defines standard terms for the things that are important to the organization–things important enough to the business that data about these things is necessary to run the business. These things are business entities. Perhaps the most important and beneficial aspect of enterprise data architecture is establishing a common business vocabulary of business entities and the data attributes (characteristics) that matter about these entities. Enterprise data architecture defines the semantics of an enterprise. 

I would like to point out two additions to these descriptions; 1) business rules management, although that comes with the territory if you model information (but loads of people do not get that). 2) (Enterprise) DQDADMDG Data Architecture needs to be clearly aligned with data management and data governance. Getting them aligned results in relevant (&acceptable) data quality.

Please note that my definition of quality is; quality is value to some person. This definition deviates from the general (ISO) definition. If you want to know more, please read this blogpost I wrote a year ago. 

Screen Shot 2015-09-23 at 15.50.13

 Please note that my little 101-data-triangle deviates from the DAMA management functions. I think that the quality of data should play the lead part, all others (management, architecture, governance) are supportive. 

 
And to those that think that in a world of increasing datafication, Enterprise Data Architecture is not a part of Enterprise Architecture:

  • Your organisation will miss out on opportunities that will never be discovered;
  • Your organisation will have a hard time leveraging exciting innovative technology in the data-space;
  • Your organisation will see an ever-increasing technical debt;
  • Your organisation will suffer operational risks and privacy violations;
  • Your organisation will have increasing costs associated with getting data cleansed and integrated downstream (ETL, data muddling, master data management blabla suites, business glossaries that are not aligned to actual data processes, data quality, loads of crappy data warehouses, etc..).

And to those that think that Enterprise Data Architecture is not part of Enterprise Architecture - brush up on your TOGAF, Zachman or whatever you're using - because your lagging. 

My final message to those that think that Enterprise Data Architecture is not part of Enterprise Architecture; you're nuts. 

 

(Enterprise) Data Architecture 101

(Enterprise) Data Architecture 101

I loath the misuse of the term 'Data Architecture' by most software/consultancy/technology firms.
I loath the misuse of the term 'Data Architecture' by the average Enterprise Architect.

This is a blogpost that attempts to clarify the term 'Data Architecture'. And I do not want to complicate things, so lets just quote DAMA and its Body of Knowledge that extends the term to Data Architecture Management:

Data Architecture Management is the process of defining and maintaining specifications that:

    • Provide a standard common business vocabulary,
    • Express strategic data requirements,
    • Outline high level integrated designs to meet these requirements, and
    • Align with enterprise strategy and related business architecture.

      Data architecture is an integrated set of specification artifacts used to define data requirements, guide integration and control of data assets, and align data investments with business strategy. It is also an integrated collection of master blueprints at different levels of abstraction

The DAMA Body of Knowledge also defines Enterprise Data Architecture. Aye, dear Enterprise Architect, do you have an Enterprise Data Architecture?

Enterprise data architecture is an integrated set of specifications and documents. It includes three major categories of specifications:

    1. The enterprise data model: The heart and soul of enterprise data architecture,
    2. The information value chain analysis: Aligns data with business processes and other enterprise architecture components, and
    3. Related data delivery architecture: Including database architecture, data integration architecture, data warehousing / business intelligence architecture, document content architecture, and meta-data architecture.

      Enterprise data architecture is really a misnomer. It is about more than just data; it is also about terminology. Enterprise data architecture defines standard terms for the things that are important to the organization–things important enough to the business that data about these things is necessary to run the business. These things are business entities. Perhaps the most important and beneficial aspect of enterprise data architecture is establishing a common business vocabulary of business entities and the data attributes (characteristics) that matter about these entities. Enterprise data architecture defines the semantics of an enterprise. 

I would like to point out two additions to these descriptions; 1) business rules management, although that comes with the territory if you model information (but loads of people do not get that). 2) (Enterprise) DQDADMDG Data Architecture needs to be clearly aligned with data management and data governance. Getting them aligned results in relevant (&acceptable) data quality.

Please note that my definition of quality is; quality is value to some person. This definition deviates from the general (ISO) definition. If you want to know more, please read this blogpost I wrote a year ago. 

Screen Shot 2015-09-23 at 15.50.13

 Please note that my little 101-data-triangle deviates from the DAMA management functions. I think that the quality of data should play the lead part, all others (management, architecture, governance) are supportive. 

 
And to those that think that in a world of increasing datafication, Enterprise Data Architecture is not a part of Enterprise Architecture:

  • Your organisation will miss out on opportunities that will never be discovered;
  • Your organisation will have a hard time leveraging exciting innovative technology in the data-space;
  • Your organisation will see an ever-increasing technical debt;
  • Your organisation will suffer operational risks and privacy violations;
  • Your organisation will have increasing costs associated with getting data cleansed and integrated downstream (ETL, data muddling, master data management blabla suites, business glossaries that are not aligned to actual data processes, data quality, loads of crappy data warehouses, etc..).

And to those that think that Enterprise Data Architecture is not part of Enterprise Architecture - brush up on your TOGAF, Zachman or whatever you're using - because your lagging. 

My final message to those that think that Enterprise Data Architecture is not part of Enterprise Architecture; you're nuts. 

 

Data: have we given up?

Data: have we given up?

“As long as we persevere and endure, we can get anything we want” – Mike Tyson

Never-give-up-quotes-3

 Human beings are terrible at seeing the bigger picture, strategizing, taking responsibility for it and overcoming huge amounts of opposition. Add to that a management with an innate proposition towards short-termism, risk aversion and lack of trust and you have yourself a rather explosive cocktail that results in chaotic organisations.

 Chaotic organisations will have chaotic data…

 The quality of the data in organisations is terrible. Lets not beat around the bush, it is. Michael Stonebreaker, winner of the A.M.Turing Award in 2015, confirmed it. According to Stonebreaker the average organisation owns 5000 silos of data, bigger organisations estimated to have 10.000. It boggles the mind why organisations keep on persisting in cleaning this data downstream. Stonebreaker – founder of Tamr – wonders why you don’t clean your data before it enters your downstream systems. If you don’t, he continues: “systems like Tamr will consume all your profits”.

 I love this guy. Blatantly honest. But the fact is that the visualisation and “data muddling” technology is thriving. And I get that…

 Cleaning up the garbage is ‘easier’ when the actual garbage can be seen. You can hire cleaners, buy machines/software to do the cleaning, etc.. It falls perfectly in line with a managements innate proposition towards short-termism, risk aversion and lack of trust.

 Prevention of (data-) garbage requires a long-term view, vision, strategy, trust (!), perseverance. It requires an informational perspective on your organisation. A perspective where data (and process) – the raw ingredients – are at the center of the information strategy. Applications, functions and technology are derived from it, not vice versa. Organisations keep on buying or building applications/technology where the data is a by-product and subsequently keep investing in downstream tools and people to clean the garbage. Crazy. Images

 It is what I call ‘high heels architecture’ – It looks sexy, but walking hurts like hell.

 We need to look at data with an holistic perspective – across the divisions and often even across organisations. We need to conceptualize the information, logically model the data and choose the appropriate technology.

Very old school? Hell yes!! We need to educate our young people again in the craft of information, the ability to think in (logical) abstractions and separations of concerns. It is a craft! Nowadays, our education systems are hugely biased towards programming languages, applications and technology, etc..  

Finally, the biggest problem in my view, is how we run organisations. How we educate our managers. Appreciate bad news from your employees, trust the craftsmanship, take responsibility for the path chosen, trust in teams to organize themselves, radical transparency (!), etc..

 Back on the subject. Many organisations seem to have given up on data. In a future where datafication will only increase, huge amounts of their profits will indeed be spend on getting the data right. Risks in terms of operational failures but also privacy breaches will increase ever-more. Ultimately,  the chances of delighting your customers, patients, tax-payers or students will diminish drastically.

 Lets not give up on data….

 

Data: have we given up?

Data: have we given up?

“As long as we persevere and endure, we can get anything we want” – Mike Tyson

Never-give-up-quotes-3

 Human beings are terrible at seeing the bigger picture, strategizing, taking responsibility for it and overcoming huge amounts of opposition. Add to that a management with an innate proposition towards short-termism, risk aversion and lack of trust and you have yourself a rather explosive cocktail that results in chaotic organisations.

 Chaotic organisations will have chaotic data…

 The quality of the data in organisations is terrible. Lets not beat around the bush, it is. Michael Stonebreaker, winner of the A.M.Turing Award in 2015, confirmed it. According to Stonebreaker the average organisation owns 5000 silos of data, bigger organisations estimated to have 10.000. It boggles the mind why organisations keep on persisting in cleaning this data downstream. Stonebreaker – founder of Tamr – wonders why you don’t clean your data before it enters your downstream systems. If you don’t, he continues: “systems like Tamr will consume all your profits”.

 I love this guy. Blatantly honest. But the fact is that the visualisation and “data muddling” technology is thriving. And I get that…

 Cleaning up the garbage is ‘easier’ when the actual garbage can be seen. You can hire cleaners, buy machines/software to do the cleaning, etc.. It falls perfectly in line with a managements innate proposition towards short-termism, risk aversion and lack of trust.

 Prevention of (data-) garbage requires a long-term view, vision, strategy, trust (!), perseverance. It requires an informational perspective on your organisation. A perspective where data (and process) – the raw ingredients – are at the center of the information strategy. Applications, functions and technology are derived from it, not vice versa. Organisations keep on buying or building applications/technology where the data is a by-product and subsequently keep investing in downstream tools and people to clean the garbage. Crazy. Images

 It is what I call ‘high heels architecture’ – It looks sexy, but walking hurts like hell.

 We need to look at data with an holistic perspective – across the divisions and often even across organisations. We need to conceptualize the information, logically model the data and choose the appropriate technology.

Very old school? Hell yes!! We need to educate our young people again in the craft of information, the ability to think in (logical) abstractions and separations of concerns. It is a craft! Nowadays, our education systems are hugely biased towards programming languages, applications and technology, etc..  

Finally, the biggest problem in my view, is how we run organisations. How we educate our managers. Appreciate bad news from your employees, trust the craftsmanship, take responsibility for the path chosen, trust in teams to organize themselves, radical transparency (!), etc..

 Back on the subject. Many organisations seem to have given up on data. In a future where datafication will only increase, huge amounts of their profits will indeed be spend on getting the data right. Risks in terms of operational failures but also privacy breaches will increase ever-more. Ultimately,  the chances of delighting your customers, patients, tax-payers or students will diminish drastically.

 Lets not give up on data….

 

Make data management a live issue for discussion in your organization

Make data management a live issue for discussion in your organization

Screen Shot 2015-06-15 at 21.20.38

Last month an interview with me was published in the SAS Future Bright Magazine. There quarterly theme was 'A Data Driven Reality' [Link to Dutch version] - right up my alley. This magazine was published in French, English and Dutch.

IMG_1360

At the event that launched this quarterly magazine I had the honour to speak about the core of my interview; The Data Quadrant Model.

Thanks SAS Institute for having me, for publishing a great magazine and for giving data management the attention it so desperately needs. 

For those interested in the interview that attempts to explain the Data Quadrant Model in layman's terms - here are the downloads:

Download Dutch - Het Data Kwadranten Model, Interview Ronald Damhof

Download English - The Data Quadrant Model, interview Ronald Damhof

Download French - modèle de quadrant de données, Ronald Damhof

 

 

Make data management a live issue for discussion in your organization

Make data management a live issue for discussion in your organization

Screen Shot 2015-06-15 at 21.20.38

Last month an interview with me was published in the SAS Future Bright Magazine. There quarterly theme was 'A Data Driven Reality' [Link to Dutch version] - right up my alley. This magazine was published in French, English and Dutch.

IMG_1360

At the event that launched this quarterly magazine I had the honour to speak about the core of my interview; The Data Quadrant Model.

Thanks SAS Institute for having me, for publishing a great magazine and for giving data management the attention it so desperately needs. 

For those interested in the interview that attempts to explain the Data Quadrant Model in layman's terms - here are the downloads:

Download Dutch - Het Data Kwadranten Model, Interview Ronald Damhof

Download English - The Data Quadrant Model, interview Ronald Damhof

Download French - modèle de quadrant de données, Ronald Damhof

 

 

Data Quadrant Model: how to organise the data domain

Data Quadrant Model: how to organise the data domain

The Data Quadrant Model (DQM) is typically a model that can be used by an organisation to make sense of their data domain. One of the sense making aspects is 'how to organise' and although every organisation should translate the DQM to an organisation model that fit their context (culture, maturity, timing, people, system landscape, etc..), there might be some heuristics that help you on your way. 

In general the DQM - if travelled from I to II, IV to III (see figure) - will be characterised by increasing Screen Shot 2015-05-20 at 16.59.56
entropy, something I have tried to explain in an earlier blogpost. For the sake of simplicity in this blogpost, the question regarding 'How to organise' is translated into the degree of centralisation versus decentralisation. 

Heuristic: systems with lower entropy are more prone to centralise as opposed to systems with high entropy which are prone to decentralise.

So, quadrant I and quadrant II have the highest propensity to be organised centrally and quadrant IV and quadrant III have the highest propensity to be organised decentrally. 

Easy, peasy..? So we need one central quadrant I organisational entity. Lets go overboard and give it a name; a Data Service Center (DSC). The DSC is comprised of Data- and Information modellers, engineers and architects that model, validate and process the data and give access to the data to serve application development (II), BI professionals (II), data scientists (IV) etc.. 

Fig3But how is the DSC organised in different operating models? Lets refresh our memory a bit with the four quadrant model developed by Ross and Weiss in their landmark book ' Enterprise Architecture as strategy'. They identified four types of operating models; diversification, replication, coordination and unification. Yes, my dear data architect, this is stuff you need to know. This is stuff you need to be able to translate to the data domain and advise your management on the consequences. 

I know....I am preaching....sorry

  1. In the Diversification quadrant there is low integration as well as low standardisation. There is no shared data and business processes in the different business units are unique. Every business unit will have its own DSC. Hell, every business unit will have its own unique implementation of a DQM. 
  2. In the Replication quadrant there are similar business processes across business units but there is no shared data. Every business unit will have its own DSC. Hell, every business unit will have its own DQM. BUT it makes sense to replicate the artifacts (for example, the data models, definitions, tools, etc..).
  3. In the Coordination quadrant, business processes in each business unit are unique but several data components (often master data) need to be shared (customers, products, etc..). I would strongly consider one central DSC (one DQM strategy/architecture). The organisational entities on the pull/demand side of the DQM I would tend to decentralise though.  
  4. In the Unification quadrant we have globally integrated business processes with support of enterprise systems, data sharing is crucial. This is a no-brainer. A central DSC (one DQM Strategy/architecture)  and even the pull/demand side has a strong tendency to be fairly centralised. 

This all sounds pretty dandy, but the operating model is not the only independent variable that determines how you organise the data domain. Another independent variable is the scarcity of resources. In the data domain we need expertise that might be pretty scarce and an organisation might be forced into centralising/sharing the skills although the operating model is not in sync with it. 

Yesterday, I spoke at the SAS Insight session in the Netherlands and I got asked several times who owned/managed the DQM as a whole? Certainly, the data domain needs central coordination? A good question which of course (?!) got the ultimate consultancy answer; it depends. 

It depends on the operating model and the scarcity of resources. With Coordination and Unification operating models you need one DQM and I would suggest a high level executive that is accountable, either a CIO, CDO (Chief Data officer)or whatever you might call him or her. 

1 I wrote a blogpost on the CIO versus the CDO last year. 

 

 

Data Quadrant Model: how to organise the data domain

Data Quadrant Model: how to organise the data domain

The Data Quadrant Model (DQM) is typically a model that can be used by an organisation to make sense of their data domain. One of the sense making aspects is 'how to organise' and although every organisation should translate the DQM to an organisation model that fit their context (culture, maturity, timing, people, system landscape, etc..), there might be some heuristics that help you on your way. 

In general the DQM - if travelled from I to II, IV to III (see figure) - will be characterised by increasing Screen Shot 2015-05-20 at 16.59.56
entropy, something I have tried to explain in an earlier blogpost. For the sake of simplicity in this blogpost, the question regarding 'How to organise' is translated into the degree of centralisation versus decentralisation. 

Heuristic: systems with lower entropy are more prone to centralise as opposed to systems with high entropy which are prone to decentralise.

So, quadrant I and quadrant II have the highest propensity to be organised centrally and quadrant IV and quadrant III have the highest propensity to be organised decentrally. 

Easy, peasy..? So we need one central quadrant I organisational entity. Lets go overboard and give it a name; a Data Service Center (DSC). The DSC is comprised of Data- and Information modellers, engineers and architects that model, validate and process the data and give access to the data to serve application development (II), BI professionals (II), data scientists (IV) etc.. 

Fig3But how is the DSC organised in different operating models? Lets refresh our memory a bit with the four quadrant model developed by Ross and Weiss in their landmark book ' Enterprise Architecture as strategy'. They identified four types of operating models; diversification, replication, coordination and unification. Yes, my dear data architect, this is stuff you need to know. This is stuff you need to be able to translate to the data domain and advise your management on the consequences. 

I know....I am preaching....sorry

  1. In the Diversification quadrant there is low integration as well as low standardisation. There is no shared data and business processes in the different business units are unique. Every business unit will have its own DSC. Hell, every business unit will have its own unique implementation of a DQM. 
  2. In the Replication quadrant there are similar business processes across business units but there is no shared data. Every business unit will have its own DSC. Hell, every business unit will have its own DQM. BUT it makes sense to replicate the artifacts (for example, the data models, definitions, tools, etc..).
  3. In the Coordination quadrant, business processes in each business unit are unique but several data components (often master data) need to be shared (customers, products, etc..). I would strongly consider one central DSC (one DQM strategy/architecture). The organisational entities on the pull/demand side of the DQM I would tend to decentralise though.  
  4. In the Unification quadrant we have globally integrated business processes with support of enterprise systems, data sharing is crucial. This is a no-brainer. A central DSC (one DQM Strategy/architecture)  and even the pull/demand side has a strong tendency to be fairly centralised. 

This all sounds pretty dandy, but the operating model is not the only independent variable that determines how you organise the data domain. Another independent variable is the scarcity of resources. In the data domain we need expertise that might be pretty scarce and an organisation might be forced into centralising/sharing the skills although the operating model is not in sync with it. 

Yesterday, I spoke at the SAS Insight session in the Netherlands and I got asked several times who owned/managed the DQM as a whole? Certainly, the data domain needs central coordination? A good question which of course (?!) got the ultimate consultancy answer; it depends. 

It depends on the operating model and the scarcity of resources. With Coordination and Unification operating models you need one DQM and I would suggest a high level executive that is accountable, either a CIO, CDO (Chief Data officer)or whatever you might call him or her. 

1 I wrote a blogpost on the CIO versus the CDO last year. 

 

 

The fifth quadrant: disorder

The fifth quadrant: disorder

A small 'encore'..the fifth quadrant.  Depersonalization-disorder-811

Where quadrant I and quadrant II are (simplistically stated) degrees of order and quadrant III and IV are degrees of un-order, there is a fifth quadrant. The quadrant of disorder.

This quadrant comes into play when one does not have a glue what to do, where they are and what decision to make. If you are not aware that you are in a disorderly state, you will fall back to your comfort-level. Do the stuff you always did....

  • Data scientists (quadrant IV) need to write down their 'requirements' because somebody needs to make a data-mart....
  • Regulatory compliance reports (quadrant II) to authorities are comprised of non-validated, non-consistent excel-worksheets (quadrant III)....
  • Nah, data governance did not work with us very well....
  • Data scientists (quadrant IV) need to deploy changes from development, test, acceptance to production...
  • Lets hire 12 ETL developers.....
  • This data we are receiving is so bad, lets model it....
  • Data scientists (quadrant IV) are not allowed to use data from quadrant I, because it might compromise the reliability of the systems...
  • The BICC need to improve the data quality, we can't use it...
  • We need a datalake for our management information, or (even worse) for our regulatory compliance reports....
  • We need data virtualisation (or any other <tech>)....
  • Yes, we dimensionally model everything...it always works!
  • Yes, we use data vault for all our data....it always works! ;-)

Sounds familiar? 

If you are in a disorderly state regarding the data domain, no problem! Form a team with people from all the different quadrants, discuss.... 

It's ok to start with a quadrant III to IV project!
It's ok to start with a quadrant I to IV project!

Remember; the Data Quadrant Model is not imposing any values.There is no right or wrong way to start, just be aware of where you start and where you are going. 

 

Photo by: James Steidl

 

The fifth quadrant: disorder

The fifth quadrant: disorder

A small 'encore'..the fifth quadrant.  Depersonalization-disorder-811

Where quadrant I and quadrant II are (simplistically stated) degrees of order and quadrant III and IV are degrees of un-order, there is a fifth quadrant. The quadrant of disorder.

This quadrant comes into play when one does not have a glue what to do, where they are and what decision to make. If you are not aware that you are in a disorderly state, you will fall back to your comfort-level. Do the stuff you always did....

  • Data scientists (quadrant IV) need to write down their 'requirements' because somebody needs to make a data-mart....
  • Regulatory compliance reports (quadrant II) to authorities are comprised of non-validated, non-consistent excel-worksheets (quadrant III)....
  • Nah, data governance did not work with us very well....
  • Data scientists (quadrant IV) need to deploy changes from development, test, acceptance to production...
  • Lets hire 12 ETL developers.....
  • This data we are receiving is so bad, lets model it....
  • Data scientists (quadrant IV) are not allowed to use data from quadrant I, because it might compromise the reliability of the systems...
  • The BICC need to improve the data quality, we can't use it...
  • We need a datalake for our management information, or (even worse) for our regulatory compliance reports....
  • We need data virtualisation (or any other <tech>)....
  • Yes, we dimensionally model everything...it always works!
  • Yes, we use data vault for all our data....it always works! ;-)

Sounds familiar? 

If you are in a disorderly state regarding the data domain, no problem! Form a team with people from all the different quadrants, discuss.... 

It's ok to start with a quadrant III to IV project!
It's ok to start with a quadrant I to IV project!

Remember; the Data Quadrant Model is not imposing any values.There is no right or wrong way to start, just be aware of where you start and where you are going. 

 

Photo by: James Steidl

 

Data Quadrant Model: decreasing entropy

Data Quadrant Model: decreasing entropy

With regard to 'order' and the 2nd law of thermodynamics I have discussed the four quadrants of my Data Quadrant Model:

  1. The order of quadrant I Copyright Data Quadrant Model - Ronald Damhof
  2. The chaos of quadrant III
  3. The un-order of quadrant IV
  4. The "order" of quadrant II

Entropy in the Data Quadrant Model is lowest in quadrant I, higher in II, even higher in IV and highest in III. If we do not actively decrease it, we tend to loose the value of (the) data(-platform) and we will find ourselves investing huge amounts of Euro's to do the same thing over and over again (like Groundhog Day) or spending huge amounts of Euro's to control an 'out-of-control-beast'. Unfortunately, I have seen and still see this a lot.

In an ultimate open system1, entropy can not decrease. In the universe the entropy will only increase (and eventually we all die). However, the Data Quadrant Model is a closed system. And in a closed system we can decrease entropy. How?

There are roughly three important directions in which entropy is to be decreased actively: 

  1. Decreasing entropy from III to I Screen Shot 2015-05-16 at 17.09.10
  2. Decreasing entropy from IV to II
  3. Decreasing entropy from II to I
    (I describe these in the details of this post. Warning it is not for the faint-hearted)

Important message: like in physics, decreasing entropy costs energy. The higher the difference in entropy between two systems/quadrants, the higher the energy needed. And yes, you can replace 'energy' with 'costs'. 

We now enter the field of Data Management. A prime directive of Data Management is to reduce entropy in data and to keep the data-platform in a sustainable modus where it serves the data-driven and data-centric organisation. It is hard, not cheap and still mostly unknown territory, but if you can make it work, the rewards - in the era of datafication - are huge. 

------------------------------

ad1. Decreasing entropy from III to I
Un-managed, un-governed, data of unknown quality that is marked as data that is important for the organisation, will need to be promoted to quadrant I. 

ad2. Decreasing entropy from IV to II
Brilliant insights of double PhD's who have constructed and tested analytical models or discovered interesting patterns (e.g. fraud, data quality, etc..) need to promote their products to quadrant II in order for them to be productised. This means using these products in a system where they can be scaled, changes are managed and funds are allocated to improve them. Furthermore, we need these brilliant people to discover new stuff, not to maintain the stuff that has already been proven. That needs to be automated. We need to free this scarce resource of the burden of maintenance, version control etc..

ad3. Decreasing entropy from II to I
This is tough one, but a most vital one. Data that travels from quadrant I to quadrant II is transformed from basic facts to a context that is needed by someone or something. Rules are being executed on the facts and a context is born. A context that is used (and designed) by the various stakeholders of the organisation. 

A very simple example might clarify:

  • Fact: on may 15, 2015 I order a bicycle for 100 USD in Seattle, US
  • Fact: on may 15, 2015 the exchange rate USD to EUR is 0,9
  • Fact: on may 15, 2014 the exchange rate USD to EUR is 0.6
  • Context: Bikes I ordered in EUR with Exchange Rate 15/5/2015: Bike is 90 EUR
  • Context: Bikes I ordered in EUR with Exchange Rate 15/5/2014: Bike is 60 EUR

What if we could solidify/persist this context and give it the same treatment as facts? Ordered, automated, managed and governed (low entropy)? Still with me?

For the abstract thinker or my fellow data architects; I mix two levels of abstraction now and I need to differentiate between the two. The Data Quadrant Model on the datamodel level (where we differentiate between facts and context) and the Data Quadrant Model on a data-processing level where we push back data to the system of quadrant I (don't be fooled, we still enforce the separation between fact and context!). The latter deserves a whole chapter....

Working on it....

 

 

1  A system is a part of the universe being studied, while the surroundings are the rest of the universe that interacts with the system.

Data Quadrant Model: decreasing entropy

Data Quadrant Model: decreasing entropy

With regard to 'order' and the 2nd law of thermodynamics I have discussed the four quadrants of my Data Quadrant Model:

  1. The order of quadrant I Copyright Data Quadrant Model - Ronald Damhof
  2. The chaos of quadrant III
  3. The un-order of quadrant IV
  4. The "order" of quadrant II

Entropy in the Data Quadrant Model is lowest in quadrant I, higher in II, even higher in IV and highest in III. If we do not actively decrease it, we tend to loose the value of (the) data(-platform) and we will find ourselves investing huge amounts of Euro's to do the same thing over and over again (like Groundhog Day) or spending huge amounts of Euro's to control an 'out-of-control-beast'. Unfortunately, I have seen and still see this a lot.

In an ultimate open system1, entropy can not decrease. In the universe the entropy will only increase (and eventually we all die). However, the Data Quadrant Model is a closed system. And in a closed system we can decrease entropy. How?

There are roughly three important directions in which entropy is to be decreased actively: 

  1. Decreasing entropy from III to I Screen Shot 2015-05-16 at 17.09.10
  2. Decreasing entropy from IV to II
  3. Decreasing entropy from II to I
    (I describe these in the details of this post. Warning it is not for the faint-hearted)

Important message: like in physics, decreasing entropy costs energy. The higher the difference in entropy between two systems/quadrants, the higher the energy needed. And yes, you can replace 'energy' with 'costs'. 

We now enter the field of Data Management. A prime directive of Data Management is to reduce entropy in data and to keep the data-platform in a sustainable modus where it serves the data-driven and data-centric organisation. It is hard, not cheap and still mostly unknown territory, but if you can make it work, the rewards - in the era of datafication - are huge. 

------------------------------

ad1. Decreasing entropy from III to I
Un-managed, un-governed, data of unknown quality that is marked as data that is important for the organisation, will need to be promoted to quadrant I. 

ad2. Decreasing entropy from IV to II
Brilliant insights of double PhD's who have constructed and tested analytical models or discovered interesting patterns (e.g. fraud, data quality, etc..) need to promote their products to quadrant II in order for them to be productised. This means using these products in a system where they can be scaled, changes are managed and funds are allocated to improve them. Furthermore, we need these brilliant people to discover new stuff, not to maintain the stuff that has already been proven. That needs to be automated. We need to free this scarce resource of the burden of maintenance, version control etc..

ad3. Decreasing entropy from II to I
This is tough one, but a most vital one. Data that travels from quadrant I to quadrant II is transformed from basic facts to a context that is needed by someone or something. Rules are being executed on the facts and a context is born. A context that is used (and designed) by the various stakeholders of the organisation. 

A very simple example might clarify:

  • Fact: on may 15, 2015 I order a bicycle for 100 USD in Seattle, US
  • Fact: on may 15, 2015 the exchange rate USD to EUR is 0,9
  • Fact: on may 15, 2014 the exchange rate USD to EUR is 0.6
  • Context: Bikes I ordered in EUR with Exchange Rate 15/5/2015: Bike is 90 EUR
  • Context: Bikes I ordered in EUR with Exchange Rate 15/5/2014: Bike is 60 EUR

What if we could solidify/persist this context and give it the same treatment as facts? Ordered, automated, managed and governed (low entropy)? Still with me?

For the abstract thinker or my fellow data architects; I mix two levels of abstraction now and I need to differentiate between the two. The Data Quadrant Model on the datamodel level (where we differentiate between facts and context) and the Data Quadrant Model on a data-processing level where we push back data to the system of quadrant I (don't be fooled, we still enforce the separation between fact and context!). The latter deserves a whole chapter....

Working on it....

 

 

1  A system is a part of the universe being studied, while the surroundings are the rest of the universe that interacts with the system.

Data Quadrant II: the battle against entropy

Data Quadrant II: the battle against entropy

A word of warning; after writing this post I realise that truly understanding it, requires a more than basic understanding of the Data Quadrant Model. And of course; if you still don't understand where I am going with it, then it's save to say that I failed you, my apologies. :-)

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV. A few days ago I posted the blog about the order of quadrant I as well as the chaos of quadrant IIIIt's time to describe the battle against the ever increasing entropy of quadrant II.

Imagine quadrant I as data where the molecules are set in a fixed position, neatly managed, ordered & governed. Now, in quadrant II the molecules are beginning to move. They can be rearranged in a vast amount of different positions, the number of positions can be huge, infinite even. Now replace these molecules with data and take into account that data can be copied and re-used over and over again. There is a vast array of possible positions for data.....

Back to the Data Quadrant Model...

Quadrant II reflects the contexts1 in which data is used, like I said before - in quadrant II we have multiple versions of the truth.

Lets, for argument's sake, assume that the number of contexts/truths is potentially large. It is save to say 1entropythat the number of ways in which data can (and probably must) be re-arranged, aggregated, calculated, inferred or otherwise manipulated to serve a vast array of contexts, will equally be (exponentially) large. 

What quadrant II aims to achieve is to control the entropy as long as possible against justifiable costs. It is known that entropy, by definition, will only increase. But there all kinds of tools, technologies and mitigation measures to extend the time before entropy reaches an un-manageable (or non-cost-effective) state. If the latter happens we can always return to the 'zero-state' - still present in quadrant I - and start again. 

That's the cool thing about data. Entropy in the universe is ever increasing and we can never reset (unless we break some fundamental rules of physics or with divine intervention), with data we can....

Summarise
Quadrant II is trying to maintain order (same as quadrant I). The entropy makes it harder though and subsequently more energy (=costs) is needed. Where the systems in quadrant I are characterised by its obvious state, the systems in quadrant II are considered to be complicated2.

 

1 It would be interesting to research what the independent variables for # of contexts could be. Size of the organisation, # of internal & external stakeholders, # differentiation of products and services, dynamics of the (world)market, complexity of business processes, managerial effectiveness, organisation hierarchy...?

2 A system can be very complicated but not complex at all. A system is complex when it has emergent behaviour (quadrant IV)

Data Quadrant II: the battle against entropy

Data Quadrant II: the battle against entropy

A word of warning; after writing this post I realise that truly understanding it, requires a more than basic understanding of the Data Quadrant Model. And of course; if you still don't understand where I am going with it, then it's save to say that I failed you, my apologies. :-)

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV. A few days ago I posted the blog about the order of quadrant I as well as the chaos of quadrant IIIIt's time to describe the battle against the ever increasing entropy of quadrant II.

Imagine quadrant I as data where the molecules are set in a fixed position, neatly managed, ordered & governed. Now, in quadrant II the molecules are beginning to move. They can be rearranged in a vast amount of different positions, the number of positions can be huge, infinite even. Now replace these molecules with data and take into account that data can be copied and re-used over and over again. There is a vast array of possible positions for data.....

Back to the Data Quadrant Model...

Quadrant II reflects the contexts1 in which data is used, like I said before - in quadrant II we have multiple versions of the truth.

Lets, for argument's sake, assume that the number of contexts/truths is potentially large. It is save to say 1entropythat the number of ways in which data can (and probably must) be re-arranged, aggregated, calculated, inferred or otherwise manipulated to serve a vast array of contexts, will equally be (exponentially) large. 

What quadrant II aims to achieve is to control the entropy as long as possible against justifiable costs. It is known that entropy, by definition, will only increase. But there all kinds of tools, technologies and mitigation measures to extend the time before entropy reaches an un-manageable (or non-cost-effective) state. If the latter happens we can always return to the 'zero-state' - still present in quadrant I - and start again. 

That's the cool thing about data. Entropy in the universe is ever increasing and we can never reset (unless we break some fundamental rules of physics or with divine intervention), with data we can....

Summarise
Quadrant II is trying to maintain order (same as quadrant I). The entropy makes it harder though and subsequently more energy (=costs) is needed. Where the systems in quadrant I are characterised by its obvious state, the systems in quadrant II are considered to be complicated2.

 

1 It would be interesting to research what the independent variables for # of contexts could be. Size of the organisation, # of internal & external stakeholders, # differentiation of products and services, dynamics of the (world)market, complexity of business processes, managerial effectiveness, organisation hierarchy...?

2 A system can be very complicated but not complex at all. A system is complex when it has emergent behaviour (quadrant IV)

The chaos of Data Quadrant III

The chaos of Data Quadrant III

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV. A few days ago I posted the blog about the order of quadrant I. Now it's time to describe the most misunderstood quadrant, quadrant III

So far, quadrant III has received little mention, even though it is incredibly important. It is the quadrant of data sources which are not under governance, like an ad hoc download which you obtain from an open data provider, a list in Excel that you want to use, or a set of verification data which you have received on a CD. But you might also use it to dump huge amounts of data that you want to explore or experiment on. 

Quadrant III exists (implicitly) in every organisation. Only think of the huge amounts of Excel sheets or MS Access databases currently stored on your fileservers. 

Entropy in quadrant III is at its height. There is data chaos. As Stephen Brobst (chief technology officer of Teradata) so eloquently put it when he described a data lake - slightly paraphrased; 'the only people that can use this data are the people that  have put it in'.

And yes - the popular 'data lake' is an artefact that is situated in quadrant III

Governance, control, checks and balances in quadrant III are virtually non-existent. People using this data (the peeps in quadrant IV) need to discover the structure/schema, make inferences and be 264112-chaosextremely careful in drawing conclusions. The field of statistics is important in this regard. 

Let me be absolutely clear; I am not against a data lake. I am just against the notion that a data lake is viable as a quadrant I artefact (and that is how the industry is trying to sell it). As a quadrant III artefact the notion of a data lake could be valuable. Is is a cost-effective artefact that has the potential to drive innovation.

Although I described quadrant III as data chaos which is unmanaged, ungoverned, etc... It is possible to manage and govern the infrastructure. Current technology is promising in offering a managed and governed infrastructure on top of which one can innovate with data in an unmanaged and ungoverned manner.

The cool thing of this technology is that quadrant III is being made explicit, from the shadows into the light. In quadrant III an organisation might want to offer its users an infrastructure-as-a-service. 

Oh, one more thing; if you consciously architect/configure a quadrant III you are engaged with innovation. If you do not, you are just messing about. ;-)

The big danger of quadrant III (and quadrant IV) is that promising discoveries are never productised (I unfortunately see that a lot!). They remain to be high potentials that never ever see the light of day (promotion to quadrant I and II). The high level of entropy enforces this risks evenmore, since discoveries are by definition often related to the brilliance of individuals. Governing mechanisms are need to counter this risk, otherwise you are just playing with data.....a waste I'd say.

I have now discussed three quadrants (I, III and IV) in terms of various levels of order. I have one quadrant to go. Before I move on, let me again stress the following:

  1. None of the four quadrants is more desirable that any other; there are no implied value axes. All quadrants are needed in every organisation that wants to be more data-centric, data-driven, adaptable to changes, innovative, etc..
  2. The Data Quadrant Model is NOT a blueprint implementation model. It can however be used as a guide in how to organise, how to manage, how to govern, what technology choices to make, what people to hire, etc.. 

 

 

The chaos of Data Quadrant III

The chaos of Data Quadrant III

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV. A few days ago I posted the blog about the order of quadrant I. Now it's time to describe the most misunderstood quadrant, quadrant III

So far, quadrant III has received little mention, even though it is incredibly important. It is the quadrant of data sources which are not under governance, like an ad hoc download which you obtain from an open data provider, a list in Excel that you want to use, or a set of verification data which you have received on a CD. But you might also use it to dump huge amounts of data that you want to explore or experiment on. 

Quadrant III exists (implicitly) in every organisation. Only think of the huge amounts of Excel sheets or MS Access databases currently stored on your fileservers. 

Entropy in quadrant III is at its height. There is data chaos. As Stephen Brobst (chief technology officer of Teradata) so eloquently put it when he described a data lake - slightly paraphrased; 'the only people that can use this data are the people that  have put it in'.

And yes - the popular 'data lake' is an artefact that is situated in quadrant III

Governance, control, checks and balances in quadrant III are virtually non-existent. People using this data (the peeps in quadrant IV) need to discover the structure/schema, make inferences and be 264112-chaosextremely careful in drawing conclusions. The field of statistics is important in this regard. 

Let me be absolutely clear; I am not against a data lake. I am just against the notion that a data lake is viable as a quadrant I artefact (and that is how the industry is trying to sell it). As a quadrant III artefact the notion of a data lake could be valuable. Is is a cost-effective artefact that has the potential to drive innovation.

Although I described quadrant III as data chaos which is unmanaged, ungoverned, etc... It is possible to manage and govern the infrastructure. Current technology is promising in offering a managed and governed infrastructure on top of which one can innovate with data in an unmanaged and ungoverned manner.

The cool thing of this technology is that quadrant III is being made explicit, from the shadows into the light. In quadrant III an organisation might want to offer its users an infrastructure-as-a-service. 

Oh, one more thing; if you consciously architect/configure a quadrant III you are engaged with innovation. If you do not, you are just messing about. ;-)

The big danger of quadrant III (and quadrant IV) is that promising discoveries are never productised (I unfortunately see that a lot!). They remain to be high potentials that never ever see the light of day (promotion to quadrant I and II). The high level of entropy enforces this risks evenmore, since discoveries are by definition often related to the brilliance of individuals. Governing mechanisms are need to counter this risk, otherwise you are just playing with data.....a waste I'd say.

I have now discussed three quadrants (I, III and IV) in terms of various levels of order. I have one quadrant to go. Before I move on, let me again stress the following:

  1. None of the four quadrants is more desirable that any other; there are no implied value axes. All quadrants are needed in every organisation that wants to be more data-centric, data-driven, adaptable to changes, innovative, etc..
  2. The Data Quadrant Model is NOT a blueprint implementation model. It can however be used as a guide in how to organise, how to manage, how to govern, what technology choices to make, what people to hire, etc.. 

 

 

The order of quadrant I

The order of quadrant I

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV, this post will be about it's (diagonal) mirror image, quadrant I. The quadrant of Facts

It is in all its aspects a mirror image of quadrant IV. It is the quadrant with the lowest entropy. There is order, governance, rules, standards and automation. Quadrant I is constantly searching for efficiency or optimisation gains. The 'systems' (I use the term loosely here) in this quadrant are characterised by (mostly linear) cause and effect relations that are repeatable, perceivable and predictable. Checks and balances are strict in terms of the adherence to standard operating procedures.

Quadrant I is the domain of methodology (seeks to identify cause-effect relationships through the study of properties which appear to be associated with qualities). Methodology can be translated to architecture as in (information) system development. Put it In other words; if you have no architecture for quadrant I.....

There is a tremendous effort necessary to keep the 'systems' in quadrant I low in terms of entropy, it needs to be controlled and it needs to be governed fiercely. Organisational-wise, quadrant I benefits hugely of organisational centralisation (as opposed to quadrant IV that benefits a lot from decentralisation).

What it all boils down to is that in quadrant I there is no fooling around with data. People working in quadrant I have a data-mindset combined with an engineering heart. Data Architecture, Data & Information modeling, Databases (using the terms loosely), Data processing (sets & transactions), etc.. They are infused with the will to improve the consistency, availability, reliability and traceability of data. If there is a science of data, these people are living it and doing it. If anyone wants to know the pedigree of a certain data element....

And no - they are not 'IT' - they are the glue that binds business with IT1 and they are scarce. You either got to buy them or train them - there is no middle ground, especially since universities are dropping the ball here big time. They seem to be pre-occupied by offering 'Data Science' masters now, riding the waves of opportunity and short-termism.

I want 'Science of Data' masters.....

Ah well, at least there is job security for quadrant I peeps. ;-)

12-1423763883-employment-600

 1 In terms of Business- and IT alignment I am loosely referring to the generic framework for Information Management - the middle column of the Information management model. Warning: Organisations are using this framework as a blueprint for organizational structure. I can not begin to explain how utterly wrong that is. It is a framework that assists in thinking and designing, nothing more. 

The order of quadrant I

The order of quadrant I

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.

Copyright Data Quadrant Model - Ronald DamhofA week ago I wrote about the un-order of quadrant IV, this post will be about it's (diagonal) mirror image, quadrant I. The quadrant of Facts

It is in all its aspects a mirror image of quadrant IV. It is the quadrant with the lowest entropy. There is order, governance, rules, standards and automation. Quadrant I is constantly searching for efficiency or optimisation gains. The 'systems' (I use the term loosely here) in this quadrant are characterised by (mostly linear) cause and effect relations that are repeatable, perceivable and predictable. Checks and balances are strict in terms of the adherence to standard operating procedures.

Quadrant I is the domain of methodology (seeks to identify cause-effect relationships through the study of properties which appear to be associated with qualities). Methodology can be translated to architecture as in (information) system development. Put it In other words; if you have no architecture for quadrant I.....

There is a tremendous effort necessary to keep the 'systems' in quadrant I low in terms of entropy, it needs to be controlled and it needs to be governed fiercely. Organisational-wise, quadrant I benefits hugely of organisational centralisation (as opposed to quadrant IV that benefits a lot from decentralisation).

What it all boils down to is that in quadrant I there is no fooling around with data. People working in quadrant I have a data-mindset combined with an engineering heart. Data Architecture, Data & Information modeling, Databases (using the terms loosely), Data processing (sets & transactions), etc.. They are infused with the will to improve the consistency, availability, reliability and traceability of data. If there is a science of data, these people are living it and doing it. If anyone wants to know the pedigree of a certain data element....

And no - they are not 'IT' - they are the glue that binds business with IT1 and they are scarce. You either got to buy them or train them - there is no middle ground, especially since universities are dropping the ball here big time. They seem to be pre-occupied by offering 'Data Science' masters now, riding the waves of opportunity and short-termism.

I want 'Science of Data' masters.....

Ah well, at least there is job security for quadrant I peeps. ;-)

12-1423763883-employment-600

 1 In terms of Business- and IT alignment I am loosely referring to the generic framework for Information Management - the middle column of the Information management model. Warning: Organisations are using this framework as a blueprint for organizational structure. I can not begin to explain how utterly wrong that is. It is a framework that assists in thinking and designing, nothing more. 

The un-order of Data Quadrant IV

The un-order of Data Quadrant IV

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great. 

Copyright Data Quadrant Model - Ronald DamhofIn this post I want to elaborate a bit more on quadrant IV (see figure).
This quadrant is often present implicitly in most organisations.Just think of all the Excel users in your organisation! Nowadays, quadrant IV is supercharged by all the stuff that is offered under the flag of 'Data Science'. And yes, I put quotes around it....the word 'science' bugs me a lot...

But that is another blogpost.

Quadrant IV is often misunderstood, especially by my fellow architects and management. They have an inert need to establish order in this quadrant. And there is no order....there is also no dis-order.

Huh?

Dave Snowden is actually describing it correctly; there is un-order: "Un-order is not the lack of order, but a different kind of order, one not often considered but just as legitimate in its own way". He uses a great story to exemplify un-order:

"....a group of West Point graduates were asked to manage the playtime of a kindergarten as a final year assignment. The cruel thing is that they were given time to prepare. They planned; they rationally identified objectives; they determined backup and response plans. They then tried to “order” children’s play based on rational design principles, and, in consequence, achieved chaos. They then observed what teachers do. Experienced teachers allow a degree of freedom at the start of the session, then intervene to stabilize desirable patterns and destabilize undesirable ones; and, when they are very clever, they seed the space so that the patterns they want are more likely to emerge. "

Order might eventually emerge.....

Now, lets take this notion of un-order a bit further into quadrant IV. If something is ordered, management (and architects alike) like to focus on efficiency (e.g. control, standards, predictable/repetitiveness, best practices etc..) a lot. In an un-ordered system (like quadrant IV) this behaviour is killing. Management and architect need to allow a lot of sub-optimal behaviour in order to reap the benefits of quadrant IV. 

Put it more popular (and scientifically wrong, but what the hell); In quadrant IV we need to 'architect' a bit of chaos. We need to un-manage.....un-control....experiment & learn...Trust

So, we gotta free the experts/knowledge workers (yep, not calling them "data-scientists") in quadrant IV. Freedom-chanins

Will order eventually emerge? Yes and no. Order and un-order will exists both in the same timeframe in quadrant IV. 

 

 

The un-order of Data Quadrant IV

The un-order of Data Quadrant IV

A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great. 

Copyright Data Quadrant Model - Ronald DamhofIn this post I want to elaborate a bit more on quadrant IV (see figure).
This quadrant is often present implicitly in most organisations.Just think of all the Excel users in your organisation! Nowadays, quadrant IV is supercharged by all the stuff that is offered under the flag of 'Data Science'. And yes, I put quotes around it....the word 'science' bugs me a lot...

But that is another blogpost.

Quadrant IV is often misunderstood, especially by my fellow architects and management. They have an inert need to establish order in this quadrant. And there is no order....there is also no dis-order.

Huh?

Dave Snowden is actually describing it correctly; there is un-order: "Un-order is not the lack of order, but a different kind of order, one not often considered but just as legitimate in its own way". He uses a great story to exemplify un-order:

"....a group of West Point graduates were asked to manage the playtime of a kindergarten as a final year assignment. The cruel thing is that they were given time to prepare. They planned; they rationally identified objectives; they determined backup and response plans. They then tried to “order” children’s play based on rational design principles, and, in consequence, achieved chaos. They then observed what teachers do. Experienced teachers allow a degree of freedom at the start of the session, then intervene to stabilize desirable patterns and destabilize undesirable ones; and, when they are very clever, they seed the space so that the patterns they want are more likely to emerge. "

Order might eventually emerge.....

Now, lets take this notion of un-order a bit further into quadrant IV. If something is ordered, management (and architects alike) like to focus on efficiency (e.g. control, standards, predictable/repetitiveness, best practices etc..) a lot. In an un-ordered system (like quadrant IV) this behaviour is killing. Management and architect need to allow a lot of sub-optimal behaviour in order to reap the benefits of quadrant IV. 

Put it more popular (and scientifically wrong, but what the hell); In quadrant IV we need to 'architect' a bit of chaos. We need to un-manage.....un-control....experiment & learn...Trust

So, we gotta free the experts/knowledge workers (yep, not calling them "data-scientists") in quadrant IV. Freedom-chanins

Will order eventually emerge? Yes and no. Order and un-order will exists both in the same timeframe in quadrant IV. 

 

 

Information, Ethics and Privacy

Information, Ethics and Privacy

Screen Shot 2015-01-07 at 11.48.32

I remember - back in the days (1996 or so) - while working for a large retail company, the introduction of a loyalty card and the fierce public discussion regarding privacy. On average people seemed to be reluctant to give up their personal information, some were even outraged.

I remember - back in those days - that one of the shops of this retail company did a large restructuring. The shop was even closed for 2 weeks. The supermarket manager had the brilliant idea to reward its loyal customers with a pie in the opening week. So he wrote to his loyalty card members that they would receive a pie if they came to visit the new store in the first week of opening and showed their loyalty card. 

So far, so good.

But people are strange. The customers with no loyalty card took notice of other customers leaving with a free pie. Guess what happened? In this first week of opening the number of loyal card holders doubled (!). People filled out their application form, activated their card and rushed to the ‘free-pie counter’.

In the decades that followed ‘we the people’ seem to be rather generous in giving our personal information to the public, or to be more precise, giving it to some company which business model consists of using this information for profit. I especially am weary of those companies I actually pay for a service (like a bank) and the company is using my data to increase revenues/profit for profit-sake, not increasing my customer delight IN ANY WAY. I already hear these companies saying; ‘well, you did agree with our 40-page-Terms-of-Use’.

Yeah right.

I consider myself a kind of a liberal, so I believe the individual is responsible for his own actions, but I increasingly find myself taking a non-liberal standpoint on the issue regarding privacy. One of the prime objectives of Government is to protect its citizens. And with privacy I think we need laws and institutions that guard our privacy.

The European union is responding with a new regulation that - when approved - does not need national law to enact on it. It will be considered law in 28 European countries instantly. This regulation, according to expert Daragh O Brien is the ‘single most heavily lobbied against piece of legislation EVER in the history of the European Union’. It will affect a lot of shady business models and it will have a MAJOR impact on companies collecting personalized data.

Companies will - by law - need Data Governance- and Data Protection protocols, officers, frameworks, etc.. This is a good thing in my opinion, but not good enough.

Companies will need to respond in order for them to comply to the laws and avoid legislation. But I find that a rather poor incentive. Laws and regulations will always drag after-the-fact, by definition they can not keep up with the acceleration-rate of technology and the ingenuity of people to game the system for their own personal benefit. If organizations will only comply to laws and regulations I kinda fear for the future of my kids..

Shouldn’t the incentive be: ‘I want my customers to regard me as trustworthy, reliable & honest’?
Shouldn’t the incentive be: ‘I want my customers to be delighted with my company, services and products’?

In the prelude of a world where we tend to 'datafi' everything (whether it be a car, an airplane, your heartrate, steps, calls, payments, bloodpressure, etc.) this becomes even more pressing. It might be an idea if organizations recognize that the ownership of data always remains to sit with the individual. Regard it as a property that is temporarily borrowed. A 'landmower' you borrow from your neighbour that you treat with the most utter respect, not because the law says so, but because 'that is the way we conduct ourselves'. 

Ethics and morality come into play. We as a society need to establish ethical boundaries. Crossing these boundaries should be punished by the people. It will be the ‘moral of men’ that decide whether or not we act. By either not buying or using your products and services or by leaving a company that you do not want to work for.

Although I remain somewhat skeptical/cynical (see pie-example), I do think 'we' (working in the data industry) have the obligation to inform people, the organizations and ourselves as much as we can and to discuss this topic transparently.

In the afternoon of February 12 I will host a round-table organized by BI-Podium, ‘Information, Ethics and Privacy’ with amazing experts from various backgrounds: law, research, education, consultancy, security, government. 

 I invite everyone to come and join me in this discussion! Limited seating so be quick!

Information, Ethics and Privacy

Information, Ethics and Privacy

Screen Shot 2015-01-07 at 11.48.32

I remember - back in the days (1996 or so) - while working for a large retail company, the introduction of a loyalty card and the fierce public discussion regarding privacy. On average people seemed to be reluctant to give up their personal information, some were even outraged.

I remember - back in those days - that one of the shops of this retail company did a large restructuring. The shop was even closed for 2 weeks. The supermarket manager had the brilliant idea to reward its loyal customers with a pie in the opening week. So he wrote to his loyalty card members that they would receive a pie if they came to visit the new store in the first week of opening and showed their loyalty card. 

So far, so good.

But people are strange. The customers with no loyalty card took notice of other customers leaving with a free pie. Guess what happened? In this first week of opening the number of loyal card holders doubled (!). People filled out their application form, activated their card and rushed to the ‘free-pie counter’.

In the decades that followed ‘we the people’ seem to be rather generous in giving our personal information to the public, or to be more precise, giving it to some company which business model consists of using this information for profit. I especially am weary of those companies I actually pay for a service (like a bank) and the company is using my data to increase revenues/profit for profit-sake, not increasing my customer delight IN ANY WAY. I already hear these companies saying; ‘well, you did agree with our 40-page-Terms-of-Use’.

Yeah right.

I consider myself a kind of a liberal, so I believe the individual is responsible for his own actions, but I increasingly find myself taking a non-liberal standpoint on the issue regarding privacy. One of the prime objectives of Government is to protect its citizens. And with privacy I think we need laws and institutions that guard our privacy.

The European union is responding with a new regulation that - when approved - does not need national law to enact on it. It will be considered law in 28 European countries instantly. This regulation, according to expert Daragh O Brien is the ‘single most heavily lobbied against piece of legislation EVER in the history of the European Union’. It will affect a lot of shady business models and it will have a MAJOR impact on companies collecting personalized data.

Companies will - by law - need Data Governance- and Data Protection protocols, officers, frameworks, etc.. This is a good thing in my opinion, but not good enough.

Companies will need to respond in order for them to comply to the laws and avoid legislation. But I find that a rather poor incentive. Laws and regulations will always drag after-the-fact, by definition they can not keep up with the acceleration-rate of technology and the ingenuity of people to game the system for their own personal benefit. If organizations will only comply to laws and regulations I kinda fear for the future of my kids..

Shouldn’t the incentive be: ‘I want my customers to regard me as trustworthy, reliable & honest’?
Shouldn’t the incentive be: ‘I want my customers to be delighted with my company, services and products’?

In the prelude of a world where we tend to 'datafi' everything (whether it be a car, an airplane, your heartrate, steps, calls, payments, bloodpressure, etc.) this becomes even more pressing. It might be an idea if organizations recognize that the ownership of data always remains to sit with the individual. Regard it as a property that is temporarily borrowed. A 'landmower' you borrow from your neighbour that you treat with the most utter respect, not because the law says so, but because 'that is the way we conduct ourselves'. 

Ethics and morality come into play. We as a society need to establish ethical boundaries. Crossing these boundaries should be punished by the people. It will be the ‘moral of men’ that decide whether or not we act. By either not buying or using your products and services or by leaving a company that you do not want to work for.

Although I remain somewhat skeptical/cynical (see pie-example), I do think 'we' (working in the data industry) have the obligation to inform people, the organizations and ourselves as much as we can and to discuss this topic transparently.

In the afternoon of February 12 I will host a round-table organized by BI-Podium, ‘Information, Ethics and Privacy’ with amazing experts from various backgrounds: law, research, education, consultancy, security, government. 

 I invite everyone to come and join me in this discussion! Limited seating so be quick!

IT does not matter

IT does not matter

This may look like a career-killing title for a blog post, but let me explain.

Oh - just to correctly name sources; I stole the title from an article Nicholas Carr wrote in the Harvard Business Review in 2003. 

In all the hype and buzz surrounding #BigData #InternetOfThings #MachineLearning #Digitization and so on, the tech is portrayed as a huge differentiator in competitive advantage. Well...it's not. The tech is highly irrelevant in terms of competitive advantage (not saying that it is not a prerequisite!). Some companies might have an edge for some time, purely based on some innovative use of tech, but it will not last. It will be copied extremely fast over and over again by others. "Tech will be commoditized.....'. 

So if tech is not a differentiator, what is? Because there is obviously something buzzing....

You know what is a differentiator? Data.
You know what is a differentiator? How you use data to delight your customers
You know what is a differentiator? How you use data to improve healthcare, balance in-equality, fight crime or supervise vital infrastructure
You know what is a differentiator? How you respect privacy of data not only because you want to avoid litigation, but because you want to be trusted and valued. 

Data must be nurtured, managed and governed aggressively and - most of all- must be enabled to be exploited to the max in order to get a good return or help the common cause. It is by far the ultimate proprietary asset. 

So if you want to change strategy, embark on a road of datafication, be more data-centric or data-driven, you might want to spend your investment Euros or Dollars not in tech right away - resist that urge. Invest it in people first; data-strategists, data architects, data modellers, data analysts (ok ok...data 'scientists'), data quality experts, data privacy experts and so on. The tech will come...

Some statements about c-level:
1.Tech is not a differentiator
2.Tech should therefor not be represented on the c-level
3.Data is a differentiator
4.Data should therefor be represented on the c-level
5.If your CIO is about the Information (not the tech!)? Yee - good for you!
6.If your CIO is about the tech? Well, in my opinion you don't have a CIO - you have a Technology Manager

A Chief Data Officer should not be necessary if you have a true CIO. If you don't, get yourself a Chief Data Officer. Someone seasoned in the field of (respectfully) exploiting data and information for the ambitions and goals of your organization.

 

 

IT does not matter

IT does not matter

This may look like a career-killing title for a blog post, but let me explain.

Oh - just to correctly name sources; I stole the title from an article Nicholas Carr wrote in the Harvard Business Review in 2003. 

In all the hype and buzz surrounding #BigData #InternetOfThings #MachineLearning #Digitization and so on, the tech is portrayed as a huge differentiator in competitive advantage. Well...it's not. The tech is highly irrelevant in terms of competitive advantage (not saying that it is not a prerequisite!). Some companies might have an edge for some time, purely based on some innovative use of tech, but it will not last. It will be copied extremely fast over and over again by others. "Tech will be commoditized.....'. 

So if tech is not a differentiator, what is? Because there is obviously something buzzing....

You know what is a differentiator? Data.
You know what is a differentiator? How you use data to delight your customers
You know what is a differentiator? How you use data to improve healthcare, balance in-equality, fight crime or supervise vital infrastructure
You know what is a differentiator? How you respect privacy of data not only because you want to avoid litigation, but because you want to be trusted and valued. 

Data must be nurtured, managed and governed aggressively and - most of all- must be enabled to be exploited to the max in order to get a good return or help the common cause. It is by far the ultimate proprietary asset. 

So if you want to change strategy, embark on a road of datafication, be more data-centric or data-driven, you might want to spend your investment Euros or Dollars not in tech right away - resist that urge. Invest it in people first; data-strategists, data architects, data modellers, data analysts (ok ok...data 'scientists'), data quality experts, data privacy experts and so on. The tech will come...

Some statements about c-level:
1.Tech is not a differentiator
2.Tech should therefor not be represented on the c-level
3.Data is a differentiator
4.Data should therefor be represented on the c-level
5.If your CIO is about the Information (not the tech!)? Yee - good for you!
6.If your CIO is about the tech? Well, in my opinion you don't have a CIO - you have a Technology Manager

A Chief Data Officer should not be necessary if you have a true CIO. If you don't, get yourself a Chief Data Officer. Someone seasoned in the field of (respectfully) exploiting data and information for the ambitions and goals of your organization.

 

 

IAIDQ: Data Quality, objectivism versus subjectivism

IAIDQ: Data Quality, objectivism versus subjectivism

I attended and spoke this week at the International Data Quality Summit organised by the IAIDQ. I had a great time listening to my peers and talking shop over beers and good food. Learned a lot and I hope some people learned something from me.

Now, besides my regular talk I also sat in the IAIDQ Data Quality Panel. And we embarked on something that still resonates a bit with me. The question of the moderator began innocent ‘Define Data Quality’. There were like two schools of quality that seemed to be combatting each other a bit.  

On the one hand there was - let me call it - the ISO school of quality.

ISO 9000 deals with quality in general: Quality is the degree to which a set of inherent characteristics fulfils requirements. Where requirement is defined as ‘a need or expectation that is stated, generally implied or obligatory’. 

ISO 8000 is somewhat derived from ISO 9000 but more rephrased towards Data Quality: Quality data is data that meets stated requirements. 

Now, I had just been to the session of Allan Duncan about gathering information requirements and he said some wise words; be polemic if you want discussion, tension, feedback, learning, etc.. Well, being polemic is something that comes quite natural to me.. So there I go…

The moderator asked basically if I agreed with this ISO definition of quality. Me - being polemic - said “No, I do not agree”. I cited the definition of quality as I learned it from Jerry Weinberg: Quality is value to some person. And I added two rules to this definition that were inspired by a blog of Markus Gartner I read some time ago: the Relative Rule and the Time Rule. 

The Relative Rule is obvious; the word ‘quality’ is like ‘complexity’ or ‘beauty’….is has only value in the eye of the beholder. A human being - a person - a warm body….

The Time Rule is somewhat more sophisticated: the value that is being put on beauty, complexity or quality depends on time as well, not only on a human being. Something that is complex now, can be perceived as less complex a year later. Same goes for beauty, something that is beautiful now, can be perceived ugly as hell a year later ;-). And yes, same goes for quality. Something that is perceived as high quality by someone now, can be perceived as being of low quality a year later. So quality in my opinion is never in a fixed state, it is always moving, fluently through time.

So, a bit rephrased; quality is value to some person at some time

There you have it; the ISO-school of quality and - let me call it - the Weinberg-school of quality. I think the ISO-guy was getting a bit irritated with me - not sure though. Be hey - blame Alan - I needed to be polemic to get the discussion going to a place where we all actually learned something.  

In the ISO-school opinion you can always state the quality of data by knowing their requirements. In my world (the Weinberg-school) you can never state the quality of data unless you know the person and subsequently what he or she values at that time. Now, to be very annoying. What if we ship a product (data product or any other product) to thousands of people...or even more. Oh crap, to actually know the quality, I now need to know all these people and what they value to have an idea about the degree of quality? Yes, that it exactly my point.

The ISO-school seems to think we can state the requirements finitely and thus can put a value to it.  In my world I do not believe that. Or, in other words, I can meet stated requirements (voiced by person X at time Y) but still deliver a crappy quality. Why? Because, we humans are extremely bad at expressing our requirements, validating them and assessing whether or not we are complete in all our requirements. The probability by the way that someone can actually finitely state their requirements is something that I reject. I reject that as a world-view even…..

Final remarks; assessing quality is a moving target. So lets throw in the towel? Nah. Remember, I was being polemic (!) 

As Danette McGilvray very eloquently put it in the discussion at the end: on various dimensions of data quality we have a duty to be as precise as we can regarding stating the requirements and subsequently designing systems that can uphold these requirements. Yes Yes and Yes…So Agree.

In data quality we gotta meet known requirements as  best we can. But to say that quality is high if we have met these stated requirements? My answer; I do not know. Only person X knows….or Y...or Z...at a certain moment in time. 

 This blog has inpsired Jim Harris to write also a blogpost "Requirements, data quality and coffee"

Data is a company-wide concern, guts needed

Data is a company-wide concern, guts needed

The point am trying to make in this blogpost is that the classic distinction between operational data environment and informational data environment (where the data warehouse resides) is fading, thank god! Data is a company-wide concern...but guts is needed to actually achieve it. 

For me to be able to make this point, I need to convey 5 important statements first,clarifying where I am coming from a bit more:

  1. I am going to elaborate a bit more on the Data Quadrant Model I wrote about some time ago. Remember quadrant I - the facts? A quadrant where data is factually registered conforming to several non-functionals (temporal, historical, standardized, etc..). 
  2. It is important to emphasize that the data models in quadrant I are driven by centrally managed logical data models (preferably these are derived from a conceptual- or information model) and governed by means of Data Delivery Agreements. Physical models are preferably derived as much as we possibly can from the logical model. This is not trivial, but decades of science and more and more practitioners are actually doing it, (im)proving it.
  3. A data warehouse is an architectural construct and by such should be separated from its implementation or its technology. This is a distinction I rarely see, but is vital for the point I am trying to make. 
  4. Data warehouses are traditionally fed with data coming from various sources, either internal systems (batch, services, whatever) or external systems. 
  5. In the Netherlands (my work area) we evolved data warehouses the last decade by making a very sharp distinction between facts (quadrant I) and context (quadrant II). This distinction is being made on all levels; architecture, data modeling, technology, management, organization, people, processes, etc..

If we take a closer look to the fact-part (first quadrant) of the data warehouse, something interesting is going on. Although originally setup to accommodate feeds from various sources, it slowly evolved into an integrated data environment.

How so? Suppose I wanna build me a new order-intake system. We make use cases, do the sprints and just 'build me an app'. The datamodel is often something that 'organically' grows with the app.....brrrr...Data is a company-wide concern not an app-concern.

Classically the data warehouse guys came only into play the moment the app is finished and people needed to do some sophisticated analytics or just some management reporting. A feed was constructed and data was to be deployed to the data warehouse. Slowly and gradually this has changed in my current work environment using the Data Quadrant Model and some very smart co-workers. The app guys are still doing the use cases, but all the modeling is now done by the data modelers of quadrant I

These data modelers first construct a logical model that the app guys can use as a stub to build upon all their 'app-stuff'. How can they? We do not want these app-guys to be concerned with stuff like auditability, traceability, temporality, extensibility, standardization of data definitions and types, etc.. Remember in quadrant I we have a 'fundamentalistic' opinion on how the data is to be managed. 

Now, the data modelers construct the physical model by using the full force of temporal- fact based modeling and hide the complexity of the physical model by virtualizing/decoupling the data back to the original logical data model. The app is build and users can now insert, update, delete data. 

The data is decoupled from the apps, other apps might be interested.....

The data is centrally managed and governed on at least the logical level....

The non-functionals regarding data are being guarded fiercely....

Interestingly, we still make a data delivery agreement. In this case with the functional owner of the system and the governor of quadrant I. We do a handshake on the logical data model, the validations we do before the data is registered and several other stuff we expect from each other regarding the data and its quality. 

The result: data that orginally came from an operational enviroment and was fed to a data warehouse, is now directly registered in the fact-part (quadrant I) of the data warehouse. This part of the data warehouse has now faded into an operational environment with the non-functionals we learned to implement in the first quadrant. The first quadrant governers are now in charge of a company wide integrated data-environment.....

What about the data warehouse? Quadrant II and IV (stuff like reporting, dasbhboarding, analytics, visualisation) are still fed from the first quadrant and 100% demand oriented. Since we model and store all data in an integrated manner adhering to important non-functionals we are able to serve these quadrants (II en IV) cheaper, faster and with a (to some degree) guaranteed level of quality. 

Two more things;

  1. What I do not say is that data is stored in one location. That can and probably will be hugely federated within the company or outside the company (e.g cloud). What I do say is that the logical model AND the metadata are crucial to govern in an integral and central manner.
  2. With the above in mind I strongly urge senior management to take the next step as well. Organize the data-competency; the people, governance (e.g. protection), architecture, design and implementation of the data needs firm management support. Centralize it....pls....A centralized shared data center preferably managed directly by a true CIO (Information officer, not technology officer) or even better, a Chief Data Officer...With means in terms of expertise and budgets.

If there is an organization outthere with the pain, the guts and the patience to really go for it, to assign and mandate a CDO; call me, I am game. 

Thou cannot buy your way out of the data misery you are in

Thou cannot buy your way out of the data misery you are in

In an interview with Gerald Weinberg, he was asked the question ‘In your experience, what is the most notable “constant” in the world of software?’. His answer:

“The infatuation with the latest fad which is supposed to “increase productivity” by some arbitrary amount, with no clue as to how that “productivity” is measured. For the most part, these fads are usually a new set of names for old practices. Yet at the same time a segment of the developer population goes ga-ga over the fad, a much larger segment doesn’t even attempt to learn what is good about the fad. The majority of developers are working the same primitive way their predecessors did 50 years ago, just using more machine power to do it.”

I think these are wise words and we should all try to imprint whenever we ‘buy’ or are infatuated with the newest ‘fad’.

In the data-world I am in, this seems to be particularly true, although that might be because I am not very deep into other ‘worlds’. Examples are Big Data, data virtualization, data visualization, analytics, cloud, in-memory, mobile, data integration hubs, etc..

The promises are immense. Either an increase in productivity, a decrease in costs or a fantastic super-dooper increase of customer satisfaction. Another characteristic is the rebelling against the ‘old’ or the ‘traditional’ way of doing stuff. ‘That era is over, they say, we now have <fill-in tech trend>’ or 'xxx is dead, now we have <fill-in tech trend>'. 

Basically, ‘we ’ get the message that we must be totally out of our minds not to use this <fill-in techtrend>.

From an architectural point of view it is shocking how analysts/experts or whatever can not seem to understand the difference between logical and technical views. They are constantly mixed; utter confusion being the result.

Now, do not get me wrong, I am a fan of progress in technology and I deem the above mentioned ‘trends’ extremely valuable……when applied appropriately….

In the end this technology needs to fit into a context. A context of people, their skills and experiences, a context of architecture, a context of legacy systems and a context of strategy and objectives.

Proper software development in data management is not about technology. It is about proper business analysis, information modeling, business rules, architecture and above all…it is about people, people and people. These are the constants over time - whatever tech you use. If you want to actually yield benefits, do the work. Invest in people first.....tech comes later..really...

Even better formulated; tech evolution will accelarate more and more, it's not about this tech (IT does not matter), it is about how well you prepare your organization in adopting technology for the benefit of your customer, patient, student. It is about the 'degree of adoption' that we need to worry about, instead of following blindly the next fad, promise or tech brochure. 

Thou cannot buy your way out of the data misery you are in - it takes blood, sweat and tears. Or in other words; stamina, discipline, Trust (!!!!!) and courage (especially by management!). 

4 Quadrant Model for Data Deployment

4 Quadrant Model for Data Deployment

I have written numerous times about efficient and effective ways of deploying data in organizations. How to architect, design, execute, govern and manage data in an organization is hard and requires discipline, stamina and courage by all those involved. The challenge increases exponentially when scale and/or complexity increases.

On average a lot of people are involved and something of a common understanding is vital and unfortunately often lacking. As a consultant it is a huge challenge getting management and execution both on the same page, respecting the differences in responsibilities and the level of expertise regarding the field we are operating in.

An abstraction functions as a means of communication across the organization. An abstraction to which the majority can relate to and that can be used to manage, architect, design, execute, govern and manage the field that is at stake.

For data deployment I came up with the so-called ‘4 Quadrant model’ (4QM). 

It starts with a basic assumption that data deployment starts with raw materials and ends up in some sort of product. And in the process of getting raw materials to end products, logistics and manufacturing is required. 

It starts with the basic assumption that reliability and flexibility are both wanted, but are mutual exclusive.

Pushpull
It starts with the basic notion of the ‘push pull point’, which stems from the logistic- and manufacturing literature we were taught in high school about push systems and pull systems. 

It starts with the basic notion that that data needs to be separated into facts and context.

Q1q2_v2
There is no single version of truth

There is only a single version of facts

There are many truths out there

Push systems are to be standardized as much as possible. Only if we standardize we can automate. Holding off any product-specific demand features is vital within push systems. However, the table turns completely when we enter the pull-systems domain.

The push pull point is - in architectural terms - a decoupling point, separating fundamentally different concerns, but it is not the only one. Using this high level of abstraction another decoupling is crucial; development style. Missing this one often results in pain, misery, demotivation, failed projects, dissatisfied users, budget overruns, IT focus, etc…

Basically I tend to distinguish between a systematic and opportunistic development style.

DevelopmentStyle
This separation between systematic and opportunistic development style respects both the professionalism of the data scientists and the IT engineers. It respects both the organizational ambitions as the more local ambitions. 

Two decouple points; push pull and development style. Depicting them results in four quadrants.4qm

Quadrant IV is especially worth some attention. Typical use cases for this quadrant:

1)   Reducing uncertainty/getting requirements: ‘I do not know what I need, what I want’, ‘I do not know the quality of the data yet’. Experiment a lot, change fast, instant feedback.
Quadrant IV is especially worth some attention. 

2)   Reducing lead times:  Going from quadrant I to II might take to long. There needs to be an environment where products can be made a.s.a.p.

3)   Stimulating innovation: Discover, experiment, throw away, testing hypothesis, etc.. An environment that is as friendly as possible for the creative mind.

In quadrant IV we are architecting some degree of chaos…which is fine.

Finally, it needs to be understood that there is no one way the data flows through the quadrants. Four process variants are depicted in the small picture on the right, but more can be thought of.For example, the quadrant can be used to architect a phased/managed approach towards migration of legacy data deployment systems (which I know you have!!).ProcessVariants

The point is that there are several ways of manufacturing information products and you might wanna consider them all. Much to often I only see 1 way (or the highway) -  USSR-Politburo-Fundamentalism - kind of deployment option. 

I can write a book about the four quadrants and the operationalization of the decouple points. But this is a blog post and I really need to stop now.

My point is that the four quadrants are different in almost every way in terms of architecture, design, execution, governance and (project) management:

  • qI and qII need a more centralized governance style, qIII and qIV require a more decentralized governance style
  • qII might be more suitable for agile deployments, while qI might be more suitable for good old waterfall;
  • In terms of data modeling, q1 needs a more temporal style, q2 might be more prone to dimensional;
  • Both qI and qII need to be aligned with Enterprise Data Models, q3 and q4 do not;
  • In terms of tooling, q1 needs more discover/analytic functionality, while q2 is more prone to reporting, dashboards, etc..;
  • Different kinds of databases might be required for the quadrants and even within the quadrants;
  • Ownership of definitions is in q1 more or less driven by source, while ownership of definitions in q2 is more or less driven by the product owner;
  • qII publishes information products which are certified, trusted, centrally maintained. qIV publishes information products where the creator bares all responsibilities regarding trustworthiness, change management, etc..
  • q1 the datamodel and the data logistics need to be standardized and automated as much as possible, there is more freedom (based on the requirements) in q2. In q4 is completely free how products are modeled.
  • q1 might be the domain of the IT department or even prone to be outsourced eventually. Other domains will probably be situated close to the requirements in an organization;
  • q3 is a suitable quadrant to situate sources which are hugely ad-hoc or use new innovative technologies;
  • Every quadrant requires a specific education and competencies profile;
  • etc..

 

About Leadership…

About Leadership…

If you are a good leader,

Who talks little,

They will say,

When your work is done,

And your aim fulfilled,

"We did it ourselves."

 

Lao-Tse (thx Jerry Weinberg)

You are not a customer……..

You are not a customer……..

W.Edwars Deming once said: “The customer is the most important part of the production line.”

In terms of engineering information-assets (getting relevant data to the end-user and doing valuable stuff with it) this still holds big value, but...there is a misconception I would like to stipulate. The misconception that users in an organisation are 'customers'. I hear that quite a lot and I find it disturbing and dangerous.

Please, please, please...do not...do never treat the recipients or users of information assets within your organisation a customer. These so-called customers are paid by the same people as you, whether you are an engineer, designer, architect, manager or the freakin coffee machine. You and this so-called customer are part of an organisation....

According to wikipedia, an organisation is a social entity that has a collective goal and is linked to an external environment. 

... joined by collective goals - that is why you are organised in a single entity called 'organisation'. Customer are those people or entities in the external environment - outside the organisation. Got it? 

These users, that are sometimes referred to as 'customers' are, as Deming intended, a vital part of the production process of Information assets. But with users there is more....They need to be involved and are as accountable as anyone in the organisation involved in making relevant information-assets with scarce resources.

These users can not say 'I am a customer, thou have to listen to me' or 'the data sucks, you gotta fix it' or 'the data does not reconcile, I will never use this again' or 'I know we have tool X, but I just acquired tool Y'. It is a jointly effort with joint accountabilities between engineer and user, where both are joint in the common goals of the organization.

So, if you are a BICC manager, an ETL developer, a data modeler, a BI consultant or whatever, and someone comes up to you saying 'I am a customer and I want X' - you know what to say.

Stop treating internal users of information assets as customers. 

 


Full circle; ‘geef mij de data, de software en de machine’

Full circle; ‘geef mij de data, de software en de machine’

De afgelopen 15 jaar hebben veel organisatie veel geïnvesteerd in de uitnutting van gegevens. Enerzijds door middel van de aanschaf van technologie, zowel hardware als software, anderzijds door middel van projecten met concreet op te leveren informatieproducten en de daarbij behorende training van ontwikkelaar tot gebruiker.

Dit zijn significante investeringen die vanzelfsprekend om een gezonde ROI vragen. En devraag die ik hier stel is of we de ROI op al gedane investeringen kunnen verhogen?

Het antwoord van een goede consultant zou moeten zijn ‘dat hangt van uw context af’. En daar heeft die consultant helemaal gelijk in, maar met dit schrijven wil ik een scenario schetsen dat al dan niet bij uw organisatie past. Ik zie dit scenario in toenemende mate langskomen – aan de lezer de vraag of hij dit scenario herkend in haar eigen context.

Organisatie X heeft veel geïnvesteerd in het voortbrengen van gegevens; ofwel gegevens geschikt maken voor velerlei gebruik; marketing automation, risk management, financieel management,  in-externe verantwoording en sturing, etc.. De vorm van dit gebruik uit zich vaak in de vorm van rapportages, dashboards of geleide analyses.

Om bovenstaande daadwerkelijk uit te nutten heeft organisatie X ook de kwaliteit van haar gegevens moeten verbeteren, processen moeten aanpassen, rollen en verantwoordelijkheden moeten herdefinieren en belanghebbenden moeten trainen.

De stijl van ‘ontwikkeling’ wordt nog vaak gekenmerkt door een hoge mate van systeem ontwikkeling. Ofwel; er moet bekend zijn wat er gemaakt moet worden, hoe het eruit ziet, welk gegevens nodig zijn, hoe die gegevens aangepast moeten worden, wie er toegang moet hebben etc.. In deze stijl van ontwikkelen is er een sterk onderscheid tussen diegenen die ‘maken’ en diegenen die ‘gebruiken’.  Bovendien krijg je te maken met de typische kenmerken van applicatie ontwikkeling zoals de gevreesde ‘voortbrengingsstraat’ (ontwikkel, test, acceptatie, productie) en versiebeheer.

Organisatie X heeft deze stijl van systematische ontwikkeling goed onder controle, maar ondervond Wijze van ontwikkelen enkele jaren terug toch dat de gebruiker niet echt heel tevreden was en dat de voorzieningen maar mager werden gebruikt. Eén van de hoofd oorzaken die werd genoemd is dat de gebruiker maar moeilijk in staat was om zijn wensen snel te kunnen realiseren.  De gebruiker had eigenlijk de behoefte om zelf voor een deel de ontwikkelaar te zijn. Organisatie X besloot (achteraf bezien) om een andere ontwikkelstijl te introduceren, namelijk die van de ‘gedelegeerde ontwikkeling’. Een ander type gebruiker ontstond; de gebruiker die vanuit zijn domein zelf rapportjes in elkaar ging klikken en zijn achterban voorzag.

Al klinkt dit eenvoudig, deze nieuwe stijl van ontwikkeling bracht met name voor de IT-afdeling grote hoofdbrekens; hoe zit het met versiebeheer? Wie test de rapportages? Wie beheert de rapportages? Wie zorgt ervoor dat de omgeving niet omver wordt getrokken? Maar ook daar is organisatie X goed uitgekomen.

Op dit moment voelt organisatie X nog steeds dat het rendement op de investeringen te laag zijn, dit wordt nog eens versterkt door de steeds toenemende dynamiek in de omgeving die vereist dat medewerkers van organisatie X sneller moeten kunnen reageren, meer vooruit moeten kunnen kijken en schaarse middelen steeds beter moeten inzetten.

Er ontstaat een groep gebruikers met eisen die helemaal in lijn zijn met bovenstaande dynamiek:

‘Ik weet nog niet wat ik nodig heb om mijn probleem op te lossen. Ik moet exploreren, experimenteren en leren. Dus geef mij data, geef mij software, geef mij machines’.

Wederom een nieuwe ontwikkelstijl, of toch niet? Het is namelijk de ontwikkelstijl waarbij de gebruiker gelijk is aan de ontwikkelaar. En waar hebben we dat eerder gezien? Juist – in de glorietijd van spreadsheets en lokale databases. Ook toen had de gebruiker zijn eigen data, zijn eigen software en zijn eigen machine. De cirkel is rond? Deze ‘opportunistische ontwikkelstijl’ vraagt een kanteling in denken van vele partijen binnen organisatie X.

Zo is er besloten om, naast de omgeving die systematisch (en gedelegeerd) informatie producten kan voortbrengen, ook een aparte omgeving te creëren:

  • Waar snel gegevens ter beschikking gesteld kunnen worden die qua volume niet of nauwelijks grenzen kent;
  • Waar gegevens overal vandaan kunnen komen;of het nu een data warehouse is, gegevens van een ketenpartner of een twitterfeed;
  • Waar geavanceerde visualisaties op uitgevoerd kunnen worden;
  • Waar geavanceerde analytische mogelijkheden zijn;
  • Waar de resultaten van deze analyses eenvoudig (mobiel) gedeeld kunnen worden;
  • Waar de omgeving op kosten-efficiënte en transparante wijze opgeschaald kan worden.

Organisatie X beseft ter dege dat deze opportunistische ontwikkelstijl governance uitdagingen kent, veel vragen van de vaardigheden van de gebruiker en zeker niet de systematische en gedelegeerde ontwikkelstijlen vervangen, laat staan het data warehouse – ze zijn allemaal nodig.

Heel bewust is er voor gekozen om hier achter te komen gedurende de reis waaraan men is begonnen. Soms kan de technologie helpen om te fungeren als breekijzer om veranderingen teweeg te brengen of te leren aan de hand van experimenten.

 

The democracy of data

The democracy of data

There is something going on for some time now, decades even. It all started with the arrival of the Internet where people voluntarily contributed data to, well, everyone who was interested. Data about themselves, their relationships, their adventures, their careers. This data was shared with consent of the owner of the data - although not everyone knew what the data was used for. So one might say that there was consent, but not informed consent.

Lets take it a step further and imagine....

What if our data  - generated by others - was given back to us and we could consent in an informed manner, that we wish to share this data for the greater good? For example; my tax information is my data, it is about me and I want to decide whether or not others can use this data. Or, suppose I can get a hold of my location/GPS data, showing all my movements. Or my point of sale data from the grocery store, showing my eating patterns. Suppose I can even get a hold of the data of the last MRI I took, my genome data or the data of my last blood test or even the data of a clinical test I was in?

Imagine....

What if I could decide to contribute this data (consensually) for the public good, where my privacy was Freedom-road-sign still being honoured? What if dozens of people would decide that? What if millions of people would decide that? Clinical research would never be the same again. We would be able to scan for patterns in seas of data consisting of environmental data and healthcare data. No more clinical trials with just 2000 people and ever-increasing smarter statistics. In this setting the healthcare specialists, the quants, the sociologists and the behavioural scientists would have an unprecedented test bed of data. Is there a correlation (or even causality) between aspects of travelling, career, eating patterns, social status and cancer? Suppose even several generations would contribute their data; what would that mean for clinical research? Mindblowing.... 

In the above I discussed data that was about myself and so I should be the one who should decide whether or not to share. But what about data that is ours? The government heavily sponsors research in many countries. Research on biology, behavioural science, economic science, climate science etc.. Shouldn’t the data generated by this research be public domain? I think it should...

What about data created by government - which is us. Data about im- and export movements, data regarding employment, schooling, law enforcement, crime, etc.. 

Imagine....

What would this democratization of data mean with regard to innovation? I think it would truly ignite a burst of possibilities and a huge potential for our general wellbeing. And no - I am not referring to the challenge of marketing handbags to middle age ladies (quote somewhat paraphrased from Neil Raden).

No, set the data free to go for the real challenges we face; decreasing poverty, climate control, improving healthcare, scarcity of resources, economic stability and decreasing crime.

This blogpost is hugely inspired by John Wilbanks - google the guy (!) -, all the Open Data initiatives of the world where governmental agencies free up their data, the technological possibilities of data storage, data deployment, data enrichment, data visualization and advanced analytics and finally...this blog is inspired by a deeply felt wish and conviction that our field of knowledge (data management and data utilisation) can make a contribution to a better place for us to live in.

The journey that never ends; the origins of data quality

The journey that never ends; the origins of data quality

I have always been fascinated by the true origins of modern-day phrases or trends in my domain – Information Management, data management in particular. It is like a challenge I give to myself, a puzzle waiting to be solved. Why you say? Well, Aristotle said it already:

‘If you would understand anything, observe its beginning and its development’.

I tend to collect first the modern-day writings about it, mostly by practioners. Then I go to the on-line science libraries and browse through ACM journals, MIS Quarterly journals, European Journal of Information Systems, IBM Systems Journal, Decision Support Systems, Journal of Management Information Systems and lately the journal of Data and Information Quality Research. And I am forgetting a whole lot. But, since the field of information management is a relatively young science, I tend to eventually end up in the more or less classic science domains; psychology, mathematics, engineering, etc..

Being on such a quest is like opening up an unprecedented series of presents given to me by brilliant men and women. There is so much out there that can easily be applied to other domains, for example, the information management domain.

With ‘Data Quality’ the same applied. I started with books of Thomas Redman, aka the data doc, of course Larry English, Danette McGilvray, David Loshin, Jack Olson and also Arkady Maydanchik can not be missed. And one cannot overlook the books written by Yang Lee, Richard Wang and Leo Pipino. The majority of these books however (with the exception of Lee, Wang and Pipino), lack the scientific rigor, the kind of Design Research approach as introduced by Alan Hevner in 2004 (published in MIS Quarterly). And although this type of research is relatively young, there are many scientific based papers out there that more or less adhere to several of the Design Research pre-requisites that aim to have scientific rigor and relevance in practice.

Since 2004 many papers on data quality have been published that are really precious to me, but it just was not good enough for me. I had not reached the true origins yet, so I felt. So I broadened the scope to ‘Quality’ in general. Quality in a manufacturing/engineering/services context pointed me in the direction of Shewhart, Demming, Juran, Crosby, Feigenbaum, Ishikawa, and also Peter Drucker. Boy – did I enjoy the writing of these guys (sorry, they were all men).

However, I slowly digressed into various domains that opened up Pandora’s box; the domain of coping with change, management theory, decision theory, group processes, system theory, system dynamics and much more. And although I studied on a university, economics, this was all new to me. 

Still not sure whether I have not being paying attention back in college or my university just sucked.

In between I entered into the field of Quality Software Management, not that odd I would say; on an abstract level one might argue that it is the sum of the above combined with software engineering and my own professional domain and the projects I undertook. Back then (and I still do) I felt that Gerald (Jerry) Weinberg seemed to have captured the soul of all these quality people combined with system theory, system dynamics, software engineering, a profound human perspective and a keen view on leadership and management (and why many current management models simply disfunction).

If anyone want to really go on a quest regarding ‘agile software development’; do not bother, start by reading the books of Jerry Weinberg. You will not find the word ‘agile’, but you will recognize it.

These books (and he wrote a whole lot) put me on a roller-coaster (which I am still on) that included exploratory testing, self-organizing teams, leadership, Kanban/Scrum/XP, CMMI, Six Sigma, etc…

I have so many books now, so many papers, so many subjects, so many loose ends…..it is ridiculous.

And it all started with ‘data quality’....

Am I done yet?

Hell no

Will I ever be done?

Hell no

Is it fun?

Hell yes

I need a second life, and a third...

The biology of data, Conformers & Regulators

The biology of data, Conformers & Regulators

Imagine 'Data' being an organism, a living being existing in an environment we tend to call 'organization'. In the field of biology there are two types of organisms in such an environment; conformers and regulators.

These two types have different mechanisms as to how they regulate their life system parameters (e.g. temperature, pH, blood glucose), in other words, how they maintain homeostasis. Homeostasis

Those organisms that conform allow the environment to determine several important life system parameters. A reptile for example uses a sun-heated rock in the morning to raise its body temperature. So - conformers needs to change its behavior to maintain homeostasis.

Those that regulate, maintain life system parameters at a constant level over possibly wide ambient environmental variations. It has the capability to adjust its metabolism to maintain homeostasis. In other words; we - human beings - are regulators.

Now, imagine 'Data' being a regulator. This would mean that 'Data' is capable of adapting, self-sufficiently, to different environmental variations. Now, in several discussions regarding data - especially Big Data - I encounter an assumption that data by itself can have value, meaning and relevance. And if there are lots of it, its value is even bigger!

That is just so wrong….

'Data' is definitely a conformer - its environment determines its life system parameters. Its value, it’s meaning, its relevance is - by definition - determined by the environment.

Still with me? Lets continue.

What is left of 'Data' - being a conformer - when it cannot correctly detect its environment? Or even worse, what happens to 'Data' when the environment is void? The answer is simple; it can not sustain, even worse, it can not exist. 'Data' is profoundly in existential trouble. 

Data needs context
Data needs metadata
Data needs definitions 

TDWI EU – Agile Data Warehousing; What is the buzz about

TDWI EU – Agile Data Warehousing; What is the buzz about

June 18th, just after the demise of the Dutch soccer team, I gave a seminar at TDWI-EU in Munich. I think me, Jos van Dongen and Frank Buytendijk where the only Dutch guys (speakers as well as attendee) at a succesful conference with over 700 attendees. 

I had a blast and was impressed by the seminars given by Jos van Dongen, Barry Devlin and Mike Ferguson. I learned a hell of a lot and had great laughs with these guys as well. 

20120619_144547To top things of I had some vegan food in Munich with a bunch of Pentaho folks that almost convinced me to join an even darker side (instead of SAS....lol). If you guys are reading this; thx for a great evening and showing us some architecture in Munich. 

Oh and please - there is no correlation between Pentaho and vegan food. How a meat-loving guy like me ends up in a vegan restaurant deserves a separate blogpost. Suffice it to say that Barry Devlin was to be blamed ;-). The food was great though!

The seminar regarding Agile Data Warehousing had something like 50 attendees I think. Which is pretty ok, considering my competition as well as considering the fact that Data Vault is not very known (yet) in Germany. I had some great questions, DSCF0701 especially from the guys/gallls who 'know' and have 'been in the trenches'. Lots of recognition and good questions and discussions were asked after the session and the days that followed.

It amazed me that large companies in Germany experimented already with Data Vault and data warehouse automation or were familiar with it and wanted to start a pilot. Beside the Scandinavian countries, Holland and Belgium; I think Germany will soon join and contribute to the community, which would be great. 

DSCF0697 DSCF0698

A new beginning

A new beginning

After you've done a thing the same way for two years, look it over carefully. After five years, look at it with suspicion. And after ten years, throw it away and start all over.  Alfred Edward Perlman, New York Times, 3 July 1958

Change-management1-2I started my career with 5 years of working in the service of companies who need data
management1 to support them in their efforts to be either more competitive or more reliable. Ahold, specifically Albert Heijn and the Pallas project2, formed the basis of my career in data management and was also the first jelled team I was part of3.

The next 5 years I worked for various consultancy firms; Vertis (now Ordina), I3 and Cibit. Every one of those taught me valuable lessons in what worked and what did not work. Within Cibit I began giving masterclasses data warehousing in depth and tutoring prep courses for the CBIP certification. 

The last 5 years I was an independent 'jiggler'4 with a mission; enhancing the rigor and relevance of data management. Data management was and still is a young field of expertise with a thin body of knowledge, in practice as well as in science. One of the reasons is the extremely limited cross-pollination of other fields of expertise with data management; Quality, Manufacturing, Operational Research, Logistics, Software Engineering, Psychology, Economics, etc.. Hundreds of books and huge piles of academic papers have littered/light up (?!) my path the last 5 years and resulted in various papers, blog posts, tweets, conferences and workshops. In my 'jiggling' efforts I increasingly was able to use this wonderful knowledge for the benefit of my clients. I like to belief that I have had some impact on how we can deliver cost-effective quality data products.

In this day and age, the field of data management and its uses for decision support/enhancement is growing at an unprecedented pace and is getting more and more complex and intertwined. Combined with an evermore accelerating technology advancement, data explosion and increasing demand, we find ourselves in a fascinating era. For an indy like me, keeping up is vital and increasingly challenging.

Instigated by the quality literature, the agile approach in software engineering and the problem solving leadership course5 I took in 2011, I got convinced that 'the team' - when empowered - can achieve astounding results in the non-linear world we are living in. And like they say - there is no 'I' in 'TEAM'...

In short; I decided to search for an environment where I can make a difference, where I can learn and contribute and be part of a team of kindred spirits that share the passion of the trade we are in.

SAS and myself go back over a decade where we worked together for various clients. And over the years I became impressed with the company (great place to work, the history), the SAS employees, the approach to the market (very much value based), their products (a.o interoperability, breadth of products, expertise in analytics), their Dataflux subsidiary and a consulting company I admire for quite some time; Baseline Consulting, featuring giants like Jill Dyche and Evan Levy.

In short; I am leaving the lonesome cowboy act and join the SAS stable as of September
1st 2012. As a Principle Business Solution Consultant I will remain to be focusing on (agile) data management, be part of the the SAS data management team, telling the story that needs to be told, continue blogging and tweeting and assisting customers in formulating their data management strategies and executing them.

Can't wait!

ps. A press release has been issued by SAS regarding my transfer (Dutch only)

Luckysundown_590

1 - The best description of data management in my opinion is the one used by DAMA;http://www.dama.org/i4a/pages/index.cfm?pageid=3548
2 - http://www.grey-matter.nl/pubs/AH_Pallas_NL.pdf
3 - http://www.amazon.com/Peopleware-Productive-Projects-Second-Edition/dp/0932633439
4 - J.Weinberg: 'A jiggler gets an organization unstuck by providing a small change in how the client sees the world'
5 - Jerry Weinberg/Esther Derby/Johanna Rothman


Privacy Policy

Copyright © 2018 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa
X