Case Studies | Greg Young
So there was a question on StackOverflow today that I got linked into. I decided to answer the question here instead.
There’s lots of information out there about CQRS and Event Sourcing. But who’s actually using it in practice / in production? I tried to find references on the Internet, but couldn’t find any.
(This is not really a programming question perhaps, but this seems to be the most appropriate place to ask. I got asked this yesterday when doing a presentation to colleagues on these topics.)
There is a lot wrong here. Event Sourcing in particular and in conjunction with “CQRS” (note its basically a pre-requisite if you want to query current state at all) is a very very old concept. There are many thousands of systems in existence that do things this way. In fact there are so many and its such a core concept that writing up a case study on us “doing it this way” is frankly a waste of time.
That “transaction file” thing in your database? Event Sourcing. There are countless systems using these ideas. Smalltalk images are built up this way. Research brings the ideas back to the dark ages. Can probably get further back than that but research time is expensive when you have lots of other things to do.
Deltas + Snapshots or Separating Reads from Writes are in no way new concepts.
But before we get into the dysfunctions let’s ask a really basic question. Does a Case Study from “Some Marketing Guy” or Pretentious Ivory Tower Architect #14 at We’re So F#%*ing Large You Can’t Imagine How Big Our Towers Are Corporation have anything at all to do with the success or failure of a project on your team?
Have you ever actually read these so called Case Studies? Most are terrible. Here let’s try some http://www.brainbench.com/pdf/CS_IBM.pdf this is from Brain Bench about their software. Go on read it. Is this a case study or a piece of marketing? Here let’s try these instead http://www-01.ibm.com/software/success/cssdb.nsf/topstoriesFM?OpenForm&Site=cognos&cty=en_us. How about from Oracle? http://www.oracle.com/technetwork/database/features/ha-casestudies-098033.html These are not “case studies” they are marketing.
So people who are managing risk want marketing materials that try to look like research to help them make their decisions? Interesting risk mitigation strategy. Perhaps they are mitigating their-ass-gets-shown-the-door risk by trying to show they did something/anything to mitigate risk.
Beyond that even if they were awesomely written well thought out discussions (which they aren’t). What applicability do they have to your team and environment? If we are talking about a TOOL then yes case studies may have merit (a switch/logging tool/etc in a similar organization as an example) but to write case studies about a CONCEPT? Next thing you know someone will patent it and sue Oracle for violating their patent, courts will award billions to the originator of the idea though he’s been dead for 1000 years.
Think for a moment what my “Case Study” might look like for the simplest CQRS system “We put our commands on this service and our queries on that service …”. Do you think the success or failure of the project was due to this decision? What I really want is a case study on the value of these case studies (with empirical data of how many perfectly good donuts have gone wasted)
The Real Game
Why are we getting these questions? What is the serious risk that people are trying to mitigate? It must be a pretty big risk if they want to do research and prove out this decision before looking at implementing it. If we are willing to spend 2 weeks on this we must be mitigating a pretty large risk later.
The serious risk is they want to implement CQRS + Event Sourcing everywhere. They want cookie cutter “architecture” (I use the term as loosely as possible) that they will follow everywhere. Yes if you attempted to do this with CQRS + ES it would be a massive risk. That’s precisely why we don’t do this.
CQRS is not a top level architecture.
CQRS is applied within a BC/component/whatever people want to call things tomorrow. It is not applied globally. When we talk about applying things like CQRS and ES we must leave the tower. If I can rewrite the whole thing from scratch in 9 days why are we spending two weeks “proving out” our ideas in meetings on whiteboards (the meetings are much more tolerable if you have coffee and donuts … but the best ones have a good fruit juice selection as well for future reference)? There might be some core places where this kind of risk mitigation is justified but they are few and far between.
The systems we discuss look at risk in a very different way. The are designed to be responsive to change and to minimize the costs of failure. Instead of spending the next 6 months designing the stuff welcome to our world of we-actually-do-stuff. Let’s actually build out that thing we were talking about. Week or two later we have it done. Its not some abstract picture on the wall. We actually did it. You would be amazed how much code you can write on a pair in a week.
I have had so many occasions where as a consultant I had people shut up and code. I have seen abstract ideas that were being discussed for two months implemented in an afternoon.
As Zed would say. Its CEBTENZZVAT, ZBGURESHPXRE DO YOU SPEAK IT? We get so far up our own asses drawing pretty pictures and discussing abstractly the most minute details of systems (of course before we generalize it to solve all the problems nobody has) we forget that our job is to actually do stuff.
Now this is not to say that we are without a net. We always manage risk just in different ways. We reduce costs of failures instead of doing loads of upfront analysis (sound familiar?). Our risk management is in the prevention of high impact due to change (strategic design). I have to admit one thing that has always made me laugh (and this is common) is when people spend 6 months choosing which database/platform they will use going through all sorts of “vetting” processes then have absolutely no concept of strategic design in their software.
“Well we spent all this time picking this stuff because its the decision we use everywhere”
Why is it I can get management to accept that for our analysis of what the software should do upfront doesn’t work but I can’t get them to accept the same concept about our hair brained upfront architectural risk decisions?
Think back to when you first started focusing on minimizing cost of failures instead of preventing risk (its a common theme). Are you dealing with the type that actually wanted “case studies” and weeks of analysis to help prevent the risk from the concept of minimizing cost of failure instead of preventing risk? OK then RUN! Hell even in waterfall there are backwards pointing arrows.
Disclaimer this post is deliberately far to one side to pull people bad to the reality. I do not actually believe that we should never do upfront risk mitigation I believe that we do way too much of it.