Why Repository SaveUpdate is a smell | Ian Cooper
One of the idioms I see a fair amount of is the SaveUpdate method on a Repository. The intent is usually to persist an item to storage, using a SaveUpdate method to lazily avoid the question of whether it is a new object, or just one you are updating.
I have a couple of issues with this approach. The first is that a Repository is intended to have collection semantics and encapsulate the method used to obtain the data. Adding SaveUpdate makes it explicit that your repository is backed with some kind of persistent storage – why else would you need to save it. But a Repository does not encapsulate access to a Db, a repository is simply a collection of entities (in DDD aggregate roots) which is agnostic about where the entities are stored. A Repository might be entirely in memory, it might abstract a REST API on a separate service, then options are legion.
In most cases I would prefer that we have a similar interface to a collection: an Add method because I want to add new items into the collection, but not an update, because I should only need to change an object referenced from the collection for that.
Working with NHibernate (NH) this also works with NH’s paradigm rather than against it. This is my second issue: you should rarely need to ever call SaveUpdate in NH, and only rarely call Save.
NH distinguishes between two types of objects: transient and persistent. A persistent object is one that is in the Db, a transient one is in memory. When you Save against an NH session a transient object becomes a persistent one. When an object is loaded from the Db it is a persistent object. The key to this is the Identity Map pattern – NH has a list of all the objects loaded from the Db, so that it knows to serve them back up to you, if you ask for the object representing a row again during the session. The idea is that you want to use the version with any changes you have already made, not a fresh copy from the Db.
When NH flushes a session, or commits a transaction, it does a dirty check for all the persistent objects in your Identity Map, and saves any changes. That’s it, no need to call SaveUpdate, just flush the session or commit a transaction and your persistent objects will be saved. So Update is meaningless to call explicitly, NH will do it for you. (Understanding this is important, because auto-flush will commit dirty writes when your session terminates, even if you threw an exception; this may not be what you desired, in which case you should not autoflush but rely on explicit transactions).
This fits nicely with our Repository as collection metaphor: we don’t need to do anything with objects that already part of the collection post-updating them.
In addition NH supports persistence by cascade – you know those options you set on many-to-one, many–to-many relationships etc. Provided your cascade options require NH to follow a relationship, a persistent object will also save children in its object graph. Now this does not make any difference for already persistent objects, but transient ones that can be reached by following a cascade will be saved too.
This means that you do not need to call Save for an object that is added to the graph of an already persistent the dirty check will pick it up – so adding a new object in this case does not require a call to Save.
This makes sense for our Repository pattern, we don’t tend to call Add for the children in the object graph of entities we add to a collection, and there. It is also worth noting that this lends itself to the DDD idea of an Aggregate Root being what a Repository loads, we need to load root objects, not all their children. (Sadly NH does not really support the notion of a coarse-grained lock, which aggregates try to implement, but that’s a seperate issue).
This means that we only ever need to Save i.e. Add to our Repository, on new transient objects that have no association with existing objects (in DDD terms likely to be Aggregate Roots). There are usually a lot less of these than other kinds of entities. So once you challenge the idea of doing SaveUpdate, you naturally begin to move toward thinking about a Repository loading an object graph, and not an entity. By doing this you are both encapsulating your access to whatever backs the repository, and in the case of an Object-Relational Mapper (ORM) working with the ORM. A good ORM is intended to support the OO paradigm, if you don’t exploit its power to work seamlessly, but rely on old school explicitly CRUD operations on a Data Mapper then you miss some of the strengths of a well written ORM.
If you want to have explicit knowledge of the Db, and find it more comfortable to know that you are saving to the Db, use the Data Mapper pattern, not a Repository. Otherwise, consider that explicitly calling SaveUpdate (or synonyms of that) may well be a smell that you are pusing too much information about how persistence is done into your model.