Scaling Writes in Neo4j | Greg Young


: 2

I was reading today a very interesting post by Max de Marzi @maxdemarzi  Scaling Writes in Neo4j

Basically he reaches the point of saying “write into rabbitmq, read from Neo4j”. This is actually a really good way of doing things but it can actually be even better if you mix eventsourcing into the problem.

In the blog post Max puts neo4j commands into the queue and then reads them out as fast as possible updating neo4j. There is also some dubious in memory batching that could result in data loss but for many circumstances that is acceptable.

There are two distinct issues that can come up in the model (and some opportunities for other cool stuff!). The first problem is that since you are putting neo commands in, you can only ever have it work with neo (wouldn’t it be even cooler to also be able to use the same exact mechanism to maintain a star schema?). The second problem is that it will require manual configuration (you need to explicitly create the rabbit queues). When there is only one thing listening its no big deal but if you get to twenty it becomes a problem in the post there is only one so this is a minor quip.

This particular use of Event Sourcing is exactly what I hit on in my polyglot data talk. With event sourcing you would not put the neo commands into the queue. Instead put events into the queue that represent the business concept that has happened say CustomerCreated or ContractTerminated. Then you would write a small handler of these events that created the neo4j commands something like (in a typescript type language):

CustomerCreated : (s,e) => { neo_add_node(, {e.CustomerName, e.CustomerAddress})},
CustomerCancelled : (s,e) => {neo_remove_node(…..)}

anyways you get the idea (its a projection off an event stream).

Using something like the EventStore would make this even better.

One of the concepts of Event Sourcing is that you keep all of your business events. Any time that you come up with a new projection (hey let’s start tracking relationships of x’s to y when blueberrries happen) you can just replay all your old events until you are caught up then continue with live events. Not only that but you can have many projections. I could for instance have one for neo4j, one for a sql cube, and another to mongodb. Business people decide that you need an elastic search model just fire up a new projection off those events and you can have one up and running.

Event Sourcing is very powerful in this model and the event store makes it even easier (it even provides HA assurances for your writes with consistency). Overall its a pretty simple setup to handle what would normally be a ton of complexity.