Manipulate your expression trees with elegance - CodeProject

:

Introduction

Expression trees are very usefull concept. Especially the possibility to build them using lambda expressions keeps your code type-safe, clear and clean and the references inside code are preserved. The problem starts when you want to manipulate your expression trees - for example combine them with each other. Then your work becomes more dirty. I propose a solution to keep your hands clean. It might be not very easy to understand, but once you get it, I bet you will find it helpful.

Background

When Microsoft introduced the expression trees and I met them for the first time, I was a little bit confused about the LINQ syntax, which allowes us to write SQL-like code to query databases. I was surprised - as young boy I was told that progremmer and database engineer are different persons and the data layer should be separated from the code. So why to mix things together like this? But then I realized the advantages:

  1. Your code is short and clean. If your logic is simple, you can run a query just in one short line of C# code. No more changes in another layers of the solution.
  2. Your code is type-safe. You cannot accidentaly send to SQL query anything else than what is expected. And if you change the type of a database column, the compiler will find the errors for you.
  3. Your code preserves the references. If some column is used in some query, you will find it easily directly in Visual Studio .NET. Together with type-safety it allows you to change database without being afraid that your application will become too buggy.
  4. You can pass complex conditions to db queries. I guess every experienced programmer remembers this: you have a db query in stored procedure and you would like to filter the results. If you need to filter by one or two parameters then it's ok, but what if you need to filter by unknown numbers of parameters? You don't want to end up with sending SQL-string conditions to the procedure... Which happened sometimes anyway. Shortly passing complex queries to database layers is not easy.

So far so good. But the last point is kind of tricky. Yes, you can create complex query in C# and yes, you can use it directly to ask the database. But the expression is still hard-coded in your code. What if you need to create the expression dynamically? Then you will probably do something like this (still recommended by Microsoft):

IQueryable<String> queryableData = companies.AsQueryable<string>();
ParameterExpression pe = Expression.Parameter(typeof(string), "company");
Expression left = Expression.Call(pe, typeof(string).GetMethod("ToLower", System.Type.EmptyTypes));
Expression right = Expression.Constant("coho winery");
Expression e1 = Expression.Equal(left, right);
left = Expression.Property(pe, typeof(string).GetProperty("Length"));
right = Expression.Constant(16, typeof(int));
Expression e2 = Expression.GreaterThan(left, right);
Expression predicateBody = Expression.OrElse(e1, e2);
MethodCallExpression whereCallExpression = Expression.Call(
	typeof(Queryable),
	"Where",
	new Type[] { queryableData.ElementType },
	queryableData.Expression,
	Expression.Lambda<Func<string, bool>>(predicateBody, new ParameterExpression[] { pe }));
MethodCallExpression orderByCallExpression = Expression.Call(
	typeof(Queryable),
	"OrderBy",
	new Type[] { queryableData.ElementType, queryableData.ElementType },
	whereCallExpression,
	Expression.Lambda<Func<string, string>>(pe, new ParameterExpression[] { pe }));
IQueryable<string> results = queryableData.Provider.CreateQuery<string>(orderByCallExpression);

As you can see, suddenly we lost all the advantages. The code is no more short and clean. It is not type-safe either: if you change data type of the property Length, the compiler won't notice this. And if you rename this property - also no compile-time error.

Note: This example shows expressinn building. For manipulating, the principes are the same: you work with type-unsafe Expression type descendants. Microsoft made it more easy by introducing (public) ExpressionVisitor class in .NET Framework 4, but it still doesn't solve the situation.

Typical solution

So what to do with this? I guess lots of programmers had to solve similar problems and decided to do the same as me: to encapsulate the dirty code to some helper class. The dirty stuff is here, it is allowed to be here, but nowhere else. Your dirty class has a nice API, it is maybe even type-safe, it is easy to call, and everything is OK. Until... Your API is not big enough, it does not cover all cases of expression-tree manipulation... what if you need more type parameters? What if you need to replace one of them with something else? What to do so my dirty class have an universal interface, and still would be type-safe and allowing to preserve the code references?

Proposed solution

Let's consider this scenario:

  • We have some database tables represented by ORM classes: User and Company. Every table has columns Id, Deleted, and some field with a name (UserName, CompanyName).
  • We need to write every (not deleted) row to the console: their fields Id and Name. We want to avoid loading other columns from the database.
  • The logic would be very similar for every table, so we also want to avoid copy-pasting but create some universal code instead.

So what we want to do, is to create a method implementing the described logic. Let's call it WriteAll. Because the logic will be universal for all ORM classes and every table has different column names, it will have generic parameter T. The columns Id and Deleted are common to every table, so we can use an interface for accessing those fields.

interface IMyEntity
{
    int Id { get; }

    bool Deleted { get; }
}

The tricky part of our method is to get only those columns we need. To do this we need to call the IQueryable.Select() method, so we have to prepare the expression parameter for this method. The result type of the expression will be some IdWithName object - which is just a container for Id and Name column, used for the output logic. But the problem is that the expression have to vary for each table, because every table has different name of "name" field (UserName vs CompanyName), so we cannot just write a lambda expression. Instead, we need to create the expression dynamically from two separate expressions:

  1. Name getter - (variable) function to get name from the database item, for example:
  2. Template - creates the IdWithName containing only the substitution for the name getter.

So our method including the Template could look like this:

static void WriteAll<T>(IQueryable<T> table, Func<T, string> nameGetter)
	where T : IMyEntity
{
	var items = table
		.Where(item => !item.Deleted)
		.Select(item => new IdWithName 
		{ 
			Id = item.Id, 
			Name = nameGetter(item) 
		});

	foreach (var item in items)
	{
		Console.WriteLine("{0}: {1}", item.Id, item.Name);
	}
}

And the method call includes the specific Name getter:

WriteAll(users, user => user.UserName);

You can compile the code, then you run it... And as you probably realize, you will get an error. We cannot run queries like this on database, because Entity Framework cannot recognize nameGetter delegate call. It cannot be translated into SQL. We need to pass it not as a delegate (Func<>) but as an expression (Expression<Func<>>). And if you use an expression instead of delegate, you cannot run it directly. If you want to call an expression tree, you have to compile it first by Compile() method. This will not solve our problem - not yet. The solution will come later.

static void WriteAll<T>(IQueryable<T> table, Expression<Func<T, string>> nameGetter)
	where T : IMyEntity
{
	var items = table
		.Where(item => !item.Deleted)
		.Select(item => new IdWithName
		{
			Id = item.Id,
			Name = nameGetter.Compile()(item)
		});

	foreach (var item in items)
	{
		Console.WriteLine("{0}: {1}", item.Id, item.Name);
	}
}

Well, there actually two problems in this code:

  1. You cannot use Deleted and Id properties for querying Entity Framework, because they are interface members. They must be replaced with specific ORM class members.
  2. You cannot use nameGetter reference, Compile() method call, and also calling the delegate. All these operations are impossible to translate into SQL.

To resolve these problems, we can use my two methods:

  • GetRidOfInterfaceCast - replaces the interface properties with the specific property on ORM classes
  • GetRidOfCompile - replaces the Compile() method and the delegate call with the substitution (see below)

GetRidOfInterfaceCast

The expression tree using the interface members has only one problematic part: interface cast. The ORM object is at first converted to interface and then there is a call of property with specific name. The GetRidOfInterfaceCast searches for such nodes in the tree and then it just removes the interface cast. It works like this:

 

GetRidOfCompile

 

The expression trees with Compile() methods are little bit more complicated. We need to take the source for Compile() method, which is nameGetter variable in our scenario, and evaluate it. The result is an expression tree, which is used to replace the delegate call - and if it has parameters (one in our scenario), they are replaced with sub-trees originally passed to the delegate call (P1, P2, etc. on the following schema).

 

So the actual substitution process should work according the following process diagram, where step 1 is parameter mapping (P1, P2, etc) and step 2 is showing replacement of the delegate call with the given expression tree.

And this expression is ready for processing by Entity Framework.

Using the code

Now we need to use our helper methods to clean up the used expressions. They are written directly as lambda expressions into parameters of methods Where() and Select(), so we cannot manipulate them there. We need to move the lambda declaration to variables, and then use my helper methods to clean up the variable values:

static void WriteAll<T>(IQueryable<T> table, Expression<Func<T, string>> nameGetter)
	where T : IMyEntity
{
	Expression<Func<T, bool>> @where = item => !item.Deleted;
	Expression<Func<T, IdWithName>> @select = 
		item => new IdWithName 
		{ 
			Id = item.Id, 
			Name = nameGetter.Compile()(item) 
		};

	var items = table
		.Where(where.GetRidOfInterfaceCast())
		.Select(select.GetRidOfInterfaceCast().GetRidOfCompile());

	foreach (var item in items)
	{
		Console.WriteLine("{0}: {1}", item.Id, item.Name);
	}
}

There are some examples in the attached ZIP showing what will actually happen to the expression trees while clean-up process.

Points of Interest

I developed those two methods like two or three years ago and since then I never needed anything else for expression manipulation. I consider this solution as very robust, because:

  • You don't have to use any low-level untyped Expression manipulation, only high-level strongly-typed Expression<Func<>>.
  • Your code does not require any type cast (which is potentially unsafe).
  • All code references are preserved and type-safe.
  • The solution is universal for all expression types regadless how many parameters they might have and which type they are.

History