Update: Get source code for this project on CodePlex! Note the DLR is still being developed and hence a “moving target”. Some (or many) of the items discussed here may change
Part 1 of this post gave a quick overview of writing a custom “parser generator” for Excel formulas. Here we’ll take a look at a way to bind that to the DLR. You can try out a sample Silverlight 2.0 application here or by clicking the image below:
The goal here is to create a calculation engine that understands Excel-like formulas, allowing for cell dependencies, simple ranges and custom functions. In the screenshot above, a few identifiers (D5 – D7) have been assigned static values, then included in an expression later on.
Global math functions
Excel provides many global math functions. I implemented a small subset of these as static methods like so:
/// <summary> /// Excel global function implementations /// </summary> /// <remarks>Include an optional CodeContext as the first parameter if required.</remarks> public class ExcelHelpers { public static double Exp( double d ) { return Math.Exp( d ); } public static decimal Abs( decimal d ) { return Math.Abs( d ); } public static double Pow( decimal x, decimal y ) { return Math.Pow( (double)x, (double)y ); } public static DateTime Today() { return DateTime.Today; } public static decimal Min( params decimal[] p ) { return p.Min(); } public static decimal Max( params decimal[] p ) { return p.Max(); } public static decimal Round( decimal x, int decimals ) { return Math.Round( x, decimals ); } public static double RoundDown( decimal number, int digits ) { var x = Math.Pow( 10, digits ); // 1, 10, 100, 1000, 10000 var y = (double)number * x; return Math.Round( Math.Floor( y ) / x, digits < 0 ? 0 : digits ); } public static double RoundUp( decimal number, int digits ) { var x = Math.Pow( 10, digits ); // 1, 10, 100, 1000, 10000 var y = (double)number * x; bool isExact = Math.Floor( y ) == y; return Math.Round( Math.Floor( y + ( isExact ? 0 : 1 ) ) / x, digits < 0 ? 0 : digits ); } public static bool And( params bool[] conditions ) { return conditions.All( i => i ); } public static bool Or( params bool[] conditions ) { return conditions.Any( i => i ); } public static string Left( string text, int count ) { return !string.IsNullOrEmpty( text ) ? text.Substring( 0, count ) : ""; } public static string Right( string text, int count ) { return !string.IsNullOrEmpty( text ) ? text.Substring( text.Length - count ) : ""; } public static string Mid( string text, int start, int characters ) { if ( string.IsNullOrEmpty( text ) ) return ""; var length = text.Length; var startPos = start < 0 ? 0 : start - 1; //Javascript is 0 based index, excel is 1 based. var returnLength = length - characters; return ( characters <= 0 || startPos > length || returnLength < 0 ) // Invalid bound, return blank string ? "" : text.Substring( startPos, characters ); } public static object Sum( params decimal[] values ) { return values.Sum(); } public static object Search( string find, string withIn ) { return Search( find, withIn, 1 ); } public static object Search( string find, string withIn, int start ) { if ( string.IsNullOrEmpty( find ) ) return null; string search = withIn.Replace( "~*", "*" ).Replace( "~?", "?" ); var pos = search.IndexOf( find, start - 1, StringComparison.InvariantCultureIgnoreCase ); return pos > -1 ? (int?)( pos + 1 ) : null; } public static string Dump( CodeContext context ) { return Dump( context, false ); } public static string Dump( CodeContext context, bool verbose ) { var scope = context.Scope as WorksheetScope; var result = new System.Text.StringBuilder(); foreach ( var cell in scope.Worksheet.Cells ) { if ( result.Length > 0 ) result.AppendLine(); var expression = verbose ? cell.Value.Formula.Expression.DebugView : cell.Value.Formula.ExpressionText; result.AppendFormat( "{0} = {1}{2}", cell.Key, cell.Value.Value, expression != null ? " [ = " + expression + " ]" : null ); } return result.ToString(); }
These are all pretty trivial, but notice the Sum method where we’re able to take an array of decimals and use LINQ to sum the values together (go LINQ!
).
You can add any method overrides you like – later we’ll hand them to the DLR and let it figure out the best one to call. If you try the calculator demo above, you should be able to try these out (Dump() and Dump(true) can be quite revealing
).
Building the expression tree
In Part 1 I described using TinyPG to create a parser. Once you’ve defined your grammar (and fixed any errors) it generates some C# partial classes. The only one you really need to think about is ParseTree.cs. Here’s my customized part of the class (remember it’s a partial class, so no need to touch the TinyPG generated file):
public partial class ParseTree { private Dictionary<Operators, ExcelOperationBinder> opBinders = new Dictionary<Operators, ExcelOperationBinder>(); private ExcelContext context; private Dictionary<SymbolId, ParameterExpression> variables = new Dictionary<SymbolId, ParameterExpression>(); public Expression GetExpression( ExcelContext context ) { this.context = context; return GetNodeExpression( this ); } private Expression GetNodeExpression( ParseNode node ) { switch ( node.Token.Type ) { case TokenType.Start: return GetStartExpression( node ); case TokenType.Assignment: return GetAssignmentExpression( node ); case TokenType.CompareExpr: return GetComparisonExpression( node ); case TokenType.AddExpr: return GetAddExpression( node ); case TokenType.MultExpr: return GetMultiplyExpression( node ); case TokenType.Atom: return GetAtomExpression( node ); case TokenType.Range: return GetRangeExpression( node ); case TokenType.TRUE: return Expression.Constant( true ); case TokenType.FALSE: return Expression.Constant( false ); case TokenType.NULL: return Expression.Constant( null ); case TokenType.STRING: return GetString( node ); case TokenType.INTEGER: { int iVal; var value = node.Token.Text; return int.TryParse( value, out iVal ) ? Expression.Constant( iVal ) : Expression.Constant( long.Parse( value ) ); } case TokenType.NUMBER: return Expression.Constant( decimal.Parse( node.Token.Text ) ); case TokenType.Identifier: return GetVariableExpression( node ); case TokenType.Method: return GetMethodExpression( node ); } return Expression.Constant( node.Token.Text ); } private IEnumerable<Expression> GetNodeExpressions( ParseNode node ) { var expression = GetNodeExpression( node ); if ( expression.Type != typeof( Range ) ) return new[] { expression }; var range = (Range)( (ConstantExpression)expression ).Value; return from s in range.GetSymbols() select GetVariableExpression( s ); } private Expression GetStartExpression( ParseNode node ) { return GetNodeExpression( node.Nodes[ 0 ] ); } private Expression GetAssignmentExpression( ParseNode node ) { return Expression.Assign( GetVariableExpression( node.Nodes[ 0 ].Token.Text ), Utils.Convert( GetNodeExpression( node.Nodes[ 2 ] ), typeof( object ) ) ); } private Expression GetComparisonExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.COMPARE ) { switch ( n.Token.Text ) { case "=": case "==": oper = OpBinder( Operators.Equals ); break; case ">": oper = OpBinder( Operators.GreaterThan ); break; case ">=": oper = OpBinder( Operators.GreaterThanOrEqual ); break; case "<": oper = OpBinder( Operators.LessThan ); break; case "<=": oper = OpBinder( Operators.LessThanOrEqual ); break; case "!=": case "<>": oper = OpBinder( Operators.NotEquals ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetAddExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.PLUSMINUS ) { switch ( n.Token.Text ) { case "+": oper = OpBinder( Operators.Add ); break; case "-": oper = OpBinder( Operators.Subtract ); break; case "&": oper = OpBinder( Operators.Concatenate ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null || oper == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetMultiplyExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.MULTDIV ) { switch ( n.Token.Text ) { case "*": oper = OpBinder( Operators.Multiply ); break; case "/": oper = OpBinder( Operators.Divide ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null || oper == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetAtomExpression( ParseNode node ) { foreach ( var n in node.Nodes ) { if ( n.Token.Type != TokenType.BROPEN && n.Token.Type != TokenType.BRCLOSE ) return GetNodeExpression( n ); } return null; } private Expression GetVariableExpression( ParseNode node ) { return GetVariableExpression( node.Nodes.First().Token.Text ); } private Expression GetVariableExpression( string id ) { return GetVariableExpression( SymbolTable.StringToCaseInsensitiveId( id ) ); } private Expression GetVariableExpression( SymbolId symbol ) { return variables.ContainsKey( symbol ) ? variables[ symbol ] : variables[ symbol ] = Expression.Variable( typeof( object ), symbol.ToString() ); } private Expression GetRangeExpression( ParseNode node ) { var cellIds = ( from n in node.Nodes where n.Token.Type == TokenType.CELLID select new CellId( n.Token.Text ) ).ToArray(); return ( cellIds.Length > 1 ) ? Expression.Constant( new Range( cellIds ) ) : GetVariableExpression( cellIds.First().Symbol ); } private Expression GetString( ParseNode node ) { var text = node.Token.Text; return Expression.Constant( text.Substring( 1, text.Length - 2 ) ); } private Expression GetMethodExpression( ParseNode node ) { var parameters = ( from n in node.Nodes where n.Token.Type == TokenType.Params from p in n.Nodes where p.Token.Type != TokenType.COMMA from e in GetNodeExpressions( p ) select e ).ToArray(); var name = node.Nodes.First().Token.Text; if ( name.ToLower() == "iif" ) return GetConditionExpression( parameters ); return Expression.Dynamic( context.CreateCallBinder( name, true, new CallInfo( parameters.Length ) ), typeof( object ), new[] { Utils.CodeContext() }.Union( parameters ).ToArray() ); } private Expression GetConditionExpression( Expression[] args ) { return Expression.Condition( Utils.Convert( args[ 0 ], typeof( bool ) ), Utils.Convert( args[ 1 ], typeof( object ) ), Utils.Convert( args[ 2 ], typeof( object ) ) ); } private ExcelOperationBinder OpBinder( Operators oper ) { return opBinders.ContainsKey( oper ) ? opBinders[ oper ] : opBinders[ oper ] = new ExcelOperationBinder( context.Binder as ExcelBinder, Utils.CodeContext(), oper ); } }
All it does is recursively walk the parse tree and emit a DLR expression tree (Abstract Syntax Tree). It has some basic handling for binary operators (including Excel / VB’s ‘&’ concatenation operator). For some operations (such as binary operators and method calls) it creates DLR DynamicExpression nodes, which hold “just enough” information to define the operation to be performed.
Binders
Before the DLR can make any sense of your DynamicExpressions you need to give it some guidance about how to perform your operations. To do this you create custom “language binders”, whose job is to provide the DLR with rules defining how to perform operations on various data types. For this project I created the following classes:
- ExcelContext – Derives from LanguageContext and provides some compilation helper methods and overrides to create related binders.
- ExcelBinder – Defines standard operations for various data types and which conversions are valid.
- ExcelCallAction – Maps “method call” actions to static methods on a helper class (see ExcelHelpers).
- ExcelConversionBinder – Allows conversions to be overridden; currently just a placeholder.
- ExcelOperationBinder – Defines how binary operations (add, multiply, comparison operators etc) should behave, performing some basic type coercion.
Operations
You can control the implicit type coercions the DLR is allowed to perform by defining “extension types”. For each data type, you can create a static class with helper methods to perform the low-level operations. These methods are marked with the [SpecialName] and [ImplicitConversionMethod] attributes. You can see some examples here (for this project I wanted to treat all non-integer numbers as decimals).
Other interesting bits
I defined a few classes to represent formulas, engines, cells and worksheets. The idea is you create an engine and populate it with formulas, then use that engine to create one or more worksheets. Everything is created / compiled on demand and invalidated appropriately when dependencies change:
- Cell – Represents an instance of a Formula in a Worksheet.
- Worksheet – Contains all Cells and an Engine reference.
- Formula – Contains an ID, expression text, default value and cached expression tree.
- Engine – Contains all Formulas.
- Invoker – Caches dynamic delegates (see related post here).
Try it!
You can find the source for this project on CodePlex and you can try the Silverlight calculator test here
Don’t forget to look at the retirement planner example I posted too:
It’s built on an extended version of this project, able to load an Excel-like worksheet (including lookup tables and some scary long expressions). As you drag the “retirement age” slider it’s recalculating several hundred table values (pretty quickly thanks to the DLR’s call site caching) and kicking off a tween-like animation (sometimes it’ll slow down as the axis scales etc; that’s just an animation issue rather than a DLR / engine problem
).




24 comments
Comments feed for this article
March 23, 2009 at 1:34 am
Excel Formulas and the DLR – Part 1 « Chris Cavanagh’s Blog
[...] February 19, 2009 · 3 Comments Update: Part 2 now available here! [...]
May 2, 2009 at 11:37 pm
Jorge
Cool Tool. How managed accidental reference circular and recalculation order?
From SimpleCalcTest , I do:
B1=2
B2=B1+2
(dump() -> B1 say 2 , B2 say 4)
B1 = 4
(dump() -> B1 say 4 , B2 say 4 ????)
thanks.
Jorge
May 3, 2009 at 12:04 am
Chris Cavanagh
Jorge – Interesting! It seems I may have broken something
I’ll let you know when it’s fixed (or I find an explanation).
May 3, 2009 at 12:14 am
Chris Cavanagh
Jorge – Actually it’s deliberate, but clearly not ideal
. If you try the same thing in code you’ll find B2′s value updates as expected. The “inline assignment” currently implemented just takes the result and (in this case) puts it in B2 (effectively saying B2=2+2). I’ll post back here when I’ve improved it.
May 3, 2009 at 12:44 am
Chris Cavanagh
Jorge – Fixed bug and updated source
Thanks for the feedback!
May 4, 2009 at 6:46 am
Jorge
Hi Chris,
Great! Thanks!
In your roadmap considered
* Matrix/Array formulas (‘spread’ results with helper function like CELLRESULT(cellreference,coordX,coordY )
Two TIPS for your:
(1) NATURAL ORDER RECALCULATION
Please considered, the use natural order recalculation, which is a method of ordering the computations such that each cell’s value is computed only after the values for all cells on which it depends have been computed. Natural order recalculation guarantees that cells are always computed correctly, regardless of the order in which the cell formulas were entered.
(2) PROJECT LIKE BUT INTERPRETED
This project, handle and resolve problem related of excel-like calculation engines, talk in special of natural order recalculation and other issues…
http://www.codeproject.com/KB/vb/FormulaEngine.aspx
Thanks,
Jorge
May 15, 2009 at 10:37 pm
Chris Cavanagh
Jorge – Thanks for the tips! It’s already implementing “natural order” recalculation… or something very like it I hope
Basically when the expression is compiled for each cell, a list of dependencies is created (references to other cells). It keeps track of runtime dependencies too, but the sample project doesn’t demonstrate that. Each cell builds its expression delegate and refreshes its calculated value on demand, caching the result. If a dependency changes, the cell is marked dirty and will be recalculated next time its value is requested. As all values are calculated (or retrieved from cache) on demand, natural order is ensured.
Please let me know if I misunderstood your comments. Thanks for the feedback!
July 3, 2009 at 10:05 am
sami
How we can use logical expression, example:
if(, value1, value2)??
Thanks
July 3, 2009 at 11:31 am
Chris Cavanagh
sami – The ‘ExcelHelpers’ class contains implementations of some Excel global functions, but IF is a special case. The engine supports Excel’s IF function with a ConditionalExpression (so only the matching result is evaluated).
July 6, 2009 at 7:03 am
Herre Kuijpers
Hi Chris,
It’s nice to see the TinyPG is finally put to some good use
Interestingly enough I was working on creating an excel formula parser myself initially when I decided I needed a compiler compiler. Hence the TinyPG project was borne
My idea was to have an excel parser that could not only parse just excel formulas, but parse actual excel sheets! This could then be used as a very easy way to configure a rule engine.
It is basically the same idea as what you did for the retirement planner.
Is it ok with you if I link from the TinyPG project to this project as a reference for use? I think the project shows nicely how TinyPG can be put to use… and that it actually works
July 6, 2009 at 8:00 am
Chris Cavanagh
Herre – Thanks for the feedback; I’d welcome a link from TinyPG
It saved me a lot of time and works great!
July 6, 2009 at 10:07 am
sami
Thank you Chris for the answer.
But is it possible to modify ‘ExcelHelpers’ class to have an IF function.
For example, if I go to this link (http://www.chriscavanagh.com/chris/SimpleCalcTest/Cjc.ExpressionEngine.SimpleCalcTestTestPage.aspx), I could type:
a=3 ENTER
IF(a=3,a+2,a+1) ENTER
and result “5″ will be displayed.
My objective is to provide user the possibility in the application to enter directly formulas with conditional expressions.
Thank you.
July 6, 2009 at 10:38 am
Chris Cavanagh
sami – You’ve found a bug
The code is currently looking for “IIF” instead of “IF”. However, even if you (temporarily) use IIF, it’s throwing an argument exception.
I’ll get the project updated to the latest DLR version and post here when the bug is fixed
July 7, 2009 at 7:45 am
sami
Great !!
Thank you
November 13, 2009 at 4:15 pm
Pierre
I when to the download page and got the “dlrexpressionengine-16143.zip” version.
1) I do not know where to get the Microsoft.Scripting.dll and the Microsoft.scripting.Core.dll it ask for.
2) The silverlight directory is empty so I cannot run the example.
3) The silverlight project is calling the System.data since it tries to use the LINK but that dll is not available in Silverlight.
4) None of the code compile wint VS8 SP1 on Vista64.
What am I missing?
November 13, 2009 at 4:16 pm
Pierre
Sorry, not LINK but LINQ
November 13, 2009 at 8:46 pm
Chris Cavanagh
Pierre – The Microsoft.Scripting assemblies are/were part of the Dynamic Language Runtime (http://dlr.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=34834). Unfortunately my project was built against an older version of the DLR and I neglected to keep it up to date. It’s likely I’ll update it based on the DLR in .NET 4.0 if there’s enough interest. Let me know if that might work for you…
November 13, 2009 at 8:50 pm
Chris Cavanagh
Pierre – Also, the Silverlight folder you mentioned is just there for the build output. The source is shared between the Silverlight and .NET projects – it could be I accidentally left a reference/dependency in there; I’ll take a closer look. LINQ itself is fully supported by Silverlight though, so that shouldn’t be an issue.
February 15, 2010 at 3:37 am
Markus
Hi, are you going to update the project to .NET 4.0?
February 15, 2010 at 8:18 am
Chris Cavanagh
Markus – It’s likely I’ll update it when .NET 4 releases if there’s some interest
May 21, 2010 at 12:57 am
Hannes
Hello Chris,
Is this source compatible with dot.net 4.0 ? I have some problem getting started…..
June 8, 2010 at 7:57 pm
Jonathan
hi,Chris,
i have download this project source code,and i have add the DLR Runtime dll,but this project can’t be successful build.what should i do ?
please help me !thanks!
June 8, 2010 at 8:41 pm
Chris Cavanagh
Jonathan – Sorry for the confusion! The DLR is now part of .NET itself (since .NET 4.0) and this project needs some updating. However if you’re using an older version of .NET, send me a sample of the errors you’re getting and I’ll take a closer look
June 8, 2010 at 9:06 pm
Jonathan
thanks for your reply!
the version I use is .Net 4.0(Silverlight 4),how to update this project?