Update: Get source code for this project on CodePlex! Note the DLR is still being developed and hence a “moving target”. Some (or many) of the items discussed here may change
Part 1 of this post gave a quick overview of writing a custom “parser generator” for Excel formulas. Here we’ll take a look at a way to bind that to the DLR. You can try out a sample Silverlight 2.0 application here or by clicking the image below:
The goal here is to create a calculation engine that understands Excel-like formulas, allowing for cell dependencies, simple ranges and custom functions. In the screenshot above, a few identifiers (D5 – D7) have been assigned static values, then included in an expression later on.
Global math functions
Excel provides many global math functions. I implemented a small subset of these as static methods like so:
/// <summary> /// Excel global function implementations /// </summary> /// <remarks>Include an optional CodeContext as the first parameter if required.</remarks> public class ExcelHelpers { public static double Exp( double d ) { return Math.Exp( d ); } public static decimal Abs( decimal d ) { return Math.Abs( d ); } public static double Pow( decimal x, decimal y ) { return Math.Pow( (double)x, (double)y ); } public static DateTime Today() { return DateTime.Today; } public static decimal Min( params decimal[] p ) { return p.Min(); } public static decimal Max( params decimal[] p ) { return p.Max(); } public static decimal Round( decimal x, int decimals ) { return Math.Round( x, decimals ); } public static double RoundDown( decimal number, int digits ) { var x = Math.Pow( 10, digits ); // 1, 10, 100, 1000, 10000 var y = (double)number * x; return Math.Round( Math.Floor( y ) / x, digits < 0 ? 0 : digits ); } public static double RoundUp( decimal number, int digits ) { var x = Math.Pow( 10, digits ); // 1, 10, 100, 1000, 10000 var y = (double)number * x; bool isExact = Math.Floor( y ) == y; return Math.Round( Math.Floor( y + ( isExact ? 0 : 1 ) ) / x, digits < 0 ? 0 : digits ); } public static bool And( params bool[] conditions ) { return conditions.All( i => i ); } public static bool Or( params bool[] conditions ) { return conditions.Any( i => i ); } public static string Left( string text, int count ) { return !string.IsNullOrEmpty( text ) ? text.Substring( 0, count ) : ""; } public static string Right( string text, int count ) { return !string.IsNullOrEmpty( text ) ? text.Substring( text.Length - count ) : ""; } public static string Mid( string text, int start, int characters ) { if ( string.IsNullOrEmpty( text ) ) return ""; var length = text.Length; var startPos = start < 0 ? 0 : start - 1; //Javascript is 0 based index, excel is 1 based. var returnLength = length - characters; return ( characters <= 0 || startPos > length || returnLength < 0 ) // Invalid bound, return blank string ? "" : text.Substring( startPos, characters ); } public static object Sum( params decimal[] values ) { return values.Sum(); } public static object Search( string find, string withIn ) { return Search( find, withIn, 1 ); } public static object Search( string find, string withIn, int start ) { if ( string.IsNullOrEmpty( find ) ) return null; string search = withIn.Replace( "~*", "*" ).Replace( "~?", "?" ); var pos = search.IndexOf( find, start - 1, StringComparison.InvariantCultureIgnoreCase ); return pos > -1 ? (int?)( pos + 1 ) : null; } public static string Dump( CodeContext context ) { return Dump( context, false ); } public static string Dump( CodeContext context, bool verbose ) { var scope = context.Scope as WorksheetScope; var result = new System.Text.StringBuilder(); foreach ( var cell in scope.Worksheet.Cells ) { if ( result.Length > 0 ) result.AppendLine(); var expression = verbose ? cell.Value.Formula.Expression.DebugView : cell.Value.Formula.ExpressionText; result.AppendFormat( "{0} = {1}{2}", cell.Key, cell.Value.Value, expression != null ? " [ = " + expression + " ]" : null ); } return result.ToString(); }
These are all pretty trivial, but notice the Sum method where we’re able to take an array of decimals and use LINQ to sum the values together (go LINQ!
).
You can add any method overrides you like – later we’ll hand them to the DLR and let it figure out the best one to call. If you try the calculator demo above, you should be able to try these out (Dump() and Dump(true) can be quite revealing
).
Building the expression tree
In Part 1 I described using TinyPG to create a parser. Once you’ve defined your grammar (and fixed any errors) it generates some C# partial classes. The only one you really need to think about is ParseTree.cs. Here’s my customized part of the class (remember it’s a partial class, so no need to touch the TinyPG generated file):
public partial class ParseTree { private Dictionary<Operators, ExcelOperationBinder> opBinders = new Dictionary<Operators, ExcelOperationBinder>(); private ExcelContext context; private Dictionary<SymbolId, ParameterExpression> variables = new Dictionary<SymbolId, ParameterExpression>(); public Expression GetExpression( ExcelContext context ) { this.context = context; return GetNodeExpression( this ); } private Expression GetNodeExpression( ParseNode node ) { switch ( node.Token.Type ) { case TokenType.Start: return GetStartExpression( node ); case TokenType.Assignment: return GetAssignmentExpression( node ); case TokenType.CompareExpr: return GetComparisonExpression( node ); case TokenType.AddExpr: return GetAddExpression( node ); case TokenType.MultExpr: return GetMultiplyExpression( node ); case TokenType.Atom: return GetAtomExpression( node ); case TokenType.Range: return GetRangeExpression( node ); case TokenType.TRUE: return Expression.Constant( true ); case TokenType.FALSE: return Expression.Constant( false ); case TokenType.NULL: return Expression.Constant( null ); case TokenType.STRING: return GetString( node ); case TokenType.INTEGER: { int iVal; var value = node.Token.Text; return int.TryParse( value, out iVal ) ? Expression.Constant( iVal ) : Expression.Constant( long.Parse( value ) ); } case TokenType.NUMBER: return Expression.Constant( decimal.Parse( node.Token.Text ) ); case TokenType.Identifier: return GetVariableExpression( node ); case TokenType.Method: return GetMethodExpression( node ); } return Expression.Constant( node.Token.Text ); } private IEnumerable<Expression> GetNodeExpressions( ParseNode node ) { var expression = GetNodeExpression( node ); if ( expression.Type != typeof( Range ) ) return new[] { expression }; var range = (Range)( (ConstantExpression)expression ).Value; return from s in range.GetSymbols() select GetVariableExpression( s ); } private Expression GetStartExpression( ParseNode node ) { return GetNodeExpression( node.Nodes[ 0 ] ); } private Expression GetAssignmentExpression( ParseNode node ) { return Expression.Assign( GetVariableExpression( node.Nodes[ 0 ].Token.Text ), Utils.Convert( GetNodeExpression( node.Nodes[ 2 ] ), typeof( object ) ) ); } private Expression GetComparisonExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.COMPARE ) { switch ( n.Token.Text ) { case "=": case "==": oper = OpBinder( Operators.Equals ); break; case ">": oper = OpBinder( Operators.GreaterThan ); break; case ">=": oper = OpBinder( Operators.GreaterThanOrEqual ); break; case "<": oper = OpBinder( Operators.LessThan ); break; case "<=": oper = OpBinder( Operators.LessThanOrEqual ); break; case "!=": case "<>": oper = OpBinder( Operators.NotEquals ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetAddExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.PLUSMINUS ) { switch ( n.Token.Text ) { case "+": oper = OpBinder( Operators.Add ); break; case "-": oper = OpBinder( Operators.Subtract ); break; case "&": oper = OpBinder( Operators.Concatenate ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null || oper == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetMultiplyExpression( ParseNode node ) { ExcelOperationBinder oper = null; Expression expr = null; foreach ( var n in node.Nodes ) { if ( n.Token.Type == TokenType.MULTDIV ) { switch ( n.Token.Text ) { case "*": oper = OpBinder( Operators.Multiply ); break; case "/": oper = OpBinder( Operators.Divide ); break; } } else { var next = GetNodeExpression( n ); expr = ( expr == null || oper == null ) ? next : Expression.Dynamic( oper, typeof( object ), expr, next ); } } return expr; } private Expression GetAtomExpression( ParseNode node ) { foreach ( var n in node.Nodes ) { if ( n.Token.Type != TokenType.BROPEN && n.Token.Type != TokenType.BRCLOSE ) return GetNodeExpression( n ); } return null; } private Expression GetVariableExpression( ParseNode node ) { return GetVariableExpression( node.Nodes.First().Token.Text ); } private Expression GetVariableExpression( string id ) { return GetVariableExpression( SymbolTable.StringToCaseInsensitiveId( id ) ); } private Expression GetVariableExpression( SymbolId symbol ) { return variables.ContainsKey( symbol ) ? variables[ symbol ] : variables[ symbol ] = Expression.Variable( typeof( object ), symbol.ToString() ); } private Expression GetRangeExpression( ParseNode node ) { var cellIds = ( from n in node.Nodes where n.Token.Type == TokenType.CELLID select new CellId( n.Token.Text ) ).ToArray(); return ( cellIds.Length > 1 ) ? Expression.Constant( new Range( cellIds ) ) : GetVariableExpression( cellIds.First().Symbol ); } private Expression GetString( ParseNode node ) { var text = node.Token.Text; return Expression.Constant( text.Substring( 1, text.Length - 2 ) ); } private Expression GetMethodExpression( ParseNode node ) { var parameters = ( from n in node.Nodes where n.Token.Type == TokenType.Params from p in n.Nodes where p.Token.Type != TokenType.COMMA from e in GetNodeExpressions( p ) select e ).ToArray(); var name = node.Nodes.First().Token.Text; if ( name.ToLower() == "iif" ) return GetConditionExpression( parameters ); return Expression.Dynamic( context.CreateCallBinder( name, true, new CallInfo( parameters.Length ) ), typeof( object ), new[] { Utils.CodeContext() }.Union( parameters ).ToArray() ); } private Expression GetConditionExpression( Expression[] args ) { return Expression.Condition( Utils.Convert( args[ 0 ], typeof( bool ) ), Utils.Convert( args[ 1 ], typeof( object ) ), Utils.Convert( args[ 2 ], typeof( object ) ) ); } private ExcelOperationBinder OpBinder( Operators oper ) { return opBinders.ContainsKey( oper ) ? opBinders[ oper ] : opBinders[ oper ] = new ExcelOperationBinder( context.Binder as ExcelBinder, Utils.CodeContext(), oper ); } }
All it does is recursively walk the parse tree and emit a DLR expression tree (Abstract Syntax Tree). It has some basic handling for binary operators (including Excel / VB’s ‘&’ concatenation operator). For some operations (such as binary operators and method calls) it creates DLR DynamicExpression nodes, which hold “just enough” information to define the operation to be performed.
Binders
Before the DLR can make any sense of your DynamicExpressions you need to give it some guidance about how to perform your operations. To do this you create custom “language binders”, whose job is to provide the DLR with rules defining how to perform operations on various data types. For this project I created the following classes:
- ExcelContext – Derives from LanguageContext and provides some compilation helper methods and overrides to create related binders.
- ExcelBinder – Defines standard operations for various data types and which conversions are valid.
- ExcelCallAction – Maps “method call” actions to static methods on a helper class (see ExcelHelpers).
- ExcelConversionBinder – Allows conversions to be overridden; currently just a placeholder.
- ExcelOperationBinder – Defines how binary operations (add, multiply, comparison operators etc) should behave, performing some basic type coercion.
Operations
You can control the implicit type coercions the DLR is allowed to perform by defining “extension types”. For each data type, you can create a static class with helper methods to perform the low-level operations. These methods are marked with the [SpecialName] and [ImplicitConversionMethod] attributes. You can see some examples here (for this project I wanted to treat all non-integer numbers as decimals).
Other interesting bits
I defined a few classes to represent formulas, engines, cells and worksheets. The idea is you create an engine and populate it with formulas, then use that engine to create one or more worksheets. Everything is created / compiled on demand and invalidated appropriately when dependencies change:
- Cell – Represents an instance of a Formula in a Worksheet.
- Worksheet – Contains all Cells and an Engine reference.
- Formula – Contains an ID, expression text, default value and cached expression tree.
- Engine – Contains all Formulas.
- Invoker – Caches dynamic delegates (see related post here).
Try it!
You can find the source for this project on CodePlex and you can try the Silverlight calculator test here
Don’t forget to look at the retirement planner example I posted too:
It’s built on an extended version of this project, able to load an Excel-like worksheet (including lookup tables and some scary long expressions). As you drag the “retirement age” slider it’s recalculating several hundred table values (pretty quickly thanks to the DLR’s call site caching) and kicking off a tween-like animation (sometimes it’ll slow down as the axis scales etc; that’s just an animation issue rather than a DLR / engine problem
).


14 responses so far ↓
Excel Formulas and the DLR – Part 1 « Chris Cavanagh’s Blog // March 23, 2009 at 1:34 am |
[...] February 19, 2009 · 3 Comments Update: Part 2 now available here! [...]
Jorge // May 2, 2009 at 11:37 pm |
Cool Tool. How managed accidental reference circular and recalculation order?
From SimpleCalcTest , I do:
B1=2
B2=B1+2
(dump() -> B1 say 2 , B2 say 4)
B1 = 4
(dump() -> B1 say 4 , B2 say 4 ????)
thanks.
Jorge
Chris Cavanagh // May 3, 2009 at 12:04 am |
Jorge – Interesting! It seems I may have broken something
I’ll let you know when it’s fixed (or I find an explanation).
Chris Cavanagh // May 3, 2009 at 12:14 am |
Jorge – Actually it’s deliberate, but clearly not ideal
. If you try the same thing in code you’ll find B2’s value updates as expected. The “inline assignment” currently implemented just takes the result and (in this case) puts it in B2 (effectively saying B2=2+2). I’ll post back here when I’ve improved it.
Chris Cavanagh // May 3, 2009 at 12:44 am |
Jorge – Fixed bug and updated source
Thanks for the feedback!
Jorge // May 4, 2009 at 6:46 am |
Hi Chris,
Great! Thanks!
In your roadmap considered
* Matrix/Array formulas (’spread’ results with helper function like CELLRESULT(cellreference,coordX,coordY )
Two TIPS for your:
(1) NATURAL ORDER RECALCULATION
Please considered, the use natural order recalculation, which is a method of ordering the computations such that each cell’s value is computed only after the values for all cells on which it depends have been computed. Natural order recalculation guarantees that cells are always computed correctly, regardless of the order in which the cell formulas were entered.
(2) PROJECT LIKE BUT INTERPRETED
This project, handle and resolve problem related of excel-like calculation engines, talk in special of natural order recalculation and other issues…
http://www.codeproject.com/KB/vb/FormulaEngine.aspx
Thanks,
Jorge
Chris Cavanagh // May 15, 2009 at 10:37 pm |
Jorge – Thanks for the tips! It’s already implementing “natural order” recalculation… or something very like it I hope
Basically when the expression is compiled for each cell, a list of dependencies is created (references to other cells). It keeps track of runtime dependencies too, but the sample project doesn’t demonstrate that. Each cell builds its expression delegate and refreshes its calculated value on demand, caching the result. If a dependency changes, the cell is marked dirty and will be recalculated next time its value is requested. As all values are calculated (or retrieved from cache) on demand, natural order is ensured.
Please let me know if I misunderstood your comments. Thanks for the feedback!
sami // July 3, 2009 at 10:05 am |
How we can use logical expression, example:
if(, value1, value2)??
Thanks
Chris Cavanagh // July 3, 2009 at 11:31 am |
sami – The ‘ExcelHelpers’ class contains implementations of some Excel global functions, but IF is a special case. The engine supports Excel’s IF function with a ConditionalExpression (so only the matching result is evaluated).
Herre Kuijpers // July 6, 2009 at 7:03 am |
Hi Chris,
It’s nice to see the TinyPG is finally put to some good use
Interestingly enough I was working on creating an excel formula parser myself initially when I decided I needed a compiler compiler. Hence the TinyPG project was borne
My idea was to have an excel parser that could not only parse just excel formulas, but parse actual excel sheets! This could then be used as a very easy way to configure a rule engine.
It is basically the same idea as what you did for the retirement planner.
Is it ok with you if I link from the TinyPG project to this project as a reference for use? I think the project shows nicely how TinyPG can be put to use… and that it actually works
Chris Cavanagh // July 6, 2009 at 8:00 am |
Herre – Thanks for the feedback; I’d welcome a link from TinyPG
It saved me a lot of time and works great!
sami // July 6, 2009 at 10:07 am |
Thank you Chris for the answer.
But is it possible to modify ‘ExcelHelpers’ class to have an IF function.
For example, if I go to this link (http://www.chriscavanagh.com/chris/SimpleCalcTest/Cjc.ExpressionEngine.SimpleCalcTestTestPage.aspx), I could type:
a=3 ENTER
IF(a=3,a+2,a+1) ENTER
and result “5″ will be displayed.
My objective is to provide user the possibility in the application to enter directly formulas with conditional expressions.
Thank you.
Chris Cavanagh // July 6, 2009 at 10:38 am |
sami – You’ve found a bug
The code is currently looking for “IIF” instead of “IF”. However, even if you (temporarily) use IIF, it’s throwing an argument exception.
I’ll get the project updated to the latest DLR version and post here when the bug is fixed
sami // July 7, 2009 at 7:45 am |
Great !!
Thank you