Excel Formulas and the DLR – Part 1

Update: Part 2 now available here!

For a recent project I needed an expression parser / compiler / evaluator that could handle Excel-like formulas, but be able to run cross-platform and achieve [near] native code performance. The Dynamic Language Runtime – which IronPython and IronRuby are built on – looked a pretty neat fit.

First, a little background. I already had something based on .NET 3.5’s statically-typed expression trees and a modified version of the Dynamic Expression API. Most of my changes to “Dynamic.cs” were related to handling differences between value types, and some Excel quirks. To keep things simple, it took a pretty heavy-handed approach to type conversion (largely run-time checks) and imposed some type-related restrictions on the expressions.

The DLR introduces “Expression Trees v2.0”, which are a superset of those from .NET 3.5 and Silverlight 2. Some interesting things it adds are:

DynamicExpression – Describes an operation to resolve at run time, but uses the DLR’s fast dynamic call site caching.
Control flow – In addition to simple expressions, many more language concepts can be represented.

IronPython and IronRuby are already a great fit for many use cases. For an excellent example, check out Pavan Podila’s ScriptConverter – it allows you to use Python expressions and statements in your XAML markup. The potential uses are huge.

However, for this project I wanted to support as much Excel syntax / oddities as possible and keep a minimal download footprint.

Getting Started – Parsing an expression

A little research revealed I’d probably need a “compiler compiler” or “parser generator”. I found an awesome utility called TinyPG (by Herre Kuijpers) on CodeProject. In Herre’s words:

This particular generator is an LL(1) recursive descent parser generator. This means that instead of generating a state machine out of a grammar like most compiler compilers, instead it will generate source code directly; basically generating a method for each non-terminal in the grammar.

So basically you can throw your grammar at it and get source code out. Nice :) Here’s some grammar I came up with:

//Tiny Parser Generator v1.2
//Copyright © Herre Kuijpers 2008-2010
<% @TinyPG Namespace="Cjc.ExpressionEngine.Excel.Compiler" %> 

PLUSMINUS    -> @"(\+|-|&)";
MULTDIV    -> @"\*|/";
BROPEN        -> @"\(";
BRCLOSE    -> @"\)";
SBROPEN    -> @"\{";
SBRCLOSE    -> @"\}";
COMMA        -> @",";
PERIOD        -> @"\.";
COLON        -> @"\:";
SHEETBEGIN    -> @"'(?=(''|[^'])*'!)";
SHEETEND    -> @"'!";
QUOTEBEGIN    -> @"@?\""(?=(\""\""|[^\""])*\"")";
QUOTEEND    -> @"\""";
QUOTED        -> @"(\""\""|[^\""])*";
SNGQUOTEBEGIN    -> @"'(?=[^']*'[^!]?)";
SNGQUOTEEND    -> @"'(?=[^!]?)";
SNGQUOTED    -> @"(''|[^'])*";
TRUE        -> @"[Tt][Rr][Uu][Ee]";
FALSE        -> @"[Ff][Aa][Ll][Ss][Ee]";
NULL        -> @"[Nn][Uu][Ll][Ll](\(\))?";
CELLID        -> @"\$?[a-zA-Z]+\$?[0-9]+";
CONSTRUCTOR    -> @"new\s[a-zA-Z_][a-zA-Z0-9_\.]*";
METHODNAME    -> @"[a-zA-Z_][a-zA-Z0-9_\.]*[a-zA-Z0-9_](?=\()";
IDENTIFIER    -> @"[a-zA-Z_][a-zA-Z0-9_\.]*[a-zA-Z0-9_]";
EVAL        -> @"\{Eval\s[a-zA-Z0-9_\-\.\s]*?\}";
DATA        -> @"\{Data\s[a-zA-Z0-9_\-\.\s]*?\}";
ASSIGNID    -> @"[a-zA-Z_][a-zA-Z0-9_\.]*(?=\s?=\s?)";
INTEGER    -> @"(\+|-)?[0-9]+";
NUMBER        -> @"(\+|-)?[0-9]*\.[0-9]+";
COMPARE    -> @"==|<=|>=|>|<|=|!=|<>";
EQUALS        -> @"=";
SIGN        -> @"\+|-";
INVALID    -> @"#VALUE!";
EOF        -> @"^$";

[Skip] WHITESPACE -> @"\s+";

Start        -> (Assignment|CompareExpr)? EOF;
Assignment    -> ASSIGNID EQUALS CompareExpr;
CompareExpr    -> AddExpr (COMPARE AddExpr)?;
AddExpr    -> MultExpr (PLUSMINUS MultExpr)*;
MultExpr    -> Sign (MULTDIV Sign)*;
Params        -> CompareExpr? (COMMA CompareExpr)*;
Constructor    -> CONSTRUCTOR (BROPEN Params? BRCLOSE | SBROPEN Params? SBRCLOSE);
Method        -> METHODNAME BROPEN Params? BRCLOSE;
String        -> QUOTEBEGIN QUOTED QUOTEEND | SNGQUOTEBEGIN SNGQUOTED SNGQUOTEEND;
Sheet        -> SHEETBEGIN SNGQUOTED SHEETEND;
Range        -> Sheet? ( CELLID (COLON CELLID)? );
Identifier    -> IDENTIFIER;
Eval        -> EVAL;
Data        -> DATA;
Sign        -> SIGN? (Method | Constructor | Range | Identifier | Eval | Data | Atom);
Atom        -> TRUE | FALSE | NULL | INTEGER | NUMBER | String | INVALID | BROPEN CompareExpr BRCLOSE;

Building this in TinyPG gives you three source files: Scanner.cs, Parser.cs and ParseTree.cs. Most of these have partial classes so it’s easy to extend them (TinyPG generates code from a template; you could easily change that if you want more control).

Building an Expression / Abstract Syntax Tree

To build a tree from an expression, simply create a Scanner, wrap it in a Parser and feed it your expression. We’ll extend the ParseTree [partial] class with code similar to this:

private Expression GetNodeExpression( ParseNode node )
{
    switch ( node.Token.Type )
    {
        case TokenType.Start:
            return GetStartExpression( node );

        case TokenType.Assignment:
            return GetAssignmentExpression( node );

        case TokenType.CompareExpr:
            return GetComparisonExpression( node );

        case TokenType.AddExpr:
            return GetAddExpression( node );

        case TokenType.MultExpr:
            return GetMultiplyExpression( node );

        case TokenType.Atom:
            return GetAtomExpression( node );

        case TokenType.Range:
            return GetRangeExpression( node );

        case TokenType.TRUE:
            return Expression.Constant( true );

        case TokenType.FALSE:
            return Expression.Constant( false );

        case TokenType.NULL:
            return Expression.Constant( null );

        case TokenType.STRING:
            return GetString( node );

        case TokenType.INTEGER:
            {
                int iVal;
                var value = node.Token.Text;

                return int.TryParse( value, out iVal )
                    ? Expression.Constant( iVal )
                    : Expression.Constant( long.Parse( value ) );
            }

        case TokenType.NUMBER:
            return Expression.Constant( decimal.Parse( node.Token.Text ) );

        case TokenType.Identifier:
            return GetVariableExpression( node );

        case TokenType.Method:
            return GetMethodExpression( node );
    }

    return Expression.Constant( node.Token.Text );
}

The helper methods (GetMultiplyExpression etc) are pretty simple; you can see the full source on CodePlex.

Not Using the DLR

So far we haven’t done anything specific to the DLR. You could go ahead and build a regular .NET 3.5 [statically-typed] expression tree from this and it’d work great. In fact, if you only needed to handle numeric values the implementation could be pretty trivial – just Expression.Convert all values to System.Decimal, wrap the tree in Expression.Lambda, call Compile and invoke the delegate. All too easy – but where’s the fun in that? 🙂

A Small Teaser

To get an idea how something like this might be used, you can check out a small teaser right here (or click the image below).

Continue to Part 2…

9 Comments

Dew Drop - February 20, 2009 | Alvin Ashcraft's Morning Dew February 20, 2009 at 8:56 am

[…] Custom Expressions and the DLR – Part 1 (Chris Cavanagh) […]

Parameterized DLR Invoker « Chris Cavanagh’s Blog March 6, 2009 at 4:39 pm

[…] examples (and hopefully answer the giant “why?” this Invoker invokes) with the next part of my Excel Formulas and the DLR post, but let me know if you want more info before […]

Excel Formulas and the DLR – Part 2 « Chris Cavanagh’s Blog March 23, 2009 at 1:17 am

[…] Part 1 of this post gave a quick overview of writing a custom “parser generator” for Excel formulas. Here we’ll take a look at a way to bind that to the DLR. You can try out a sample Silverlight 2.0 application here or by clicking the image below: […]

Walt Crosby September 28, 2010 at 4:20 pm

Chris — very interesting. I might have a potential customer for this capability. Did you ever finish the development, so that it would actually update and recalc?

LOLCode Parser « Chris Cavanagh's Blog March 19, 2012 at 10:08 pm

[…] LOLCode is being parsed into a LINQ expression tree (similar to this old post, but no DLR and much […]

gas June 5, 2013 at 7:45 am

I would love to have the last version of your excel grammar. It seems like it’s not the own generating the parser found in the source published. Do you still have it?

1. Chris Cavanagh June 5, 2013 at 8:06 am
  
  gas – I’ve updated the Excel grammar in the post… It’s the latest version I have (and is being actively used), but you may still need to make some tweaks. Hope this helps! 🙂
  
  1. gas June 5, 2013 at 8:22 am
    
    Uh great! Thanks. I was since then trying to reconstruct it from the Parser without success until yet. You saved my day (and more)
Armin June 21, 2014 at 5:32 pm

Hi Chris, is there also a .NET 4.0 Version of the source?

Chris Cavanagh's Blog

Happy coding