Tag Archives: Parent Child Hierarchy

PowerPivot Powered Budgets & Targets

Last week I used PowerPivot for the first time in a classic budgeting scenario. The existing, Excel based, system was straining at the seams due to recent changes in the organisation (merger, followed by lots of changes, resulting in the replacement of long-established reporting and budgeting hierarchies).

The budget process followed the, more or less, classic lines of using adjusted previous years figures to create high-level targets for coming years, agreeing those targets with various interests and then “driving” these high-level targets down various hierarchies to inform front-line staff what would be expected of them in the years to come.

PowerPivot did a marvellous job of providing the necessary figures required to set the initial budgets and to help inform those who must agree them. This involved lots of moving-annual-totals, percentage-increases and so on. A dream compared to the purely excel methods employed in the past.

The “driving down” logic was however kept outside PowerPivot, as the hierarchies involved were unbalanced in the main and the rules were complex but already existed and worked (and probably more importantly were understood and agreed by the various “interested parties”) in Excel. All that was required was the creation of the numerous “flattened cross-join tables” to support the existing logic. The various SQLite “hierarchy helpers” I detailed in my previous Handling Flat, Parent-Child and Nested Set Hierarchies post did most of the heavy lifting. As the process involved the “cross-join” of  hierarchies at various levels I used a great deal of “nested set” SQL to achieve the required result; simple enough, but did become tedious and made the resulting logic somewhat un-approachable for those with limited SQL. What I needed was another “helper function”. The CROSS_JOIN_HIER function was born!

FUNCTION: CROSS_JOIN_HIER

First argument is the hierarchy table name (see previous post for description of this table). The 2nd is the output table name. This table must already exist and have at least the same number of columns as the number of subsequent arguments.

The remaining arguments specify the source of the tables (single column lists) to cross-join in order to populate the output table. The arguments can be of three types.

  • The name of an existing table. This table must consist of single column named “Name”. This allows for complex (or perhaps, simple lists in a different sort order) to be generated outside the function call.
  • A request for a list of nodes from some level within a hierarchy. Such requests can consist of one of the following:
    • Integer between 0 and 99. If the column name associated with this argument (e.g. arg 1 implies column 1 of output table, arg 2 column 2 etc…) is the same name as a valid hierarchy, fetch all nodes at that level (e.g. if 0, then fetch top level nodes, if 1, 2nd level etc..). Allows for up to 99 levels (i.e max value  98). Level 99 is special, it’ll return all “leaf nodes” i.e. those nodes with no children.
    • aNode,(+ or -)integerValue e.g. “Beer,+2″, in this case fetch all nodes 2 levels below Beer in the hierarchy corresponding with column name. The value before the comma must be a valid node name (assumes names are unique within hierarchies). A value after the comma of “-1″ will fetch parent node, a value of “-2″ will fetch grandparent. In both “+” and “-” if the level to navigate down/up is greater than the levels available, the last available level is returned.
    • In both cases above, the name of the column can be over-ridden by prepending the hierarchy name such that: “Product,Beer,+1″ will drive down 1 level from “Beer” in the Product hierarchy. And, “Product,99″ will return a leaf nodes of the Product hierarchy.
  • A comma separated list to create a “manual” level. For example, “Budget,Actual”, “2010,2011,2012″ or “Beer”

The resulting single column tables are then cross-joined with the resulting multi-column table (a column for each “source” argument) which is used to populate the output table.

This is a “stored procedure” like function, so should be called using “Select function(arg1,arg2 …);” syntax.

Example: click for larger version.

Download microETL from http://www.gobansaor.com/microetl and unpack to a folder, locate alberto_hier workbook (2007/2010 format) and go to the makeBudgetTable sheet.

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

About these ads

Handling Flat, Parent-Child and Nested Set Hierarchies

Alberto Ferrari describes a method of handling un-balanced hierarchies in PowerPivot. As usual Alberto comes up with an elegant solution to something that looks near impossible. For more of the same, check out  his book, co-authored with Marco Russo, Microsoft PowerPivot for Excel 2010: Give Your Data Meaning. It’s the best book out there for those from a data modeling background trying to make sense of this new form of ROLAP or indeed for those from a spreadsheet background who wish to gain a better insight into PowerPivot-oriented data modelling.

PowerPivot has problems with un-balanced hierarchies (aka variable-depth parent-child hierarchies) as it’s essentially a relational engine. Relational databases have traditionally sucked at handling such hierarchies which are usually implemented using an Adjacency List Model.

When I moved from network CODASYL databases to relational databases all those years ago it was my biggest complaint about the ‘new order’ of the RDBMs. I guess it’s my network database heritage that makes working with MOLAP engines such as Palo seem so natural. Such databases are kind to hierarchies and to the users of hierarchies.

Most of the major commercial databases have now extended their SQL offering to handle hierarchy navigation, you can see the SQL Server’s way in Alberto’s post. But SQLite offers no help, so in order to make handling for what are in fact very common requirements for reporting and analysis datasets, I’ve extended the SQLite SQL available within microETL with 4  useful helper functions.

All 4 work with tables of the following structure:

Create table whatEver (HierarchyName text, Parent text,Child text,Name text,Level int,setID int,setLeft in,setRight int);

Where:

  • HierarchyName allows for multiple hierarchies to be stored in the one table.
  • Parent is the parent node for “this Child”. Can be text or a numeric surrogate key. Top level nodes will be set to null (or = “”).
  • Child is the node identifier.
  • Name is the textual identifier. If not populated, uses Child value.
  • Level is the depth from top of the current path. Levels = 0 are top level nodes. Not strictly required, but so useful it’s worth populating.
  • setID like Level is not required to be populated but can be useful in Nested Set queries.
  • setLeft is the left-hand side of the Nested Set range associated with this node.
  • setRight is the right-hand side of the Nested Set range.

What’s all this Nested Set stuff? As I pointed out above, SQL engines tend not to agree on a method of hierarchy navigation, if indeed they offer any method. Nested Sets is the classic data modelling method to get around this problem (if you look at hierarchy tables in cross-RDBMs packaged applications you’ll often find a variation on this method).

See this post Managing Hierarchical Data in MySQL for a good description of the technique.

There’s no need to populate these ‘set columns’ unless required they’re there just in case.

Function: make_pc_hier

This function is for handling the opposite problem to that which Alberto faced. It takes an existing flattened hierarchy and turns it into a “proper” parent-child one (hence the pc element of the name). Why would you do that? Well, MOLAP tools such as Palo not only handle unbalanced hierarchies, they usually store hierarchies using an adjacency list model. Also, if you wish to use Nested Set SQL, you’ll first need the model in parent-child format.

The function’s 1st argument is the name of the hierarchy table; if it exists it must be in the format specified above, otherwise a table of that name with the required format will be created by the function.

The 2nd argument is the hierarchy’s name.

The remaining arguments are the source columns for the hierarchy in the sequence of level 0 to level n.

The function is in fact a aggregation function (like SUM()) so can be used to group the required columns from a table where they are repeated (such as a fact table on which the hierarchy has been denormalised).

Example (click on it to see larger version):

Function: make_pc_hier_from_tree

This is similar to make_pc_hier() except the source table must be in a ‘tree format’. It is also an aggregate function but the use of group by would, in this case, not make sense. The use case for this is the easy creation of hierarchy dimensions from a tree structure for use in the likes of Palo.

Example:

Function: flatten_pc_hier

This takes four mandatory and an optional fifth argument. First two are the usual hierarchy table name and hierarchy name. The 3rd is the output table to receive the flattened table. The first column of this output table will be populated with the base element Child IDs (i.e. the deepest level in the nodes’ paths). The remaining columns must match the number of the next argument. This 4th argument specifies the maximum depth of the flattened hierarchy, so if the hierarchy is to be, for example, Sector,Brand,Product, then this would be 3.

The optional 5th argument if set to Y will not fill unused columns in an unbalanced hierarchy, otherwise the final nodes are filled out to the right with the last non-blank Name. This is a “stored procedure” like function, so should be called using “Select function(arg1,arg2 …);” syntax.

Example:

Function: nested_set_hier

This takes the two usual arguments, hierarchy table and hierarchy name. Its function is to populate the “nested set” fields of the supplied table to make a valid Nested Set. Again, it’s a “stored procedure” function so should be called using the “Select function(arg1,arg2 …);” syntax.

Example

Download Example:

Download  microETL from http://www.gobansaor.com/microetl and locate the hier97-2003.xls file.

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

Update:

There’s now an Alberto_hier workbook (in 2007/2010 format) with a worked example of building Alberto’s modified dimension. This workbook also includes a PowerPivot cube demonstrating the required measure (plus an equivalent Nested Set SQL Query).

For more on handling nested sets in PowerPivot see http://blog.gobansaor.com/2011/03/10/powerpivot-nested-set-measures/