Author Archives: gobansaor

Automating PowerPivot Refresh operation from VBA

Although the latest CTP3 release of PowerPivot has addressed many of the short-comings of the original release (allowing drill-thru for example), there’s still one major missing: a published API accessible from VBA (or even .NET).

Like Marco Russo, I would settle for the ability to automate the PowerPivot refresh function. What’s doubly annoying about this is that there’s obviously an undocumented method of doing this as Predixion are doing it and so are PowerPlanner.

Encouraged by the fact that it’s obviously possible, armed with hints from the comments on Macro’s post, and from this Chris Webb post and a late night twitter conversation with Mark Stacey, I decided to try.

I used VBA, but this could also be done within a .NET add-in.

The first thing to do is get an ADODB connection to the local embedded AS engine. I tried using the connection string details used by the Excel PowerPivot model connection (see under Data->Connections) but couldn’t get past an authentication error. There may be a way around this, but I decided to short circuit the problem by using the Excel Object Model to directly fetch the already established connection’s ADODB handle via ..

ActiveWorkbook.Connections(“PowerPivot Data”).OLEDBConnection.ADOConnection

One problem with this method is that when a workbook is 1st opened the default PowerPivot Data connection will not yet be established, needs something like a PivotTable refresh or a Cube formula call to fire it up. But again this can be automated.

Next step is to issue an XMLA command like the one in this post (no need for the CubeID property, but if want, you can specify it as “Model” or  ”Sandbox” depending on the version of PowerPivot; “Model” seems to be the new name of the cube in the latest version).

But how to get DatabaseID? I used a “select distinct object_parent_path from $system.discover_object_activity” DMVcall to get a list of database objects and parsed out the DatabaseID from this object…

GHOME1300\LocalCube.Databases.CBBB19B2CD9B4017A8A0

… where GHOME1300 is my machine name and the DatabaseID is CBBB19B2CD9B4017A8A0

The DatabaseID can also be seen in the un-zipped Excel file but it appears to change when the workbook is loaded so will need to be refreshed each time.

The important bit of the XMLA command is this ..

<Object><DatabaseID>CBBB19B2CD9B4017A8A0</DatabaseID></Object><Type>ProcessFull</Type>

Having issued the command, the PowerPivot model will refresh all its external connections and rebuild the Model (aka Sandbox) cube. Linked Excel tables however, appear not to be affected by this, which is a pity.

The workbooks pivot tables still require to be refreshed separately, but this too can be automated via an ActiveWorkbook.Connections(“PowerPivot Data”).Refresh or a PivotTable refresh. One thing to be aware of, is that the PowerPivot Refresh happens asynchronously (i.e. command will potentially return before the refresh has finished) therefore a delay may be required before issuing a connection or pivot table refresh. This asynchronous behaviour may be the reason why an API has not yet been provided (similar problems arise when automating CUBE formulas from VBA). There are various last_updated datetime attributes available on DMV tables, perhaps these could be used to signal when a refresh has completed.

As I said, this issues a complete refresh, I’ve not been able to restrict to a single table refresh (using the DataSourceID, the XMLA was accepted but it generated a DLL error). Those with more knowledge of XLMA processing commands might have more luck.

UPDATE: Both the asynchronous nature of the Process call (which turns out not to be asynchronous, the call actually blocks, must have been imagining things during my late night experimenting ;) ) and the lack of a single table refresh are no longer problems, see comments on next post. Will produce an updated example in due course. Here’s an updated version of the code.

So not quite a full blown PowerPivot API, but I’ve always said I’d be happy with a simple Refresh ability from VBA, looks like I might have it.

This is not yet fully proven, so proceed with caution. Hopefully over the next week or so I’ll attempt to put together a version of this for inclusion in HAMMER and/or microETL.  UPDATE: Here’s a HAMMER version.

UPDATE:

To see an example in code see http://blog.gobansaor.com/2011/09/01/automating-powerpivot-refresh-operation-from-vba-the-code/

Hammer Inside

As I explained in a previous post my main reason for developing HAMMER was to provide me with the same productivity boost in .NET as that provided by MicroETL when working using VBA.

Although the resulting tool is proving to be very useful in its standalone formats (as a UDF formula, VBA tool via Application.Run and as a command-line EXE) its use as  a powerful utility library within VB.NET or C# add-ins is still what justifies the cost of its development.

Over the last few weeks I’ve used HAMMER in the development of several ETL-focused add-ins with great success. That success is not just to be measured in speed and accuracy of development but in the extra features I can now deliver for little or no cost (caching of API or database calls for example). I term them my “HAMMER Inside” add-ins.

The final destination is to allow me to offer fixed-price (or at least transparent price) development of  ETL-focused add-ins. Not just the classic database or flat file backends, but WEB APIs and calculation engines (such as pricing /discounting tables) etc.).

Such add-ins would be “opinionated software”, in that they’d have a limited look’n'feel, with a concentration on the hewing and hauling of data, not on fine-detail Excel automation or advanced/pretty UI features.

In the meantime, HAMMER has been updated with a series of new COMMANDs:

  • JSON – Load a JSON document into a table.
  • TOJSON – Convert a table to a JSON document. If a single row table, will generate a JSON “dictionary” object with the column headers as the Keys, and the column data as the Values. If a multi-row table will generate a JSON List of Lists.
  • TOJSONLOL  - force a JSON List of Lists (when a single row table is really a table not a dictionary).
  • TOJSONLOO – force tables to be represented as a List of (Dictionary) Objects rather than a List of Lists.
  • TEXTTOFILE – output an argument as an UTF8 encoded text files e.g. HAMMER(myTable,”TOJSON”,”C:\files\myTable.json”,”TEXTTOFILE”)
  • TABLETOSCALAR – converts a table to a scalar value by picking the first column of the first row.
  • SQLRAW – Like SQL, but doesn’t perform any token substitutions. The range of tokens that can be substituted by SQL has also been increased. The existing “:n” tokens and “from tablen”, where n is the argument number, continue to be supported; but new formats of :ARGn ,:<ARGn ,:TABLEn, and :<TABLEn have been added.  As before, n will be the argument number for :ARG and :TABLE but in the case of :<ARG and :<TABLE, n will represent the previous nth argument or table (think “<’ pointing backwards).
  • URLGET and URLPOST – fetch or post data to a URL.
  • There’s also a series of “_” functions, mainly intended for use within VBA or .NET but could be useful as formula calls too, I guess. These are: _URLENCODE (encode a URL), _HAMCSHA1 (calculate a HMACSHA1 hash),_MD5 (MD5 hash),_MD5FILE (same but for a file),_GETCONFIG (reads addin’s .config file’s HAMMER JSON section) and _GETCONFIGNOERROR (same but doesn’t complain if no .CONFIG file or no JSON section found).

The Civilian Datasmith – a live specimen.

A live specimen - not a datasmith but a related data consuming species: The PacMan

If you’ve been following my ramblings posts for any length of time you’ll have come across the term “civilian datasmith” quite a lot. For example, in my last two posts on DAX I’ve been looking at whether they’re likely to use PowerPivot’s DAX.

Professional datasmiths, like myself, will have little choice but to learn and adopt DAX if we intend to continue to service either the high-end Excel management information market or the general MS BI tools market, but will our civilian peers follow suit? I think many will, particularly those who are already comfortable with high-end “excel formula programming” or those who already have extensive SQL and/or data modelling experience. But who are these non-IT datasmiths?

Well, they’re very unlikely to have data anywhere in their job titles and often share their job titles with dataphobic colleagues. But you’ll know them, they’re usually the “go-to-guys/gals” for all sort of data analysis/integration/discovery tasks and they tend to favour making decisions based on hard facts rather than gut instinct. They’ve learned to make use of whatever data and whatever tools are at hand, while others complain about tools/systems not doing what they want or IT not reading their minds, datasmiths just get on with making things happen.

An example of a real live civilian datasmith in action is Richard Baxter of SEOGadget. I know Richard from Excel development work he’s commissioned from me, including this Google Adwords Excel Add-in that appears to have been all the rage at this year’s MozCon conference.

I was at first somewhat surprised that an SEO professional was  commissioning Excel add-ins, as the typical SEO tools tend to be web-based (usually by default I think,rather than by rational choice), but I soon realised I’d found yet another datasmith enabled profession (to add to the more typical accountants, engineers, sales & marketing managers etc.).

Richard’s a classic datasmith.  To see him in action  check out the slides of his well received MozCon presentation. Note how he sources data from multiple sources but uses Excel to merge and shape the data (one other tool I’d suggest he should check out would be my HAMMER, ideal when multiple tables need to be queried,merged, cross-joined etc.).

So if you’re an SEO pro (or indeed any data dependent professional) struggling with too much data, or worrying that decisions are been made based on too little data, check out the following:

  1. Microsoft Excel for SEO http://www.distilled.net/excel-for-seo/ not just for SEO types, an excellent run down on the Excel skills any would-be datasmith must have.
  2. See Richard’s Adwords Add-in for Excel, a good example of the sort of thing Excel can do through the power of custom-coded add-ins. (UPDATE: Here’s a video of the tool in action)
  3. Then check out HAMMER, my swiss-army-knife-like  datasmithing tool for those whose working lives revolve around  data tables (primarily an Excel add-in for use as an array formula, but can also be called from VBA and via the command-line (aka DOS box)) .



DAX the new noSQL?

In my previous post Dax the new SQL I argued that perhaps DAX would replace SQL as the database query language, in particular for those I term “civilian datasmiths”. But perhaps I should have hailed DAX as a new contender in the world of noSQL.

When I started in this business in the late 1970s, the database world I encountered was a noSQL one. Database structures were in the main laid out by DBAs using either hierarchical (e.g. MUMPS) or network topologies (the various CODASYL offerings). Programmers would then navigate these predetermined access paths using a cacophony of APIs and query languages. It worked quite well, and if fact many developers found the move to a SQL world difficult, as they regarded the pure relational model to be sub-optimal when developing applications (and many still do!).

But there were two main problems with such databases:

  • Although the pre-defined access paths worked well for development, they were often next to useless for downstream reporting, which usually meant that data had to be dumped to files for “reshaping” and de-normalisation, with resulting flattened data often being consumed by early end-user reporting tools such as Datatrieve. (Sound familiar?)
  • Likewise, although the original data design tended to be ideal for the original application design, downstream changes in requirements were not easy to incorporate. DBAs were busy, expensive and usually very grumpy, which meant that many enhancements, as with their reporting cousins, had to make do with using external files.

The SQL era was a huge improvement. Nevertheless, many developers found the impedance mismatch between this relational SQL world and their own  ”networked”  object-oriented world to be a continuing problem, leading to the constant search for the perfect ORM and eventually the re-emergence of noSQL database engines.

PowerPivot’s VertiPaq engine is one of these new noSQL databases. It resembles those pre-SQL databases in that its access and relationships paths are largely determined not by its DAX query language but by whoever builds the database in the 1st place. The big difference is of course, PowerPivot’s primary (and only) use-case is reporting while most  databases have in the past targeted system datastorage as their primary purpose in life, with reporting as a by-the-way. The other difference is that the creators of PowerPivot “databases” are less likely to be DBAs and more likely to be end-users or at least those in IT, like data/business analysts, who are close to the business i.e. more common on the ground, less expensive and hopefully less grumpy. Indeed many civilian DAX practitioners will have the luxury of being able to populate their reporting models with suitable datasets that have already been constructed by existing data warehousing systems.

It’s this separation of filtering- aggregation-transformation from topology that I think will help endear DAX to a less technical audience that has been the case with SQL. MDX also had this concept, but the underlying “more natural” tabular nature (business users live by lists!) that DAX builds on, will I think,be easier for many to engage with.

I used “I think” at lot in the previous paragraph as it’s very difficult for somebody like myself (a professional programmer and veteran of many a database mind-share war) to know how non-technical folks will actually view something like DAX. Although my career has been one largely based on business-user interaction, I’m still a techie at heart.

So what do you civilian datasmiths think? DAX, yep I’ll learn it, or nah, that’s what those geeks down in the basement are for!

VBA Multithreading, .NET integration via HAMMER

In a previous post I urged all potential datasmiths to learn a scripting language (I suggested Python). But what of VBA, the granddaddy of the scripting world? Well yes, if you have a need to automate Excel then you must learn VBA. VBA is to Excel as JavaScript is to the modern browser, its tight integration with Excel’s Object Model and its superb debugging facilities makes it the optimal choice for automating Excel.

VBA is now a one trick pony (Office automation), but Python opens all sorts of API scripting and product automation doors. My use of IronPython as HAMMER’s scripting language is one such door, a door to the otherwise mainly-closed-to-VBA world of multi-threading and easy .NET library integration.

The HAMMER function can be called from VBA using the Application.Run command like so…

retArray = Application.Run(“DATASMITH.HAMMER”,inArray,”Select dept,count(*) from table1″,”SQL”)

… the 1st parameter is the function name, parameter two is the 1st argument to the function, parameter three is the 2nd and so on.

By utilising HAMMER’s IronPython functionality (requires NET 4.0 runtime), VBA routines can access the full power of the .NET platform with data passed back and forth using tables. Admittedly for many complex .NET utilities utilising VB.NET or C# may be a better approach (due to better IDE and debugging features of such languages) but for standard library calls, IronPython is an ideal option. It also has the benefit that the “code” can be stored within the workbook.

HAMMER also offers the power of multi-threading to VBA via its internal threading functionality (requires Excel >= 2007 and Net4.0 runtime). The multi-threading example in the hammerThreads.xlsx workbook could easily be wrapped in VBA code, perhaps to allow it to be controlled by a user-form.

I’ve added two new commands specially designed for use within VBA scripted HAMMER scenarios:

  • APPDB – Opens an application-wide shared in-memory database. This will allow tables (and Python objects) created in one function call to be accessible in an other function call (assuming both issue the APPDB command as their 1st command). This replicates the functionality of microETL which by default exposes a application-wide SQLite in-memory instance and a common Python workspace.
  • CLOSEAPPDB – This will close and clear the shared c#-SQLite and IronPython instances. Equivalent of microETL’s xLiteReset() function.

Be careful not to…

  • use the APPDB instance from in-cell UDF calls to HAMMER that are likely to be scheduled as multi-threaded (the helper functions HammerToSheet & HammerToFit are safe, as they are always single-threaded),
  • or use within “internal threaded” HAMMER commands

…as although c#-SQLite is thread safe, the implementation logic is not.

Here’s a list of the HAMMER commands implemented so far …

You can download the latest version of HAMMER here …