Excel documents are, since Office 2007, persisted in an Open XML format (packaged in a Zip file). Unlike the previous proprietary binary xls format, this format is both documented and supported by an official SDK. So no hacking required to examine and/or modify a document’s internals. This gives those with a modicum of programming ability the means to automate Excel outside of using Excel’s COM Object Model (aka VBA).
What are the advantages over VBA? The main ones are:
- Speed, much much faster for mass generation/modification of large numbers of documents.
- No need for Excel to be installed. So, can be utilised in a typical server environment using .NET based code – such as HAMMER.EXE (or indeed a non-Windows box, using an appropriate SDK - or simply a ZIP packager and an XML reader/writer).
- Access to features not (yet, or indeed ever likely to be,) ported to the Object Model.
It’s the last two reasons that can prove useful for PowerPivot automation, prompting me to add a number of commands to HAMMER based on the Open XML library. (These are only available in the NET4.0 versions (aka PYTHON versions) of the EXE and add-in, not really a problem as PowerPivot V2 now requires NET4.0).
The new commands are:
Takes one argument, the name of a workbook, will return the workbook’s model in raw xml format. Two sister commands are PPMODELGETPRETTY, returns a beautified version of the XML and PPMODELGETCDATA, returns the model enclosed and encoded in a CDATA format (the format it’s held within its holding CustomXML part).
Takes two arguments, the replacement model (in raw XML format) followed by the workbook whose model is to be replaced. Likely used in conjunction with the command above, whereby a model is extracted from this or another workbook, modified in some way and used as a replacement model.
Like above, but skips the GET stage, takes three arguments, the text to search for within a model, the text to replace the searched text with and the workbook name. Useful for simple swap in/out tasks such as replacing source file-based datasets’ folder addresses. (You can even replace absolute addresses with relative addresses, a big missing in the UI; refreshes will work against such relative addresses
but corruption of the workbook can happen if such a model is subsequently manually changed and saved, so not really that useful turns out relative works okay, the error I saw - OLE DB or ODBC error: Unspecified error. A connection could not be made to the data source… – looks like an ACE Provider error which occasionally appears depending on how Excel is opened and the relative location of source XLS/CSV files to the workbook! looks like it only happens under XP/SP3).
Takes two arguments, the source workbook followed by the recipient workbook. Copies a model from one workbook to another.
All three ( REPLACE.MODIFY & COPY) commands will destroy the modified workbook’s binary PowerPivot database file (by replacing it with an empty file). This must happen as we’re not really modifying the PowerPivot model but a copy of that model that’s held in the workbook for use in a database-is-corrupted recovery scenario (and in loading Excel generated models into SQL Server 2012). So, we must corrupt the database, which will then trigger the repair dialog, after which, a full refresh is required.
A couple of things to be aware of:
- If you’ve more than one PowerPivot connection in a workbook (as happens when a drill-down is initiated) you’ll get multiple requests of the repair dialog; just work through them, the final dialog sequence will do the work. If that doesn’t work, delete the drill-thru related connection (under Data->Connections).
- PowerPivot’s recovery logic has a few bugs (not the Tabular model itself, which is rock-solid as it has been hardened by its use in SQL Server 2012 loading). Certain display settings sometimes do not recover properly making subsequent manual modification difficult if not impossible. Luckily the vast majority of workbooks I tried this against seem to work without issues.
- If you do destroy a workbook, don’t panic, you’ve made a backup before hand, didn’t you? Well, to be sure to be sure, any command that modifies a document will make a backup for you (but only the 1st time you attempt to modify a particular workbook).
Two more commands that directly target the internal PowerPivot tabular database:
Takes a single argument, the workbook name. Will replace the xVelocity database file (remember a XLSX is simply a zip file of a bunch of folders, xml and binary files) with an empty file. This is called automatically by the modify-a-model type commands above but can also be useful stand-alone. For example, testing if a workbook will successfully recover using its existing recovery meta-data or removing the (often sizeable) database from workbooks that are intended to be “static”, either because the end-users are still on Excel 2007 or the PowerPivot add-in is not likely to be installed.
Like PPMODELCOPY but copies a PowerPivot database from one workbook to another. Very useful for situations where multiple workbooks share the same model and the refresh of that model is very time-consuming.