Category Archives: Python

HAMMER Alongside, as a COM Server

Although it has always been possible to call HAMMER from within VBA via the Application.Run method, this is a somewhat clunky way of doing so and it can also be very inefficient, particularly for tight loops. But now, with this release (V 1.2.0 (Beta) ) core functionality is exposed as a COM Server, which means easier and more efficient interfacing between VBA and the .NET multi-threaded enabled world of the datasmith’s HAMMER.

I call this “HAMMER Alongside”, to differentiate it from HAMMER Inside where I use HAMMER internals to craft stand alone XLLs. With this COM Server method, the standard HAMMER add-in (installed or just-in-time registered) is needed alongside either a VBA add-in or a VBA enabled workbook. The extra “moving parts” are easily justified by the extra flexibility that this method allows (particularly to those with a reasonable grasp of VBA, but perhaps lacking any familiarity with the .NET world).

By learning a small amount of IronPython (here’s a good starting place) it’s relatively easy to hook up VBA to any .NET library through HAMMER, without having to invest time and money in learning heavy duty development environments such as Visual Studio. Using “Internal Threads, it’s also possible to take advantage of .NETs multi-threading capability from VBA. (Note: both these options require .NET4).

So with VBA becoming a first-class language with regards to HAMMER, why offer Python as an alternative scripting choice? Well, one reason is to offer a means to access .NET’s power as per the last paragraph. But, the primary reason, is related to the ability of HAMMER transformations to be “detached” from Excel and run via HAMMER’s command line tools. The .NET 4 version of the command line tool supports Python, so it is possible to initially build out a micro-ETL transformation within Excel (using Python to perform the business logic alongside SQL) and then transfer that logic to the command-line tools with a minimum of modification.

This could be useful , for example, when a PowerPivot model moves to the server, its related HAMMER-powered micro-ETL processing could do likewise (most enterprise ETL tools support call-outs to command-line processes).

The COM server can be accessed from VBA only via late-binding like so:

Set comServ = CreateObject(“hammerCOMServerV1″)

The server’s methods are:

  • hammer(…) – works the same way as the UDF version.
  • hammerVersion() – returns HAMMER version, again the same as UDF version.
  • hammerVersionOK(version) – e.g.  isOK= comServ.hammerVersionOK(120) will return TRUE if the current version is >= V1.2.0
  • arrayResize(anArray) enables the creation of “toFit” UDFs.
  • arrayToSheet(anArray) likewise for “toSheet” UDFs.
  • arrayToRange(anArray,pasteToWhereString) enables “toRange” UDFs.
  • hammer_ppRefresh_inline(optional table,optional timeout) – refreshes a PowerPivot model, again the same as its UDF equivalent.

Download the latest version of HAMMER from here …

Exposing an Excel PowerPivot model as a Web Service

In my last post I demonstrated a simple way to interrogate an Excel based PowerPivot model using either MDX,DMV or the new DAX Query Table commands. This is a great way to dig deep into a PowerPivot model, for example, this new Denali DMV discover_calc_dependency can be use to  describe the dependencies between a model’s DAX measures.

Being able to directly access the PowerPivot model opens up all sorts of interesting options, one such is a web service.

In the past I’ve written about exposing Excel as a simple DIY web service, see here for a JavaScript example and here for a Python version. Both were capable of serving PowerPivot data but they both had to trigger events in Excel to do so.

With this newly discovered ability to query the model directly, I figured a third attempt at a PowerPivot Web Service was overdue.  This time I’m using the new kid on the block, HAMMER, and in particular using its in-built IronPython interpreter.

I could have built a simple server using VB.NET or C# and added it as a HAMMER command, but the nature of a web service makes it hard to frame as a parametrised command. It was for this type of requirement that I added the ability to script using Python, i.e. situations where a generic command would always come up  short, for example:

  • Should the service allow only local connections?
  • Should it use authentication, and if so, what type?
  • Should it allow free-format commands or offer only packaged commands?
  • Should it use HTTPListener (which is in-built and very powerful, but requires Admin privileges) or use a more basic socket based utility that can run without Admin privileges.
  • Should it block or handle asynchronous requests. (For services that access the Excel Object Model, it has to be blocking, but perhaps if the service simply accessed the AS engine, it night support multi-threaded queries? Must try it sometime.)

For this example I decided to use HTTPListener (in blocking mode and without authentication) and to expose services that accept free-format MDX,DMV or (if Denali) DAX Table Queries.

See the PPWebService workbook example in the latest (V0.5.0) version of HAMMER. To activate, make sure latest version of HAMMER is installed (or use activate button on Hammer sheet) and click the Expose PowerPivot Model as Web Service button (again on Hammer sheet). This will start a server on port 8070.

The service exposes 6 endpoints:
  • /exit – this will shutdown the service.
  • /html – this will return data as a HTML table (ideal for importing into Excel using a Web Query).
  • /xml – this returns a ‘canonical’ XML table (ideal for importing to Excel as an XML Map).
  • /json – returns a JSON table (as a list of lists).
  • /odata – returns a ATOM-base ODATA feed suitable for direct import into PowerPivot using its Data Feed import facility.
  • /csv – returns in CSV format.

Each service (except /exit) expects to be followed by a MDX, DMV or DAX command. e.g. /html/evaluate(InvoiceHeaders). In the case of /csv an optional file name may be specified e.g. /csv/invhead.csv/evaluate(InvoiceHeaders).

It’s also possible to issue a POST request with the command in the message body.

As with all URL requests, the commands may need to be URL encoded (use the HAMMER command “_URLENCODE” if need be).

For the latest versions and articles on HAMMER follow the HAMMER tag on my blog …



VBA Multithreading, .NET integration via HAMMER

In a previous post I urged all potential datasmiths to learn a scripting language (I suggested Python). But what of VBA, the granddaddy of the scripting world? Well yes, if you have a need to automate Excel then you must learn VBA. VBA is to Excel as JavaScript is to the modern browser, its tight integration with Excel’s Object Model and its superb debugging facilities makes it the optimal choice for automating Excel.

VBA is now a one trick pony (Office automation), but Python opens all sorts of API scripting and product automation doors. My use of IronPython as HAMMER’s scripting language is one such door, a door to the otherwise mainly-closed-to-VBA world of multi-threading and easy .NET library integration.

The HAMMER function can be called from VBA using the Application.Run command like so…

retArray = Application.Run(“DATASMITH.HAMMER”,inArray,”Select dept,count(*) from table1″,”SQL”)

… the 1st parameter is the function name, parameter two is the 1st argument to the function, parameter three is the 2nd and so on.

By utilising HAMMER’s IronPython functionality (requires NET 4.0 runtime), VBA routines can access the full power of the .NET platform with data passed back and forth using tables. Admittedly for many complex .NET utilities utilising VB.NET or C# may be a better approach (due to better IDE and debugging features of such languages) but for standard library calls, IronPython is an ideal option. It also has the benefit that the “code” can be stored within the workbook.

HAMMER also offers the power of multi-threading to VBA via its internal threading functionality (requires Excel >= 2007 and Net4.0 runtime). The multi-threading example in the hammerThreads.xlsx workbook could easily be wrapped in VBA code, perhaps to allow it to be controlled by a user-form.

I’ve added two new commands specially designed for use within VBA scripted HAMMER scenarios:

  • APPDB – Opens an application-wide shared in-memory database. This will allow tables (and Python objects) created in one function call to be accessible in an other function call (assuming both issue the APPDB command as their 1st command). This replicates the functionality of microETL which by default exposes a application-wide SQLite in-memory instance and a common Python workspace.
  • CLOSEAPPDB – This will close and clear the shared c#-SQLite and IronPython instances. Equivalent of microETL’s xLiteReset() function.

Be careful not to…

  • use the APPDB instance from in-cell UDF calls to HAMMER that are likely to be scheduled as multi-threaded (the helper functions HammerToSheet & HammerToFit are safe, as they are always single-threaded),
  • or use within “internal threaded” HAMMER commands

…as although c#-SQLite is thread safe, the implementation logic is not.

Here’s a list of the HAMMER commands implemented so far …

You can download the latest version of HAMMER here …

Those with a datasmith’s hammer, see every problem as a table.

I picked the name HAMMER for my new micro ETL tool to capture the spirit of how it’s intended to be used. It’s an everyday tool, sufficient for many of the tasks a datasmith faces and when not, will help (or at least not get in the way of) with the marshalling of data that needs to be processed by more powerful tools.

The saying “If you only have a hammer, you tend to see every problem as a nail“ could be rephrased as ”If you only have a datasmith’s HAMMER, you tend to see every problem as a table“!. But that’s OK, as nearly every commercial IT problem can be expressed as a set of tables, and in any case, HAMMER is not intended to be a datasmith’s only tool. Its close integration with Excel, recognises the prime importance of spreadsheets to most data wranglers and its use of CSV & the SQLite database format as its persistence & transport mechanism, (à la “SQLite as the MP3 of data“) recognises that datasets will often need to be shared with other tools.

Finding the perfect tool for the job it an IT obsession that most of our customer’s care little for; it’s only when our choice of tool affects their bottom-line (i.e excessive cost or wasted time) that end users take much notice of the backroom technology. The most important skill for a datasmith, is the ability to understand the structures and forms that data can take and to be able to derive and build business value from that data. The technology is an aid, nothing more, technology (and indeed  applications) will come and go, data lives on.

HAMMER encapsulates the  minimum set of technical skills an aspiring datasmith should learn:

  • Spreadsheets, in particular Excel. No need to know every nook and cranny, nor expect it to handle every task. But do learn about pivot tables and array formulas; if you have Excel >= 2007, learn about Excel  tables (aka Lists). If you have Excel 2010, make sure to download PowerPivot. Become comfortable with “formula programming”, don’t expect  a ribbon command to solve every problem.
  • SQL – learn the basics of selecting, joining and group-by; again no need to become a master; SQLite is not only an excellent way to learn SQL, it’s also a powerful tool to have once you’ve picked up the basics.
  • Learn a scripting language –  Python is one of the best and is relatively easy to learn. Again, mastery is nice, but not essential, learn the basics of IF-THEN-ELSE logic, loops and iterations, array and list handling and string manipulation.  Your efforts at coding do not have to be pretty or elegant, just functional. Python skills will also transfer across platforms, CPython (the original and the best), IronPython (.NET and HAMMER) and Jython (JVM, here’s an cool example of Python as a scripting language to automate GUIs).

All this talk of picking the right tools brings to mind the old golf pro-am story where the amateur was constantly pointing out to the pro (Jack Nicklaus I think) what club to play. At a particularly tricky shot to the green, the pro had enough when his choice of club was again “criticised”. So, he proceeded to take out every club including woods & putters, placed a ball for each one and hit every ball on to the green.

We’re not all as talented as Jack Nicklaus, so having at least an good enough tool for the job at hand  is important. But it does show, that focusing on the end-game is what matters, not becoming fixated with a  particular way of doing things.

Enough of the moralising, and talking of being fixated on a particular tool  ;) here’s this week’s list of new features to be found in HAMMER:

New commands:

TXT – will load a text file such that each line is a new row in a single columned (column named “text”) table.

LOADDB – like OPENDB, opens the named database but only if the database file exists (OPENDB will create a new empty database if file not found). Intended primarily as end-point to request-process-response loop, see DELEGATE/SUBMIT  below.

SAVEDB – saves the current “main” database using the previous argument as it’s name. (shorthand for … “main”,”filename.db”,”SAVENAMEDDB”).

SUBMIT – same as SAVEDB, but the value of the prior argument has a “.request” appended to make a file name in the format “databasename.request” for saving to. Also, if the previous argument = “” or “{GUID}” will generated a globally unique name using  a GUID. The main use-case for this command is to send the saved database for processing by an external program, maybe a Talend or Keetle ETL job , or a HAMMER.EXE job.

DELEGATE – same a SUBMIT, but doesn’t do any processing (i.e. all commands other than DELEGATE, which is expected to be the last command, are ignored), instead it’ll package the request in the saved database with the expectation that an external HAMMER.EXE, or another Excel-based HAMMER call, with do the processing.

Changes and helper functions to support DELEGATE/SUBMIT processing:

The DELEGATE and SUBMIT commands are how HAMMER implements its version of steam-powered servers.

The intention is that databases named with extensions of “.request” are picked up and processed by “servers” waiting for such files. (The transport of such files between server and the served is a separate issue, might be simply a matter of placing files on a shared drives, or DropBox!) Theses servers may then populate that database with new data ( or not, e.g. might generate a CSV). When finished, a text file of the same name but with the “.request” replaced by a “.response” is written to indicate that processing is complete.

Both the Excel udf HAMMER and HAMMER.EXE (non-excel command line version of UDF function), have been changed such that when a single argument is passed, that argument is expected to be a DELEGATE generated “.request” database. The database’s “pickled” arguments and data will then unpacked and the “delegated request” processed.

HAMMER.exe, if started with no arguments, implements a simple DELEGATE “server”, i.e. will wait for “.requests” files in its default folder and process each file as above.

Three RTD helper functions have been added to HAMMER udf:

waitOnHammerResponse(database (without the .response),[folder name]) – will return “Waiting” until it finds a database.response file, will then return the original .request filename (i.e. not the response file as it’s simply a marker, the database.request will  contain original data and any new data generated).

waitOnHammerRequest(database (without the .request),[folder name]) – will return “Waiting” until it finds a database.request file, will then then return the request filename.

waitOnFile(filenaname,[folder name]) – like the functions above but without the response/request expectations.

Here’s a list of the HAMMER commands implemented so far …

Get the latest version of HAMMER here …

Using PowerPivot to Hammer home some facts

From my previous post’s example of a Hammer use-case, it’s obvious I primarily see Hammer (and indeed microETL) as a tool for shaping dimensional type data; i.e. relatively low volume, often very ‘dirty’, but very high (business) value.

Fact (aka transactional data) can of course be handled, particularly when already reduced or when by-nature low volume, such facts will in many cases fit easily in-memory.

But when facts start to run into the millions of records, traditional in-memory manipulation becomes a problem. Obviously such large volumes datasets should in the first instance be handled IT-side utilising tools that are designed to handle such volumes, i.e. enterprise-class databases. But sometimes data, even large transactional databases, are “thrown-over-the-wall” with limited support offered (or accepted). But again, there’s plenty of cut-down versions of enterprise RDBMS available (SQL Express, Oracle Express etc.) plus the FOSS offerings such as MySQL or PostgresSQL, that can be configured user-side to help tame these beasts.

If you’re using PowerPivot, you have another option, PowerPivot itself. With its ability to quickly load and compress large volumes of data and its ability to perform many data cleansing tasks by means of row-context DAX formulas, often that’s all that will be required.

The typical problems that a transactional dataset can throw up, such as data split over two tables (header and detail) or needing to replace a surrogate date key with an actual date (to enable certain DAX date functionality to work) can easily be fixed within PowerPivot.

One thing to note about fact data presented as a Header-Detail set, is that traditional star-schema design requires that such data be flattened to the lower “grain” of the detail line, but PowerPivot doesn’t actually require you to do that. Some dimensions can link to the header (example Customer on Invoice Header) and others to the line (example Product on Invoice Line). The detail line table is still the “hub” of the “star” but one of its dimensions (the header table) is its route to a multiple other dimensions. Not classic star schema design, but it’ll work , and good for quick and dirty analysis (might be situations though where things might not pan-out as expected see this, perhaps best stick with pure star-schemas for complex work).

There’ll come a time however, when you’ll be faced with the problem of manipulating large datasets outside of traditional  RDBMS servers and outside of PowerPivot. Combining sets of data from multiple sources, as in my previous post, would be a prime example. Such projects often operate on a “need-to-know-basis” often with those supplying the data ‘outside the loop’. Today’s additions to HAMMER should help.

Three new commands, ATTACHDB, ATTACHDBINMEMORY and SAVENAMEDB, will allow external disk-based databases to be attached to the default HAMMER database.

ATTACHDB requires a filename  followed by an alias name for the attached db. Having an attached external database would allow, for example, a large fact table in CSV format to be loaded (and indexed) without touching memory. This could also be done using the previously introduced OPENDB command, but the benefit of ATTACHDB is that other non-memory-threatening processing can continue to take place in-memory.

The ATTACHDBINMEMORY also attaches an external database, but this time loads it into memory, so any changes made will not be automatically persisted back to disk. To do that, use the SAVENAMEDDB command.

This requires an attached database alias, followed by the file name to save the database to. SAVEASNAMEDB has other uses such as making backups or making copies of data for use in an external debugging platform (it can be much easier to debug Python using a proper IDE).

Along side the facility to load data via CSV/TSV, I’ve also added an ADO command. This requires a valid ADO connection string, followed by either a table/view name or a SQL Select statement. It uses ADODB 2.7, to enable handling of modern Access file formats, I’ll eventually make an ADO.NET version to remove this dependency.

Finally, I’ve managed to get a DBAPI compliant SQLite provider working in the PYTHON step. The provider is called XLite (it’s a modified version of Jeff Hardy’s SQLite3 library) and exposes most of the same functionality as CPython’s SQlite3 provider.

The library can open external SQLite databases, so offering another means of accessing non in-memory data and accesses the HAMMER default databases via the pre-connected pyDB variable. Having the ability to lazy-load rows via a cursor loop is also very useful in reducing memory foot-print when dealing with large tables. (see the IrelandFOIExample_hammer_test2.xlsx for an example of using XLite).

Update:

I’ve removed the dependency on a specific ADODB library (figured out how do equivalent of CreateObject in .NET for COM libraries). Also included is a first pass at a command line version (hammer.exe). Example:

hammer.exe mydata.csv CSV myotherdata.csv CSV JOIN > new.csv

hammer.exe inv.csv “CSV” “sum(qty)” REDUCE sum.csv TOCSV

Here’s a list of the HAMMER commands implemented so far …

Download  the latest version of HAMMER from here …