Category Archives: VBA

VBA Multithreading, .NET integration via HAMMER

In a previous post I urged all potential datasmiths to learn a scripting language (I suggested Python). But what of VBA, the granddaddy of the scripting world? Well yes, if you have a need to automate Excel then you must learn VBA. VBA is to Excel as JavaScript is to the modern browser, its tight integration with Excel’s Object Model and its superb debugging facilities makes it the optimal choice for automating Excel.

VBA is now a one trick pony (Office automation), but Python opens all sorts of API scripting and product automation doors. My use of IronPython as HAMMER’s scripting language is one such door, a door to the otherwise mainly-closed-to-VBA world of multi-threading and easy .NET library integration.

The HAMMER function can be called from VBA using the Application.Run command like so…

retArray = Application.Run(“DATASMITH.HAMMER”,inArray,”Select dept,count(*) from table1″,”SQL”)

… the 1st parameter is the function name, parameter two is the 1st argument to the function, parameter three is the 2nd and so on.

By utilising HAMMER’s IronPython functionality (requires NET 4.0 runtime), VBA routines can access the full power of the .NET platform with data passed back and forth using tables. Admittedly for many complex .NET utilities utilising VB.NET or C# may be a better approach (due to better IDE and debugging features of such languages) but for standard library calls, IronPython is an ideal option. It also has the benefit that the “code” can be stored within the workbook.

HAMMER also offers the power of multi-threading to VBA via its internal threading functionality (requires Excel >= 2007 and Net4.0 runtime). The multi-threading example in the hammerThreads.xlsx workbook could easily be wrapped in VBA code, perhaps to allow it to be controlled by a user-form.

I’ve added two new commands specially designed for use within VBA scripted HAMMER scenarios:

  • APPDB – Opens an application-wide shared in-memory database. This will allow tables (and Python objects) created in one function call to be accessible in an other function call (assuming both issue the APPDB command as their 1st command). This replicates the functionality of microETL which by default exposes a application-wide SQLite in-memory instance and a common Python workspace.
  • CLOSEAPPDB – This will close and clear the shared c#-SQLite and IronPython instances. Equivalent of microETL’s xLiteReset() function.

Be careful not to…

  • use the APPDB instance from in-cell UDF calls to HAMMER that are likely to be scheduled as multi-threaded (the helper functions HammerToSheet & HammerToFit are safe, as they are always single-threaded),
  • or use within “internal threaded” HAMMER commands

…as although c#-SQLite is thread safe, the implementation logic is not.

Here’s a list of the HAMMER commands implemented so far …

You can download the latest version of HAMMER here …

SQL noSQL no Python no VBA.

I’ve uploaded another version of HAMMER; this adds some new features and also takes some away. The removed features are Python and multi-threading support from the 2003 version of the add-in. Calling it the 2003 version isn’t entirely accurate (it’s actually called datasmith-noPython.xll) as this version will also work for Excel 2007/2010 32bit and for older versions (maybe even ’97!). It should really be called the .NET 2.0 version as the features removed from this version depend on the NET 4.0 runtime (IronPython 2.7 and multi-threading). I’ll eventually build a .NET4 version for Excel 97-2003 with Python included, but this will still be missing the multi-threading features.

So, the version that the setup.xls will install if it detects a sub-2007 version of Excel, will offer SQL and noSQL (JOIN,UNION etc.) but no Python or multi-threading.

So what about new features? Excel being the original noSQL  database, I continue to add more noSQL commands for those who wish to avoid SQL or find its syntax somewhat long-winded. The JOIN  & LOJOIN (outer join) commands are good examples, simply load two tables with the column names that you wish to join on, sharing the same names, simple.  Another example is the REDUCE (aka GROUPBY aka DISTINCT) command I’ve added this version. It essentially performs a SELECT … FROM … GROUP BY; again load or generate a table, then follow with a list of the columns you wish to ‘reduce’ the table by, plus any aggregates you wish to perform. Examples:

  • =HAMMER(myHugeList,”dept,sum(overtime)”,”REDUCE”)
  • =HAMMER(AccessLogs!A1:C9999,”areaAccessed,byWhom”,”REDUCE”)
  • =HAMMER(invHead,invLine,”JOIN”,”count(invID),sum(netAmt)”,”REDUCE”)

If noSQL is not your cup of tea and you wish to utilise the full power of a SQL database; a new command “OPENDB” will allow you to open an existing (or create a new) SQLite database file. This will allow SQLite data sources to be accessed and written to via standard in-cell formula, no VBA required! The command expects the previous argument to be the database name. If no such argument exists it will create a temporary on-disk database. This command usually only makes sense as the 1st command as it’ll close and wipe any previously opened databases. If no “OPENDB” command is issued (i.e. the default) an in-memory database (aka :memory:)  is used . Examples:

  • =HAMMER(“C:\data\myDB.db”,”OPENDB”,A10:C:9910) will copy the data for range A10:C9910 and save in a table called table3 in the myDB.db SSQLite database.
  • =HAMMER(“C:\data\myDB.db”,”OPENDB”,”SELECT * from table3″) will fetch the same data back into Excel.

Wow steady on, what if there’s a need to store or fetch data from disk without using SQLite? No problem, use the “TOCSV” command, outputs the last table loaded or generated in CSV format to the file name specified. (There’s also a “SQLTOCVS” command which expects a SQL statement to specify the data to extract followed by the file name to extract to).

Two other commands “CSV” and “TSV” will load comma and tab separated data into HAMMER.  Although the CSV functionality is useful within Excel, the main driver for these command is to enable HAMMER to work outside Excel as a command-line data processor; you heard it here first folks!

I’ve also added the 1st set of my helper functions, these two functions are only available in the 2007/2010 versions as they use multi-threading. The two functions are:

  • hammerToFit – wraps HAMMER, but will auto-resize the array area (or create a brand new array-selection if none) to fit the returned table. Note: to achieve this, the HAMMER function will be called twice if the existing array area needs adjusting.
  • hammerToSheet – again wraps HAMMER, but will paste the resulting table to a new sheet.

Although both helper functions utilise threads to achieve these little tricks (hence they’re not available sub-2007) when HAMMER functionality is called via these wrappers the function operates as a single threaded function – there’s a good reason for this which I’ll explain some other time. Internal HAMMER threading does however still work.

 

 

 

 

Here’s a list of the HAMMER commands implemented so far …

Download the latest version of HAMMER here …

The Datasmith’s Hammer

Although my microETL add-in is very powerful, it can be a bit intimidating for those without a programming background. It was after all, designed for my needs primarily and being a professional programmer I tend to see the world from that perspective. Hence microETL’s ability to forge vast and complex datasets in parallel to those of Excel; to share those datasets not just with Excel and VBA but also with the powerful tool that is CPython. But microETLs genesis was not as an all-powerful ETL tool but as means to quickly and accurately handle tabular data in Excel. The original xLite (which begat microETL) started out with just two functions, Join two tables or Left Outer Join two tables. That was it, but it was still useful.

I’ve been intending for some time to build an offshoot of microETL that would be less powerful but perhaps more approachable and have fewer moving parts. This week I finished it, still needs some more testing, but the basic product is in place. It’s a single file add-in (a .xll) , called DATASMITH. At its heart is a single function called HAMMER, there will be other helper functions that in the main will wrap the HAMMER function, but in essence it is the datasmith’s HAMMER.

If microETL is a datasmith’s forge or indeed a mirco-foundry,HAMMER is a datasmith’s everyday portable tool (with perhaps Excel as the anvil, and your CPUs as the fire?). Talking of CPUs; multi-core CPUs are now the norm and since version 2007, Excel can utilise such multi-cores. MicroETL being VBA-based cannot however take advantage of this, to do so requires an .xll add-in; another reason to build the HAMMER.

So what will this new functionality look like:

Examples:

=HAMMER(Invoices[#All],InvoiceLine[#All],”JOIN”)

…will take the two ranges (you need the [#All] to pick-up the header and data sections of a 2007/2010 Excel Table) and join them using the columns with the same name as the join fields. The JOIN command expects the last two preceding arguments to be tables (aka arrays with a header line).

=HAMMER(“SALES”,DeptSales!A1:F2101,”SELECT * from table2 where dept=’:1′”,”SQL”)

… the 1st argument is loaded as Arg(1) (:1 in SQL), the 2nd argument is loaded (being an array) into a table named table2; if they were the other way round, it would be table1 and Arg(2). The 3rd argument is loaded as Arg(3) and the 4th is a command: SQL. SQL looks back at the preceding argument (Arg(3) in this case) and executes its contents as SQLite SQL. If the preceding argument was a table, it would expect to find a list of SQL statements for execution in the first column. The output of the last issued SELECT statement is then returned to Excel as an array.

=HAMMER(DeptTargets!A1:D20,DeptSales!A1:F2101,SalesTagetScript,”PYTHON”)

…this is similar to the previous SQL example but this time the command is PYTHON which will execute the Python Script passed in via the command’s preceding argument. The script will most likely return a table to Excel, but it could also, if not the last argument, create a table associated with its position, in this case table4, which could then be accessed by subsequent PYTHON or SQL scripts.

Up to 25 arguments can be passed, the last table produced is returned to Excel (either via a load, which wouldn’t be terribly useful, or more likely as a result of a command such as JOIN, SQL or PYTHON). HAMMER functions can of course be nested and can also issue “internal” HAMMER requests. The “flow” of commands is from left to right, with the preceding args usually setting the stage for subsequent commands. Alongside the all-powerful SQL and PYTHON commands, I’ll most likely add a set of “noSQL” offerings such as JOIN, LOJOIN (left-outer), DISTINCT, REDUCE (a SELECT .. GROUP BY… with PYTHON as the MAP?), UNION, INTERSECT. These will likely also be available through helper functions such as =JOIN(ThisTable,ThatTable).

Unlike microETL, there’s no persistence across function calls i.e. HAMMER will play by Excel’s Functions no-side-effects rule. Each call to a function will build up and tear down its in-memory SQLite environment (including calls to HAMMER from within PYTHON).(UPDATE: Specify the 1st COMMAND as “APPDB” to simulate microETL’s persistence across function calls)  Likewise each call to PYTHON will be a separate engine instance. Likewise each call to a HAMMER function will create its own PYTHON engine. Tables  and Python artefacts created by prior “steps” in a single HAMMER call are available to subsequent steps.

HAMMER embeds Python under the guise of IronPython, so a lot of the power and speed of CPython will not be available, but on the other hand, the full power of the .NET CLR will be, not a bad swap.

And a huge advantage, HAMMER will be an asynchronous function (i.e. will run in its own thread). This will allow multiple long running transforms to be handled within the same Excel instance, a major shortcoming of microETL. This requires Excel 2007 or 2010, but will still work synchronously for Excel 97-2003. Having said that, there’s a requirement for Net4.o for IronPython so this add-in is more suited for modern versions of Excel and for OS >= XP SP3.

HAMMER is an array function, yeah I know, normal folk tend to steer clear of Excel arrays. Which is to be expected, they’re not the most intuitive end-user-facing construct that the industry has ever come up with. But, they are the “Excel way” for passing tables in and out of formulas and, once mastered, open up a new world of power to Excel users. I will be adding helper functions to make using arrays a bit easier (an autoSize wrapper function, that’ll resize the selection area if it’s too small or too big, at the cost of a 2nd pass at the enclosed functions, might also port microETL’s SQL “paste” functionality).

I had intended to have an example to download with this post, but have discovered a last-minute bug that needs fixing. So it’ll be most likely next week before I’ve a version to show.

UPDATE:

I’ve managed to sort out the bug, so here’s an example…

http://bit.ly/datasmith

Here’s a list of the HAMMER commands implemented so far …

Use setup.xls to install (or simply open in Excel), see IrelandFOIexample_hammer (2007/2010) for examples (there’s also a copy of the same functionality via microETL in IrelandFOIexample_microETL.xls – note the speed difference).  There’s Excel 2007/2010 32bit and  Excel 2010 64bit versions included and there’s also an un-tested 2003 version.

Data Wrangler

A few weeks ago I came across (thanks to @lismissData Wrangler; a very promising data cleansing tool from the Stanford Visualization Group. Not only is Data Wrangler a web-service (which the group intend to open source) but it also allows transformations to be “recorded” in either Python or JavaScript (see here). It was this Python scripting feature that really caught my attention; would be very useful to be able to hack away at a dataset using the service, then transfer the script to microETL’s PyScript to adjust and integrate with Excel and SQL.

The demo video and test datasets give a good overview of the tool but the proof of the pudding is in trying out some real world dirty data; I chose a fine example of the art of Freedom Of Information datasets, issued by a Republic of Ireland government department. As an example of how not to do something (unless your intention is to make the recipient regret asking for the FOI in the first place) this is excellent. (I suppose we should be grateful it’s in Excel not Word or PDF or even PowerPoint). You can download it here http://bit.ly/Ireland_FOI_example (the data as released is in the FOI sheet).

As I said, Data Wrangler is promising, but needs some more work (to be fair, the group warns it’s a work in progress). The tool choked on the FOI dataset, too many columns I think, so not ready for the real world yet but I’ll be keeping an eye on its progress. Don’t let my experience put you off, it looks more than capable of handling smaller but still quite messy datasets.

If you’ve downloaded my example workbook, you’ll see how I managed to cleanse the data using microETL’s Python & SQL scripting functionality (the PyScript is in the Python sheet, with the SQLScript in the Control sheet). I could have cleansed the data using pure Excel and some VBA  and perhaps I would have if this was a format requiring parsing on a regular basis; I could then save the transformation as a single file macro-enabled workbook, ideal for sharing, no need for add-ins etc. But it was a once-off, and even if it wasn’t, it’s quite likely the format supplied in answer to a subsequent FOI request would be different. This is the sort of work that microETL’s Python & SQL scripting is designed for; quick and dirty data wrangling, but with the ability ro persist, and modify the resulting transformations if so required.

If you wish to try out this example, there’s a new version of microETL (Alpha1.08) available for download. You’ll notice a new folder structure (the usual sub-folders are now under a single sub-folder call microETL) to make installation of the add-in somewhat neater; and there’s also a setup.xls that’ll do all the hard work of installing (and un-installing) the microETL add-in. Note: you still need to manually install Python 2.7 to enable the PyScript’ing functionality.

If you need help with your Excel, ETL or  data cleansing tasks, I can help.

Attach a SQLite database into Excel’s memory via microETL

In my previous post I described the various methods of accessing SQLite databases from within Excel using microETL. Via comments on the post, Michael Römer suggested a change to how microETL loads into memory an external SQLite database (not only suggested, but also provided the C code changes to enable the change; thanks Michael).

The existing xLiteLoadUnLoad(filename[,unload]) function loads a SQLite file into “main” i.e. the primary database (which is usually a :memory: db) overwriting any existing data. Michael’s suggestion was to allow loading into another in-memory database with a different alias; thus keeping the main database intact but allowing the benefits of in-memory access to the externally attached database. This feature has now been added.

I’ve kept the existing xLiteLoadUnLoad as is, but added a new optional argument to the xLiteAttachDB function so.

  •  xliteAttachDB(databaseName,alias)  becomes xliteAttachDB(databaseName,alias,[loadInMemory=False]). The optional loadInMemory argument defaults to FALSE, so acts like the old version (i.e. issues a standard SQLite Attach statement). But if set to TRUE; the function will first Attach a “:memory: database” named as the alias, then will load the external database file into that in-memory database. Once this happens the on-disk database is not referenced, so any changes will not be reflected back to disk. To enable changes to be persisted to disk, I’ve added another new function…
  • xLiteDBSaveAs(alias,outDatabaseFile) will save a copy of the database named alias (with could be “main” if you wished to backup the default in-memory database) to the file outDatabaseFile. I’ve also added a …
  • xLiteDetachDB(alias) to issue a SQLite DETACH statement. You might ask why not simply use the SQL() function to issue DETACH (or indeed ATTACH) statements? Statements such as ATTACH/DETACH cannot be issued by the SQL() functions as its pre-processor (for table() functionality) wraps SQL statements in a SAVEPOINT (nested SQL transaction). You can however use the fastSQL() or xliteRawSQL() functions to issue such commands.

There’s another (this time breaking) change to the SQLScript TIMER  (see here …) command. The existing function used an ActiveX control (the Internet Explorer control as a provider for JavaScript timer functionality); ActiveX controls do not work under 64bit Excel, so I’ve reverted back to using Application.OnTime as my timer mechanism. The breaking change is the 3rd argument, which previously expected a value indicating the number of thousands-of-a-second to wait, now it represents whole seconds.

To download the latest version see the http://www.gobansaor.com/microetl page.

Update:

For another method of loading SQLite databases within Excel/VBA see my new .NET-centric micro ETL tool  http://blog.gobansaor.com/category/hammer/