Category Archives: VBA

Attach a SQLite database into Excel’s memory via microETL

In my previous post I described the various methods of accessing SQLite databases from within Excel using microETL. Via comments on the post, Michael Römer suggested a change to how microETL loads into memory an external SQLite database (not only suggested, but also provided the C code changes to enable the change; thanks Michael).

The existing xLiteLoadUnLoad(filename[,unload]) function loads a SQLite file into “main” i.e. the primary database (which is usually a :memory: db) overwriting any existing data. Michael’s suggestion was to allow loading into another in-memory database with a different alias; thus keeping the main database intact but allowing the benefits of in-memory access to the externally attached database. This feature has now been added.

I’ve kept the existing xLiteLoadUnLoad as is, but added a new optional argument to the xLiteAttachDB function so.

  •  xliteAttachDB(databaseName,alias)  becomes xliteAttachDB(databaseName,alias,[loadInMemory=False]). The optional loadInMemory argument defaults to FALSE, so acts like the old version (i.e. issues a standard SQLite Attach statement). But if set to TRUE; the function will first Attach a “:memory: database” named as the alias, then will load the external database file into that in-memory database. Once this happens the on-disk database is not referenced, so any changes will not be reflected back to disk. To enable changes to be persisted to disk, I’ve added another new function…
  • xLiteDBSaveAs(alias,outDatabaseFile) will save a copy of the database named alias (with could be “main” if you wished to backup the default in-memory database) to the file outDatabaseFile. I’ve also added a …
  • xLiteDetachDB(alias) to issue a SQLite DETACH statement. You might ask why not simply use the SQL() function to issue DETACH (or indeed ATTACH) statements? Statements such as ATTACH/DETACH cannot be issued by the SQL() functions as its pre-processor (for table() functionality) wraps SQL statements in a SAVEPOINT (nested SQL transaction). You can however use the fastSQL() or xliteRawSQL() functions to issue such commands.

There’s another (this time breaking) change to the SQLScript TIMER  (see here …) command. The existing function used an ActiveX control (the Internet Explorer control as a provider for JavaScript timer functionality); ActiveX controls do not work under 64bit Excel, so I’ve reverted back to using Application.OnTime as my timer mechanism. The breaking change is the 3rd argument, which previously expected a value indicating the number of thousands-of-a-second to wait, now it represents whole seconds.

To download the latest version see the http://www.gobansaor.com/microetl page.

Update:

For another method of loading SQLite databases within Excel/VBA see my new .NET-centric micro ETL tool  http://blog.gobansaor.com/category/hammer/

Excel Document-Oriented Database with Python Map & SQL Reduce

In a previous post Excel as a Document-Oriented noSQL database I discussed why a document-oriented approach to system design can be very useful due to the ease of setup and subsequent pain-free evolution of data models that comes with a schema-less datastore. Two new features I’ve added to microETL make using Excel as the basis for a document-orientated system much easier.

The two features are:

  • JSON support, load and manipulate JSON documents; very useful for communicating with webs services that increasingly are abandoning XML for JSON as their preferred inter-change format; but also very useful for serialising (in a simple textual format) all sorts of documents especially Excel data-loads.
  • SQL Virtual Tables, for example virtual table vtCSV; works directly with a CSV file, including writing back any changes to the file. Another set are vtVBA, vtScript and vtPY; allowing for bespoke VTs to be created via VBA, SQL Script or Python. It is these three VTs that are proving useful for crafting document-oriented systems.

Having the ability to use JSON as document serialisation format is the first step (prior to this I would have had to use XML or CSV) allowing me to store a document’s data without recourse to a schema. But how to view and analyse the documents? In document-oriented databases such as CouchDB, documents are likewise stored as JSON objects, while to fetch and analyse, these tools tend to use a variation on MapReduce; where a Map function gathers the subset of documents and whatever data items required, then either passes that back as a “view” or passes on the set to a Reduce function to do things such as SUM or AVERAGE etc.

MicroETL uses a variation on this MapReduce approach. Using the virtual table facility, Map functions are written in VBA, SQLScript or Python. Such Map functions have a standard signature, where by the 1st argument is the name of a SQLite table that the function must create and fill with the resulting “view”. That table will then be available to SQL to act as the Reduce method (sum(),avg() etc.) or simply as a conduit for passing back the table to Excel.

An Example

Say you use Excel to record your employees’ time-sheets. Each week you send each employee a partially filled out (StaffNo and WeekNo) time-sheet workbook. (You used to this manually but now a microETL-powered workbook does this automatically).

The employees fill out their sheets, and mail them back to your payroll clerk. In the past the clerk manually opened each mail, opened the attachment, transcribed the hours and overtime worked into another workbook, manipulated it and exported as a CVS to make it suitable for batch-load into the payroll system. Painful. But again, you’ve now automated this using microETL.

You’re thinking of replacing your existing, cheap and cheerful, payroll system with a new super-super one that handles direct remote time-sheet entry and sophisticated build-up-to-gross calculations. But here’s the rub, you’re current build-up-to-gross process is simple (standard hours one rate, overtime 30% extra), but change is afoot, you’re hoping to buy-out some of the current overtime spend and convert  it to standard hours, plus a whole raft of other changes. But before that, you need hard figures on what’s driving overtime and even then, you’ll have to introduce the necessary changes piecemeal, department by department. When you explained this to the Payroll System’s sales rep, he was very happy, happy to quote for the extra expense of such a plan. You’ve decided to stick with spreadsheets at least until you’ve decided on, and implemented, the new overtime regime.

Here’s the current timesheet:

The “Reason for Overtime” is free-format text, so the first thing to do is to codify the reasons and change the standard time-sheet template to use a drop-down menu. To help identify the reasons and to analyse the existing 3 years worth of time-sheets you hold, you use microETL to serialise the data into a document-oriented SQLite-based table. Using something like this:

The SQLScript is called for each workbook and produces a JSON object like so …

…which is stored in a SQLite table like this …

CREATE TABLE timedocs(id text,key text, date datetime,doc text);

…where doc holds the JSON object, id is a MD5 hash of the object, and key is the business key. To help quickly search the table a FTS-3/4 SQLite full-text-search index is created on the doc column (remember JSON is just text!). This will allow the free-formatted reason attributes to be quickly and easily analysed.

A number of vtPY “views” are then applied. One such, produces a table in the format required by the current payroll system – StaffNo,WeekNo,Standard Hours, Overtime Hours. You use this every week to load the weekly payroll run, but will use it for this exercise to validate you’ve fetched the full 3 years history by comparing against payroll’s history.

Lo and behold, they don’t agree, due to a lot of manually mis-typed hours resulting in over-payments (under-payments were always caught by the affected employee :) ), but at least there’s none since you’ve automated the process.

Here’s the Python Map function:

An here’s SQL doing a Reduce to get totals by WeekNo.

SELECT WeekNo,sum(Standard),sum(Overtime) from table(vtPY,Python!PayrollList) group by WeekNo

The real beauty of this approach to Really Simple Systems, is that the core of the process remains under the control, and within the capability, of business users. Even if the writing of Map functions (in Python or VBA) is outside their comfort-zone (as would be the case for most) and even if “SQL Reduce” skills are scarce on the ground, the design and deployment of the source documents should be within the competence of Excel-savvy end-users. So at least data can be gathered immediately and handled manually while waiting for a power-user or IT resource to add the automation layer.

Such systems have resilience built-in. If, for example, you were using a cloud-hosted  web site to allow employees to download and upload time-sheets and that service fails, you could revert back to using email (or use Dropbox!).

Same for any automation elements that break over time, reverting back to “manual” is doable, whether or not to invest in re-automating being a simple cost-benefit analysis comparing on-going manual effort to automation costs.

Likewise, you could use an externally hosted website as your “gateway” but construct the upload and download links to use behind-the-firewall URLs, so taking advantage of the cost-benefits and simplicity of the cloud while keeping hard-data securely within your firewall.

These sort of benefits don’t just apply to data-collection systems but to reporting and data-provisioned systems such as micro-ETL and micro-BI initiatives.  And of course, the sweet-spot, is operational-BI, mixing BI insights with operational actions; the simplicity of Excel forms and reports, provisioned by the Excel’s new native BI engine, PowerPivot!

Need help utilising Excel to its full potential?  Contact Tom

.

microETL – the ultimate datasmithing tool?

Over the years I’ve accumulated and written thousands of lines of VBA code, I’ve also integrated Excel with countless other technologies. Most of this work was associated with the movement and managing of datasets; data for reporting on, for analysing and for collection. MicroETL is, in effect, a collection of the best (or at least the most useful) of these activities, packaged primarily so that I don’t have to continuously re-invent the wheel, so that when I’m working on a client’s problem, I can concentrate on the problem and not the technology. So although I share the code with all and sundry with the hope that others may find it useful (little choice as it’s VBA ;) ), its primary purpose is to the serve my needs, and indirectly, those of my clients.

VBA is of course at the heart of microETL. I’m often asked why not port the code to one of Microsoft’s more “modern” languages? There are three  main reasons for not doing so now (and one major one for not doing so in the past).

First the past. When MS introduced .NET the needs of those who program mainly around the Office platform were ignored, .NET add-ins for Excel were a pain to write, even more painful to deploy and even then, were slow compared to VBA add-ins. Things have improved, it’s now much easier to target Office from within Visual Studio (although still just as expensive). But even now, the best method of using .NET with Excel is a  superb 3rd-party open source tool, ExcelDNA. This is the tool I’ve used (and will continue to use) whenever I needed to merge Excel with the .NET platform.

I’m currently developing a .NET offshoot of microETL called HAMMER; using ExcelDNA, C#-SQLite and IronPython; not so much a replacement for microETL but a new type of tool to take advantage of the new capabilities offered by Excel 2007/2010 (multi-threading and 64bit memory addressing mainly) and .NET4 (IronPython and ease of multi-threading programming). It’s also, thanks to ExceDNAPack, a single file deployment. To keep an eye on its development follow the HAMMER tag on my blog.

So if ExcelDNA provides a pain-free (and cost-free) method of using C# or VB.NET with Excel, why stick with VBA? As I said, three reasons:

  • VBA is to Excel what JavaScript is to the web browser. It’s only remaining purpose as a language (with the death of VB6) is as an Office automation language and ,in effect, when you’re programming in Excel you’re largely working with the Excel Object model (in much the same way and the DOM is the focus of attention on JavaScript progamming). VBA is perfectly suited to this task and although it has its problems as a language (again like JavaScript) once you know them they’re not a problem. VBA is here for the long haul, Office 2010′s addition of the 64bit support to the language (Vb7 !) confirmed that. As the code is already in VBA, works very well as VBA and has a long future ahead of it, why change?
  • VBA is easy for a large subset of end-users to learn due to Excel’s macro-recording facility. The resulting code may not be pretty but it works both as a automation tool and as a learning tool. Millions of citizen programmers owe their skills to the VBA macro recorder. One of microETL’s goal is to make adding functionality as easy as possible and one such method of doing that is via simple VBA functions that can be called from SQLScript (microETL’s tabular scripting extension). C# or VB.NET are professional languages designed for professionals to use, VBA comes from a long line of “pro-am” scripting languages which open the world of programming to a much larger population.
  • Multiple deployment options. MicroETL can be installed like a normal add-in and consists of three files, the VBA add-in, the SQLite3 dll and the xLiteSQLite dll to wrap SQLite to make it VBA accessible. Python functionality and other options will add further files. But by embedding a small call-on-open macro in a workbook the add-in can be called without installation. This is how I use it for once-off tasks, as I can package the code and the associated data and workbooks in a single zip file for archiving purposes or for deployment (often running off a USB stick).  Because I can deploy my code as code-at-a-moment-in-time I can modify the codebase to suit the current requirements without affecting other microETL projects that are using, or might in the future use, a different version fo the code. In fact, microETL has a noSQL compile option that removes SQLIte and other DLL dependencies (with a reduction in power obviously) enabling the code to be embedded in a workbook allowing for a single-file deployment (which often, is the only option). This could not be done with a .NET add-in.

Okay, so VBA is useful, but why embed SQLite and Python? As I’ve said above, microETL exists to make my life easier. I’m a database programmer, I’ve been using SQL (or its predecessors ) for  three decades. I like set-based logic (such as SQL  or PowerPivot’s DAX) because:

  • I’m good at it (practice does make perfect) enabling me to solve many data related problems efficiently and more accurately (fast is no good if the result is wrong) that would be the case with functional (Excel fromulas) or procedural (code) approaches.
  • I believe anybody who spends a significant proportion of their working lives managing datasets should learn basic SQL. If they don’t it’s like driving a car but never getting beyond 2nd gear.

SQLite allows me to bring the beautiful world of set logic to user-centric Excel world of functional programming (yes, all you Excel jocks, you’re functional programmers). It also brings some nice side-effects:

  • It is, like Excel itself, an outstanding single-file document-oriented datastore.
  • By default, microETL uses SQLite in-memory database functionality (you can also attach disk-based ones) so it offers a means of exploiting the vast amount of cheap RAM that modern PCs now offer. Excel has always stored its working datasets in-memory and it too now offers much greater capacity (1,000,000 rows per sheet in Excel 2007/2010). SQLite has no practical limit other than the available memory and the addressable-limit of the OS (2GB for 32bit Windows). With 64bit Windows (microETL is Excel 64bit enabled) that addressable limit is now effectively gone. This makes microETL a superb platform for all sorts of ETL tasks.
  • As SQLite offers Excel a secondary in-memory datastore it can be used to share models across programming platform boundaries. This allows me to embed Python in Excel and use the in-memory SQLite database as a shared data conduit. I’ve done the same with JavaScript (but only on 32bit) and with .NET. The result is the non-native languages can be used as if they were not operating with Excel, no need to mangle in Excel interface code, simple!

Which brings me to Python. Why embed a Python interpreter into microETL? Can I not just use VBA as I quite obviously know it inside out? Well of course for all things Excel-focused that is what I do, and for most other transformation or calculations that require some programming I would also opt for VBA. No reason not to, it works and works quickly. However, Python (and in particular C-Python) offers access to world of superb code. If something can be automated, or interfaced to, or turned into a computing algorithm, somebody (usually very bright) will have done it in Python. Python also works well with C, and again if a C library is useful somebody will have wrapped it in Python. Python means that if I’m asked “Can you integrate Excel with ……?”, I can say, “Yeah, no problem”.

Python as a language has the other advantage of being a superb tool to work through a problem with; it doesn’t get in the way; very little of my mind has to be diverted to the language itself, the problem at hand gets the attention (C would be an example of a language on the other extreme, you better have the solution to your domain problem figured out before you start to program in C as the programming task itself will command most of your attention).

Python is an ideal language for business logic. It is easy to learn, eminently readable and generally immune to programming gotchas. Because of this, it’s very popular as a citizen programmers’ language, especially amongst engineers, scientists and quants, so it blends in naturally with microETL the ultimate datasmithing tool!

So if microETL is built on a foundation of “civilian friendly” technologies such as Excel formulas, VBA, SQL and Python is it a tool that anybody could use? In theory yes, in practice it depends.

Only a small proportion of the population has the ability to master procedural programming and an even smaller precentage has the interest in so doing. (A much large proportion of the population shows the ability and the interest to tackle Excel’s native programming method: functional no-side-effects programming using Excel formulas, which accounts for the appeal of spreadsheets to the general populace!).

So although I’ve abstracted away a large level of the complexity of automating Excel by means of the SQLScript and SQL functions, it’s still programming.  But even if the full power of the tool is beyond most, the tabular sequential nature of  SQLScript and the relative approachability of SQL should mean that many can at least follow the logic and data flows and might even be able to modify existing scripts. Those users with a good knowledge of VBA and or SQL should find the tool enables them to be more productive and cost effective. While those with Excel formula skills can usually work in tandem with a framework of microETL functionality with relative ease.

For the latest versions and articles on using microETL follow the microETL tag on this blog …

Really Simple Systems

Organisations large and small depend on packaged software solutions for their core processes, and this is how it should be, the idea of writing your own ERP system or word processor makes no economic sense even if your business model is an extreme outlier. Differentiation does exist, the ERP system that a mining company would need would be significantly different from a chain of retail stores but most of the major (and minor) niches have packaged offerings to suit.

(Unless you’re a certain SME manufacturing company in the South-West of England that couldn’t find a suitable and affordable ERP package for its type of business; well then, developing an open-source one to help you and your fellow SMEs is, it seems, a viable option, resulting in uzERP!)

Many organisations will commission changes to these packages to better fit their way of doing business . In fact, such modifications are a major source of revenue for package providers (and as a result a major cost to those customers who will not “bend” to the packages’ default process flows).

The advice for those who either don’t have the money to commission changes (or who wish to avoid such costs) is to fit in with the package’s way of doing things (which as most will be “industry best practices” can often be a good thing) or use “off-line” processes to handle the exceptions. In some such cases a bespoke professionally delivered software solution might be the result; but for many the more usual “solution” to to those out-of-scope processes is “Simple!, you could easily do that in Excel“. Yet another recruit for the Excel -driven world of Shadow IT.

Whether such “systems” are the casualties of scope warfare or the preferred means of getting things done by Excel-savvy end-users they are part of every organisation’s IT landscape. They’re not going away, yet they receive scant attention from professional IT who often regard them as “somebody else’s problem”.

The end result is sub-optimal; although the business logic encoded in these spreadsheets is often of the highest quality and extremely fit-for-purpose (as it’s the work of those with the greatest insight of the problems at hand) the processes of moving data in and out (and when inside, efficently processing such data) is usually highly time-consuming, manual and error prone. This is where I come in.

I can wrap such “Simple Systems” is a professional “envelope” that will automate the flow of data in, out and within the process while keeping the likely-to-change business logic in Excel. Even when the Excel formula-based business logic needs to be converted to “code” (perhaps for performance or automation reasons) most likely that code will be in VBA (a language familiar to many citizen programmers) which can also be made configurable via a simple Excel “table” rather than some obscure XML dialect.

To get started, send me your existing workbooks (with or without data) with a short note describing the process they support and what you would like to automate or change. I’ll likely respond with some further questions re the process and perhaps made some further suggestions for automation. When we’ve agreed a “deliverable” I’ll get back to you with a fixed price quote.

The idea is to turn “Simple Systems” into “Really Simple System” not works of software art. The main interface will continue to be Excel enhanced by Excel-based tools (mainly my microETL add-in and perhaps add-ins such as PowerPivot for added reporting goodness). Separating data-flow from business logic is usually the major change. It’s also important to be clear what parts of the process need to remain “user-configurable” and which parts are to be “professionally” automated.

The user-configurable parts will need to continue to be Excel-formula based or as code that’s at a level the user is happy to engage with, or as code controlled via configuration tables. The professional parts will be developed by a professional (me!) with the expectation that a professional (i.e somebody proficient in Excel, VBA and SQL) would be required to modify the (always supplied) source code.

Highly configurable solutions are sometimes more expensive than “black boxed” ones, with the most user-configurable option – leave as is – the cheapest (until you take into account the payroll costs of those staff repeating the same easily-automated task over and over again!).

Contact Tom …

S3 as an Excel hub

In my Steam Powered PowerPivot post I explained the concept of powered-spoke based Excel systems. One of the hub options was Amazon’s Simple Storage Service, S3. I’ve been using S3 both as a simple backup service and as a “systems hub” since the service began in 2006. Back then, it was not necessarily the easiest to use or the most suitable, but it was cheap and reliable. The code I used to access S3 from Excel can be found here http://blog.gobansaor.com/2008/02/22/xlaws-excel-vba-code-for-accessing-amazons-s3-and-simpledb/

Over the years, lots of new features have been added to S3 that have made the service more useful and easier. The final big missing, for me at least, was the lack of multiple  authorisation credentials  for a single account i.e. the ability to set-up multiple users with their own access policies. Last year, with the launch of IAM (Identity & Access Management) Service, that last major problem was removed.

Although currently IAM is accessible only by the API that’s not a problem as my favourite tool for managing S3 (and AWS in general) is Python’s boto package, and it has that, and everything else you might wish to do S3 -wise, covered.

Although Boto does most of my heavy lifting (particularly on the server-side), I still have need for a VBA based S3 utility. Boto is fine for situations where I’m using microETL with Python enabled, but in many situations this is not ideal or even possible, particularly on the client-side of systems. Here, even a single file add-in may not be possible which often means a macro-enabled workbook is the only option.

MicroETL has a noSQL compile option which enables a lot of the code to operate without the SQLite libraries, thus allowing it to be used as a single-file VBA add-in or embedded in a workbook.

Up until now my S3 code depended on the xliteSha1Hmac.dll to provide SHA1-HMAC hashing (necessary to authorise access to S3) meaning that only pre-signed S3 URLS could be used in single-file deliverables. Even if this was not the case, prior to IAM, pre-signed URLs (usually with time limits) were often the only option as sharing an S3 account’s single authorisation credentials with all and sundry was not ideal from a security point of view. But post IAM, being able to set-up sub-accounts with specific access policies means the full power of S3 can now be used at client-level.

I’m in the process of adding a pure VBA HMAC-SHA1 facility (32 bit only at the moment, 64 bit continues to require the xliteSha1Hmac.dll). I’m also adding a JSON encode/decode module to allow tables to be more easily shared with non-Excel programs, currently I default to using tables encoded using ADO’s export XML schema to share via S3, but the JSON option will make it easier to share with a pure Python server (such as a PiClound based service, see my Expand Excel’s Horizons – look to the cloud post).

Cloud-shared tables can not only pass datasets back and forth but can pass SQLScripts (which are, after all, simply tables) for processing elsewhere.

For example, client workbooks not having Python  (or SQLite ) functionality installed could write a script, post it to S3; that script table could then be picked up by a “server” microETL workbook (or by a pure Python server), executed, and the results posted back to S3 to be picked up by the client (possibly waiting a response using a TIMER).

Server processes could also do the opposite; passing down SQLScripts for client-side execution.

When people think of utilising the cloud, this sort of scenario is not what comes to mind, thinking that Excel is a desktop product so it cannot take advantage of the enormous cost-advantages of cloud-based solutions. Not so, it’s now very easy to integrate the likes of S3 into existing processes, adding that cloud magic but keeping those existing works-so-don’t-fix-it processes firmly on the ground.

If you need help figuring out how to take advantage of S3 (or other AWS services) and how to integrate them with your existing technology base do contact me. I’ve been working with this technology for over 5 years (and have 30 odd years of experience with good-old reliable “ground-based systems”,

What’s with the bucket picture? S3 calls its primary data storage unit a bucket.

UPDATE:

IAM is now available from the AWS Management Console http://aws.typepad.com/aws/2011/05/identity-and-access-management-console-support.html