Category Archives: Ireland

Data Wrangler

A few weeks ago I came across (thanks to @lismissData Wrangler; a very promising data cleansing tool from the Stanford Visualization Group. Not only is Data Wrangler a web-service (which the group intend to open source) but it also allows transformations to be “recorded” in either Python or JavaScript (see here). It was this Python scripting feature that really caught my attention; would be very useful to be able to hack away at a dataset using the service, then transfer the script to microETL’s PyScript to adjust and integrate with Excel and SQL.

The demo video and test datasets give a good overview of the tool but the proof of the pudding is in trying out some real world dirty data; I chose a fine example of the art of Freedom Of Information datasets, issued by a Republic of Ireland government department. As an example of how not to do something (unless your intention is to make the recipient regret asking for the FOI in the first place) this is excellent. (I suppose we should be grateful it’s in Excel not Word or PDF or even PowerPoint). You can download it here http://bit.ly/Ireland_FOI_example (the data as released is in the FOI sheet).

As I said, Data Wrangler is promising, but needs some more work (to be fair, the group warns it’s a work in progress). The tool choked on the FOI dataset, too many columns I think, so not ready for the real world yet but I’ll be keeping an eye on its progress. Don’t let my experience put you off, it looks more than capable of handling smaller but still quite messy datasets.

If you’ve downloaded my example workbook, you’ll see how I managed to cleanse the data using microETL’s Python & SQL scripting functionality (the PyScript is in the Python sheet, with the SQLScript in the Control sheet). I could have cleansed the data using pure Excel and some VBA  and perhaps I would have if this was a format requiring parsing on a regular basis; I could then save the transformation as a single file macro-enabled workbook, ideal for sharing, no need for add-ins etc. But it was a once-off, and even if it wasn’t, it’s quite likely the format supplied in answer to a subsequent FOI request would be different. This is the sort of work that microETL’s Python & SQL scripting is designed for; quick and dirty data wrangling, but with the ability ro persist, and modify the resulting transformations if so required.

If you wish to try out this example, there’s a new version of microETL (Alpha1.08) available for download. You’ll notice a new folder structure (the usual sub-folders are now under a single sub-folder call microETL) to make installation of the add-in somewhat neater; and there’s also a setup.xls that’ll do all the hard work of installing (and un-installing) the microETL add-in. Note: you still need to manually install Python 2.7 to enable the PyScript’ing functionality.

If you need help with your Excel, ETL or  data cleansing tasks, I can help.

PowerPivot Powered Budgets & Targets

Last week I used PowerPivot for the first time in a classic budgeting scenario. The existing, Excel based, system was straining at the seams due to recent changes in the organisation (merger, followed by lots of changes, resulting in the replacement of long-established reporting and budgeting hierarchies).

The budget process followed the, more or less, classic lines of using adjusted previous years figures to create high-level targets for coming years, agreeing those targets with various interests and then “driving” these high-level targets down various hierarchies to inform front-line staff what would be expected of them in the years to come.

PowerPivot did a marvellous job of providing the necessary figures required to set the initial budgets and to help inform those who must agree them. This involved lots of moving-annual-totals, percentage-increases and so on. A dream compared to the purely excel methods employed in the past.

The “driving down” logic was however kept outside PowerPivot, as the hierarchies involved were unbalanced in the main and the rules were complex but already existed and worked (and probably more importantly were understood and agreed by the various “interested parties”) in Excel. All that was required was the creation of the numerous “flattened cross-join tables” to support the existing logic. The various SQLite “hierarchy helpers” I detailed in my previous Handling Flat, Parent-Child and Nested Set Hierarchies post did most of the heavy lifting. As the process involved the “cross-join” of  hierarchies at various levels I used a great deal of “nested set” SQL to achieve the required result; simple enough, but did become tedious and made the resulting logic somewhat un-approachable for those with limited SQL. What I needed was another “helper function”. The CROSS_JOIN_HIER function was born!

FUNCTION: CROSS_JOIN_HIER

First argument is the hierarchy table name (see previous post for description of this table). The 2nd is the output table name. This table must already exist and have at least the same number of columns as the number of subsequent arguments.

The remaining arguments specify the source of the tables (single column lists) to cross-join in order to populate the output table. The arguments can be of three types.

  • The name of an existing table. This table must consist of single column named “Name”. This allows for complex (or perhaps, simple lists in a different sort order) to be generated outside the function call.
  • A request for a list of nodes from some level within a hierarchy. Such requests can consist of one of the following:
    • Integer between 0 and 99. If the column name associated with this argument (e.g. arg 1 implies column 1 of output table, arg 2 column 2 etc…) is the same name as a valid hierarchy, fetch all nodes at that level (e.g. if 0, then fetch top level nodes, if 1, 2nd level etc..). Allows for up to 99 levels (i.e max value  98). Level 99 is special, it’ll return all “leaf nodes” i.e. those nodes with no children.
    • aNode,(+ or -)integerValue e.g. “Beer,+2″, in this case fetch all nodes 2 levels below Beer in the hierarchy corresponding with column name. The value before the comma must be a valid node name (assumes names are unique within hierarchies). A value after the comma of “-1″ will fetch parent node, a value of “-2″ will fetch grandparent. In both “+” and “-” if the level to navigate down/up is greater than the levels available, the last available level is returned.
    • In both cases above, the name of the column can be over-ridden by prepending the hierarchy name such that: “Product,Beer,+1″ will drive down 1 level from “Beer” in the Product hierarchy. And, “Product,99″ will return a leaf nodes of the Product hierarchy.
  • A comma separated list to create a “manual” level. For example, “Budget,Actual”, “2010,2011,2012″ or “Beer”

The resulting single column tables are then cross-joined with the resulting multi-column table (a column for each “source” argument) which is used to populate the output table.

This is a “stored procedure” like function, so should be called using “Select function(arg1,arg2 …);” syntax.

Example: click for larger version.

Download microETL from http://www.gobansaor.com/microetl and unpack to a folder, locate alberto_hier workbook (2007/2010 format) and go to the makeBudgetTable sheet.

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

PowerPivot & The Parable of the Snow Sock

For those of us in North West Europe this winter has been one of the coldest on record; not only cold but early, in this part of the world winter usually only kicks in after Christmas. The beauty of a perfect White Christmas Day did make up for the previous month of extreme cold and inconvenience but only just; and would not have at all if I had not invested in a simple piece of relatively new technology; a pair of snow socks.

For it was this pair of snow socks that enabled our household to continue to get to work and school, to Christmas shop, visit a sick relative and drive on Christmas Day to the winter wonderland that was a frozen Blessington Lake.

As the cold spell continued the main roads (and eventually the motorways) became hazardous to travel on; minor roads and housing estates became close to impassable for many normal cars. As the ability of the county council to keep roads functioning decreased (due to an early winter catching it on the hop with insufficient salt stocks and no Plan B) so my family and our neighbours realised than self-help was our best bet. People started investing in snow shovels, sand and salt, and hiring JCBs; gritting hills, clearing school and community hall carparks, helping dig out neighbours’ and strangers’ cars. And I invested in a pair of Weissenfels WeissSock Snow Socks (from http://www.micksgarage.ie/tyre-snow-socks.aspx).

Snow chains or even winter tyres are not seen as necessary by most drivers in Ireland or the UK; our winters (including this and last year’s nearly as cold one) are not that severe. Most are unaware of the alternative, the snow sock, a relatively new concept; in effect, a fabric snow chain. A fraction of the cost of winter tyres; easier to use (and nearly as effective) as snow chains.

Having a pair meant we had our own self-service road treating device, insuring we could safely get to the local town or nearest motorway and  when the M7 motorway was backed-up being able to use the old un-treated N7 road to bypass it.

PowerPivot is to Excel what a Snow Sock is to a car in a snowy winter. The ideas and a lot of the technology used in both products have been around for some time, but the accessibility, low-cost and “good enough” packaging are game changers.

Like our neighbourhood waiting on an over-stretched, under-funded county council to solve our “last mile” problems, countless small companies, departments and individuals in large organisations are likewise awaiting their IT Godots often in vain and if delivered, often too little and/or too late.

Two technologies that I’ve been introduced to in the last 12 months:

both liberating, both having the ability to make aspects of my life much better and much easier. I expect to see a lot more people adopt both in the year to come.

So if you’re stuck for some New Years’s resolutions this year:

  • Buy some snow socks now; i.e. do NOT wait until the next “snow event”!
  • Learn PowerPivot.

Happy New Year ;)

 

UPDATE: Looks like Marco & Alberto are doing one of their PowerPivot courses in Dublin this March, so no excuses for not learning PowerPivot http://sqlblog.com/blogs/marco_russo/archive/2011/02/15/new-powerpivot-workshop-dates-copenhagen-dublin-and-zurich.aspx

Cloudy skies, cloudy apps…

Just back from a break in Clifden, Connemara, summer is nearly over, the kids return to school today, back to work.

Aasleagh Falls, Co. Mayo

Aasleagh Falls, Co. Mayo

Counties Galway and Mayo were like the rest of the country last week, a tad wet, but unlike the developed east of the island, flooding was not a problem; a problematic drainage area is called a lake in the west.

This August has been the wettest and dullest I’ve ever experienced but at least I saw some sunshine earlier in the month thanks to Kristian Raue CEO of Jedox who kindly invited me to visit the company’s offices in Freiburg, Germany.  Freiburg is very green in both senses of the word, surrounded as it is by the Black Forest and its well deserved “eco-city” status.  Its also know as the warmest city in Germany, a reputation it thankfully lived up for this visitor from a rain-soaked Atlantic isle.

August morning, Frieburg Im Breisgau

August morning, Freiburg im Breisgau

If Freiburg left a positive impression on my mind, so too did Jedox.  The overall impression is of a company which intends to use a combination of quality, vision and the judicious use of open-source to build the Jedox brand into one associated with best-of-breed products and consultancy.  This vision can be seen in the evolution of Palo, from its “good enough” beginnings to its current near-best-of-breed 2.5 version, and from talking to some of those working on the product, best-of-breed status is not that far off.

Likewise, ETL-Server which is currently a Palo only “loader”, is to be further  developed into a true ETL tool, while continuing to offer MOLAP-centric specialisms.

I also got a glimpse of the next version of Worksheet Server. “Wow!”, is all I can say.

Existing web based spreadsheet products are fine for simple data analysis or basic data capture purposes but cannot compete with their client-based elder cousins when serious datasmithing is required.  Well, from the demo I saw of Worksheet Server in action, that’s about to change.  The look and, more importantly, the feel is similar to that of traditional spreadsheets, its interface with Palo is identical to that of the existing Excel add-in, and here’s the big one, its open source!  Game-changing or what?

But …

That might enable me to move a lot of my spreadsheet applications to the cloud, but what about those applications that are more suited to an MS Access type solution?

Then try out WaveMaker. It’s open source and built on industry standards, Hibernate,Spring and the Javascript Dojo framework but has the ease of GUI database development more usually associated with MS tools. The resulting applications are packaged as a WAR file which can be hosted by any standards based Java server (e.g. Tomcat or Jetty).  The latest version makes developing Ajax-fronted database applications even easier with the addition of layout templates.  Its existing ability to automatically bind interfaces to SOAP web services has been extended to REST web services by means of a new WSDL auto-discover tool.  And Chris Keene CEO of WaveMaker also informs me that …

We are also releasing a cloud-based IDE in October with Amazon – stay tuned…

We launched in February and will be announcing our first 7 figure deal this month. We run on Mac, Linux and Windows and are currently the #1 developer download on Apple.com (http://www.apple.com/downloads/macosx/development_tools/)

Our goal is to make it easy to build rich internet applications without complex coding – kind of a MS Access for the Web.

Jedox and Wavemaker the new breed of open-source businesses

Nollaig Shona Daoibh

 

It’s December 21st, the shortest day of the year; either the middle of winter or the start, depending on your view (here in Ireland,winter really only kicks in from mid-December onwards). The lucky annual lottery winners had a fantastic clear frosty morning to witness the solstice dawn in Newgrange, but the view I experienced from the hill fort in Glending, Co. Wicklow this morning was just as rewarding.

Merry Christmas to you all.

Update: My sister (the teacher) informs me that the modern spelling for Nollaig Shona Dhaoibh (i.e. Merry Christmas to you (ye)) is Nollaig Shona Daoibh, well you learn something new every day!