Category Archives: news

Clouds no longer pass by Windows.

Amazon today announced that later this year, Windows Server woud be available on EC2. No details on cost and licensing etc. but this is major.  Up until now, that portion of the business world who are pure MS shops (a very large percentage especially amongst SMEs) were excluded from taking advantage of Amazon’s amazing (and getting more amazing everyday) EC2 platform

From my point of view, as with Oracle’s announcement last week, this releases yet more of my “legacy” skillset for deployment in the clouds. Although I’ve been involved with  *nix servers for 20 years or so, as corporate servers became more locked-down (and removed to the control of 3rd party data centres) I lost day-to-day experience of using them; in latter years my main ‘hands-on’ platform was Windows, either my own PC or local departmental NT servers. Windows on EC2 will allow me to use a whole new set of Windows only software (e.g. RSSBus or XLsgen) and of course SQLServer.

The lack of SQLServer on EC2 has been a major problem for me as a datasmith; there’s an awful lot of data out there sitting in SQLServer databases, but currently if I need to “cloud burst” such datasets I would have to first extract the data to, say, csv files and then load the data on to a Linux compatible database. But with a SQLServer instance running in the cloud, I could simply use SQLServer’s native backup/replication tools.  No more need to download data to my “ground-based” PCs resulting in quicker turnaround and fewer data security risks.

On the licensing front,  I’m presuming that the OS licence will be on a pay-as-you-go basis, but what about SQLServer and other server products?  Will MS do an Oracle on it, i.e. require a traditional upfront use-it-or-lose-it payment or will they the go the radical (but I thing inevitable) path of a licence-by-the-hour. 

First RedHat, then Sun, then Oracle and now Microsoft; the mighty beasts of our industry have acknowledged there’s a new mighty beast on the prowl, dressed as a humble bookseller no less!

Cloudy skies, cloudy apps…

Just back from a break in Clifden, Connemara, summer is nearly over, the kids return to school today, back to work.

Aasleagh Falls, Co. Mayo

Aasleagh Falls, Co. Mayo

Counties Galway and Mayo were like the rest of the country last week, a tad wet, but unlike the developed east of the island, flooding was not a problem; a problematic drainage area is called a lake in the west.

This August has been the wettest and dullest I’ve ever experienced but at least I saw some sunshine earlier in the month thanks to Kristian Raue CEO of Jedox who kindly invited me to visit the company’s offices in Freiburg, Germany.  Freiburg is very green in both senses of the word, surrounded as it is by the Black Forest and its well deserved “eco-city” status.  Its also know as the warmest city in Germany, a reputation it thankfully lived up for this visitor from a rain-soaked Atlantic isle.

August morning, Frieburg Im Breisgau

August morning, Freiburg im Breisgau

If Freburg left a positive impression on my mind, so too did Jedox.  The overall impression is of a company which intends to use a combination of quality, vision and the judicious use of open-source to build the Jedox brand into one associated with best-of-breed products and consultancy.  This vision can be seen in the evolution of Palo, from its “good enough” beginnings to its current near-best-of-breed 2.5 version, and from talking to some of those working on the product, best-of-breed status is not that far off.

Likewise, ETL-Server which is currently a Palo only “loader”, is to be further  developed into a true ETL tool, while continuing to offer MOLAP-centric specialisms.

I also got a glimpse of the next version of Worksheet Server. “Wow!”, is all I can say.

Existing web based spreadsheet products are fine for simple data analysis or basic data capture purposes but cannot compete with their client-based elder cousins when serious datasmithing is required.  Well, from the demo I saw of Worksheet Server in action, that’s about to change.  The look and, more importantly, the feel is similar to that of traditional spreadsheets, its interface with Palo is identical to that of the existing Excel add-in, and here’s the big one, its open source!  Game-changing or what?

But …

That might enable me to move a lot of my spreadsheet applications to the cloud, but what about those applications that are more suited to an MS Access type solution?

Then try out WaveMaker. It’s open source and built on industry standards, Hibernate,Spring and the Javascript Dojo framework but has the ease of GUI database development more usually associated with MS tools. The resulting applications are packaged as a WAR file which can be hosted by any standards based Java server (e.g. Tomcat or Jetty).  The latest version makes developing Ajax-fronted database applications even easier with the addition of layout templates.  Its existing ability to automatically bind interfaces to SOAP web services has been extended to REST web services by means of a new WSDL auto-discover tool.  And Chris Keene CEO of WaveMaker also informs me that …

We are also releasing a cloud-based IDE in October with Amazon – stay tuned…

We launched in February and will be announcing our first 7 figure deal this month. We run on Mac, Linux and Windows and are currently the #1 developer download on Apple.com (http://www.apple.com/downloads/macosx/development_tools/)

Our goal is to make it easy to build rich internet applications without complex coding – kind of a MS Access for the Web.

Jedox and Wavemaker the new breed of open-source businesses

Amazon’s SAN in the cloud is a mirage…

This morning I got very excited.  While quickly scanning the headlines of the 1000+ unread feeds that had accumulated in my Google Reader this week, one heading in particular caught my attention, “Amazon Elastic Block Store goes live!“.

The post from the Right Scale folks gives a detailed overview of the new  Amazon ‘SAN storage in the cloud’ service, aka Elastic Block Store, aka EBS.  Alas, this particular cloud offering was a mirage, the post was subsequently removed (but can still be viewed on Robert Scoble’s Shared Items) it seems the post was a work-in-progress and not intended for publishing, yet!

Why was I so excited?  Amazon EC2 had two major shortcomings when it launched 2 or so years ago; the first, ephemeral IP addresses, was solved by the new Elastic IP feature; the second, ephemeral storage volumes (when you shutdown an instance the disks are wiped!) is due to be solved by EBS.  With both of these problems solved, EC2, already near perfect, would be perfect.

The article does a good job of explaining the new service…

EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at anytime to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot.

The thing that caught my eye in the above paragraph was the snapshot facility.  Snapshots are to be stored on S3 via an EC2-specific incremental-snapshot API.  This means the volumes will come with a built-in back-up facility. This is important as EBS drives reside in one availability zone (that of the instance that they are mounted against) and do not have the data replication security offered by S3.  It also means that disk systems can be restored quickly and simply from snapshots without the overhead  (and bugs!) of writing an S3 specific incremental backup and restore utility.

Back to waiting…

UPDATE: 20th August

Wait over…

New universal SQLite JDBC library.

Both Talend (Java) and Kettle distribute the Zentus.com pure-Java SQLite JDBC driver and for most purposes this run-anywhere version is fine. But, if you really need to take advantage of SQLite’s speed then connecting using the native JNI version is a must.  Doing this was easy enough, just change over to using a generic JDBC connection specifying the required native jar and placing the associated dll/so on your system path.

But now there’s an easier way, the latest version (V052, in fact from V050 on) is a universal jar, it contains native JNI libraries for Windows, Linux and MacOS alongside the pure-Java version.  It will automatically pick the correct lib for the platform and fall back to the pure-Java version if required.  You can tell if it’s picked up the native lib by calling conn.getDriverVersion(); it’ll return “native” if it has.

To upgrade to this jar in Kettle see this, this time replacing the nested jar with sqlitejdbc-v052.jar.

For Talend:

  • Either rename the new V052 jar to sqlitejdbc_v037_nested.jar, replace the existing V037 jar in the ../lib/java folder with this new renamed file.
  • Or, you could edit the Java specific XML files in the various tSQlite component folders, replacing the references to the old nested V037 jar.
  • Or, and this is what I would do, don’t use the tSQLite components, replace them with tJDBC generic components, then you can pick whatever version of the driver you require, you could even change to a different database provider!

The Talend tradition of a separate set of components for each type of database, seems to be a hangover from its Perl-generating roots. It’s true that database specific components are required for certaing tasks such as  bulk-loading, ELTs and so on, but JDBC was designed to be generic and as long as the SQL syntax is compatible, it makes switching in an out database providers very easy.  So unless there’s a good reason, stick to using tJDBC.

Amazon S3; there’s a holdup on the buckets, Dear Liza…

Amazon’s S3 service has been down since 9.00am PDT but I only noticed an hour ago (2.30pm PDT) when a EC2 instance launch failed.

Am I worried? No, but as I become more and more dependent on such services, perhaps I will, but then again at least I’ll not be alone.  WordPress.com and countless others will be using the same excuse to their customers and unlike Renginald Perrin who had a different excuse every day for his train’s late arrival…

Ep.1   “Eleven minutes late, staff difficulties, Hampton Wick.”
Ep.1   “Eleven minutes late, signal failure at Vauxhall.”
Ep.1   “Eleven minutes late, staff shortages, Nine Elms.”
Ep.1   “Eleven minutes late, derailment of container truck, Raynes Park.”
Ep.1   “Eleven minutes late, seasonal manpower shortages, Clapham Junction.”
Ep.2   “Eleven minutes late, defective junction box, New Malden.”
Ep.4   “Eleven minutes late, overheated axle at Berrylands.”
Ep.4   “Eleven minutes late, defective axle at Wandsworth.”
Ep.5   “Eleven minutes late, somebody had stolen the lines at Surbiton.”

a whole industry will shout in unison “6 hours late (and counting), overheated axle on US Buckets…”

Python the new VBA ?

These last two weeks, Python has been on my mind. First off, last week I decided to make time to fully investigate Picalo, an open-source Python-based data analysis tool, and then, this week, Google announced their long awaited cloud-computing offering, Google Apps Engine, with the language at its core.

Python was the first of the “LAMP generation” scripting languages that I decided to learn in any detail ( I had used Perl before that but only on a per-task basis (similar to how I’d used AWK)). I then invested time in learning PHP, then Ruby and finally JavaScript. And here I am, back where I started, with Python.

But it’s not the same Python I learned three years ago, not that it has changed that much, but my appreciation of the language has, largely due to my deep dives into other languages. For example, JavaScript’s treatment of functions as first-class objects, highlighted the same functionality in Python, something I’d missed (or rather, not fully understood) the first time I encountered the language. Likewise, Ruby’s RoR introduced me to a “best of breed” approach to web application design, something that can be used as a comparison aid when approaching new web frameworks such as Django.

But of course the scripting language that continues to power most of my datasmithing activities is Excel VBA. That’s why I was so excited to see a tool such as Proto utilise VBA as its scripting language. But, Microsoft has abandoned VBA, there will be no more Protos.

Also, Excel VBA is now a Windows only language. Windows, however, is no longer the ‘only’ business client OS (see how many Apple laptops you can spot the next time you’re in a business-class airport lounge, a few years ago it would have been zero, not any more), and is currently nowhere to be seen as a cloud computing platform (but that’ll change).

I’m at heart a table-oriented programmer, and I, like Picalo’s author Conan Albrecht, believe “data analysis is best done through scripting”; but not just data analysis, the T in ETL (Extract, Transform and Load) and the I in DI (Data Integration) and SI (Systems Interfacing) also benefit from a scripting approach.

So, what to adopt as a successor/companion-in-her-old-age to VBA, will it be Ruby, JavaScript, Python, Perl, even PHP?

It looks like it’ll be Python because it’s …

The runner up is of course Ruby, but its poor integration with Windows is a major problem and the datasmithing “prior art” of Picalo and Resolver makes Python hard to beat.

UPDATE Jan 2010:

To experience the best of both worlds, VBA & Python, my xLite (Excel combined with SQLite) datasmithing platform now allows Python to be used in conjunction with VBA.  Check it out here http://www.gobansaor.com/xlite

UPDATE:

Also, as Dan pointed out in the comments below, I’d not included Jython in my list of reasons for embracing Python. I must add it to my list of things to try out particularly as both my “classic” ETL tools, Talend and Kettle are JVM based.

Another thing to add to the (ever growing) list is Mike Pitarro’s SnapLogic python-based ETL tool. They have …

…just released a 2.0 Beta version with some major architectural enhancements. The SnapLogic model is very different from traditional ETL systems. It takes an approach that’s more like the web, based on loose coupling and HTTP interactions. We model data source, sinks, and transformations as URI addressable endpoints, and have a model where than can be chained together in pipelines to build transformation logic. We use a plugin architecture to make it easy to add custom components.

SimpleDB + S3 = distributed document-centric database

I’m a database man. I’ve worked on or about most variations on the theme, from roll-your-own flat files, to hierarchical, to CODASYL network databases, to the current crop of relational and MOLAP platforms. Of late, I’ve being investigating what I think will be the future of database technology, the distributed document-centric database. Today, the future arrived in the form of Amazon’s new SimpleDB service.

Up until now Amazon’s S3 service offered one half of the future platform the “distributed document-centric” bit but it lacked the indexed structure part to make it a true database; but in combination with SimpleDB it’s now complete.

SimpleDB stores data in a Domain/Attribute schema-less and type-less structure having more in common with a spreadsheet than a traditional relational table. If you’ve worked with the likes of SQLite (manifest typing) or Excel (no predefined schema and manifest typing) then you’ll appreciate this is no hardship, quite the opposite in fact (I find the strong typing nature of most databases a real pain having worked recently on a SQLite combined with Excel project).

The distributed nature of SimpleDB may however pose some difficulty to those of us (i.e. almost everybody) raised in the world of ACID compliant databases. Because of the Brewer’s Conjecture effect, SimpleDB sacrifices consistency for availability and partition tolerance i.e. when you write something to the database, an immediate query may not return the updated value, subsequent queries will eventually return the new data, exactly when depends on the load and the availability of resources. Those of you already using S3 will already be living with this “feature”, and in practice you rarely notice it (most updates seem to appear immediately) but it will still pose design challenges to handle the edge cases.

The service is still in limited Beta, but the documentation is available and if you already used any other AWS product you’ll immediately feel at home. The pricing is again based on usage, the cost of storage is much higher than S3, being $1.50 per GB-month, but a GB of structured data is an awful lot of data (and the larger document style storage would be provided by S3).

If you’ve not yet tried out either S3 or EC2, now might be a good time to start, cloud computing has come down to earth, all thanks to an online book store, Amazon!

Zimki – the spirt lives on …

Although Zimki is to shut down on Christmas Eve, the ideas behind the service live on. Two new offerings, Horuku and AppJet, offer variations on the idea of hosted application development/deployment.

AppJet, funded by Paul Graham‘s Y-Combinator, is very similar to Zimki, being a server-side JavaScript platform. No details yet as to what sort of paid options will be offered (all accounts are free at the moment). Unlike Zimki there’s no plans to create an open-source version. I like the easy “build a Facebook app” feature; and I guess this is the sort of light-weight applications that they hope to attract.

Although Heroku uses Ruby-on-Rails technology, rather than JavaScript, it is closer to the original Zimki idea; but rather than take the hard (and ultimately unsuccessful in Zimki’s case) road of building an open-source platform from scratch, Heroku takes an already popular open-source project and offers it wrapped in a full on-line development and deployment environment. Again, being in beta, there’s no indication as to what pricing model it will operate under, but I would think that it will attract more “serious” projects than AppJet since anything developed under Heroku is pure Rails which means it can be migrated to any other Rails hosting environment; so no lock-in. The online editor is excellent and whatever about its merits as a hosting service it’s by far the easiest way to learn and explore Ruby and Rails, even easier than this…

If Facebook apps are your goal but you wish to use Ruby rather than AppJet’s JavaScript then not to panic, as being Ruby some bright young spark (no, not me I’m afraid) will already have done a lot of the hard graft for you…

You say 100,000 I say 65,535! Let’s call the whole thing off!

According to this Google groups thread, Excel 2007 has a serious bug. Certain calculations (e.g. =850*77.1) that should yield 65535 are being rendered by Excel 2007 as 100,000. Brilliant, bloody brilliant!

I’ve been a fan of 2007 especially the new table handling features and the ability to handle more than 65536 rows, these are particularly useful for someone like myself who uses Excel as an ETL and data cleansing tool. Unlike many others, the new ribbon UI doesn’t bother me, it’s a slight annoyance, but within a day or so I’d mastered it. In fact, my wife who’s trained first time Excel users in both 2003 and 2007 reported back that novices found the new UI much easier to master.

But returning the wrong answer! “Well that beats Banagher”. I’ll not be recommending any client of mine to upgrade to Office 2007, until this is fixed.

If you don’t have a copy of Excel 2007, you can try it out using the Office 2007 online “test drive”, bugs and all.

Below is a screen shot of the bug in action using the online test drive version. Column A is set to =850*77.1, Column B uses =5.1*12850 and Column C is set to =100000. The first two should yield 65535 but all three display as 100,000! If you SUM() column A you get 196605 (see A7, which is the correct answer for 65535+65535+65535). But if you AVG() the affected cells you get 100,000, see B7. Also, see the various results when an affected cell is used in other calculations (E5 to H5).

Nirvanix targets Amazon S3 shortcomings

Let there be no doubt about it, Amazon’s S3 online storage system is wonderful; it’s secure (both from an technology point of view and from Amazon’s status as one of the web’s most trusted sites i.e. one you wouldn’t worry about giving your credit card to), it’s cheap, it’s pay-as-you-go and it has first mover advantage, but (there’s always a but) it has until now lacked competition. And because it lacked competition the various shortcomings (such as no support for HTTP POST file upload, no SLAs etc.) that S3 users complain about are handled by Amazon in what can best be described as ..

..we hear what you’re saying, we have it on a list; no, we’ll not tell if/when we’ll remedy this problem (or explain why it’s not possible to do so); and anyway if you don’t like it, who else provides anything comparable?

Okay, I’m being unfair here, I’m sure Amazon has very good reasons for how they do things and scalability and “keeping it simple” seem to be their development mantra; and this is a good thing for an online 24/7 storage infrastructure. But, as in all things in life, competition would help not just disillusioned users by offering another comparable service but would help Amazon prioritise items on its S3 roadmap.

Most would have assumed that when that competitor arrived it would either be Google or Microsoft, instead the first up to bat is Nirvanix, a San Diego startup which appears to be associated with another online storage player, MediaMax. Pricing is similar to S3, but with the option of purchasing extra SLA backed support packages, something that has been top of the list for many actual and potential S3 users. Other “missings” that Nirvanix addresses are;

  • File upload via HTTP POST, S3 restricts upload to HTTP PUTs which requires the use of a proxy server or the installation of client software.
  • File rename and move, S3 requires that a file is first deleted and then reloaded.
  • In-built support for media processing such as image resize/rotate for thumbnails.
  • Multi-tenant accounts, each S3 account supports only a single ‘user view’.
  • Files are indexed via tags and name, not just by name as is the case with S3.
  • Granular control of usage limits and reporting, S3 only offers ‘after-the-fact’ reporting.
  • Maximum file size of 256Gb compared to Amazon’s 5Gb.

The Nirvanix authentication method uses a much simpler and more traditional username/password over SLL approach than S3′s key-pair based URL signing method. This can be seen as either a weakness or a strength, but combined with Nirvanix’s support for POST file uploads, multi-tenant accounts and granular usage controls it makes building browser based clients much simpler.

S3′s industrial grade authentication is all fine and dandy but if the key becomes compromised, all’s lost, you could expose not just your data but your wallet if somebody used the compromised key to maliciously upload Terabytes of data. This single point of failure is perhaps my main complaint of S3′s current set-up.

So, am I getting ready to jump ship, no, at least not yet, as;

  • Amazon is still Amazon, they may be lacking SLAs but they have my trust.
  • S3′s role as a back-end to Amazon Ec2.
  • Friendly and effective forums offering excellent support provided by both the developer community and Amazon’s own staff.
  • CNAME support. (e.g. http://www2.gobansaor.com/)
  • Did I mention Ec2?

Should Amazon be worried? No, this is not a zero-sum game, in fact competition will help grow awareness and expand the market for all “cloud” based services.

Proto – desktop BI tool.

I see that Proto have repositioned their excellent VBA scripted mash-up product as a “desktop business intelligence system”. This is to be welcomed as the first time I used it I described it as a “mash-up tool for adults” and although it has the ability to play hard ball with the other Web2.0 mash-up kids, its DNA is firmly within the business world, more BI2.0 than Web2.0. It is in effect, a “reporting tool on steroids”.

The “free for non-commercial purposes” product is no more, but you can still download a 30-day free trial and the really good news is the new price of $300 for the Proto Individual product.

Also, another business-oriented mash-up tool, RSSBus, is partially out of beta, RSSBus Server RC1 is now available. Still no information on final pricing for the server product. A new Beta 5 release of the free desktop edition is also available for download. As I’ve said before, RSSBus Server front-ended by Proto would be a very powerful combination.

10,000 hits …

Sometime during August I recorded my 10,000th hit on this blog. OK, that doesn’t put me in the A-list ( more like the does-anybody-know-what-comes-after-Z-list) but it’s a start. I started the blog in February 2006 as a destination for my del.icio.us feed auto-posts, but my first real post wasn’t until December 2006. Mind you, my 11 year old son, who only started blogging this April, has already passed his 20,000th 30,000th hit!

So, a big thank-you to you, the readers; especially those who’ve commented and even more especially, those of you who have done me the honour of subscribing to the Gobán Saor feed via RSS. (Not that I know how many of you have subscribed, as WordPress.com no longer provides feed reader stats. :-( )

Thanks again,

Tom Gleeson – An Gobán Saor

August 25th, 2007.

UPDATE:  Jan 28th 2008

Today I recorded my 20,000th hit, and am getting a lot of queries through my Contact Me page.  Hits took a dip after I changed over to using my own sub-domain (http://blog.gobansaor.com) but they’ve now well recovered. Mind you, my son is still racing ahead of me, he passed the 100,000th mark over Christmas!!!

Google Apps not just for SMEs?

The relentless positioning of Google Apps as an alternative to MS Office continues.  Google has just announced the acquisition of Postini an on-demand hosted provider of secure communications (EMail and IM) for large corporate clients.  The use of hosted email and document storage solution is a no-brainer for small  businesses but compliance and data security worries hold back large companies from taking advantage of the cost benefits of  Google Apps (although some large institutions like Trinity College Dublin have made the leap).   I guess when Google incorporates Postini technology we’ll see a third Apps edition added to the existing standard and premium options, this time targeting the needs of large enterprises.

CRM – How not to do it …

Having been through a lost luggage experience myself in the past I can understand Damein Mullvey’s frustration and anger.

Sky Handling Partners “handling” of the issue would make a good example of how not to deal with an irate customer when that customer is also a prominent blogger. And now this ….

Zimki – goes off the boil.

Looks like any further enhancement of Zimki is to be put on hold until Fotango‘s parent company, Canon Europe, completes a review on the future direction (and viability?) of the hosted application market. This means the platform will not be open sourced in the near future. This and the lack of any sign of a Zimki user community probably means I’ll abandon the platform; with some regret I might add as the infrastructure is rock solid and professionally managed, and the JavaScript based development environment is both productive and fun to work on. But the feeling that one is backing the wrong horse is hard to shake, particularly with the lack of any sign of developer mind share (i.e. no official forum, unofficial forum never took off and very few, if any, comments on the Zimki blog).

A time of war and a time of peace …

Yesterday for Northern Ireland is was “a time of peace”, after centuries of conflict, Planter and Gael agree to share power. Did we ever think we’d see the day? It was a long time coming; I was on my honeymoon in Northern Ireland (moored at Lock No.1 on the Shannon-Erne waterway) in 1994 when the first IRA ceasefire was announced, it took another 13 years to get to this situation.

I’m sure there’ll be problems a plenty in the coming years but it’s also clear there’s no turning back the clock, peace is here to stay.

History in the making ….

HacketyHack from WhyTheLuckyStiff

No it’s not a tip for the 5.50 at Punchestown it’s the latest project from _why (a legend in the world of Ruby, if a language as young as Ruby can have legends). HacketyHack is a framework to teach kids how to program, built using Ruby and the gekco browser engine, it’s free and it’s an ideal next step when your kids get a bit bored with MIT’s Scratch. It’s not as much “fun” as Scratch but it will guide kids into the “real world” of programming and as the code is pure Ruby, the sky’s the limit. If fact if any “real programmers” out there have been meaning to learn some Ruby to see what all the fuss is about, HackeyHack is an excellent way to start.

Talend ETL – A New Contender

Talend have released a new version of their Open Studio ETL tool. Not as full featured as Pentaho Kettle; only supports a limited number of databases and file formats – no SQLite support shock-horror! The press release promises More than 100 Native Connectors and promises connectors to ERP and CRM tools but I couldn’t find them (maybe they meant the ODBC support – well I guess I managed to connect the SQLite using this ODBC driver so maybe there’s an ODBC driver for SAP!). Compared to Kettle the design GUI runs much slower (built on an Eclipse platform, say no more).

But two things impressed…

Talend Open Studio is not an ETL engine as such, it’s a code generator. When I last looked at it, it generated Perl code but this release now also generates Java and not only that, it packages the resulting code ready to be deployed on any Windows or *nix platform, no Talend installation required. This could be an alternative to my Ruby/SQLite micro ETL idea, especially as SQLite support appears to be in the works.

The XML and CSV import components are excellent, especially the XML functionality. The Kettle equivalents work but they never felt like they provided a productivity gain and for me that’s what matters. Once I have data within either a database or an Excel environment I don’t need any other tools, it’s how well and how cost effectively an ETL tool handles the parsing, scheduling, provisioning, distribution and logging of external (usually XML,CSV and Excel) data that matters to me. Talend is now definitely a contender.

New Open Source OLAP

Cubulus is a Mondrian-like OLAP engine supporting a subset of MDX and offering an alternative way of organising fact tables using “hierarchial range clustering of keys” rather than the traditional star-schema approach. Written in Python, very much a pre-alpha release. Interesting but a bit too experimental for me this early on a Sunday morning; especially after my sister’s 60th birthday party last night in Roscrea. Happy birthday Ban.

Thanks to Chris Webb for the Cubulus link.

New software – Pentaho Kettle 2.5 RC1 and IMP:Palo

I’ve spend a few hours trying out the latest Kettle 2.5.0 RC1 release candidate, new UI and lots of new features. Looks like the PALO code developed by 3a-strategy will not make into this release, but I see Cubeware have released IMP:PALO cube loading software, offering both a free and a premium professional version. Looks promising; I’ve just installed the software (lots of form filling, recovering Emails from GMail’s spam bucket and an activation process to go through first – I’m exhausted!) so I haven’t managed to try it out yet. I hadn’t heard of this German company until today, anybody out there using IMP:Palo or any other Cubeware products?

UPDATE:
I managed to get IMP:Palo up and running; went through the tutorial in the Help file and I’m impressed. The tool is very Kettle like, better from a UI perspective in fact, but of course it’s not open source and I’m not sure if it’s free ware or “trial ware” as its not clear what happens when the temporary activation ends next December! But it makes building MOLAP cubes very easy indeed. I’ve included a link to a finished “import definition” file as the demo MS Access data supplied is in a bit of a mess or rather the Product table is (missing GroupIDs – you’ll need an LEFT OUTER JOIN in the Products Mapping – and missing Products). First rule of a demo, make it easy, make it fool proof, otherwise you risk loosing your audience, but I’m glad I persevered, this is a good product!

Download finished IMP:Palo demo.

One other thing, to get rid of the …

PALO Error: not authorized for operation: login error (Error -44) (palo_auth_a failed.) 

…error if you’re connecting to a localhost based Palo server,  remove (or comment out by means of a #) the line

user-login

in the c:\Program Files\Jedox\Palo\data\palo.ini file. Stop and restart the server.