Feed on
Posts
Comments
Oracle CorporationImage via Wikipedia

… not yet, but Bill Hodak from Oracle has just opened a thread over on the Amazon AWS developer forums, looking for feedback on the use of Oracle in AWS projects. First there was Red Hat, then this week’s announcement from Sun and now Oracle; has Amazon managed to turn itself into the cloud provisioner not just for the hungry masses of start-ups and independent developers but for the technology elites?

As for using Oracle on EC2, yes please. Most of my datasmithing career has been spent behind the wheel of an Oracle database, the front-ends might have been Excel or some BI package, the end results might have been SAP master data take-ons or an Essbase cube, but the blood and guts were always Oracle. And this was before Oracle Apex - think what wonders could have been achieved if I had access to such a product in the past.

When EC2 first appeared I enthusiastically installed Oracle 10g Express, using a Hamachi VPN to tunnel the Apex front-end back to my PC (don’t ever expose an Oracle 10g server to the public internet, its architects assumed it would be used solely within the corporate firewall). I even used the power of Oracle’s redo logs to partially protect against the ephemeral nature of EC2’s disk storage.

It looked to me back then that EC2 could be an ideal hosting environment for Oracle Application Express (aka Apex, aka HTML DB), but for a few wee problems:

  • It’s not absolutely clear whether the Oracle 10G Express database licence covers its use in a virtual environment (sometimes the restriction of one database per server is stated as one per machine), a few attempts to look for a definitive yeah or neigh on the product’s support forums elicited no response. I’m guessing its fair-usage, but confirmation would be nice.
  • Oracle doesn’t appear to know what to do with Apex, you get the impression they’re afraid it’ll cannibalise its lucrative J2EE business.
  • 10g Express is severely hobbled as a database, not just the 4GB per server (or is that machine), it’s lacking any sort of updating service, serious security flaws remain unpatched and username/passwords are sent in plain text; making it suitable (and then only barely) for use within a firewall or VPN.
  • Once you outgrow Express, you’re into big money and even worse you might have to talk to a sales rep!

So what would I like to see Oracle offering on EC2? A paid AMI, preloaded with a variation of Express, minus the 4GB limit, with a “hardened” public internet facade, along with regular patches automatically applied. Optional add-ons…

  • Various levels of support, fixed monthly charge perhaps.
  • Ability to upgrade to the full Enterprise Editions, but again paid for via a combination of AMI hourly charges and optional month-to-month support charges.
  • Ability to purchase once-off consultancy, both from Oracle and third-party suppliers.

I’m not holding my breath though…

Oh, if you’re confused over the various “Express” terms used in the above, don’t blame me, blame Oracle, I thing the poor branding profile (constant name changes, copy cat names) is an indication of Oracle’s lack of commitment to both products.

… well at least for me. As I discussed previously I’ve been seriously investigating using Python as my primary datasmithing scripting language, in effect a new VBA. I also currently use VBA’s compiled cousin, VB6, for certain tasks such as building Excel RTD servers. The problem with VB6 is it depends on Visual Studio 6, which is no longer supported by MS and is increasingly next-to-impossible to purchase. I have a copy which I picked up at a charity auction (for €50, which also included, Windows 2000, Visio and Office 2000 professional!) but I am aware any code I develop in VB6 is in effect tending towards”closed source” as far as many others are concerned as VS6 continues its journey into history. (I also use Visual C/C++ but such code is future-proof as the latest versions of VS continue to support C/C++ unaltered).

So what are my alternatives:

  • Do nothing, continue to use VS6 (I’ve already made an “off-site” copy in case the house burns down!).
  • Use .NET, ExcelDNA is an easy way to integrate Excel with .NET, but it doesn’t handle the creation of COM servers. In general, accessing COM from .NET is a total PITA.
  • Use Python.

I’m going down the Python route, especially now that I’ve figured out how to create in-process (DLL) COM servers (via pyInstaller) and how to manage sub-processes (via import subprocess). Not just because it’s future-proof, but because:

  • It’s easy to build against and to manipulate COM interfaces, nearly as easy as VB6, much, much easier than the .NET alternatives.
  • I can then use the same platform to handle general datasmithing (on Windows, MacOS and Linux), and do Excel integration “stuff” and also web front-ends (via Google Apps Engine).
  • I really like Python and I find I’m much more productive with it than with any other language I have ever used (with the possible exception of MUMPS, but otherwise “Where were you Python, when I were a lad, down coding pit, slaving over hot Cobol …”).

Enough work, the sun is shining; the chiffchaff is, well, chiff-chaffing; spring has sprung, I’m off out to the garden…

Update:

As Prashant pointed out in the comments below, http://vb2py.sourceforge.net/ looks like a very useful tool …

Jedox have just released V1.0 of their Palo-centric ETL Server. I had been looking forward to this, not so much for its ETL ability (which is somewhat limited when compared to the likes of Pentaho PDI or Talend) but for the drill-through capability it would add to Palo. Alas, there’s a catch, you must purchase Palo Supervision Server (€8,000) to enable the Excel add-in to avail of this feature!

The thing that attracted me to Palo in the first place was its simplicity of approach and the primacy of Excel as the end-user view of the product, a modern day ESSBase. The fact that the Excel Add-in is closed source always worried me, as I felt that it would inhibit the thing that really sets open source apart (no not the cost) ,the formation of an active and innovative developer community. The sort of developers who have a need for, and an interest in, MOLAP tools tend to be more familiar with VBA, .NET, SQL, SAP Config etc. than with C/C++ or even Java development. The one area where such a community could add value, the Excel front-end, is closed to them. And I know there’s some non-Jedox involvement in the form of JPalo and the OO-Calc Add-in, but Excel is the key to Palo’s wide-spread adoption.

Also, the choice of .NET as the main client-side development platform was a mistake in my opinion, a VBA accessible object model would have been much more useful, it would also have removed the need for the current painful installation process.

This partial “open-source” model and the increasing complexity of the platform makes Palo, at least for me, a less attractive Micro-BI option. And remember, Excel already has a very powerful in-memory OLAP tool, the humble PivotTable, which is in most cases “good enough” for most analytical needs. So why use Palo rather than a PivotTable?

Palo’s advantages:

  • Can handle very large data sets (limited by the free memory available to the server).
  • Allows write-back and splash-down, both very useful for planning/budgeting applications.
  • Allows for ragged-hierarchies,
  • Server-side MOLAP rules e.g. [Budget],[2008] = [Actual],[2007]*0.035

Excel’s PivotTable advantages:

  • Pure Excel. with the object model available for VBA scripting.
  • Drill-through as standard.
  • Excel 2007 can now handle a 1,000,000 rows, for earlier versions use an Access/SQLite local database or an enterprise database to first “group” and summarise the data to be pivoted.
  • Can be used against SQL Server Analytical Services (SSAS) Cubes (and those of other providers such as Pentaho’s Mondrian).
  • Much easier to set-up and use.

Update: May 1st 13:30

Beta Version of Palo 2.5 (which you’ll need to use the Palo ETL Server’s drill-through functionality) has just been released ..

SQLite logo as of 2007-12-15Image via Wikipedia

Although my data-smithing tool box is full to the brim with powerful tools such as Talend, Kettle PDI, Picalo and Excel, all backed by the cloud infrastructure of Amazon’s S3, SImpleDB and EC2, there’s one simple yet powerful tool that I always seem to gravitate back to, that tool is SQLite.

Now obviously being a hewer of data, I need a SQL compliant database for data manipulation and SQLite performs that task with speed and ease. But it’s not just in the hewing, it’s in the hauling of data where SQLite also shines.

I use SQLite as the container for passing tabular datasets between (and within) my various tools, that data doesn’t even need to be clean (due to SQlite’s liberal manifest typing rules) just so long as it can be expressed as a table.

For example; a Talend job could store an extracted dataset in a SQLite file, pass that file on to a Python script for some special processing (for example extracting further data from a source not directly supported by Talend such as SAP or SimpleDB), and then pass the resulting SQLite database on to Excel or a similar tool to allow a business user to view and perhaps modify the data; finally Talend picking up the file again to load it into a corporate data warehouse.

Now you could use flat files to transport the data or store the intermediate results in a corporate database, but SQLite is as easy, if not easier than, flat files and offers the SQL processing capabilities of big-iron databases, but without the hassle of getting write access to an existing server or setting one up from scratch.

And I know there are other similar file based database offerings such as MS Access and the Java only HSQLDB, but neither match SQLite’s ubiquitousness, sheer simplicity and powerful data processing ability.

Python the new VBA ?

These last two weeks, Python has been on my mind. First off, last week I decided to make time to fully investigate Picalo, an open-source Python-based data analysis tool, and then, this week, Google announced their long awaited cloud-computing offering, Google Apps Engine, with the language at its core.

Python was the first of the “LAMP generation” scripting languages that I decided to learn in any detail ( I had used Perl before that but only on a per-task basis (similar to how I’d used AWK)). I then invested time in learning PHP, then Ruby and finally JavaScript. And here I am, back where I started, with Python.

But it’s not the same Python I learned three years ago, not that it has changed that much, but my appreciation of the language has, largely due to my deep dives into other languages. For example, JavaScript’s treatment of functions as first-class objects, highlighted the same functionality in Python, something I’d missed (or rather, not fully understood) the first time I encountered the language. Likewise, Ruby’s RoR introduced me to a “best of breed” approach to web application design, something that can be used as a comparison aid when approaching new web frameworks such as Django.

But of course the scripting language that continues to power most of my datasmithing activities is Excel VBA. That’s why I was so excited to see a tool such as Proto utilise VBA as its scripting language. But, Microsoft has abandoned VBA, there will be no more Protos.

Also, Excel VBA is now a Windows only language. Windows, however, is no longer the ‘only’ business client OS (see how many Apple laptops you can spot the next time you’re in a business-class airport lounge, a few years ago it would have been zero, not any more), and is currently nowhere to be seen as a cloud computing platform (but that’ll change).

I’m at heart a table-oriented programmer, and I, like Picalo’s author Conan Albrecht, believe “data analysis is best done through scripting”; but not just data analysis, the T in ETL (Extract, Transform and Load) and the I in DI (Data Integration) and SI (Systems Interfacing) also benefit from a scripting approach.

So, what to adopt as a successor/companion-in-her-old-age to VBA, will it be Ruby, JavaScript, Python, Perl, even PHP?

It looks like it’ll be Python because it’s …

The runner up is of course Ruby, but its poor integration with Windows is a major problem and the datasmithing “prior art” of Picalo and Resolver makes Python hard to beat.

UPDATE:

On Friday I, alongside 10,000 others, received an invite to try out appengine, yipee!

Also, as Dan pointed out in the comments below, I’d not included Jython in my list of reasons for embracing Python. I must add it to my list of things to try out particularly as both my “classic” ETL tools, Talend and Kettle are JVM based.

Another thing to add to the (ever growing) list is Mike Pitarro’s SnapLogic python-based ETL tool. They have …

…just released a 2.0 Beta version with some major architectural enhancements. The SnapLogic model is very different from traditional ETL systems. It takes an approach that’s more like the web, based on loose coupling and HTTP interactions. We model data source, sinks, and transformations as URI addressable endpoints, and have a model where than can be chained together in pipelines to build transformation logic. We use a plugin architecture to make it easy to add custom components.

Not sure, but this morning I received my monthly AWS bill, and it was double its usual amount! When I investigated the extra cost it was due to 133GBs of downloads from my www2.gobansaor.com bucket. This is the S3 bucket in which I store the xlAWS zip file, xlAWS being a “library-of-sorts” of VBA/VB6 helper code for accessing Amazon S3 and SimpleDB.

It’s linked to from this page on my blog (which has had 200 or so hits this month) and from this AWS Community Code page. The excessive hits on the bucket started on the 28th of Feb , the day the xlAWS code was published on Amazon and continued through most of March. Talking the size of the zip file, 133GB represents approximately 100,000 downloads. I don’t have server logging enabled on the bucket, so I can’t be sure how much is due to the other public files in the bucket (all belonging to the VBA/Proto SQLite xLite project), but as that project has been available for months and is accessible only through my website (who’s stats show a consistent 5-10 downloads per week) I’m guessing the downloads are for xlAWS.

Who would have though that there would be such interest in VBA/VB6 code for accessing AWS services! I wonder was it the Excel VBA side of the house or the dispossessed (and p*ssed off) VB6 developer hoards who downloaded it the most? Leave a comment if you downloaded and used the library, I’d love to know.

… and that’s good. That’s how I like my databases, boring, reliable, consistent, easy to use.

SimpleDB on the other hand is not boring, it’s an exciting new shiny thing that opens up a myriad of new possibilities; but first, I and the rest of the developer community, need to tool up and cast aside some of our cherished database design patterns (oh like, 3rd normal form, strong typing, joins, nothing major) and embrace a slightly different way of thinking, however, as much as I like a challenge, I also like to get things done.

That’s where EnterpriseDB’s new Postgres Plus Cloud Edition comes in, this is an Amazon Ec2/S3 hosted edition of their Oracle compatible PostgreSQL-based product that offers the scalability of SimpleDB but the familiarity of a traditional relational database. The “magic” is supplied by Elastra, who are also offering the same functionality against MySQL and standard PostgreSQL databases.

A Talend ETL job which I had been developing for a client, had been tested against a “normal” EnterpriseDB instance. This ETL job was part of a BI prototype trialling a Postgres Plus Cloud Edition (the new name for EnterpriseDB’s cloud offering) as the back-end database. So, I exported the job as a Java executable, fired up an EC2 instance, copied up the generated JAR files, changed the database’s hostname to that of the Postgres Plus “cloud” database, ran the ETL job and it worked. As I said, boring, nothing to report, it just worked.

Now you may be wondering what’s so special about these Elastra powered databases, surely EC2 is no different from any other Linux virtual machine, why not simply install a standard database? The problem with EC2, and it is a problem to those of us (i.e. practically every IT pro on the planet) who have come to expect highly reliable RAID backed disk storage, is the non-permanence of its disk systems.

When an EC2 instance is powered down or fails, the disk system is wiped!

That, combined with fixed (if generous) disk sizes (160GB, 850GB or 1690GB), means that often a clustered database environment is a necessity, adding considerably to the complexity. It’s this sort of complexity that SimpleDB and Elastra address.

The obvious use-case for both Elastra and SimpleDB is as data stores for OLTP applications but Elastra’s ability to handle S3-backed massive databases means the possibility of using EC2 as a data warehousing platform is also considerably strengthened. Although not obvious at first glance, SimpleDB could also act as an OLAP data store; SimpleDB massively indexed tuples as “sparse dimensions” pointing to S3 objects (SQLite databases?) that hold the fact data combined with dense/”partioning” dimensions (e.g. Time). Possible ? Yes. Fun to do? Yes. A solution that I can apply tomorrow? No, that’s why I’m glad EnterpriseDB and Elastra are delivery such a boring product!

UPDATE Ec2:

The other big EC2 missing - non-permanent IP addresses - has at last been addressed. EC2 now offers “Elastic IP Addresses”, addresses associated with an account not an instance. If the instance fails or is shut down, the IP address can either be immediately re-assigned to a new instance (no more waiting for Dynamic DNS propagation) or “reserved” for future use at a cost of USD0.01c per hour. Also, the new “multiple locations” facility puts the API changes in place to allow for location selection, hopefully a sign that we here in Europe will have “local” EC2 instances to match our European S3 buckets!

UPDATE EnterpriseDB:

It looks like IBM have invested in EnterpriseDB, possibly as a counter-weight against Sun’s acquisition of MySQL (EnterpriseDB’s targeting of Oracle’s customer base would also be an added benefit!).

The “perfect storm” of ubiquitous broadband, powerful and cheap laptops, virtual machines, cloud-based services/infrastructure and open-source software is changing the nature of IT in a way that’s reminiscent of the revolution started by the IBM PC. Although a lot of emphasis has been put on the influence of consumer-focused services on the enterprise, the Web2.0 effect; there’s also traffic in the other direction.

Tools that were once the preserve of large multi-national or governmental organisations are now becoming available to a much larger audience at a fraction of the cost (either free via open source or pay-as-you-go via on-line services such as Amazon Ec2 or salesforce.com).

As a result of this leakage from the enterprise, I’m more and more using a skill that I though I’d left behind in the hallowed halls of “big business”, my knowledge of enterprise Java. Two of the tools that are at the centre of my datasmithing arsenal are Java to the core, Talend ETL and WaveMaker and a third Palo’s new ETL Server is built on a Java stack.

Talend

My first impressions of the “Java Project” version of Talend were, as they say in Texas, “all hat and no cattle”. I’ve stuck with it though, and have had the opportunity over the last few weeks to re-visit the product. Initially, my attitude to Talend was coloured by my experience with Kettle (aka Pentaho PDI), which under the direction of Matt Casters and the patronage of Pentaho has grown from strength to strength, but once I attuned myself to the idea that Talend is, in essence, a code generator, generating code in a language I know well, I became more comfortable with it.

What I like about Talend, is the ability to convert an ETL process into either a POJO or a WAR file representation of the solution, both stand-alone and fixed-in-time. Talend as a company or as a product could disappear in the morning, as could I, but the solution, cast in Java, will continue on regardless, a solution expressed in a standards based language, widely used and understood by a large number of IT professionals.

This is really important when you consider the ETL/BI products that have been swallowed by the rent-book collectors of the IT business, e.g. Essbase/Hyperion by Oracle, Business Objects by SAP (I might be a bit unfair to SAP, who continue to show real commitment to software R&D). The new owners of these once ground-breaking products will continue to milk the licence holders for “rent” long into the future , a situation that would be acceptable if they continue to offer value above a “protection” service more associated with that provided by your local hoodlum; but alas, most were bought to strengthen the purchaser’s control of their market, so I wouldn’t hold my breath.

I’ve always been a lazy programmer, so I’ve over the years developed numerous productivity aids to help automate development (i.e. reduce the boredom factor for me and the cost for my customers/employers) and to reduce errors (a cost to both sides). Code generation has been at the heart of many of these efforts, so I find myself at home with both Talend and Wavemaker’s approach.

WaveMaker

I’d not heard of Chris Keane’s WaveMaker until a few weeks ago. WaveMaker (previously know as ActiveGrid) belongs, along side Talend, Pentaho and Jedox, to a new breed of open-source businesses, and the product brings new life to Java web development. I was lucky to escape the worst of the J2EE nonsense and could never understand why an easy to use GUI builder like this never existed in the Java world, no wonder .NET continues to outgun J2EE in the market place.

Those of you with a background in VB6/VBA or VisualStudio will feel right at home here, but instead of desktop GUIs you’ll be building AJAX web applications. The resulting application is packaged as a WAR file which can be hosted by any standards based Java server (e.g. Tomcat or Jetty). It’s open source and built on industry standards, Hibernate,Spring and the Javascript Dojo framework.

Not only can WaveMaker act as a front-end to traditional databases, it’s designed to be equally at home with data served up by web services or POJOs. And, as Talend WAR projects and the Palo ETL Server (Jetty based) both expose Axis based web services, these three products are a match made in Java heaven.

Oracle Application Express

So, If you only came across WaveMaker recently, what did you intend to use as a Web/GUI front-end before this?

Well, for many tasks, Excel, backed by web-service aware VBA will continue to be an option, my xlAWS library and xLite code base will continue to be useful. Obviously, Excel is a natural front-end to Palo itself and VB6 is always there for quick and dirty Window’s GUI apps. I may also use Proto if circumstances warrant it.

For web front-ends, in the past I toyed with using Oracle Apex, then Rails, then JotSpot, then Zimki.

  • Rails taught me a lot about “good web app design” and introduced me to Ruby (and SQLite) but it didn’t offer me the speed of development and ease of deployment I’ve become accustomed to.
  • JotSpot got swallowed up by Google and spat out again as Google Sites minus the innovative “wiki database” capability.
  • Zimki, alas, is no more.
  • That left me with Oracle’s impressive Application Express (aka APEX aka HTML DB).

Application Express is excellent if your background is in Oracle databases and/or Oracle Forms and if you work in a Oracle shop and have not checked it out, then do so, you’ll be impressed. It’s standard with all versions since 10G and can be installed on V9 databases. It’s also the front-end for Oracle XE - the free edition of the database server.

So why jump ship to WaveMaker?

APEX is Oracle specific, closed source, costs a fortune once you outgrow Oracle XE, is a bit “odd” to configure, and, I’m not sure Oracle know what to do with the product (afraid it might cannabalise its lucrative J2EE business!).

WaveMaker embraces the world, is open source and is really easy to use, while still allowing access to the underlying code (both Java and Javascript) and CSS styling. And, it can be easily deployed.

No contest, I’m afraid.

A Tale of Two Services.

Friday, last week, 15th Feb, two of the services I most depend on, failed. Now as it turned out, neither really concerned me at the time, as that same day my brother was taken seriously ill (he’s now doing fine and on the way to recovery). It’s only now I’ve had the time to think about the implications of these failures.

The first was my fixed wireless broadband provider, OmniTel (aka Callidus,aka Torque, aka IFA Telecom Wireless Broadband). Its signal was down yet again (4th time since Xmas, two of those for close on 7 days each!). Large areas of rural Ireland depend on providers such as Omnitel to supply them with what is now a basic service and I think many would agree that the end-user experience is, to put it as charitable as possible, sub-optimal. I’ll leave it to others to explain why we’re in such a mess, but a mess, it is.

For several reasons I’m not that overly concerned about this as the area I live in, now has at least one alternative wireless provider (Irish Broadband - and I see two three four of my neighbours have changed over to them since last week!) and I’m also within 3KM of an Eircom exchange, which means I have my trusty ISDN backup and will eventually (we’re on the “list”) have access to ADSL. Now, ISDN is not a suitable alternative if your house is full of iTune/YouTube obsessed young adults or if you need to constantly download large amounts of data (e.g. 10MB plus) but for “normal business stuff” it’s fine, I could live with it.

But, isn’t a datamith’s stock and trade large datasets? Well, yes and no. Many micro tasks such as data analysis tend to be carried out using Excel, which by its nature means you’re dealing with relatively small datasets or sub-sets of large databases, neither require significant bandwidth to load/upload. For larger datasets and more powerful ETL/analysis tasks I don’t depend on my local machines, I use Amazon EC2/S3. In fact, most of my business and personal computing infrastructure is now “cloud” based with my laptop reduced to the task of local cache/processor/communication’s device, similar to the role of my mobile phone, just a bigger keyboard and screen!

Which brings me neatly to the other failure of Friday the 15th, Amazon’s cloud services, EC2,S3,SQS and SimpleDB. As it turned out, it wasn’t the services themselves that failed rather the AWS authentication infrastructure was subjected to what could be described as a “friendly/unintentional” DoS attack. Existing publicly accessible S3/SimpleDB resources were still accessible and EC2 instances continued to operate, but anything requiring authentication failed. It reminds me a bit of the early days of RAID storage systems, the “miracle” of stripping and mirroring worked but failures still happened due to faulty power supplies or controller sub-systems.

The major complaint first-timers have when coming to terms with EC2 is the lack of post-shutdown/failure persistence on the virtual machine’s disks, data must be backed up to S3, otherwise it’s gone in the event of an instance failure. I’m guessing that the “oddness” of this architecture is to do with its suitability for the purposes that Amazon originally designed it for, and having proved it in their day-to-day business over the last decade or so, they’re sticking with it. Which is good, those of us who are now becoming dependant on this architecture want a robust and proven service. I suspect the authentication service is a new layer on the existing internal Amazon stack and is only now being stress-tested.

So it failed, and was fixed relatively quickly, but what’s more important, Amazon acknowledged the problems (not just the reason for the failure itself, but the less that perfect way their users were kept informed during the outage) and I’m reasonably confident they’ve learned from their mistakes. (To return to my rant on my broadband provider; I think the most annoying thing when the service goes down, is that the whole of Omnitel, help-line, accounts, even sales refuse to answer the phone (no forum, no status page) leaving their customers to wonder have they gone out of business or are they all hiding under their desks with their fingers in their ears shouting “Go away, go away”).

As a side note, two other services I use had hiccups this week, WordPress.Com was down for several hours on Wednesday (as a result of a DoS attack, I believe) and on Friday my Hamachi VPN service was down for a hour or so due to server resource problems.

So am I less confident in the viability of the “cloud” after this week of outages? No, I’m a believer in “risk management” rather than “risk avoidance”, as long as I’ve a “good enough” alternative (ISDN for broadband, standard Linux hosts for EC2) or a high degree of confidence in the supplier (Amazon S3 for backup and secure storage) I’m sticking with it. Not only that, I’m betting my career on it.

Update: Monday 25th

A bit windy today. You guessed it,broadband down again! So make that 5 times since Xmas. I see in their terms and conditions Callidus (OmniTel’s legal entity) promise 99% uptime within any month, that’s a little over 7 hours of acceptable outages per month, if only! On the plus side, I was talking to one of my neighbours who’d recently changed over to Irish Broadband, her experience with her new supplier where very positive. “Professionals. know what they’re doing, excellent customer service”, is how she described them.

Update: Wednesday. March 12th

Windy again last night; yep! gone again. Well I assuming it’s the wind, no reply at any of Omnitel’s numbers. Maybe they’re gone out of business!

Update: Weekend 28th-30th March

Omnitel down again Friday night (28th), my son says it was back at some stage during the weekend, but when I went to use it tonight (Sunday 30th) still not working. Left a text message on 087 2826671 their out-of-hours number (twice), but to no avail.

And this crowd were ..

… recently shortlisted in the Government’s National Broadband Scheme to provide broadband to the remaining areas currently unserved by broadband in the Republic of Ireland.

… and if they win those areas will continue to be “unserved” !

Update: Monday 31st March

Service back up and running at 12 noon! More amazing, when I rang the help line this morning, there was a message acknowledging the problem (could it be true, Omnitel have started to invest in customer relations!). Mind you, should I let them in on that other secret of modern customer service, the “status blog”?

Simply set-up a blog, e.g. http://omnitel.wordpress.com, and post network problem and resolution details, along side “good news” stories (e.g. network upgrades) and maybe even allow customer comments!

I know too much to hope for.

Update: Sat 12th April

Down again since 6PMish, actually this is the 3rd weekend in a row, but the last two were “just” Sunday night/Monday morning outages (or extreme slowness as per last Sunday PM /Monday AM) so I didn’t report them.

Update: Monday 28th April 19:00

Keeping with the now well established tradition of a weekend failure, Omnitel network down since Saturday 15:00ish, seems to be a major outage, still no sign of a return to “normal service”.  Time to phone Irish Broadband I think, Lo-Call 1890 56 44 56.

I’ve been using Amazon’s S3 service from within Excel for sometime now and as there are no libraries or examples for calling AWS services from VBA (or VB6) I had to roll my own. As with most things Excel, getting the job done always triumphs over elegance and industrial strength implementations, in other words it was all a bit of a “dog’s dinner”. To remedy this and to share my experience of using S3 from within a VBA/VB6 environment, I decided to re-factor the code and to assemble it into a more re-usable form; the end result is xlAWS.

It was going to be called xlS3, but while doing the exercise SimpleDB appeared on the scene, so I decided to try accessing it from Excel, particularly as both products have a lot in common; both “simple”, both “schema-less” data stores. Like the S3Helper code, the simpleDBHelper module is less of comprehensive library, more a collection of useful functions which (hopefully) make working with AWS a bit easier.

To use this code library, you’ll need to have a good grasp of the S3 and SimpleDB APIs and be reasonably proficient with VBA. This is not an end-user tool, it’s for VBA (or VB6) developers. There’s a README and some basic examples within the Excel VBA project to help you get started. Code is released “in the spirit” of LGPL, you can use it how you wish, but if you add something new to the “library” (or find/fix a bug) do let the rest of us know.

As I’ve not been able to find a pure VBA implementation of the HMAC-SHA1 hash algorithm (and I couldn’t see an implementation within the standard “Microsoft Enhanced Cryptographic Provider” ) I’ve wrapped the open source XySSL SHA1 HMAC C code in a VBA friendly DLL. This DLL (and the source, under LGPL) is included in the zip file as AWS authentication requires SHA1 HMAC signatures.

You’ll also obviously require an AWS account. Credentials are stored within the workbook’s custom properties and can be encrypted via a “key file” if required. If you intend to use this code within VB6 (or Proto) you’ll need to provide your own implementation of the AWSKeyData class in order to use a non-Excel persistence store.

You can download the project ZIP file from here.

Have fun.

UPDATE

Another alternative for calculating HMAC-SHA1 signatures in VBA/VB6 is a Google Checkout supplied COM DLL see http://code.google.com/apis/checkout/samples/Google_Checkout_Sample_Code_ASP_InstallTxt.html

Over the weekend I dusted down my JotSpot Wiki, cleaned out some old Wiki pages and generally made it useful as a client collaboration tool. I created some new pages and few “project diary” type blog entries to do with a proposal for work. I also set up a potential client as a contributor and sat back to reap the collaborative benefits of one of the finer Wiki tools out there.

Unfortunately, by Monday afternoon all was not well. The jot.com domain no longer pointed at JotSpot, instead it was “parked” at Network Solutions a domain name registrar. Now this generally happens to domains when they’re not renewed or your credit card company refuses to honour your request for payment. If JotSpot were a two-guys-in-a-garret operation you could see how this could happen, but JotSpot is now owned by Google.

Google’s neglect of the product and its secrecy over future plans has been a major concern to the original service’s loyal, (but I would imagine, declining) user base, but yesterday that neglect hit a new low.

The problem was fixed relatively quickly, but due to DNS migration issues, 24 hours later, many users of the service are still locked out. That’s a problem, but hey, s**t happens. What’s really astounding is Google’s complete silence on the subject over on the JotSpot support forum.

Makes you wonder how much of your commercial or indeed personal data assets you should entrust with such an organisation. Big brother may be watching you, but he’s not about to demean himself by actually communicating with you.

I’ve had this sort of problem with another Google Apps services in the past and I’ve seen problems with gmail similar to those experienced by Jeff Nolan. I’m about to launch my www.gobansaor.com business site and my intention was to host it under Google Apps (which rumour has, will soon incorporate some variation on JotSpot). My dilemma is now whether to forge ahead with my original plan to use Google Apps or use a local Irish hosting service. Or, maybe I should fork out the $50 fee for the Google Apps Premier Edition with its “24/7 assistance, including phone support for critical issues”.

Decisions, decisions.

UPDATE:

Two days after the event, Google acknowledges the problem.

UPDATE: 28th Feb 2008

JotSpot is reborn as Google Sites.

Initial quick look; I like it, keeps a lot of the simplicity of the pure Wiki side of JotSpot (the “structured  Wiki”as an alternative to a database/”application builder” is no more).  But the integration with the rest of Google Docs is to be welcomed if a bit limited at the moment (documents must be published first from within Google Docs and their URLs then  “cut and pasted” into the Sites application).

The new Google Spreadsheet’s forms functionality should make up for the loss of the JotSpot database functionality, at least for me.  Having the ability to point a CNAME at the resulting wikis is also very useful for client project collaboration.

Dublin buses, as is the norm with most road-based public transport systems in our increasingly car-choked cities, tend to operate on the basis of “no sign of a bus for ages, then two or three arrive at the same time”. Palo MOLAP ETL options appear to be following the same pattern; we’ve been waiting for ETL support for ages and now we see three of them heading down the road towards us. There’s Palo’s own offering, then came Stratebi’s Kettle Plugin and now Talend Version 2.3.0RC2 is offering a Palo output component.

Mind you, the Talend offering is very basic and I’ve not managed to get the Sratebi plugin to work, leaving Palo’s ETL Server as the front runner at the moment (drill-through capability is a winner in my book).

I’ve also been busy re-factoring my VBA SQLite and Amazon S3 code with the intention of publishing them as an Excel based micro-ETL platform. While cleaning up the Amazon AWS modules I’ve been playing with SimpleDB, I’m impressed, Excel combined with SimpleDB rocks!

I’ve also wrapped the open source XySSL SHA1 HMAC C code in a VBA friendly DLL, as searching for a VBA hmac sha1 hash implementation (essential for Amazon AWS access) has proved fruitless.

Hope to release the lot the end of next month.

UPDATE:

Thanks to Javier and Jorge from Stratebi I’ve managed to get the new Kettle Palo plugin to work. It seems that the TEST facility in the Kettle database connection dialogue throws an exception for Palo connections but the connections work fine in the actual Palo input/output steps. Did a quick test and it looks very easy to use and fits in well with the Kettle “way of doing things”.

PALO ETL-Server and SAP

Jedox have just published a roadmap for their open-source ETL-Server, release date of March 2008, same date as the next release of the Palo OLAP Server. In a future release they intend to offer SAP RFC/BAPI and SAP-BW XMLA support, being an old SAP hand this looks very interesting.

There’s also a features page with a good overview of the ETL-Server’s technical architecture.

What if you’re a major player in the IT world and suddenly the internet’s equivalent of your local bookshop releases a mould-breaking cloud-based database service, SimpleDB. This is on top of Amazon’s highly acclaimed document data store service, S3!

Well, if you’re IBM you hire Damien Katz the person behind CouchDB. I think 2008 could be the year that cloud-based database services really take off

First day back after Christmas, snow falling outside.

More additions to the PALO ETL-Server SourceForge project, new version of the core and, a new web server - built using Jetty and Apache Axis. Axis is a SOAP handler so I looked around for the WSDL file to see what services are to be exposed and found a reference to a drillThrough service which I guess is the mechanism by which we’ll soon be able to drill back from a PALO cube to its source data tables. At the moment I’m just wandering through the source code, I’ll need to fire up an EC2 instance to give it a test run as the new server code doesn’t fully support Windows.

Happy New Year!

Nollaig Shona Daoibh

It’s December 21st, the shortest day of the year; either the middle of winter or the start, depending on your view (here in Ireland, Winter solstice 2007, Glending, Co. Wicklowwinter really only kicks in from mid-December onwards). The lucky annual lottery winners had a fantastic clear frosty morning to witness the solstice dawn in Newgrange, but the view I experienced from the hill fort in Glending, Co. Wicklow this morning was just as rewarding.

Merry Christmas to you all.

Update: My sister (the teacher) informs me that the modern spelling for Nollaig Shona Dhaoibh (i.e. Merry Christmas to you (ye)) is Nollaig Shona Daoibh, well you learn something new every day!

The WAN is the new LAN

While discussing SimpleDB ,Nick Carr points to the polar opposite views that the two computing behemoths, Google and Microsoft, hold as to the future direction of cloud computing. Google’s Schmidt sees an eventual 90/10 split with the cloud being the home to most data and processes while as expected, Microsoft’s Raikes points to the current reality and insists that the trend will continue to favour a PC centric view.

I’m not sure who’s right, but my instinct (or is that my prejudice) would be towards the Google view. But one thing I am sure of, is ,that as the the cloud (aka the Internet) and “personal computing devices” (aka desktops, laptops,PDAs, mobile phones) fight it out for dominance, the future of the business LAN as the prime computing backbone is looking increasingly untenable. For SMEs and consumers at least, the WAN (in the form of the Internet) is the new LAN.

Not that LANs will disappear totally, the necessity to provide local wireless access and the address limitations of IPV4, plus the need to share printers etc. will see to that (a least in the short-term, but mobile 3G networks, IPV6 and services such as PrinterAnywhere may eventually address these issues). Also, the ability to act a local cache for backups and data access will ensure the LAN’s continued existence at least until Korean levels of broadband speed/availability becomes the norm in the rest of the developed world.

But what about shared private data, email/calendar, backups, security and last but not least, business applications; the big five “business” reasons that lie behind the justification for must organisations’ (and some families’) LAN setups?

Shared Private Data

Fast ubiquitous broadband and online data stores such as S3, SimpleDB, Microsoft Live Workspace and eventually GDrive, will mean that for many small and medium companies the cost of maintaining in-house data servers will no longer make economic sense. Even large organisations, who have in many cases already out-sourced their data centres to the likes of IBM and are already operating VPNs over private and public WANs, may also move parts of their data infrastructure to the internet cloud. Added value online storage services such as provided by Google’s Docs and Spreadsheets will also drive individuals and organisations in this direction.

Email / Shared Calendars

One word Google Apps. Okay, that’s 2 words and a bit simplistic but GMail and Google Calendar and particularly the premium Google Apps versions represent the future shape of business communication systems. Add in Wiki-like collaborative tools such as Google Docs and Spreadsheets (and the long awaited Googlified JotSpot) suddenly the idea of any SME running its own Exchange servers becomes harder to justify.

Data Backups

Even in current setups, an effective backup policy requires that data be moved of-site, so online backup services are a natural progression. In essence the LAN is working as a local cache to quickly assemble the backup and prepare it for transportation to another location (the boss’s home study most likely!). Online backup will probably be the first cloud service that businesses adopt. But as transactional data increasingly gets recorded off-site most of an organisation’s data will already be “backed up”; so, future backup services will be of the intra-cloud, belt’n'braces type e.g. a service that makes encrypted copies of your data stored on one service and either stores them in another online location or maybe burns the data to DVD and deposits it in a physical secure store.

Security

LANs are seen as the modern data equivalent of a medieval town with its firewall playing the role of the town fortifications. But just as increased mobility. collaboration and newer technology put an end to the justification and utility of walled towns, a similar fate awaits the firewalled LAN.

The explosion in the number of workers (especially knowledge workers, free agents and senior executives) operating outside the local network means that companies must already address data security in the context of public networks. VPNs can of course bring the LAN environment to the mobile worker (even a home/tiny business can use something like Hamachi VPN). But VPNs will not extend the LAN but replace it; increasingly to be used as “private pipes” between trusted peers and cloud servers.

For example, I use Hamachi to communicate with my EC2 instances and to transfer data between my laptop and my main desktop PC; something I can do securely and effortlessly from my laptop using any private or public network. As such, the firewall that really keeps my data secure is the one on my laptop not the one built into my LAN router.

You might look at the recent spate of data loses as evidence that companies should batten down the hatches and throw away the key but I’d argue that it’s a failure to face up to and manage the risks (and opportunities) of mobile data that has caused most if not all of these breaches. The first step is to focus on the “Wifi-enabled, easily-stolen laptop connected to a dodgy airport public network” as the “standard” against which your firm’s (and family’s) data security will be judged and eventually tested.

Applications

For many small businesses the business applications they use tend to be either single user packaged apps or even more likely, Excel. Having a shareable cloud-based data store is all they require to abandon their LAN. But for those businesses that rely on sophisticated multi-user systems replacing in-house servers will be more difficult. There are three options as I see it:

  • Keep servers in-house but purchase or lease them as pre-configured “black boxes”. When a new version or bug fix is required, the vendor remotely updates the software; no on-site technical expertise required. Likewise, the vendor remotely monitors the hardware and slots in a new pre-configured box as required. You may argue that the LAN remains and yes it does, but this sort of setup would only be required where high-speed and reliable broadband is not yet available or where any interruption in server connection is not an option.
  • Use remote pay-as-you-go, invoke-as-you-need virtual servers such as Amazon’s EC2 or Scotland’s Flexiscale. Again, using pre-configured virtual machines that can be either purchased or leased from software vendors removing the need to have in-house server or application expertise.
  • And finally, the ideal for most companies, SaaS, Software as a Service, pioneered by Salesforce.com and now starting to gain traction across not just CRM, but accounting, and even full scale ERP. Even the mighty Sage is starting to feel the winds of change! Very small businesses are also well catered for, e.g. FreeAgentCentral for UK based freelancers.

Times they are a-changin’, migration of some or all data to the internet cloud is inevitable, large organisations will most likely build their own cloud, smaller businesses will need to adapt to the cloud-as-a-service model. Organisations need to start thinking about it now as all future IT investments need to factor this phenomenon in, even if the reaction is to reject it!

I’m a database man. I’ve worked on or about most variations on the theme, from roll-your-own flat files, to hierarchical, to CODASYL network databases, to the current crop of relational and MOLAP platforms. Of late, I’ve being investigating what I think will be the future of database technology, the distributed document-centric database. Today, the future arrived in the form of Amazon’s new SimpleDB service.

Up until now Amazon’s S3 service offered one half of the future platform the “distributed document-centric” bit but it lacked the indexed structure part to make it a true database; but in combination with SimpleDB it’s now complete.

SimpleDB stores data in a Domain/Attribute schema-less and type-less structure having more in common with a spreadsheet than a traditional relational table. If you’ve worked with the likes of SQLite (manifest typing) or Excel (no predefined schema and manifest typing) then you’ll appreciate this is no hardship, quite the opposite in fact (I find the strong typing nature of most databases a real pain having worked recently on a SQLite combined with Excel project).

The distributed nature of SimpleDB may however pose some difficulty to those of us (i.e. almost everybody) raised in the world of ACID compliant databases. Because of the Brewer’s Conjecture effect, SimpleDB sacrifices consistency for availability and partition tolerance i.e. when you write something to the database, an immediate query may not return the updated value, subsequent queries will eventually return the new data, exactly when depends on the load and the availability of resources. Those of you already using S3 will already be living with this “feature”, and in practice you rarely notice it (most updates seem to appear immediately) but it will still pose design challenges to handle the edge cases.

The service is still in limited Beta, but the documentation is available and if you already used any other AWS product you’ll immediately feel at home. The pricing is again based on usage, the cost of storage is much higher than S3, being $1.50 per GB-month, but a GB of structured data is an awful lot of data (and the larger document style storage would be provided by S3).

If you’ve not yet tried out either S3 or EC2, now might be a good time to start, cloud computing has come down to earth, all thanks to an online book store, Amazon!

Although Zimki is to shut down on Christmas Eve, the ideas behind the service live on. Two new offerings, Horuku and AppJet, offer variations on the idea of hosted application development/deployment.

AppJet, funded by Paul Graham’s Y-Combinator, is very similar to Zimki, being a server-side JavaScript platform. No details yet as to what sort of paid options will be offered (all accounts are free at the moment). Unlike Zimki there’s no plans to create an open-source version. I like the easy “build a Facebook app” feature; and I guess this is the sort of light-weight applications that they hope to attract.

Although Heroku uses Ruby-on-Rails technology, rather than JavaScript, it is closer to the original Zimki idea; but rather than take the hard (and ultimately unsuccessful in Zimki’s case) road of building an open-source platform from scratch, Heroku takes an already popular open-source project and offers it wrapped in a full on-line development and deployment environment. Again, being in beta, there’s no indication as to what pricing model it will operate under, but I would think that it will attract more “serious” projects than AppJet since anything developed under Heroku is pure Rails which means it can be migrated to any other Rails hosting environment; so no lock-in. The online editor is excellent and whatever about its merits as a hosting service it’s by far the easiest way to learn and explore Ruby and Rails, even easier than this…

If Facebook apps are your goal but you wish to use Ruby rather than AppJet’s JavaScript then not to panic, as being Ruby some bright young spark (no, not me I’m afraid) will already have done a lot of the hard graft for you…

I was wrong. I figured Jedox would build their new ETL server on one of the existing open source ETL project code-bases, either Talend or Pentaho’s Kettle. Instead, the new alpha ETL server code which has just been uploaded to SourceForge is based on neither and appears to have been developed by another German company Proclos.

Rather that a full featured all-things-to-all-men ETL tool, it’s a specialist MOLAP cube import tool, like an XML driven version of IMPPalo. Being Java based, it should be easy enough to combine with Kettle to offer the best of both worlds; let Kettle do the heavy lifting and the management of conformed dimensions and fact tables, then use Palo ETL-Server to build the hierarchies and load the cubes from these tables.

There’s no documentation as yet but there’s two demo projects; importRelDB.xml, which loads data into a cube from a HSQLDB in-memory database and a CSV file; and importOLAP.xml, which copies data from one Palo cube to another.

To run the importRelDB.xml project …

Java -jar importer.jar - p importRelDB

… each project is broken up into Jobs (such as Initdata, MasterData, CubeData, again like IMPPalo) and these can be run separately by using the -j option.

The tool is controlled via XML configuration files and lacks a GUI interface (which is fine by me, I’m more of a command-line guy). I’ve checked-out the SVN code and am slowly working my way through it, no sign as yet as to how drill-back from PALO cubes will be enabled; as this project is called Importer ETLCore, perhaps that’s yet to come.

So far, I like what I see.

Older Posts »