Category Archives: Web2.0

Spending time on Excel-SQLite, C, VBA Callbacks & Twitter

Haven’t posted here in a while as my spare time has been soaked up programing, well actually refactoring would be more exact.  My xLite “SQLite empowered Excel” codebase has grown over the years and required a serious makeover to get rid of stuff I no longer use and to generally make it more robust.  I also decided to add some extra functionality to my VBA friendly C wrapper for SQLite (based on Pivotal Solutions’ pssqlite.dll) which meant I had to re-acquaint myself with my long lost C skills, so doing reminded me how much I like C. Close to the metal programing if not exactly super-productive is nevertheless super-powerful.

The new improved xLiteSQLite.dll now has a built-in CSV loader (both file based and string based – handy for loading Palo HTTP API responses into a table). It also returns a one columned variant array of CSV values for quick rendering via “text-to-columns” code (by far the quickest way of handling large dataset pasting into Excel).

I’ve also added the ability to create SQlite UDFs (user defined functions) in VBA (thanks to http://stackoverflow.com/users/4007/rpetrich).  This is a very powerful feature as it allows SQLite selects to act as a “loop controller” calling back to  Excel/VBA functions to process each row, really useful for ETL tasks. And not just scalar UDFs but aggregating (aka group-by) functions too, allowing the use of Excel’s powerful array functions in SQLite statements.

All in all, the changes to the xLite VBA code and the C wrapper makes Excel backed by SQLite a seriously good micro-ETL tool. Combined with Palo, the result in a truly wonderful micro-BI platform; a cost-effective toolset for these recessionary times.

Of course I’d be lying if I said code was the only reason I’ve been neglecting my blogging duties, I’m afraid I’ve a confession to make, Twitter has hooked yet another sucker, me! 

I’ve found I’ve settled in to the whole micro-blogging thing with ease, and have managed to make contact with people I would not have encountered otherwise, as well as reconnecting with others that I’d lost contact with.  So if you too are all-a-twitter then do please follow gobansaor-on-twitter.

Windows on EC2 = SMEs on EC2

The announcement that Win2003 is now an an option on EC2, is very significant, that and EC2′s exit from beta status with an SLA in tow, means that AWS is now very much more appealing to the great unwashed, the SMEs. i.e. the businesses who form the backbone of most of our economies.

Large companies and start-ups are comfortable in the world of Linux servers but most small companies are Windows to the core.  This may not be “right”, this may not be how it “should be”, but it is so.   Even within large companies, departmental computing is largely a Windows only enclave, with MS Office (and Excel in particular) as the backbone and MS SQL Server as the database of choice (or is that, no choice).

The other interesting thing is that my fear that EC2 SQL Server Standard instances would be licensed as per Oracle has not come to pass (Oracle while making a “big thing” of their recent EC2 cloud conversion, still insist on traditional licensing for EC2 database instances). SQL Server Standard is available on a pay-as-you-go model, brilliant!.

Even if running Win2003 as a server doesn’t catch your fancy and in fact you would much rather get rid of your existing Window’s laptop to be replaced by a cool new Apple Mac. Unfortunately you still need the ability to run Windows-only software, why not use EC2 as your on-demand pay-as-you-go Window’s desktop replacement?  Simply configure a Windows AMI with your required software (you may have to use something like this, if software is only available on CD); you could then use Jungle Disk to easily share data (via S3) between your new shiny Mac and the AMI.  Power up and down as required, easier than using VMWare or Parallels and @ 12.5c per hour, probably cheaper too.

Clouds no longer pass by Windows.

Amazon today announced that later this year, Windows Server woud be available on EC2. No details on cost and licensing etc. but this is major.  Up until now, that portion of the business world who are pure MS shops (a very large percentage especially amongst SMEs) were excluded from taking advantage of Amazon’s amazing (and getting more amazing everyday) EC2 platform

From my point of view, as with Oracle’s announcement last week, this releases yet more of my “legacy” skillset for deployment in the clouds. Although I’ve been involved with  *nix servers for 20 years or so, as corporate servers became more locked-down (and removed to the control of 3rd party data centres) I lost day-to-day experience of using them; in latter years my main ‘hands-on’ platform was Windows, either my own PC or local departmental NT servers. Windows on EC2 will allow me to use a whole new set of Windows only software (e.g. RSSBus or XLsgen) and of course SQLServer.

The lack of SQLServer on EC2 has been a major problem for me as a datasmith; there’s an awful lot of data out there sitting in SQLServer databases, but currently if I need to “cloud burst” such datasets I would have to first extract the data to, say, csv files and then load the data on to a Linux compatible database. But with a SQLServer instance running in the cloud, I could simply use SQLServer’s native backup/replication tools.  No more need to download data to my “ground-based” PCs resulting in quicker turnaround and fewer data security risks.

On the licensing front,  I’m presuming that the OS licence will be on a pay-as-you-go basis, but what about SQLServer and other server products?  Will MS do an Oracle on it, i.e. require a traditional upfront use-it-or-lose-it payment or will they the go the radical (but I thing inevitable) path of a licence-by-the-hour. 

First RedHat, then Sun, then Oracle and now Microsoft; the mighty beasts of our industry have acknowledged there’s a new mighty beast on the prowl, dressed as a humble bookseller no less!

Twitter – the penny drops!

I’m a fan of most things Web2.0, not just for personal use but as business tools.  Over the last four years or so I’ve enthusiastically embraced Wikis, IM (Google Talk), RSS Readers et al. I could see the benefit and attraction of social network sites such as Facebook even if I’ve not partaken as such. Heck, I’ve even joined the ranks of “those who blog”.

But one aspect of this Web 2.0 stuff that had until now not really grabbed me as particularly useful is micro-blogging i.e. Twitter, Jaiku etc.

This morning two things I read brought home to me the benefits of this technology, particularly in a business environment; the penny had dropped!

The first was this post  ”Ambient Awareness – The Cloud Killer-App” where this caught my attention …

To me, this is the essences of situational awareness. An ability to sense and understand your environment and the actions of others in that environment. Clive goes on to explain that sociologists have found that “weak ties”, such as those created by twittering, greatly expands an individual’s ability to solve problems.

Then I read that the winner of the top prize at TechCrunch50 is Yammer, yet another Twitter look a like, but this time with a difference; it’s designed to allow communication only between those within the same organisation.

Now that could be very useful, especially for organisations with a dispersed workforce or comprised mainly of teleworkers.  Such a tool could act not just as a means of keeping people in touch and aware of the general happenings with a company but could also be used a “lite command and control” tool where messages are used as a replacement for time-sheets and progress/activity reports.

As email was (and still is) the “internet as a wide-area-network” killer-app, micro-blogging may very well be the killer-app of the “always-connected internet”.

And in the spirit of sharing that is Web2.0, here’s some other things I discovered this week…

  • OutWit, a very useful Firefox extension if you need to automate the “harvesting ” of data (tables lists, photos,mp3s) from the web.
  • xlUnit – a unit testing framework for Excel VBA, now that’s something I could do with, OK it’s not quite there yet, but you can follow this Grumpy Old Programmer as he rolls it out.
  • Reverse Snowflake Joins Online, if you have a nasty bit of SQL that you need to visualise in a graphical format, then this online version of Alexandru Toth’s open source Python tool may be just what you need.
  • Quantivo, customer behaviour analytics in the cloud. If you’ve lots of sales data, but no in-house datawarehouse.
  • And if you’ve no sales data because you’ve no sales, then check-out Sales 101.

Cloudy skies, cloudy apps…

Just back from a break in Clifden, Connemara, summer is nearly over, the kids return to school today, back to work.

Aasleagh Falls, Co. Mayo

Aasleagh Falls, Co. Mayo

Counties Galway and Mayo were like the rest of the country last week, a tad wet, but unlike the developed east of the island, flooding was not a problem; a problematic drainage area is called a lake in the west.

This August has been the wettest and dullest I’ve ever experienced but at least I saw some sunshine earlier in the month thanks to Kristian Raue CEO of Jedox who kindly invited me to visit the company’s offices in Freiburg, Germany.  Freiburg is very green in both senses of the word, surrounded as it is by the Black Forest and its well deserved “eco-city” status.  Its also know as the warmest city in Germany, a reputation it thankfully lived up for this visitor from a rain-soaked Atlantic isle.

August morning, Frieburg Im Breisgau

August morning, Freiburg im Breisgau

If Freburg left a positive impression on my mind, so too did Jedox.  The overall impression is of a company which intends to use a combination of quality, vision and the judicious use of open-source to build the Jedox brand into one associated with best-of-breed products and consultancy.  This vision can be seen in the evolution of Palo, from its “good enough” beginnings to its current near-best-of-breed 2.5 version, and from talking to some of those working on the product, best-of-breed status is not that far off.

Likewise, ETL-Server which is currently a Palo only “loader”, is to be further  developed into a true ETL tool, while continuing to offer MOLAP-centric specialisms.

I also got a glimpse of the next version of Worksheet Server. “Wow!”, is all I can say.

Existing web based spreadsheet products are fine for simple data analysis or basic data capture purposes but cannot compete with their client-based elder cousins when serious datasmithing is required.  Well, from the demo I saw of Worksheet Server in action, that’s about to change.  The look and, more importantly, the feel is similar to that of traditional spreadsheets, its interface with Palo is identical to that of the existing Excel add-in, and here’s the big one, its open source!  Game-changing or what?

But …

That might enable me to move a lot of my spreadsheet applications to the cloud, but what about those applications that are more suited to an MS Access type solution?

Then try out WaveMaker. It’s open source and built on industry standards, Hibernate,Spring and the Javascript Dojo framework but has the ease of GUI database development more usually associated with MS tools. The resulting applications are packaged as a WAR file which can be hosted by any standards based Java server (e.g. Tomcat or Jetty).  The latest version makes developing Ajax-fronted database applications even easier with the addition of layout templates.  Its existing ability to automatically bind interfaces to SOAP web services has been extended to REST web services by means of a new WSDL auto-discover tool.  And Chris Keene CEO of WaveMaker also informs me that …

We are also releasing a cloud-based IDE in October with Amazon – stay tuned…

We launched in February and will be announcing our first 7 figure deal this month. We run on Mac, Linux and Windows and are currently the #1 developer download on Apple.com (http://www.apple.com/downloads/macosx/development_tools/)

Our goal is to make it easy to build rich internet applications without complex coding – kind of a MS Access for the Web.

Jedox and Wavemaker the new breed of open-source businesses

Amazon’s SAN in the cloud is a mirage…

This morning I got very excited.  While quickly scanning the headlines of the 1000+ unread feeds that had accumulated in my Google Reader this week, one heading in particular caught my attention, “Amazon Elastic Block Store goes live!“.

The post from the Right Scale folks gives a detailed overview of the new  Amazon ‘SAN storage in the cloud’ service, aka Elastic Block Store, aka EBS.  Alas, this particular cloud offering was a mirage, the post was subsequently removed (but can still be viewed on Robert Scoble’s Shared Items) it seems the post was a work-in-progress and not intended for publishing, yet!

Why was I so excited?  Amazon EC2 had two major shortcomings when it launched 2 or so years ago; the first, ephemeral IP addresses, was solved by the new Elastic IP feature; the second, ephemeral storage volumes (when you shutdown an instance the disks are wiped!) is due to be solved by EBS.  With both of these problems solved, EC2, already near perfect, would be perfect.

The article does a good job of explaining the new service…

EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at anytime to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot.

The thing that caught my eye in the above paragraph was the snapshot facility.  Snapshots are to be stored on S3 via an EC2-specific incremental-snapshot API.  This means the volumes will come with a built-in back-up facility. This is important as EBS drives reside in one availability zone (that of the instance that they are mounted against) and do not have the data replication security offered by S3.  It also means that disk systems can be restored quickly and simply from snapshots without the overhead  (and bugs!) of writing an S3 specific incremental backup and restore utility.

Back to waiting…

UPDATE: 20th August

Wait over…

Python the new VBA ?

These last two weeks, Python has been on my mind. First off, last week I decided to make time to fully investigate Picalo, an open-source Python-based data analysis tool, and then, this week, Google announced their long awaited cloud-computing offering, Google Apps Engine, with the language at its core.

Python was the first of the “LAMP generation” scripting languages that I decided to learn in any detail ( I had used Perl before that but only on a per-task basis (similar to how I’d used AWK)). I then invested time in learning PHP, then Ruby and finally JavaScript. And here I am, back where I started, with Python.

But it’s not the same Python I learned three years ago, not that it has changed that much, but my appreciation of the language has, largely due to my deep dives into other languages. For example, JavaScript’s treatment of functions as first-class objects, highlighted the same functionality in Python, something I’d missed (or rather, not fully understood) the first time I encountered the language. Likewise, Ruby’s RoR introduced me to a “best of breed” approach to web application design, something that can be used as a comparison aid when approaching new web frameworks such as Django.

But of course the scripting language that continues to power most of my datasmithing activities is Excel VBA. That’s why I was so excited to see a tool such as Proto utilise VBA as its scripting language. But, Microsoft has abandoned VBA, there will be no more Protos.

Also, Excel VBA is now a Windows only language. Windows, however, is no longer the ‘only’ business client OS (see how many Apple laptops you can spot the next time you’re in a business-class airport lounge, a few years ago it would have been zero, not any more), and is currently nowhere to be seen as a cloud computing platform (but that’ll change).

I’m at heart a table-oriented programmer, and I, like Picalo’s author Conan Albrecht, believe “data analysis is best done through scripting”; but not just data analysis, the T in ETL (Extract, Transform and Load) and the I in DI (Data Integration) and SI (Systems Interfacing) also benefit from a scripting approach.

So, what to adopt as a successor/companion-in-her-old-age to VBA, will it be Ruby, JavaScript, Python, Perl, even PHP?

It looks like it’ll be Python because it’s …

The runner up is of course Ruby, but its poor integration with Windows is a major problem and the datasmithing “prior art” of Picalo and Resolver makes Python hard to beat.

UPDATE Jan 2010:

To experience the best of both worlds, VBA & Python, my xLite (Excel combined with SQLite) datasmithing platform now allows Python to be used in conjunction with VBA.  Check it out here http://www.gobansaor.com/xlite

UPDATE:

Also, as Dan pointed out in the comments below, I’d not included Jython in my list of reasons for embracing Python. I must add it to my list of things to try out particularly as both my “classic” ETL tools, Talend and Kettle are JVM based.

Another thing to add to the (ever growing) list is Mike Pitarro’s SnapLogic python-based ETL tool. They have …

…just released a 2.0 Beta version with some major architectural enhancements. The SnapLogic model is very different from traditional ETL systems. It takes an approach that’s more like the web, based on loose coupling and HTTP interactions. We model data source, sinks, and transformations as URI addressable endpoints, and have a model where than can be chained together in pipelines to build transformation logic. We use a plugin architecture to make it easy to add custom components.

A Tale of Two Services.

Friday, last week, 15th Feb, two of the services I most depend on, failed. Now as it turned out, neither really concerned me at the time, as that same day my brother was taken seriously ill (he’s now doing fine and on the way to recovery). It’s only now I’ve had the time to think about the implications of these failures.

The first was my fixed wireless broadband provider, OmniTel (aka Callidus,aka Torque, aka IFA Telecom Wireless Broadband). Its signal was down yet again (4th time since Xmas, two of those for close on 7 days each!). Large areas of rural Ireland depend on providers such as Omnitel to supply them with what is now a basic service and I think many would agree that the end-user experience is, to put it as charitable as possible, sub-optimal. I’ll leave it to others to explain why we’re in such a mess, but a mess, it is.

For several reasons I’m not that overly concerned about this as the area I live in, now has at least one alternative wireless provider (Irish Broadband – and I see two three four of my neighbours have changed over to them since last week!) and I’m also within 3KM of an Eircom exchange, which means I have my trusty ISDN backup and will eventually (we’re on the “list”) have access to ADSL. Now, ISDN is not a suitable alternative if your house is full of iTune/YouTube obsessed young adults or if you need to constantly download large amounts of data (e.g. 10MB plus) but for “normal business stuff” it’s fine, I could live with it.

But, isn’t a datamith’s stock and trade large datasets? Well, yes and no. Many micro tasks such as data analysis tend to be carried out using Excel, which by its nature means you’re dealing with relatively small datasets or sub-sets of large databases, neither require significant bandwidth to load/upload. For larger datasets and more powerful ETL/analysis tasks I don’t depend on my local machines, I use Amazon EC2/S3. In fact, most of my business and personal computing infrastructure is now “cloud” based with my laptop reduced to the task of local cache/processor/communication’s device, similar to the role of my mobile phone, just a bigger keyboard and screen!

Which brings me neatly to the other failure of Friday the 15th, Amazon’s cloud services, EC2,S3,SQS and SimpleDB. As it turned out, it wasn’t the services themselves that failed rather the AWS authentication infrastructure was subjected to what could be described as a “friendly/unintentional” DoS attack. Existing publicly accessible S3/SimpleDB resources were still accessible and EC2 instances continued to operate, but anything requiring authentication failed. It reminds me a bit of the early days of RAID storage systems, the “miracle” of stripping and mirroring worked but failures still happened due to faulty power supplies or controller sub-systems.

The major complaint first-timers have when coming to terms with EC2 is the lack of post-shutdown/failure persistence on the virtual machine’s disks, data must be backed up to S3, otherwise it’s gone in the event of an instance failure. I’m guessing that the “oddness” of this architecture is to do with its suitability for the purposes that Amazon originally designed it for, and having proved it in their day-to-day business over the last decade or so, they’re sticking with it. Which is good, those of us who are now becoming dependant on this architecture want a robust and proven service. I suspect the authentication service is a new layer on the existing internal Amazon stack and is only now being stress-tested.

So it failed, and was fixed relatively quickly, but what’s more important, Amazon acknowledged the problems (not just the reason for the failure itself, but the less that perfect way their users were kept informed during the outage) and I’m reasonably confident they’ve learned from their mistakes. (To return to my rant on my broadband provider; I think the most annoying thing when the service goes down, is that the whole of Omnitel, help-line, accounts, even sales refuse to answer the phone (no forum, no status page) leaving their customers to wonder have they gone out of business or are they all hiding under their desks with their fingers in their ears shouting “Go away, go away”).

As a side note, two other services I use had hiccups this week, WordPress.Com was down for several hours on Wednesday (as a result of a DoS attack, I believe) and on Friday my Hamachi VPN service was down for a hour or so due to server resource problems.

So am I less confident in the viability of the “cloud” after this week of outages? No, I’m a believer in “risk management” rather than “risk avoidance”, as long as I’ve a “good enough” alternative (ISDN for broadband, standard Linux hosts for EC2) or a high degree of confidence in the supplier (Amazon S3 for backup and secure storage) I’m sticking with it. Not only that, I’m betting my career on it.

Update: Monday 25th

A bit windy today. You guessed it,broadband down again! So make that 5 times since Xmas. I see in their terms and conditions Callidus (OmniTel’s legal entity) promise 99% uptime within any month, that’s a little over 7 hours of acceptable outages per month, if only! On the plus side, I was talking to one of my neighbours who’d recently changed over to Irish Broadband, her experience with her new supplier where very positive. “Professionals. know what they’re doing, excellent customer service”, is how she described them.

Update: Wednesday. March 12th

Windy again last night; yep! gone again. Well I assuming it’s the wind, no reply at any of Omnitel’s numbers. Maybe they’re gone out of business!

Update: Weekend 28th-30th March

Omnitel down again Friday night (28th), my son says it was back at some stage during the weekend, but when I went to use it tonight (Sunday 30th) still not working. Left a text message on 087 2826671 their out-of-hours number (twice), but to no avail.

And this crowd were ..

… recently shortlisted in the Government’s National Broadband Scheme to provide broadband to the remaining areas currently unserved by broadband in the Republic of Ireland.

… and if they win those areas will continue to be “unserved” !

Update: Monday 31st March

Service back up and running at 12 noon! More amazing, when I rang the help line this morning, there was a message acknowledging the problem (could it be true, Omnitel have started to invest in customer relations!). Mind you, should I let them in on that other secret of modern customer service, the “status blog”?

Simply set-up a blog, e.g. http://omnitel.wordpress.com, and post network problem and resolution details, along side “good news” stories (e.g. network upgrades) and maybe even allow customer comments!

I know too much to hope for.

Update: Sat 12th April

Down again since 6PMish, actually this is the 3rd weekend in a row, but the last two were “just” Sunday night/Monday morning outages (or extreme slowness as per last Sunday PM /Monday AM) so I didn’t report them.

Update: Monday 28th April 19:00

Keeping with the now well established tradition of a weekend failure, Omnitel network down since Saturday 15:00ish, seems to be a major outage, still no sign of a return to “normal service”.  Time to phone Irish Broadband I think, Lo-Call 1890 56 44 56.

UPDATE: September 2008

No major outages in the last 5 months, and when they do happen, they’re fixed quickly and Omnitel are also now much better at keeping customers informed.  So praise where praise is due, well done; a huge improvement.

Google forgets to renew JotSpot domain!

Over the weekend I dusted down my JotSpot Wiki, cleaned out some old Wiki pages and generally made it useful as a client collaboration tool. I created some new pages and few “project diary” type blog entries to do with a proposal for work. I also set up a potential client as a contributor and sat back to reap the collaborative benefits of one of the finer Wiki tools out there.

Unfortunately, by Monday afternoon all was not well. The jot.com domain no longer pointed at JotSpot, instead it was “parked” at Network Solutions a domain name registrar. Now this generally happens to domains when they’re not renewed or your credit card company refuses to honour your request for payment. If JotSpot were a two-guys-in-a-garret operation you could see how this could happen, but JotSpot is now owned by Google.

Google’s neglect of the product and its secrecy over future plans has been a major concern to the original service’s loyal, (but I would imagine, declining) user base, but yesterday that neglect hit a new low.

The problem was fixed relatively quickly, but due to DNS migration issues, 24 hours later, many users of the service are still locked out. That’s a problem, but hey, s**t happens. What’s really astounding is Google’s complete silence on the subject over on the JotSpot support forum.

Makes you wonder how much of your commercial or indeed personal data assets you should entrust with such an organisation. Big brother may be watching you, but he’s not about to demean himself by actually communicating with you.

I’ve had this sort of problem with another Google Apps services in the past and I’ve seen problems with gmail similar to those experienced by Jeff Nolan. I’m about to launch my www.gobansaor.com business site and my intention was to host it under Google Apps (which rumour has, will soon incorporate some variation on JotSpot). My dilemma is now whether to forge ahead with my original plan to use Google Apps or use a local Irish hosting service. Or, maybe I should fork out the $50 fee for the Google Apps Premier Edition with its “24/7 assistance, including phone support for critical issues”.

Decisions, decisions.

UPDATE:

Two days after the event, Google acknowledges the problem.

UPDATE: 28th Feb 2008

JotSpot is reborn as Google Sites.

Initial quick look; I like it, keeps a lot of the simplicity of the pure Wiki side of JotSpot (the “structured  Wiki”as an alternative to a database/”application builder” is no more).  But the integration with the rest of Google Docs is to be welcomed if a bit limited at the moment (documents must be published first from within Google Docs and their URLs then  “cut and pasted” into the Sites application).

The new Google Spreadsheet’s forms functionality should make up for the loss of the JotSpot database functionality, at least for me.  Having the ability to point a CNAME at the resulting wikis is also very useful for client project collaboration.

SimpleDB + S3 = distributed document-centric database

I’m a database man. I’ve worked on or about most variations on the theme, from roll-your-own flat files, to hierarchical, to CODASYL network databases, to the current crop of relational and MOLAP platforms. Of late, I’ve being investigating what I think will be the future of database technology, the distributed document-centric database. Today, the future arrived in the form of Amazon’s new SimpleDB service.

Up until now Amazon’s S3 service offered one half of the future platform the “distributed document-centric” bit but it lacked the indexed structure part to make it a true database; but in combination with SimpleDB it’s now complete.

SimpleDB stores data in a Domain/Attribute schema-less and type-less structure having more in common with a spreadsheet than a traditional relational table. If you’ve worked with the likes of SQLite (manifest typing) or Excel (no predefined schema and manifest typing) then you’ll appreciate this is no hardship, quite the opposite in fact (I find the strong typing nature of most databases a real pain having worked recently on a SQLite combined with Excel project).

The distributed nature of SimpleDB may however pose some difficulty to those of us (i.e. almost everybody) raised in the world of ACID compliant databases. Because of the Brewer’s Conjecture effect, SimpleDB sacrifices consistency for availability and partition tolerance i.e. when you write something to the database, an immediate query may not return the updated value, subsequent queries will eventually return the new data, exactly when depends on the load and the availability of resources. Those of you already using S3 will already be living with this “feature”, and in practice you rarely notice it (most updates seem to appear immediately) but it will still pose design challenges to handle the edge cases.

The service is still in limited Beta, but the documentation is available and if you already used any other AWS product you’ll immediately feel at home. The pricing is again based on usage, the cost of storage is much higher than S3, being $1.50 per GB-month, but a GB of structured data is an awful lot of data (and the larger document style storage would be provided by S3).

If you’ve not yet tried out either S3 or EC2, now might be a good time to start, cloud computing has come down to earth, all thanks to an online book store, Amazon!

Zimki – the spirt lives on …

Although Zimki is to shut down on Christmas Eve, the ideas behind the service live on. Two new offerings, Horuku and AppJet, offer variations on the idea of hosted application development/deployment.

AppJet, funded by Paul Graham‘s Y-Combinator, is very similar to Zimki, being a server-side JavaScript platform. No details yet as to what sort of paid options will be offered (all accounts are free at the moment). Unlike Zimki there’s no plans to create an open-source version. I like the easy “build a Facebook app” feature; and I guess this is the sort of light-weight applications that they hope to attract.

Although Heroku uses Ruby-on-Rails technology, rather than JavaScript, it is closer to the original Zimki idea; but rather than take the hard (and ultimately unsuccessful in Zimki’s case) road of building an open-source platform from scratch, Heroku takes an already popular open-source project and offers it wrapped in a full on-line development and deployment environment. Again, being in beta, there’s no indication as to what pricing model it will operate under, but I would think that it will attract more “serious” projects than AppJet since anything developed under Heroku is pure Rails which means it can be migrated to any other Rails hosting environment; so no lock-in. The online editor is excellent and whatever about its merits as a hosting service it’s by far the easiest way to learn and explore Ruby and Rails, even easier than this…

If Facebook apps are your goal but you wish to use Ruby rather than AppJet’s JavaScript then not to panic, as being Ruby some bright young spark (no, not me I’m afraid) will already have done a lot of the hard graft for you…

Firefox tune up time again …..

This morning Firefox just got slower and slower; clicking on a link or a text box took ages to respond; using online WYSIWYG editors became next to impossible; I was also getting an error when attempting to connect to Google Sync.

I checked the usual suspects; internet connection OK; did a quick HijackThis scan and analysis to check if anything nasty was on the PC, nope, again OK; fired up IE7, it worked fine; launched Firefox in safe mode (disables add-ons and other extensions) but the problem persisted. All signs that the culprit was my Firefox profile.

This has happened before so I knew what to do.

Firefox Profile Dialog

From the command line I launched Firefox with the “-p” option which brings up the profile dialog, created a new profile and relaunched; everything back to normal, except of course all my bookmarks and my browser extensions were gone.

Reinstalling my extensions is easy enough and offers an opportunity to do some much needed spring cleaning. The first extension I always re-install is Google Sync for when its back in business I can then restore my old bookmarks and passwords (not my highly sensitive passwords I hasten to add, I use KeePass to manage those – never store financial passwords and the like in your browser’s profile).

The extensions I regard as must have are:

  • Google Browser Sync – keeps a backup of my bookmarks, remembers what tabs I had open last time and restores them if required, means I can easily flick between my laptop and my desktop. (And of course, it’s very useful when it comes to rebuilding a new PC or profile!). Google Sync to be discontinued.
  • Del.icio.us add-on – tag and search using my del.icio.us account. Why use both del.icio.us and Google Sync’d bookmarks? Well, I use bookmarks for my day-to-day commonly used links, while I use http://del.icio.us as my long-term KM memory bank.
  • S3Fox – for managing my backups and other file storage needs on my Amazon S3 account.
  • Flash Got – download manager used in conjunction with Free Download Manager. UPDATE: I’m now using DownThemAll (a Firefox plugin) rather than FDM mainly to do with FDM’s inability to handle certain ASP and PHP redirects, the prime example being downloads from SourceForge.
  • Google Toolbar – for searching blogs, quick link to Gmail, spell checker, page rank checker.
  • British English Dictionary – to use Firefox’s built-in spell checker (using this now, rather than Google Toolbar’s spell checker).
  • PDF Download – gives me control over how I access PDF links.
  • NoScript – allows me to control what JS/Java/Flash scripts run , also provides excellent XSS protection. Can be annoying sometimes, but I stick with it. To make it less annoying (but not as secure) go to Options and allow top-level sites by default (including 2nd level domains).
  • EC2 UI – for controlling my Amazon EC2 images.
  • I also install but not auto-enable several other add-ons such as Firebug (understand/debug the structure of a web page), iMacros (web browsing macro recorder/ screen scrapper) and SQLiteManager (manages my SQLite databases).

Ruby plus Amazon S3 – Document Centric Database

I’ve said it before and I’m going to repeat myself; learning Ruby has proven to be a great investment, not so much for the language itself but for the insights it gives into other technologies. As soon as a new ‘cool’ technology or idea hits the street some smart Rubyist is bound to attack it, dice it up and serve it back up as easy to digest Ruby code.

Today, it’s the turn of Document Centric Databases done in the style of CouchDB, but replacing JavaScript/Erlang with Ruby and the bespoke data store with Amazon’s S3 service.

Anthony Eden‘s RDDB project is still very much alpha, but looking through the code it looks like it has lots of good ideas, including using EC2 instances as “map reduce workers” listening on Amazon SQS Queues; so the whole Amazon AWS stack might yet get staring roles. The actual data store can be varied, with both partitioned file system and RAM based options currently available alongside S3.

Other Amazon AWS related news, was the announcement today of an option to use European data centres to store S3 data (with a slightly higher charge than using North American locations and with the transfer of data between EU based S3 buckets and US based EC2 instances being no longer free). I’m guessing that the option to fire up European based EC2 servers can’t be far behind. Also, one piece of news I’d missed was that EC2 is now in unlimited beta i.e. it’s now open to all developers. So developers everywhere can, for less that the cost of a mobile text message, fire up their own dedicated and powerful Linux server. The day of a production ready, SLA backed, EC2 service is around the corner.

CrashPlan – the best backup service yet?

You know when you come across something so simple, so obvious and so brilliant you wonder, why didn’t I think of that? Well for personal/small business data backup I’ve just had one of those moments.

CrashPlan is a consumer/SMB orientated backup service following in the footsteps of Mozy (a service I’ve used in the past and still recommend to others) but CrashPlan has one extra little facility that makes it the best yet; the ability to backup not just to a secure off-site data store but also, as an alternative or as an additional backup, the facility to copy data to another local PC or remotely to a friend’s PC. And the best part is that backup to other PCs is free (after the once-off $20 software licence), so you could have a local copy for fast backup (and more importantly fasy restore) plus a free remote copy on a friend’s or indeed a work machine.

The data is compressed and encrypted thus protecting your data from prying eyes and your friend from any danger of virus/malware cross-infection. Any combination of Windows, Mac or Linux boxes are supported and they say the software can negotiate most firewall situations. Future support for Amazon s3 as a remote backup location is on the product’s to-do list. Brilliant!

Take Mind Mapping offline with Google Gears

I’ve been a long time fan of mind maps (the pencil and paper type) and have also occasionally used the excellent and free computer based FreeMind to good effect. Over the last year or so a number of online mind mapping tools have appeared and I see that one of the better ones, www.mindmeister.com, can now be used both online and offline thanks to the magic of Google Gears; I think this is the first non-Google implementation of Gears I’ve seen in the wild.

I’m using the free (up to six mind maps) version of MeidMeister but like other such services that require a monthly subscription for access to the unlimited premium edition I’m unlikely to bite. I’m afraid I’m spoiled by the free offerings of the likes of Google Apps and the pay-as-you-go offerings of Amazon Ec2/S3 so the idea of paying a fixed monthly charge for a ‘point-solution’ doesn’t appeal.

Perhaps their long term strategy is to be purchased by the likes of Google and indeed the product would fit in beautifully with existing Google Apps offerings right down to the wiki-like sharing facilities. Nevertheless, well worth checking out the free version and if sharing and collaborating of multiple mind maps is your thing (schools come to mind) then the €3.21 monthly charge is very reasonable. Or perhaps you could use their sponsoring facility to pay for a premium licence for your local school.

Amazon EC2: S, L and XL – now we’re sucking diesel..

As of today, Amazon EC2 now supports two new Instance Types..

… a “Large” and an “Extra Large” instance type to complement the original instance type and provide more flexibility for EC2 users. The new instance types provide more memory, CPU, and instance storage, and are based on 64bit technology. EC2 users can now utilize these different instance sizes to support an even broader set of applications and use cases.

The Large Instance is equivalent to roughly four Small Instances (our original instance), and the Extra Large Instance is roughly equivalent to eight Small instances.

This increases the attractiveness of EC2 as a platform for micro ETL/BI activities, the extra memory accessible under the new 64bit instances makes the commissioning of pure in-memory on-demand open source PALO OLAP instances a real alternative. And it’s not just micro BI activities that could utilise this sort of service, many of the large BI implementations I’ve worked on in the past could easily be handled by this type of kit.

Also this week, /n Software announced the private beta of a Java version of their RSSBUS Server Engine; this could be a very useful on-demand micro ETL tool especially now that it will be capable of running under Linux (the current version requires a Windows IIS Server).

Google Spreadsheets – ETL tool

Although I’m a total Excel fanboy, I most admit I rarely use it any longer for personal stuff such as home budgets, tax calculations, what-ifs, to-do lists etc.; I now tend to use Google Spreadsheets. Likewise, personal notes, drafts and useful bits of code are stored using Google Docs rather than MS Word. Three main reasons for this shift to the cloud:

  • Google Docs & Spreadsheets are ‘good enough’ for most of the trivial lists and calculations I require in my personal life and indeed for most business purposes as well, at least those that don’t require a pivot table.
  • These spreadsheets and documents are important but not necessarily in the ‘state secret/I-could-tell-but-then-I’d-have-to-kill-you’ scale of things, by building them in Google Apps they are securely backed-up and easily accessible.
  • A lot of the spreadsheets are collaborative in nature, and in the collaboration field, Google Spreadsheets just gets better and better.

Today, Google announced further additions to their spreadsheet product. The AutoFill feature adds functionality I’ve come to expect from Excel, but with a twist, integration with Google Sets. But the additions that really caught my eye were the new data import functions. Now again, Excel has had web queries since Excel97, and it always amazed me why online pretenders to the throne tended to ignore the most common source of tabular data on the web, the HTML table; something to do with the great XML/Tables divide I guess!

Google now not only fixes this omission,providing access to HTML tables and comma/tab separated file, but also provides access to RSS/ATOM and generic XML sources. All that’s missing now are functions that can read other common online data files formats such as Excel, MSAccess, XBase and of course SQLite.

This addition of HTML import support and the AutoFill feature will further reduce the number of times I’ll need to fire up Excel for personal tasks, but the RSS/ATOM/XML import feature also has potential as a tool in my micro-ETL toolbox. Using Excel as my only micro-ETL tool is possible when the data is either already in Excel/CSV or accessible via a COM API or via ODBC drivers, otherwise I can call-in either Ruby, Talend, Kettle or even RSSBus. But now I’ve another option, if the data is public and published as RSS/ATOM or some other variation on XML, I can use Google Spreadsheets to fetch the data and import the resulting tabular dataset into Excel via a Web Query or via the GData API.

New Google Reader Search facilityOne other thing. While researching this post, looking up links etc. I used another new feature Google added today, Google Reader’s new Search facility. As most of my references are discovered via the blogs I subscribe to, the ability to restrict searches to that subset of the web is fantastic; I even used it to search through my own blog posts! If del.icio.us offered the same option it would make re-finding stuff even easier. I did try to use Google Co-Op to build a search engine restricted to my del.icio.us links but it didn’t seem to like the volume of links (4000 odd) I sent it.

Moved to blog.gobansaor.com

Over the weekend I transferred this blog over to my own sub-domain, http://blog.gobansaor.com. The blog continues to be hosted by WordPress.com and the old http://gobansaor.wordpress.com addresses will continue to work. Most RSS readers will also gracefully (I hope) handle the transfer of the RSS feed, but if not, you may wish to resubscribe via http://blog.gobansaor.com/feed. If you encounter any problems please let me know via the comments below.

As a result of the remapping, I’ve temporarily lost all my Google mojo, but the 302 redirects should restore that over time. My yearly blog hosting costs have now risen from nought to the princely sum of $10 (plus the cost of the domain). So why do it? Three main reasons:

  • I’ve decided to carry on my datasmithing business under the www.gobansaor.com banner (moving it from www.gleesonIT.com). And as this blog is my main presence on the net I wanted to harness it as a marketing tool. Also, gleesonIT is mostly associated with my wife’s home IT coaching/support service and as she has now returned to her full-time career as a civil servant we need to rethink its focus. (I may use it as a vehicle for offering IT consulting/services to others like myself , i.e. digital free-agents; but, to use marketing speak, I might end up “diluting the brand”, maybe I should offer all services under the a single banner, i.e. the Gobán Saor “brand”; but then that might “overload the brand“, decisions, decisions :-( ).
  • Increasingly, at least for those of us in the technology game, your online presence(s), be that your blog or your facebook profile or your forum-based contributions, act(s) as a surrogate CV. As this blog is starting to gather some traction I figured now, rather than later, is the time to bring it more under my own control and “brand”.
  • I wanted to give something back to Matt et al. for providing me and others (including my son) with the fantastic product and service that is WordPress.com. I figured $10 a year is the least I could do.

What use is a blog to a small business?

In future when Frank Fullard is asked that question (and the associated “What would I blog about?”) he’s going to point to the Ice Cream Ireland site. I’ve been in their shop in Dingle (if you like ice cream you’ll love it) but I was unaware they also had an outlet in Killarney, now I know, as a result of reading their blog.

Web Offline – all data lost!

…just a warning, get a life and get a data-backup strategy ;-)