Category Archives: Web2.0

Amazon’s SAN in the cloud is a mirage…

This morning I got very excited.  While quickly scanning the headlines of the 1000+ unread feeds that had accumulated in my Google Reader this week, one heading in particular caught my attention, “Amazon Elastic Block Store goes live!“.

The post from the Right Scale folks gives a detailed overview of the new  Amazon ‘SAN storage in the cloud’ service, aka Elastic Block Store, aka EBS.  Alas, this particular cloud offering was a mirage, the post was subsequently removed (but can still be viewed on Robert Scoble’s Shared Items) it seems the post was a work-in-progress and not intended for publishing, yet!

Why was I so excited?  Amazon EC2 had two major shortcomings when it launched 2 or so years ago; the first, ephemeral IP addresses, was solved by the new Elastic IP feature; the second, ephemeral storage volumes (when you shutdown an instance the disks are wiped!) is due to be solved by EBS.  With both of these problems solved, EC2, already near perfect, would be perfect.

The article does a good job of explaining the new service…

EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at anytime to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot.

The thing that caught my eye in the above paragraph was the snapshot facility.  Snapshots are to be stored on S3 via an EC2-specific incremental-snapshot API.  This means the volumes will come with a built-in back-up facility. This is important as EBS drives reside in one availability zone (that of the instance that they are mounted against) and do not have the data replication security offered by S3.  It also means that disk systems can be restored quickly and simply from snapshots without the overhead  (and bugs!) of writing an S3 specific incremental backup and restore utility.

Back to waiting…

UPDATE: 20th August

Wait over…

Python the new VBA ?

These last two weeks, Python has been on my mind. First off, last week I decided to make time to fully investigate Picalo, an open-source Python-based data analysis tool, and then, this week, Google announced their long awaited cloud-computing offering, Google Apps Engine, with the language at its core.

Python was the first of the “LAMP generation” scripting languages that I decided to learn in any detail ( I had used Perl before that but only on a per-task basis (similar to how I’d used AWK)). I then invested time in learning PHP, then Ruby and finally JavaScript. And here I am, back where I started, with Python.

But it’s not the same Python I learned three years ago, not that it has changed that much, but my appreciation of the language has, largely due to my deep dives into other languages. For example, JavaScript’s treatment of functions as first-class objects, highlighted the same functionality in Python, something I’d missed (or rather, not fully understood) the first time I encountered the language. Likewise, Ruby’s RoR introduced me to a “best of breed” approach to web application design, something that can be used as a comparison aid when approaching new web frameworks such as Django.

But of course the scripting language that continues to power most of my datasmithing activities is Excel VBA. That’s why I was so excited to see a tool such as Proto utilise VBA as its scripting language. But, Microsoft has abandoned VBA, there will be no more Protos.

Also, Excel VBA is now a Windows only language. Windows, however, is no longer the ‘only’ business client OS (see how many Apple laptops you can spot the next time you’re in a business-class airport lounge, a few years ago it would have been zero, not any more), and is currently nowhere to be seen as a cloud computing platform (but that’ll change).

I’m at heart a table-oriented programmer, and I, like Picalo’s author Conan Albrecht, believe “data analysis is best done through scripting”; but not just data analysis, the T in ETL (Extract, Transform and Load) and the I in DI (Data Integration) and SI (Systems Interfacing) also benefit from a scripting approach.

So, what to adopt as a successor/companion-in-her-old-age to VBA, will it be Ruby, JavaScript, Python, Perl, even PHP?

It looks like it’ll be Python because it’s …

The runner up is of course Ruby, but its poor integration with Windows is a major problem and the datasmithing “prior art” of Picalo and Resolver makes Python hard to beat.

UPDATE Jan 2010:

To experience the best of both worlds, VBA & Python, my xLite (Excel combined with SQLite) datasmithing platform now allows Python to be used in conjunction with VBA.  Check it out here

 

UPDATE: July 2011:

For another method of integrating Python (this time .NET’s IronPython) with Excel/VBA see http://blog.gobansaor.com/2011/07/18/vba-multithreading-net-integration-via-hammer/
UPDATE:

Also, as Dan pointed out in the comments below, I’d not included Jython in my list of reasons for embracing Python. I must add it to my list of things to try out particularly as both my “classic” ETL tools, Talend and Kettle are JVM based.

Another thing to add to the (ever growing) list is Mike Pitarro’s SnapLogic python-based ETL tool. They have …

…just released a 2.0 Beta version with some major architectural enhancements. The SnapLogic model is very different from traditional ETL systems. It takes an approach that’s more like the web, based on loose coupling and HTTP interactions. We model data source, sinks, and transformations as URI addressable endpoints, and have a model where than can be chained together in pipelines to build transformation logic. We use a plugin architecture to make it easy to add custom components.

A Tale of Two Services.

Friday, last week, 15th Feb, two of the services I most depend on, failed. Now as it turned out, neither really concerned me at the time, as that same day my brother was taken seriously ill (he’s now doing fine and on the way to recovery). It’s only now I’ve had the time to think about the implications of these failures.

The first was my fixed wireless broadband provider, OmniTel (aka Callidus,aka Torque, aka IFA Telecom Wireless Broadband). Its signal was down yet again (4th time since Xmas, two of those for close on 7 days each!). Large areas of rural Ireland depend on providers such as Omnitel to supply them with what is now a basic service and I think many would agree that the end-user experience is, to put it as charitable as possible, sub-optimal. I’ll leave it to others to explain why we’re in such a mess, but a mess, it is.

For several reasons I’m not that overly concerned about this as the area I live in, now has at least one alternative wireless provider (Irish Broadband – and I see two three four of my neighbours have changed over to them since last week!) and I’m also within 3KM of an Eircom exchange, which means I have my trusty ISDN backup and will eventually (we’re on the “list”) have access to ADSL. Now, ISDN is not a suitable alternative if your house is full of iTune/YouTube obsessed young adults or if you need to constantly download large amounts of data (e.g. 10MB plus) but for “normal business stuff” it’s fine, I could live with it.

But, isn’t a datamith’s stock and trade large datasets? Well, yes and no. Many micro tasks such as data analysis tend to be carried out using Excel, which by its nature means you’re dealing with relatively small datasets or sub-sets of large databases, neither require significant bandwidth to load/upload. For larger datasets and more powerful ETL/analysis tasks I don’t depend on my local machines, I use Amazon EC2/S3. In fact, most of my business and personal computing infrastructure is now “cloud” based with my laptop reduced to the task of local cache/processor/communication’s device, similar to the role of my mobile phone, just a bigger keyboard and screen!

Which brings me neatly to the other failure of Friday the 15th, Amazon’s cloud services, EC2,S3,SQS and SimpleDB. As it turned out, it wasn’t the services themselves that failed rather the AWS authentication infrastructure was subjected to what could be described as a “friendly/unintentional” DoS attack. Existing publicly accessible S3/SimpleDB resources were still accessible and EC2 instances continued to operate, but anything requiring authentication failed. It reminds me a bit of the early days of RAID storage systems, the “miracle” of stripping and mirroring worked but failures still happened due to faulty power supplies or controller sub-systems.

The major complaint first-timers have when coming to terms with EC2 is the lack of post-shutdown/failure persistence on the virtual machine’s disks, data must be backed up to S3, otherwise it’s gone in the event of an instance failure. I’m guessing that the “oddness” of this architecture is to do with its suitability for the purposes that Amazon originally designed it for, and having proved it in their day-to-day business over the last decade or so, they’re sticking with it. Which is good, those of us who are now becoming dependant on this architecture want a robust and proven service. I suspect the authentication service is a new layer on the existing internal Amazon stack and is only now being stress-tested.

So it failed, and was fixed relatively quickly, but what’s more important, Amazon acknowledged the problems (not just the reason for the failure itself, but the less that perfect way their users were kept informed during the outage) and I’m reasonably confident they’ve learned from their mistakes. (To return to my rant on my broadband provider; I think the most annoying thing when the service goes down, is that the whole of Omnitel, help-line, accounts, even sales refuse to answer the phone (no forum, no status page) leaving their customers to wonder have they gone out of business or are they all hiding under their desks with their fingers in their ears shouting “Go away, go away”).

As a side note, two other services I use had hiccups this week, WordPress.Com was down for several hours on Wednesday (as a result of a DoS attack, I believe) and on Friday my Hamachi VPN service was down for a hour or so due to server resource problems.

So am I less confident in the viability of the “cloud” after this week of outages? No, I’m a believer in “risk management” rather than “risk avoidance”, as long as I’ve a “good enough” alternative (ISDN for broadband, standard Linux hosts for EC2) or a high degree of confidence in the supplier (Amazon S3 for backup and secure storage) I’m sticking with it. Not only that, I’m betting my career on it.

Update: Monday 25th

A bit windy today. You guessed it,broadband down again! So make that 5 times since Xmas. I see in their terms and conditions Callidus (OmniTel’s legal entity) promise 99% uptime within any month, that’s a little over 7 hours of acceptable outages per month, if only! On the plus side, I was talking to one of my neighbours who’d recently changed over to Irish Broadband, her experience with her new supplier where very positive. “Professionals. know what they’re doing, excellent customer service”, is how she described them.

Update: Wednesday. March 12th

Windy again last night; yep! gone again. Well I assuming it’s the wind, no reply at any of Omnitel’s numbers. Maybe they’re gone out of business!

Update: Weekend 28th-30th March

Omnitel down again Friday night (28th), my son says it was back at some stage during the weekend, but when I went to use it tonight (Sunday 30th) still not working. Left a text message on 087 2826671 their out-of-hours number (twice), but to no avail.

And this crowd were ..

… recently shortlisted in the Government’s National Broadband Scheme to provide broadband to the remaining areas currently unserved by broadband in the Republic of Ireland.

… and if they win those areas will continue to be “unserved” !

Update: Monday 31st March

Service back up and running at 12 noon! More amazing, when I rang the help line this morning, there was a message acknowledging the problem (could it be true, Omnitel have started to invest in customer relations!). Mind you, should I let them in on that other secret of modern customer service, the “status blog”?

Simply set-up a blog, e.g. http://omnitel.wordpress.com, and post network problem and resolution details, along side “good news” stories (e.g. network upgrades) and maybe even allow customer comments!

I know too much to hope for.

Update: Sat 12th April

Down again since 6PMish, actually this is the 3rd weekend in a row, but the last two were “just” Sunday night/Monday morning outages (or extreme slowness as per last Sunday PM /Monday AM) so I didn’t report them.

Update: Monday 28th April 19:00

Keeping with the now well established tradition of a weekend failure, Omnitel network down since Saturday 15:00ish, seems to be a major outage, still no sign of a return to “normal service”.  Time to phone Irish Broadband I think, Lo-Call 1890 56 44 56.

UPDATE: September 2008

No major outages in the last 5 months, and when they do happen, they’re fixed quickly and Omnitel are also now much better at keeping customers informed.  So praise where praise is due, well done; a huge improvement.

Google forgets to renew JotSpot domain!

Over the weekend I dusted down my JotSpot Wiki, cleaned out some old Wiki pages and generally made it useful as a client collaboration tool. I created some new pages and few “project diary” type blog entries to do with a proposal for work. I also set up a potential client as a contributor and sat back to reap the collaborative benefits of one of the finer Wiki tools out there.

Unfortunately, by Monday afternoon all was not well. The jot.com domain no longer pointed at JotSpot, instead it was “parked” at Network Solutions a domain name registrar. Now this generally happens to domains when they’re not renewed or your credit card company refuses to honour your request for payment. If JotSpot were a two-guys-in-a-garret operation you could see how this could happen, but JotSpot is now owned by Google.

Google’s neglect of the product and its secrecy over future plans has been a major concern to the original service’s loyal, (but I would imagine, declining) user base, but yesterday that neglect hit a new low.

The problem was fixed relatively quickly, but due to DNS migration issues, 24 hours later, many users of the service are still locked out. That’s a problem, but hey, s**t happens. What’s really astounding is Google’s complete silence on the subject over on the JotSpot support forum.

Makes you wonder how much of your commercial or indeed personal data assets you should entrust with such an organisation. Big brother may be watching you, but he’s not about to demean himself by actually communicating with you.

I’ve had this sort of problem with another Google Apps services in the past and I’ve seen problems with gmail similar to those experienced by Jeff Nolan. I’m about to launch my www.gobansaor.com business site and my intention was to host it under Google Apps (which rumour has, will soon incorporate some variation on JotSpot). My dilemma is now whether to forge ahead with my original plan to use Google Apps or use a local Irish hosting service. Or, maybe I should fork out the $50 fee for the Google Apps Premier Edition with its “24/7 assistance, including phone support for critical issues”.

Decisions, decisions.

UPDATE:

Two days after the event, Google acknowledges the problem.

UPDATE: 28th Feb 2008

JotSpot is reborn as Google Sites.

Initial quick look; I like it, keeps a lot of the simplicity of the pure Wiki side of JotSpot (the “structured  Wiki”as an alternative to a database/”application builder” is no more).  But the integration with the rest of Google Docs is to be welcomed if a bit limited at the moment (documents must be published first from within Google Docs and their URLs then  “cut and pasted” into the Sites application).

The new Google Spreadsheet’s forms functionality should make up for the loss of the JotSpot database functionality, at least for me.  Having the ability to point a CNAME at the resulting wikis is also very useful for client project collaboration.

SimpleDB + S3 = distributed document-centric database

I’m a database man. I’ve worked on or about most variations on the theme, from roll-your-own flat files, to hierarchical, to CODASYL network databases, to the current crop of relational and MOLAP platforms. Of late, I’ve being investigating what I think will be the future of database technology, the distributed document-centric database. Today, the future arrived in the form of Amazon’s new SimpleDB service.

Up until now Amazon’s S3 service offered one half of the future platform the “distributed document-centric” bit but it lacked the indexed structure part to make it a true database; but in combination with SimpleDB it’s now complete.

SimpleDB stores data in a Domain/Attribute schema-less and type-less structure having more in common with a spreadsheet than a traditional relational table. If you’ve worked with the likes of SQLite (manifest typing) or Excel (no predefined schema and manifest typing) then you’ll appreciate this is no hardship, quite the opposite in fact (I find the strong typing nature of most databases a real pain having worked recently on a SQLite combined with Excel project).

The distributed nature of SimpleDB may however pose some difficulty to those of us (i.e. almost everybody) raised in the world of ACID compliant databases. Because of the Brewer’s Conjecture effect, SimpleDB sacrifices consistency for availability and partition tolerance i.e. when you write something to the database, an immediate query may not return the updated value, subsequent queries will eventually return the new data, exactly when depends on the load and the availability of resources. Those of you already using S3 will already be living with this “feature”, and in practice you rarely notice it (most updates seem to appear immediately) but it will still pose design challenges to handle the edge cases.

The service is still in limited Beta, but the documentation is available and if you already used any other AWS product you’ll immediately feel at home. The pricing is again based on usage, the cost of storage is much higher than S3, being $1.50 per GB-month, but a GB of structured data is an awful lot of data (and the larger document style storage would be provided by S3).

If you’ve not yet tried out either S3 or EC2, now might be a good time to start, cloud computing has come down to earth, all thanks to an online book store, Amazon!