Category Archives: S3

Windows on EC2 = SMEs on EC2

The announcement that Win2003 is now an an option on EC2, is very significant, that and EC2′s exit from beta status with an SLA in tow, means that AWS is now very much more appealing to the great unwashed, the SMEs. i.e. the businesses who form the backbone of most of our economies.

Large companies and start-ups are comfortable in the world of Linux servers but most small companies are Windows to the core.  This may not be “right”, this may not be how it “should be”, but it is so.   Even within large companies, departmental computing is largely a Windows only enclave, with MS Office (and Excel in particular) as the backbone and MS SQL Server as the database of choice (or is that, no choice).

The other interesting thing is that my fear that EC2 SQL Server Standard instances would be licensed as per Oracle has not come to pass (Oracle while making a “big thing” of their recent EC2 cloud conversion, still insist on traditional licensing for EC2 database instances). SQL Server Standard is available on a pay-as-you-go model, brilliant!.

Even if running Win2003 as a server doesn’t catch your fancy and in fact you would much rather get rid of your existing Window’s laptop to be replaced by a cool new Apple Mac. Unfortunately you still need the ability to run Windows-only software, why not use EC2 as your on-demand pay-as-you-go Window’s desktop replacement?  Simply configure a Windows AMI with your required software (you may have to use something like this, if software is only available on CD); you could then use Jungle Disk to easily share data (via S3) between your new shiny Mac and the AMI.  Power up and down as required, easier than using VMWare or Parallels and @ 12.5c per hour, probably cheaper too.

Amazon’s SAN in the cloud is a mirage…

This morning I got very excited.  While quickly scanning the headlines of the 1000+ unread feeds that had accumulated in my Google Reader this week, one heading in particular caught my attention, “Amazon Elastic Block Store goes live!“.

The post from the Right Scale folks gives a detailed overview of the new  Amazon ‘SAN storage in the cloud’ service, aka Elastic Block Store, aka EBS.  Alas, this particular cloud offering was a mirage, the post was subsequently removed (but can still be viewed on Robert Scoble’s Shared Items) it seems the post was a work-in-progress and not intended for publishing, yet!

Why was I so excited?  Amazon EC2 had two major shortcomings when it launched 2 or so years ago; the first, ephemeral IP addresses, was solved by the new Elastic IP feature; the second, ephemeral storage volumes (when you shutdown an instance the disks are wiped!) is due to be solved by EBS.  With both of these problems solved, EC2, already near perfect, would be perfect.

The article does a good job of explaining the new service…

EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at anytime to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot.

The thing that caught my eye in the above paragraph was the snapshot facility.  Snapshots are to be stored on S3 via an EC2-specific incremental-snapshot API.  This means the volumes will come with a built-in back-up facility. This is important as EBS drives reside in one availability zone (that of the instance that they are mounted against) and do not have the data replication security offered by S3.  It also means that disk systems can be restored quickly and simply from snapshots without the overhead  (and bugs!) of writing an S3 specific incremental backup and restore utility.

Back to waiting…

UPDATE: 20th August

Wait over…

Amazon S3; there’s a holdup on the buckets, Dear Liza…

Amazon’s S3 service has been down since 9.00am PDT but I only noticed an hour ago (2.30pm PDT) when a EC2 instance launch failed.

Am I worried? No, but as I become more and more dependent on such services, perhaps I will, but then again at least I’ll not be alone.  WordPress.com and countless others will be using the same excuse to their customers and unlike Renginald Perrin who had a different excuse every day for his train’s late arrival…

Ep.1   “Eleven minutes late, staff difficulties, Hampton Wick.”
Ep.1   “Eleven minutes late, signal failure at Vauxhall.”
Ep.1   “Eleven minutes late, staff shortages, Nine Elms.”
Ep.1   “Eleven minutes late, derailment of container truck, Raynes Park.”
Ep.1   “Eleven minutes late, seasonal manpower shortages, Clapham Junction.”
Ep.2   “Eleven minutes late, defective junction box, New Malden.”
Ep.4   “Eleven minutes late, overheated axle at Berrylands.”
Ep.4   “Eleven minutes late, defective axle at Wandsworth.”
Ep.5   “Eleven minutes late, somebody had stolen the lines at Surbiton.”

a whole industry will shout in unison “6 hours late (and counting), overheated axle on US Buckets…”

xlAWS – 100,000 downloads?

Not sure, but this morning I received my monthly AWS bill, and it was double its usual amount! When I investigated the extra cost it was due to 133GBs of downloads from my www2.gobansaor.com bucket. This is the S3 bucket in which I store the xlAWS zip file, xlAWS being a “library-of-sorts” of VBA/VB6 helper code for accessing Amazon S3 and SimpleDB.

It’s linked to from this page on my blog (which has had 200 or so hits this month) and from this AWS Community Code page. The excessive hits on the bucket started on the 28th of Feb , the day the xlAWS code was published on Amazon and continued through most of March. Talking the size of the zip file, 133GB represents approximately 100,000 downloads. I don’t have server logging enabled on the bucket, so I can’t be sure how much is due to the other public files in the bucket (all belonging to the VBA/Proto SQLite xLite project), but as that project has been available for months and is accessible only through my website (who’s stats show a consistent 5-10 downloads per week) I’m guessing the downloads are for xlAWS.

Who would have though that there would be such interest in VBA/VB6 code for accessing AWS services! I wonder was it the Excel VBA side of the house or the dispossessed (and p*ssed off) VB6 developer hoards who downloaded it the most? Leave a comment if you downloaded and used the library, I’d love to know.

Postgres Plus Cloud Edition is boring …

… and that’s good. That’s how I like my databases, boring, reliable, consistent, easy to use.

SimpleDB on the other hand is not boring, it’s an exciting new shiny thing that opens up a myriad of new possibilities; but first, I and the rest of the developer community, need to tool up and cast aside some of our cherished database design patterns (oh like, 3rd normal form, strong typing, joins, nothing major) and embrace a slightly different way of thinking, however, as much as I like a challenge, I also like to get things done.

That’s where EnterpriseDB’s new Postgres Plus Cloud Edition comes in, this is an Amazon Ec2/S3 hosted edition of their Oracle compatible PostgreSQL-based product that offers the scalability of SimpleDB but the familiarity of a traditional relational database. The “magic” is supplied by Elastra, who are also offering the same functionality against MySQL and standard PostgreSQL databases.

A Talend ETL job which I had been developing for a client, had been tested against a “normal” EnterpriseDB instance. This ETL job was part of a BI prototype trialling a Postgres Plus Cloud Edition (the new name for EnterpriseDB’s cloud offering) as the back-end database. So, I exported the job as a Java executable, fired up an EC2 instance, copied up the generated JAR files, changed the database’s hostname to that of the Postgres Plus “cloud” database, ran the ETL job and it worked. As I said, boring, nothing to report, it just worked.

Now you may be wondering what’s so special about these Elastra powered databases, surely EC2 is no different from any other Linux virtual machine, why not simply install a standard database? The problem with EC2, and it is a problem to those of us (i.e. practically every IT pro on the planet) who have come to expect highly reliable RAID backed disk storage, is the non-permanence of its disk systems.

When an EC2 instance is powered down or fails, the disk system is wiped!

That, combined with fixed (if generous) disk sizes (160GB, 850GB or 1690GB), means that often a clustered database environment is a necessity, adding considerably to the complexity. It’s this sort of complexity that SimpleDB and Elastra address.

The obvious use-case for both Elastra and SimpleDB is as data stores for OLTP applications but Elastra’s ability to handle S3-backed massive databases means the possibility of using EC2 as a data warehousing platform is also considerably strengthened. Although not obvious at first glance, SimpleDB could also act as an OLAP data store; SimpleDB massively indexed tuples as “sparse dimensions” pointing to S3 objects (SQLite databases?) that hold the fact data combined with dense/”partioning” dimensions (e.g. Time). Possible ? Yes. Fun to do? Yes. A solution that I can apply tomorrow? No, that’s why I’m glad EnterpriseDB and Elastra are delivery such a boring product!

UPDATE Ec2:

The other big EC2 missing – non-permanent IP addresses – has at last been addressed. EC2 now offers “Elastic IP Addresses”, addresses associated with an account not an instance. If the instance fails or is shut down, the IP address can either be immediately re-assigned to a new instance (no more waiting for Dynamic DNS propagation) or “reserved” for future use at a cost of USD0.01c per hour. Also, the new “multiple locations” facility puts the API changes in place to allow for location selection, hopefully a sign that we here in Europe will have “local” EC2 instances to match our European S3 buckets!

UPDATE EnterpriseDB:

It looks like IBM have invested in EnterpriseDB, possibly as a counter-weight against Sun’s acquisition of MySQL (EnterpriseDB’s targeting of Oracle’s customer base would also be an added benefit!).

xlAWS – Excel VBA Code for accessing Amazon’s S3 and SimpleDB

I’ve been using Amazon’s S3 service from within Excel for sometime now and as there are no libraries or examples for calling AWS services from VBA (or VB6) I had to roll my own. As with most things Excel, getting the job done always triumphs over elegance and industrial strength implementations, in other words it was all a bit of a “dog’s dinner”. To remedy this and to share my experience of using S3 from within a VBA/VB6 environment, I decided to re-factor the code and to assemble it into a more re-usable form; the end result is xlAWS.

It was going to be called xlS3, but while doing the exercise SimpleDB appeared on the scene, so I decided to try accessing it from Excel, particularly as both products have a lot in common; both “simple”, both “schema-less” data stores. Like the S3Helper code, the simpleDBHelper module is less of comprehensive library, more a collection of useful functions which (hopefully) make working with AWS a bit easier.

To use this code library, you’ll need to have a good grasp of the S3 and SimpleDB APIs and be reasonably proficient with VBA. This is not an end-user tool, it’s for VBA (or VB6) developers. There’s a README and some basic examples within the Excel VBA project to help you get started. Code is released “in the spirit” of LGPL, you can use it how you wish, but if you add something new to the “library” (or find/fix a bug) do let the rest of us know.

As I’ve not been able to find a pure VBA implementation of the HMAC-SHA1 hash algorithm (and I couldn’t see an implementation within the standard “Microsoft Enhanced Cryptographic Provider” ) I’ve wrapped the open source XySSL SHA1 HMAC C code in a VBA friendly DLL. This DLL (and the source, under LGPL) is included in the zip file as AWS authentication requires SHA1 HMAC signatures.

You’ll also obviously require an AWS account. Credentials are stored within the workbook’s custom properties and can be encrypted via a “key file” if required. If you intend to use this code within VB6 (or Proto) you’ll need to provide your own implementation of the AWSKeyData class in order to use a non-Excel persistence store.

You can download the project ZIP file from here.

Have fun.

UPDATE

Another alternative for calculating HMAC-SHA1 signatures in VBA/VB6 is a Google Checkout supplied COM DLL see http://bit.ly/9CIKtM

There’s the bones of a pure VBA HMAC-SHA1 implementation here http://www.eggheadcafe.com/software/aspnet/32187540/hmac-sha1-challenge.aspx

Dublin Bus and PALO ETL – the connection!

Dublin buses, as is the norm with most road-based public transport systems in our increasingly car-choked cities, tend to operate on the basis of “no sign of a bus for ages, then two or three arrive at the same time”. Palo MOLAP ETL options appear to be following the same pattern; we’ve been waiting for ETL support for ages and now we see three of them heading down the road towards us. There’s Palo’s own offering, then came Stratebi‘s Kettle Plugin and now Talend Version 2.3.0RC2 is offering a Palo output component.

Mind you, the Talend offering is very basic and I’ve not managed to get the Sratebi plugin to work, leaving Palo’s ETL Server as the front runner at the moment (drill-through capability is a winner in my book).

I’ve also been busy re-factoring my VBA SQLite and Amazon S3 code with the intention of publishing them as an Excel based micro-ETL platform. While cleaning up the Amazon AWS modules I’ve been playing with SimpleDB, I’m impressed, Excel combined with SimpleDB rocks!

I’ve also wrapped the open source XySSL SHA1 HMAC C code in a VBA friendly DLL, as searching for a VBA hmac sha1 hash implementation (essential for Amazon AWS access) has proved fruitless.

Hope to release the lot the end of next month.

UPDATE:

Thanks to Javier and Jorge from Stratebi I’ve managed to get the new Kettle Palo plugin to work. It seems that the TEST facility in the Kettle database connection dialogue throws an exception for Palo connections but the connections work fine in the actual Palo input/output steps. Did a quick test and it looks very easy to use and fits in well with the Kettle “way of doing things”.

CouchDB = IBM’s SimpleDB and S3 ?

What if you’re a major player in the IT world and suddenly the internet’s equivalent of your local bookshop releases a mould-breaking cloud-based database service, SimpleDB. This is on top of Amazon’s highly acclaimed document data store service, S3!

Well, if you’re IBM you hire Damien Katz the person behind CouchDB. I think 2008 could be the year that cloud-based database services really take off

SimpleDB + S3 = distributed document-centric database

I’m a database man. I’ve worked on or about most variations on the theme, from roll-your-own flat files, to hierarchical, to CODASYL network databases, to the current crop of relational and MOLAP platforms. Of late, I’ve being investigating what I think will be the future of database technology, the distributed document-centric database. Today, the future arrived in the form of Amazon’s new SimpleDB service.

Up until now Amazon’s S3 service offered one half of the future platform the “distributed document-centric” bit but it lacked the indexed structure part to make it a true database; but in combination with SimpleDB it’s now complete.

SimpleDB stores data in a Domain/Attribute schema-less and type-less structure having more in common with a spreadsheet than a traditional relational table. If you’ve worked with the likes of SQLite (manifest typing) or Excel (no predefined schema and manifest typing) then you’ll appreciate this is no hardship, quite the opposite in fact (I find the strong typing nature of most databases a real pain having worked recently on a SQLite combined with Excel project).

The distributed nature of SimpleDB may however pose some difficulty to those of us (i.e. almost everybody) raised in the world of ACID compliant databases. Because of the Brewer’s Conjecture effect, SimpleDB sacrifices consistency for availability and partition tolerance i.e. when you write something to the database, an immediate query may not return the updated value, subsequent queries will eventually return the new data, exactly when depends on the load and the availability of resources. Those of you already using S3 will already be living with this “feature”, and in practice you rarely notice it (most updates seem to appear immediately) but it will still pose design challenges to handle the edge cases.

The service is still in limited Beta, but the documentation is available and if you already used any other AWS product you’ll immediately feel at home. The pricing is again based on usage, the cost of storage is much higher than S3, being $1.50 per GB-month, but a GB of structured data is an awful lot of data (and the larger document style storage would be provided by S3).

If you’ve not yet tried out either S3 or EC2, now might be a good time to start, cloud computing has come down to earth, all thanks to an online book store, Amazon!

Firefox tune up time again …..

This morning Firefox just got slower and slower; clicking on a link or a text box took ages to respond; using online WYSIWYG editors became next to impossible; I was also getting an error when attempting to connect to Google Sync.

I checked the usual suspects; internet connection OK; did a quick HijackThis scan and analysis to check if anything nasty was on the PC, nope, again OK; fired up IE7, it worked fine; launched Firefox in safe mode (disables add-ons and other extensions) but the problem persisted. All signs that the culprit was my Firefox profile.

This has happened before so I knew what to do.

Firefox Profile Dialog

From the command line I launched Firefox with the “-p” option which brings up the profile dialog, created a new profile and relaunched; everything back to normal, except of course all my bookmarks and my browser extensions were gone.

Reinstalling my extensions is easy enough and offers an opportunity to do some much needed spring cleaning. The first extension I always re-install is Google Sync for when its back in business I can then restore my old bookmarks and passwords (not my highly sensitive passwords I hasten to add, I use KeePass to manage those – never store financial passwords and the like in your browser’s profile).

The extensions I regard as must have are:

  • Google Browser Sync – keeps a backup of my bookmarks, remembers what tabs I had open last time and restores them if required, means I can easily flick between my laptop and my desktop. (And of course, it’s very useful when it comes to rebuilding a new PC or profile!). Google Sync to be discontinued.
  • Del.icio.us add-on – tag and search using my del.icio.us account. Why use both del.icio.us and Google Sync’d bookmarks? Well, I use bookmarks for my day-to-day commonly used links, while I use http://del.icio.us as my long-term KM memory bank.
  • S3Fox – for managing my backups and other file storage needs on my Amazon S3 account.
  • Flash Got – download manager used in conjunction with Free Download Manager. UPDATE: I’m now using DownThemAll (a Firefox plugin) rather than FDM mainly to do with FDM’s inability to handle certain ASP and PHP redirects, the prime example being downloads from SourceForge.
  • Google Toolbar – for searching blogs, quick link to Gmail, spell checker, page rank checker.
  • British English Dictionary – to use Firefox’s built-in spell checker (using this now, rather than Google Toolbar’s spell checker).
  • PDF Download – gives me control over how I access PDF links.
  • NoScript – allows me to control what JS/Java/Flash scripts run , also provides excellent XSS protection. Can be annoying sometimes, but I stick with it. To make it less annoying (but not as secure) go to Options and allow top-level sites by default (including 2nd level domains).
  • EC2 UI – for controlling my Amazon EC2 images.
  • I also install but not auto-enable several other add-ons such as Firebug (understand/debug the structure of a web page), iMacros (web browsing macro recorder/ screen scrapper) and SQLiteManager (manages my SQLite databases).

Ruby plus Amazon S3 – Document Centric Database

I’ve said it before and I’m going to repeat myself; learning Ruby has proven to be a great investment, not so much for the language itself but for the insights it gives into other technologies. As soon as a new ‘cool’ technology or idea hits the street some smart Rubyist is bound to attack it, dice it up and serve it back up as easy to digest Ruby code.

Today, it’s the turn of Document Centric Databases done in the style of CouchDB, but replacing JavaScript/Erlang with Ruby and the bespoke data store with Amazon’s S3 service.

Anthony Eden‘s RDDB project is still very much alpha, but looking through the code it looks like it has lots of good ideas, including using EC2 instances as “map reduce workers” listening on Amazon SQS Queues; so the whole Amazon AWS stack might yet get staring roles. The actual data store can be varied, with both partitioned file system and RAM based options currently available alongside S3.

Other Amazon AWS related news, was the announcement today of an option to use European data centres to store S3 data (with a slightly higher charge than using North American locations and with the transfer of data between EU based S3 buckets and US based EC2 instances being no longer free). I’m guessing that the option to fire up European based EC2 servers can’t be far behind. Also, one piece of news I’d missed was that EC2 is now in unlimited beta i.e. it’s now open to all developers. So developers everywhere can, for less that the cost of a mobile text message, fire up their own dedicated and powerful Linux server. The day of a production ready, SLA backed, EC2 service is around the corner.

CrashPlan – the best backup service yet?

You know when you come across something so simple, so obvious and so brilliant you wonder, why didn’t I think of that? Well for personal/small business data backup I’ve just had one of those moments.

CrashPlan is a consumer/SMB orientated backup service following in the footsteps of Mozy (a service I’ve used in the past and still recommend to others) but CrashPlan has one extra little facility that makes it the best yet; the ability to backup not just to a secure off-site data store but also, as an alternative or as an additional backup, the facility to copy data to another local PC or remotely to a friend’s PC. And the best part is that backup to other PCs is free (after the once-off $20 software licence), so you could have a local copy for fast backup (and more importantly fasy restore) plus a free remote copy on a friend’s or indeed a work machine.

The data is compressed and encrypted thus protecting your data from prying eyes and your friend from any danger of virus/malware cross-infection. Any combination of Windows, Mac or Linux boxes are supported and they say the software can negotiate most firewall situations. Future support for Amazon s3 as a remote backup location is on the product’s to-do list. Brilliant!

Nirvanix targets Amazon S3 shortcomings

Let there be no doubt about it, Amazon’s S3 online storage system is wonderful; it’s secure (both from an technology point of view and from Amazon’s status as one of the web’s most trusted sites i.e. one you wouldn’t worry about giving your credit card to), it’s cheap, it’s pay-as-you-go and it has first mover advantage, but (there’s always a but) it has until now lacked competition. And because it lacked competition the various shortcomings (such as no support for HTTP POST file upload, no SLAs etc.) that S3 users complain about are handled by Amazon in what can best be described as ..

..we hear what you’re saying, we have it on a list; no, we’ll not tell if/when we’ll remedy this problem (or explain why it’s not possible to do so); and anyway if you don’t like it, who else provides anything comparable?

Okay, I’m being unfair here, I’m sure Amazon has very good reasons for how they do things and scalability and “keeping it simple” seem to be their development mantra; and this is a good thing for an online 24/7 storage infrastructure. But, as in all things in life, competition would help not just disillusioned users by offering another comparable service but would help Amazon prioritise items on its S3 roadmap.

Most would have assumed that when that competitor arrived it would either be Google or Microsoft, instead the first up to bat is Nirvanix, a San Diego startup which appears to be associated with another online storage player, MediaMax. Pricing is similar to S3, but with the option of purchasing extra SLA backed support packages, something that has been top of the list for many actual and potential S3 users. Other “missings” that Nirvanix addresses are;

  • File upload via HTTP POST, S3 restricts upload to HTTP PUTs which requires the use of a proxy server or the installation of client software.
  • File rename and move, S3 requires that a file is first deleted and then reloaded.
  • In-built support for media processing such as image resize/rotate for thumbnails.
  • Multi-tenant accounts, each S3 account supports only a single ‘user view’.
  • Files are indexed via tags and name, not just by name as is the case with S3.
  • Granular control of usage limits and reporting, S3 only offers ‘after-the-fact’ reporting.
  • Maximum file size of 256Gb compared to Amazon’s 5Gb.

The Nirvanix authentication method uses a much simpler and more traditional username/password over SLL approach than S3′s key-pair based URL signing method. This can be seen as either a weakness or a strength, but combined with Nirvanix’s support for POST file uploads, multi-tenant accounts and granular usage controls it makes building browser based clients much simpler.

S3′s industrial grade authentication is all fine and dandy but if the key becomes compromised, all’s lost, you could expose not just your data but your wallet if somebody used the compromised key to maliciously upload Terabytes of data. This single point of failure is perhaps my main complaint of S3′s current set-up.

So, am I getting ready to jump ship, no, at least not yet, as;

  • Amazon is still Amazon, they may be lacking SLAs but they have my trust.
  • S3′s role as a back-end to Amazon Ec2.
  • Friendly and effective forums offering excellent support provided by both the developer community and Amazon’s own staff.
  • CNAME support. (e.g. http://www2.gobansaor.com/)
  • Did I mention Ec2?

Should Amazon be worried? No, this is not a zero-sum game, in fact competition will help grow awareness and expand the market for all “cloud” based services.

CouchDB – document centric ODS

While the potential of column-oriented DBMSs within BI projects is obvious given the popularity of MOLAP ( a form of column-oriented data store) the potential for the other new kid on the block, the document-oriented database, is less so. One such DBMS,CouchDb, is the latest wunderkid to bubble to the surface, helped by the database’s RESTful inteface , its abandonment of XML in favour of JSON, the use of Javascript (replacing a bespoke language) as its “view” language and its use of Erlang and MapReduce algorithms. (A CouchDb view is, as far as I can tell, like a combination of a Function Based Index and a Materialized View).

Where I see CouchDb’s place in a BI project is at the messy end (or should I say start) of the ETL pipe,the operational data store (ODS). Not an ODS in the high-church Inmon sense, i.e. not a normalised logical-data-model-made-real but more a easily explorable source-data archive/audit facility. If all your data comes from one or two operational systems (e.g. ERP and CRM) the need for an ODS may not arise, simply use the operational systems themselves (or direct copies in a separate database), using conformed dimensions to provide the necessary glue. If, however, a large amount of your data comes not from traditional OLTP systems but from ‘document sources’ then something like CouchDb might come in useful.

Typical document sources might be: Excel Spreadsheets, XML/JSON/CSV responses from SaaS APIs, scraped web pages, PDF/MsWord forms, MSAccess or SQLite databases; even audio/video content (e.g. market research interviews with customers which are then “codified” and stored as customer dimension attributes).

You could of course use a traditional RDBMS to hold this information especially if the database supported full-text search or has native support for semi-structured data; however, due to the huge amount of storage space that non-structured data can soak-up, CoachDb’s open source Google inspired MapReduce architecture, with its ability to cheaply scale-out, might be more suitable. Given its alpha level status, CouchDb is currently only suitable for testing or evaluation, but if you have a pressing need for such a scalable document store you could use Amazon’s S3. Although S3 is essentially just a key/value pair store, that value can be any blob of data you wish; it is in effect a massively scalable and keenly-priced document-oriented data store.

Being key/value pairs, the only indexing option is the key and although meta-data tags can be associated with each pair this data is not indexed for fast retrieval. The use of a local database to provide meta-data based filters/indices is the obvious solution; another less obvious approach would be to use a online tagging service such as del.icio.us. The use of del.icio.us would of course raise privacy/security issues but these could be mitigated by using the privacy option in del.icio.us and by using behind-the-firewall URLs which could then be redirected to the correctly signed S3 URL via a LAN proxy.