<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Gobán Saor</title>
	<atom:link href="http://blog.gobansaor.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gobansaor.com</link>
	<description>A country datasmith.</description>
	<lastBuildDate>Sat, 04 Sep 2010 09:29:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.gobansaor.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/67e164f5d51c2b3115a7819b84505c13?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>Gobán Saor</title>
		<link>http://blog.gobansaor.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.gobansaor.com/osd.xml" title="Gobán Saor" />
	<atom:link rel='hub' href='http://blog.gobansaor.com/?pushpress=hub'/>
		<item>
		<title>LightSwitch &amp; Hobo &#8211; the return of the 4GL?</title>
		<link>http://blog.gobansaor.com/2010/08/26/lightswitch-hobo-the-return-of-the-4gl/</link>
		<comments>http://blog.gobansaor.com/2010/08/26/lightswitch-hobo-the-return-of-the-4gl/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 11:18:10 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Hobe]]></category>
		<category><![CDATA[LightSwitch]]></category>
		<category><![CDATA[Sliverlight]]></category>
		<category><![CDATA[4GL]]></category>
		<category><![CDATA[analyst-programmer]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=1123</guid>
		<description><![CDATA[Those of us of a certain age have fond memories of the golden era of 4GLs. These simple, but at the time revolutionary, tools enabled business-aware programmers (usually termed analyst-programmers) to quickly build &#38; deploy line-of-business apps. They (both tools &#8230; <a href="http://blog.gobansaor.com/2010/08/26/lightswitch-hobo-the-return-of-the-4gl/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1123&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Those of us of a certain age have fond memories of the golden era of<a href="http://en.wikipedia.org/wiki/Fourth-generation_programming_language"> 4GLs</a>. These simple, but at the time revolutionary, tools enabled business-aware programmers (usually termed analyst-programmers) to quickly build &amp; deploy <a href="http://en.wikipedia.org/wiki/Line_of_business">line-of-business</a> apps. They (both tools and devs) were primarily data-driven, data begat screens, screens begat more data and so on. The resulting apps where server delivered, using either <a href="http://en.wikipedia.org/wiki/Computer_terminal">green-screen terminals</a> or client-side delivery apps (a bit like current-day client-side <a class="zem_slink" title="Rich Internet application" rel="wikipedia" href="http://en.wikipedia.org/wiki/Rich_Internet_application">RIAs</a>). The resulting UIs could best be described as &#8220;plain but functional&#8221;.</p>
<p>These halcyon days were replaced by a combination of  2-tier Windows apps and by the 3-tier enterprise platforms. The increasing sophistication &amp; complexity of the new platforms forced programmers to become highly specialised, often losing their once close links with the business, and even losing sight of the <a href="http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/">value of business data as a resource in itself </a>(still amazes me when I come across business application programmers with poor or non-existent SQL/RDBMS skills). The analyst-programmer (AP) was no more.</p>
<p>Many of those APs, like myself, managed to stay close to the business by either becoming business/data analysts or datawarehousing/BI specialists. ERP platforms such as SAP, with their complex configuration requirements, also created a welcoming home for business-focused IT refugees.</p>
<p>The need for quick&#8217;n'dirty line-of-business apps has not disappeared, but this service is now often being provided by tools such as MS Access and above all, by Excel. This is both good and bad; the good is the expansion of development skills outside of IT; the bad is the effective dis-arming of a large proportion of professional IT folks. To paraphrase, <a class="zem_slink" title="Marvin the Paranoid Android" rel="wikipedia" href="http://en.wikipedia.org/wiki/Marvin_the_Paranoid_Android">Marvin the Paranoid Android</a>: &#8220;Brain the size of a planet, and all they give me is Excel!&#8221;</p>
<p>Most of us managed to make the best of the situation, learning to respect, if not love, Excel; becoming SQL wizards; MDX magicians and dimensional modellers par excellence. But the call of the 4GLs remained and we veterans continue to keep a watchful eye for something that will match and hopefully surpass them.</p>
<p><a href="http://www.oracle.com/technetwork/developer-tools/apex/overview/index.html">Oracle&#8217;s Application Express</a> (aka HTML DB) offered those with Oracle skills hope (very similar in concept to SQL*Forms V3) and the open source tool <a href="http://www.wavemaker.com/">WaveMaker </a>is also excellent, styling itself, with good reason, the <a href="http://en.wikipedia.org/wiki/PowerBuilder">Powerbuilder</a> for Web Enterprise.</p>
<p>In the last month I&#8217;ve come across two modern-day descendants of  these 4GL data-driven tools. Hobo and LightSwitch.</p>
<p><a href="http://gobansaor.files.wordpress.com/2010/08/hobo.png"><img class="alignleft size-full wp-image-1129" title="hobo" src="http://gobansaor.files.wordpress.com/2010/08/hobo.png?w=124&#038;h=48" alt="" width="124" height="48" /></a><a href="http://hobocentral.net/">Hobo</a>, is an open source extension to  the <a class="zem_slink" title="Ruby on Rails" rel="wikipedia" href="http://en.wikipedia.org/wiki/Ruby_on_Rails">Ruby on Rails</a> platform. It uses the same <a class="zem_slink" title="Model–view–controller" rel="wikipedia" href="http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC architecture</a> but takes the concept further in not just allowing you to build a relationship model for your data, but also enabling the  easy specification of  a lifecycle model and, here&#8217;s the biggy, automatically building (and re-building on change) a very respectable UI to present to the outside world. It also offers a starter authentication framework and lots of other useful helpers. All without writing a line of Ruby code, or having any idea of how RoR works!  The end result is a tool that can not only quickly, and iteratively,  build <a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD type applications</a>, but can also handle simple workflow apps out of the box.</p>
<p><a href="http://gobansaor.files.wordpress.com/2010/08/vs_lightswitch_beta_logo.png"><img class="alignleft size-thumbnail wp-image-1130" title="vs_lightswitch_beta_logo" src="http://gobansaor.files.wordpress.com/2010/08/vs_lightswitch_beta_logo.png?w=150&#038;h=20" alt="" width="150" height="20" /></a><a href="http://www.microsoft.com/visualstudio/en-us/lightswitch">LightSwitch</a>, is similar but yet very different. The same, in that it follows the traditional 4GL data to screen approach; presenting the user with a graphical tool to build data tables, to link to existing data sources and to create relationships between entities. Screens can then be generated that, like Hobo, are professional looking and easy to use. Again like Hobo, new &#8216;skins&#8217; can be applied to change the look and feel.</p>
<p>If a more sophisticated solution is required, code can be added (VB.NET or C#) at predefined events and indeed the resulting project, being fully <a href="http://en.wikipedia.org/wiki/Microsoft_Visual_Studio">VS 2010</a> compliant, can be opened in VS Professional and built out from there. (A similar ability to get under the bonnet exists for Hobo as it is essentially Rails).</p>
<p>Where LightSwitch differs is the deployment methods used. The end result is a <a href="http://www.silverlight.net/">Sliverlight</a> app which can either run client-side (with full access to the client&#8217;s environment e.g. interact with Office etc.) or as a sandboxed IIS browser app or via the Azure cloud. Same code, same project can easily migrate back &amp; forth between all three options. (Hobo, being a Rails app could also be sort-of-localised using<a href="http://www.erikveen.dds.nl/distributingrubyapplications/rails.html"> RubyScript2Exe</a>, but it could very easily be cloud deployed using the EC2-based, dead-simple to use, <a href="http://heroku.com/">http://heroku.com/</a>)</p>
<p>The LightSwitch data modeller also allows for relationships between local databases, network databases, <a href="http://www.microsoft.com/windowsazure/sqlazure/">SQL Azure cloud databases</a> and web service datasets to be built and maintained within the application. The need for mashups between local &amp; central/remote data is a constant requirement for LOB developers and LightSwitch appears to made it very easy to implement.</p>
<p>This data mash-up ability and the option to interact with the client will be a major attraction, at least for corporate devs working largely with MS tools. I say will, as alas the current beta is  <a href="http://review.techworld.com/applications/3236780/visual-studio-lightswitch-beta-1-review/?view=review&amp;pn=2">&#8220;molasses-in-January&#8221; slow</a>. I thought initially it was just my 5yr laptop hitting the wall, but <a href="http://ayende.com/Blog/archive/2010/08/25/lightswitch-initial-thoughts.aspx">others with more modern &amp; powerful hardware also found it so.</a></p>
<p>So do tools like Hobo and LightSwitch herald the return of the IT analyst/programmer? Probably not, different times; outsourcing, SaaS and packaged software have and will continue to reduce the number of business-facing IT staff. But their places are been taken by IT-aware business folks, citizen programmers<a href="http://blog.gobansaor.com/2010/05/12/time-assets/">,</a> <a href="http://blog.gobansaor.com/2010/05/12/time-assets/">creators of time-assets</a> and it is they that will likely be the beneficiaries of such tools.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/1123/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1123&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/08/26/lightswitch-hobo-the-return-of-the-4gl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://gobansaor.files.wordpress.com/2010/08/hobo.png" medium="image">
			<media:title type="html">hobo</media:title>
		</media:content>

		<media:content url="http://gobansaor.files.wordpress.com/2010/08/vs_lightswitch_beta_logo.png?w=150" medium="image">
			<media:title type="html">vs_lightswitch_beta_logo</media:title>
		</media:content>
	</item>
		<item>
		<title>Micro ETL in the PowerPivot age</title>
		<link>http://blog.gobansaor.com/2010/08/20/micro-etl-in-the-powerpivot-age/</link>
		<comments>http://blog.gobansaor.com/2010/08/20/micro-etl-in-the-powerpivot-age/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 18:23:24 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[Star Schema]]></category>
		<category><![CDATA[decision support]]></category>
		<category><![CDATA[micro ETL]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=1089</guid>
		<description><![CDATA[Although PowerPivot has many of the characteristics of an ETL tool, i.e. the ability to connect to disparate datasources, to filter that data and to transform it, it will still hit a brick wall when confronted by the typical data spewed &#8230; <a href="http://blog.gobansaor.com/2010/08/20/micro-etl-in-the-powerpivot-age/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1089&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Although PowerPivot has many of the characteristics of an ETL tool, i.e. the ability to connect to disparate datasources, to filter that data and to transform it, it will still hit a brick wall when confronted by the typical data spewed out by operational systems. I&#8217;m sure this is by design as a sophisticated ETL tool is both complex to design and, probably even more relevant, is difficult to use.</p>
<p>Mind you, a few years back we IT pros would have said the same about front-end BI cube configuration, and behold today we have tools such as PowerPivot that prove that this doesn&#8217;t always hold true. Perhaps subsequent versions of PowerPivot will do the same for ETL as it has for BI cubes. In the mean time much of the necessary ETL will have to take place prior to loading into PowerPivot.  But where?</p>
<p>First off, what&#8217;s ETL?</p>
<p>The term ETL applies to one of the trinity of activities that have, over the last two decades or so, been at the heart of reporting/decision support systems. The other two terms: DW (data warehousing)  &amp; BI (business intelligence), are sometimes used to refer to the whole process but can also be used to refer to two distinct sub-processes. Confused? Well, so you should be; these terms have been abused and redefined by scores of vendors over the years but for our purpose here we&#8217;ll stick to their roles as acronyms for the two of the  processes involved in the preparation &amp; presentation of reporting data.</p>
<p>BI is the term now most commonly associated by non-IT folks with decision support systems, as it&#8217;s role is the most obvious i.e. front-end presentation and manipulation of data; the dashboards, pivots, charts, summary lists etc&#8230;</p>
<p>DW, data warehousing, is the term that most IT people who&#8217;ve been in the business for a while would use to describe the techniques, best practice etc. associated with this area. The heart of traditional DW was the data warehouse itself, a mighty repository of historical data optimized for reporting purposes. When DW as a concept started it was very rare indeed for operational (OLTP) systems to hold transactional data for more that a few weeks, usually just long enough to get through month-end.  Such specially built datastores&#8217; days may be numbered as the data capacity of operational systems grow and the data munching ability of new ETL techniques (<a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> for example) to transform vast amounts of data continues to increase.</p>
<p>ETL stands for Extract,Transform and Load; sometimes also styled, ELT, extract load &amp; transform (PowerPivot would fall into this catergory). This is the process which traditionally swallowed most of the development budget in DW/BI projects (and kept me gainfully employed for years). It was the area where the dark arts of datasmiths collided with the often frightening reality of raw untamed data, with the added venom of corporate-politics-driven &#8220;data ownership&#8221; battles.  A messy business, and continues to be, even in these days of open data and open <a class="zem_slink" title="Application programming interface" rel="wikipedia" href="http://en.wikipedia.org/wiki/Application_programming_interface">APIs</a>.</p>
<p>ETL tools vary from text-editor written SQL to hugely expensive point&#8217;n'click ETL packages. Packaged ETL vendors promised (and continue to do so) that their tools would vanquish the dark arts of datasmiths with products that were so easy to use that the CEO would chip in with a few scripts to get the project finished. The reality was that IT types  found they had to learn yet another sub-optimal &#8220;language&#8221; and more often than not had to drop-down to &#8220;proper&#8221; languages to actually drive the thing to completion. ETL was (and still largely is) the preserve of IT.</p>
<div id="attachment_1100" class="wp-caption alignleft" style="width: 310px"><a href="http://gobansaor.files.wordpress.com/2010/08/herding-cats.jpg"><img class="size-medium wp-image-1100 " title="Herding Cats" src="http://gobansaor.files.wordpress.com/2010/08/herding-cats.jpg?w=300&#038;h=173" alt="" width="300" height="173" /></a><p class="wp-caption-text">ETL is easier than herding cats but just about ...</p></div>
<p>The tools have improved a lot since those early days and open source has at least removed for some the 6 figure licensing costs from the equation. ETL, <a href="http://technologizer.com/2010/08/13/google-app-inventor/">like programming in general</a>, is hard, so get over it. Tools, basic knowledge of SQL and data modelling skills can help to make ETL approachable to non-IT types, but it still has the potential to make your head hurt.</p>
<p>So what&#8217;s a PowerPivot&#8217;r to do?</p>
<p>If your organisation already has a data warehouse in place you&#8217;re in luck as it&#8217;s quite likely a lot of the data you require will exist in the optimal PowerPivot import format, <a href="http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/">i.e. a star schema</a>. You might be out of luck though, a significant percentage of DWs will not have used dimensional modeling and you could find yourself looking a complex <a href="http://en.wikipedia.org/wiki/Online_transaction_processing">OLTP like</a> data model. In that case, and in the case of pulling the data directly from an operational system, you&#8217;re in the micro ETL business. Even if your IT infrastructure provides you with cleansed and understandable data, you&#8217;ll be faced with integrating external or<a href="http://vaughanmerlyn.com/2008/07/22/shadow-it-the-good-the-bad-and-the-ugly/"> shadow-IT</a> data (probably one of the main reasons why PowerPivot appeals); again you&#8217;ll either need IT support or else you must learn how to do it yourself.</p>
<p>Long before the likes of PowerPivot appeared I regularly found myself  in need of a micro ETL toobox i.e. a set of tools that would enable me to quickly and cost-effectively prepare data for loading into some system or other. Nine out of ten times that system was an Excel <a class="zem_slink" title="Pivot table" rel="wikipedia" href="http://en.wikipedia.org/wiki/Pivot_table">PivotTable</a> (the rest of the time it was usually a master data take-on task or some variation of systems&#8217; commissioning). Although the consumers of my datasmithing services would most likely assume that I used Excel alone to perform these works of wonder, I usually had an Oracle database (along with its data loaders and superb PL/SQL language) as my secret ingredient. This combination of Excel and Oracle served me (and my clients) well, but it wasn&#8217;t the Oracle bit that gave me the edge; I could, and did, substitute SQL Server and MS Access for the SQL layer. The real trick was the combination of Excel&#8217;s flexibility/presentational strengths with SQL&#8217;s list handling power.</p>
<p>The problem with this approach was that the interface between the SQL engines &amp; the spreadsheets often involved quite a number of manual steps, and the presence of database software (even MS Access) could not always be depended on. It took my discovery of <a class="zem_slink" title="SQLite" rel="homepage" href="http://sqlite.org/">SQLite</a> to enabled me to finally combine the two worlds; <a href="http://www.gobansaor.com/xlite">xLite was born</a>!</p>
<p>This combination of Excel, and an in-process SQL engine (provided by SQLite) with the added optional ability to call either VBA, JavaScript or Python scripts, has provided me with a hugely flexible and powerful micro ETL tool. Now, with the arrival of PowerPivot, I have both the micro ETL and micro BI tools to build cost-effective Excel-based decision support systems.</p>
<p>Being Excel based, means that the end result is delivered in a format that many business people are comfortable with; PowerPivot is designed very much with &#8220;civilian&#8221; datasmiths in mind. Likewise, xLite&#8217;s ability to use VBA, simple SQL and Excel formula to perform data transformations makes a large part (if not all) of the ETL process &#8220;civilian friendly&#8221;.</p>
<p>I&#8217;m not saying that everything I can do with xLite will be as easy for an non-IT datasmith; many datasources are either too difficult and/or extremely time consuming for end-users to navigate; but much of the business logic can be expressed in Excel terms with the highly technical or time-consuming tasks  handled by SQL or VBA/Python/JavaScript. xLite is not only for once-off transformations but can also be used to automate ETL, report generation and refresh tasks (including refreshing PowerPivot itself).</p>
<p>So, if you&#8217;re thinking about utilising PowerPivot, but need help in preparing your data and automating the tasks involved, <a href="http://www.gobansaor.com/">perhaps we should talk.</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/1089/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/1089/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/1089/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1089&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/08/20/micro-etl-in-the-powerpivot-age/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://gobansaor.files.wordpress.com/2010/08/herding-cats.jpg?w=300" medium="image">
			<media:title type="html">Herding Cats</media:title>
		</media:content>
	</item>
		<item>
		<title>Power (Pivot) to the People</title>
		<link>http://blog.gobansaor.com/2010/08/11/power-pivot-to-the-people/</link>
		<comments>http://blog.gobansaor.com/2010/08/11/power-pivot-to-the-people/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 15:21:56 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[excel]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=1075</guid>
		<description><![CDATA[From a lot of the coverage of PowerPivot you could easily be under the impression that the product is aimed only at those with a reasonable understanding of traditional BI/Data Warehousing/RDBMS techniques. A lot of the people writing about it &#8230; <a href="http://blog.gobansaor.com/2010/08/11/power-pivot-to-the-people/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1075&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="PowerPivot to the People" src="http://upload.wikimedia.org/wikipedia/en/3/36/Power_to_the_People.jpg" alt="" width="320" height="320" />From a lot of the coverage of PowerPivot you could easily be under the impression that the product is aimed only at those with a reasonable understanding of traditional BI/Data Warehousing/RDBMS techniques. A lot of the people writing about it are from &#8220;Big BI&#8221;/&#8221;Big IT&#8221; backgrounds and tend to view the world from that perspective. I too could be accused of this bias with my previous post detailing <a href="http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/">the benefits of first creating a star schema</a> before loading data into PowerPivot. For those of you from a business or pure spreadsheet background don&#8217;t be put off by this coverage; PowerPivot is, like its PivotTable ancestor, a tool accessible to all.</p>
<p>Although a star schema design is necessary to get the most from PowerPivot, it can also add value to large datasets comprised of a single flat table or a table with one or two &#8220;vlookup&#8221; tables. In fact, the PowerPivot equivalent of <a href="http://office.microsoft.com/en-ca/excel-help/vlookup-HP005209335.aspx">&#8216;vlookup&#8217; </a>is much easier and far more powerful than the Excel original. Such lookups not only replace vlookup but also <a href="http://en.wikipedia.org/wiki/Join_(SQL)">SQL joins</a>. Unlike SQL joins, Powerpivot joins are case-insensitive (so &#8220;PowerPivot&#8221; = &#8220;Powerpivot&#8221;, for example) and trailing spaces are also ignored, so much more forgiving of &#8220;bad data&#8221;.  The ever-tricky case of SQL outer-joins (allowing for blank lookup codes or the code not existing on the lookup table) is also taken care of via the <a href="http://technet.microsoft.com/en-us/library/ff452109.aspx">&#8220;Unknown Member&#8221;</a>.</p>
<p>The new <a href="http://social.technet.microsoft.com/wiki/contents/articles/powerpivot-data-analysis-expressions-dax-language.aspx">DAX formula language </a>might be off-putting to many (in particular DAX used in the creation of measures, its most powerful feature). But again, a huge amount can be achieved using the automatically generated measures (near replicas of those available in a normal pivot).</p>
<p>So if you have a data source that is too large to load into a spreadsheet, or perhaps up until now your data has had to be summarised or SQL joined by IT prior to loading, then PowerPivot is well worth investigating for its heavy-lifting capabilities alone.</p>
<p>Even without very large datasets to play with, PowerPivot can be useful as a micro-ETL (Extract, Transform &amp; Load) tool. It&#8217;s very easy to load data from multiple databases and other sources and by learning a few simply DAX formulas it&#8217;s possible to transform the data to whatever formats required. The resulting table can then either be cut and pasted into an Excel sheet for pivoting/filtering or pivoted using PowerPivot. The main advantage of pivoting such data using a PivotTable rather than PowerPivot is the ability to show-detail (<a href="http://blog.gobansaor.com/2010/06/29/powerpivot-show-detail-not-allowed/">a big missing in the case of PowerPivot</a>).</p>
<p>To further explore the power of this revolutionary tool, I&#8217;d recommend <a href="http://www.mrexcel.com/">Bill Jelen&#8217;s (MrExcel</a>) <a href="http://www.amazon.com/PowerPivot-Data-Analyst-Microsoft-MrExcel/dp/0789743159/"><strong>PowerPivot For The Data Analyst</strong></a> book. Aimed squarely at Excel power-users (don&#8217;t think it mentioned a star-schema once, and only dealt with SharePoint deployment in the last chapter) this is a much better choice that the IT focused (and MS Marketing driven) <strong><a href="http://www.amazon.com/Professional-Microsoft-PowerPivot-SharePoint-Programmer/dp/0470587377">Professional Microsoft PowerPivot for Excel and SharePoint</a>.</strong></p>
<p>Another plus of Bill&#8217;s book, besides learning PowerPivot, is that you might learn something new about normal PTs, for example look out for the  &#8221;un-crosstab a crosstab&#8221; trick using a consolidated range pivot. The  Professional book also offers a behind the scenes look at the development of PowerPivot, including the gem that initially is was envisaged as an MS Access tool!</p>
<p><em>PowerPivot to the people, viva </em><em>la Revolución!</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/1075/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/1075/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/1075/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1075&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/08/11/power-pivot-to-the-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/en/3/36/Power_to_the_People.jpg" medium="image">
			<media:title type="html">PowerPivot to the People</media:title>
		</media:content>
	</item>
		<item>
		<title>Star Schemas &#8211; to boldly go where no Excel spreadsheet has gone before</title>
		<link>http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/</link>
		<comments>http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 17:50:10 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Palo]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[olap]]></category>
		<category><![CDATA[Star Schema]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=1038</guid>
		<description><![CDATA[One of the many things that delights me about PowerPivot is the central role played by the Star Schema. Those of you reading with a data-warehousing background would shrug your shoulders and say: &#8220;So what, what else would you expect &#8230; <a href="http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1038&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://gobansaor.files.wordpress.com/2010/07/star-ship.png"><img class="alignleft size-medium wp-image-1044" title="star-ship" src="http://gobansaor.files.wordpress.com/2010/07/star-ship.png?w=300&#038;h=296" alt="" width="300" height="296" /></a>One of the many things that delights me about <a href="http://powerpivot.com/">PowerPivot</a> is the central role played by the Star Schema. Those of you reading with a data-warehousing background would shrug your shoulders and say: &#8220;So what, what else would you expect to find at the core of a BI tool?&#8221;.</p>
<p>Those from an Excel PivotTable background would ask: &#8220;What&#8217;s a Star Schema, why do we need one,what&#8217;s wrong with a the good old-fashioned single flattened table?&#8221;.</p>
<p>Those from a classic <a href="http://en.wikipedia.org/wiki/MOLAP">MOLAP</a> background (Essbase, TM1, Palo) might also ask: &#8220;Why do we need this extra layer? Load the cube directly from the operational data model and get on with it!&#8221;.</p>
<p>A quick Q&amp;A is perhaps the best way for me to explain why star schema design is a powerful skill in a datasmith&#8217;s toolset.</p>
<p>First off, what&#8217;s a Star Schema?</p>
<p>A Star Schema (also know as the dimensional model) is a denormalised (flattened) data model used to simplify an operational (OLTP) data model to better accommodate reporting and what-if analysis.</p>
<p>At its simplest, it consists of a central fact table with links back to a &#8220;surrounding&#8221; set of dimensional tables, hence the star name. A variation is the snow-flake schema, where the dimensional tables are not fully denormalised (e.g. Product Category-&gt;Product-&gt;Fact instead of  Product-&gt;Fact).</p>
<p>The role of the fact table (besides being the table that hosts most of the <a href="http://en.wikipedia.org/wiki/Measure_(data_warehouse)">measure fields</a>) is to create linkages between dimensions (such as Customer, Product, Date) usually based on an actual transactional event (e.g. Invoice) or a proposed event (such as a Budget or Forecasted Sale). In effect, simplifying  the often complex work-flow-driving connections of a typical operational system by using a single many-to-many relationship (modern ERP/CRM systems&#8217; data models consist of scores of configurable many-to-many relationships).</p>
<p>Many wrongly believe the star-schema was adopted for performance reasons and now that in-memory OLAP is becoming the norm it&#8217;s no longer necessary to use dimensional modelling techniques. In fact, in the early days of data-warehousing, RDBMs had great difficulty efficiently handling star-queries (and some such as MySQL and <a href="http://blog.gobansaor.com/2007/08/30/sqlite-star-query/">SQLite, still do</a>).</p>
<p>The original primary purpose of the star schema was to simplify the SQL required to access reporting data; to make the model more approachable to non-technical users. Of course, even simple SQL was beyond the knowledge or interest of most end-users but a sizeable proportion were happy to do so (often helped by SQL &#8220;generators&#8221; such as MicroStratery or Business Objects). But even in situations where SQL-wielding civilians were not to be found, the simplicity of the dimensional models  proved to be a valuable aid when establishing and developing the warehouse data requirements. PowerPivot requires no SQL knowledge to manipulate the dimensional model which brings the original concept full-circle but this time opening its possibilities to a much wider audience.</p>
<p>But surely, concentrating on the actual reports would be a more valuable requirements gathering exercise?</p>
<p>A so called &#8220;bottom-up approach&#8221; is often the best way to approach a reporting request particularly if the reports are simple one-off &#8220;traditional&#8221; reports. But for self-service BI, this needs to be combined with a top-down dimensional design. The idea is not to build out each and every report or indeed cube but to build a structure that&#8217;ll support likely queries. The process of building a star schema provides both a logical model and a physical implementation of that model against which potential queries can be tested. I&#8217;ve worked on several POCs destined for implementation in Essbase where the star-schema was built and potential cubes mocked up using Excel PivotTables that subsequently never went any further (except for the star-schema ETL process). The end-users derived sufficient value from the denormalised star-schema pivoted and reported in Excel.</p>
<p>In traditional ROLAP data-warehouses where the cubes were built directly against star-schemas, the pure logical approach to the data model often had to take a back-seat to the necessity of fine-tuning it to make response times (be that <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a> or user-pivoting) acceptable. This is why I much preferred situations where the star acted as a logical model from which MOLAP cubes were built.</p>
<p>With PowerPivot, <a href="http://en.wikipedia.org/wiki/ROLAP">ROLAP</a> has a new champion. The column-oriented high-compression in-memory architecture means that the compromises of the past are no longer necessary. The fact table reverts back to it primary role as a many-to-many connector. In a <a href="http://en.wikipedia.org/wiki/OLAP_cube">pure hypercube</a>, measures are just another dimension (the approach that <a href="http://www.palo.net/en/">Palo</a> takes), this is also now true of PowerPivot models; measures can be sourced from dimension tables and dimensions from fact tables as it logically should be, but without the performance hit of old.</p>
<p>But what&#8217;s the advantage of a star schema over a flattened table when using PowerPivot?</p>
<p>It is true that the same flattened table model as used to backend a PivotTable can be used within PowerPivot. But doing so would limit the potential of the <a href="http://technet.microsoft.com/en-us/library/ee835613.aspx">DAX language</a> to construct measures such as average sales spread over potential customers (rather than actual customers that would typically be represented on a flattened table). Also, by creating &#8220;conformed dimensions&#8221; (single cross-business views of Customer, Product etc.) and using such tables as dimensional sources for multiple fact tables, &#8220;virtual cubes&#8221; that combine values from multiple fact tables can be built.</p>
<p>If you&#8217;re new to dimensional modelling I&#8217;d recommend t<a href="http://www.kimballgroup.com/">he books &amp; articles of Ralph Kimbal</a> as good starting point. You do have to be aware that some of the advice regarding efficiency trade-offs, surrogate keys etc. do not  apply in a PowerPivot scenario (<a href="http://sqlblog.com/blogs/marco_russo/archive/2010/01/26/memory-considerations-about-powerpivot-for-excel.aspx">even though other performance issues still apply</a>) but the logical design tips still apply.</p>
<p>Star Schemas: to explore strange new conformed dimensions, to seek out new measures, to boldly go where no Excel spreadsheet has gone before.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/1038/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/1038/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/1038/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1038&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://gobansaor.files.wordpress.com/2010/07/star-ship.png?w=300" medium="image">
			<media:title type="html">star-ship</media:title>
		</media:content>
	</item>
		<item>
		<title>PowerPivot &#8211; Show Detail not allowed!</title>
		<link>http://blog.gobansaor.com/2010/06/29/powerpivot-show-detail-not-allowed/</link>
		<comments>http://blog.gobansaor.com/2010/06/29/powerpivot-show-detail-not-allowed/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 14:43:36 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[drill-though]]></category>
		<category><![CDATA[drill-thru]]></category>
		<category><![CDATA[Gemini]]></category>
		<category><![CDATA[show detail]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=1011</guid>
		<description><![CDATA[Last week, I at long last set aside some time to put PowerPivot through its paces, triggered by my purchasing of Excel 2010 (in itself a momentous occasion as without the attraction of PowerPivot I would have followed my, and &#8230; <a href="http://blog.gobansaor.com/2010/06/29/powerpivot-show-detail-not-allowed/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1011&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last week, I at long last set aside some time to put PowerPivot through its paces, triggered by my purchasing of Excel 2010 (in itself a momentous occasion as without the attraction of PowerPivot I would have followed my, and most other Office users&#8217;, usual pattern of waiting 3-5 years or so before investigating the &#8216;latest&#8217; edition).</p>
<p>The verdict?</p>
<p>In general, still impressed, the concept <a href="http://blog.gobansaor.com/2009/04/01/project-gemini-xxl-excel-on-steroids/">demo&#8217;d to me over a year ago</a> has evolved into an impressive first version. Some things such as hierarchies did not make it (hierarchies are constructed from cross-joins of field /attribute sets as per a normal pivot table) so no hierarchy rules as would be the case with standard OLAP cubes; but perhaps for many end users the &#8216;traditional&#8217; construct-on-the-fly hierarchies will be more approachable.</p>
<p>The DAX functionality is better than I expected; easier for non-techies than MDX but still powerful.</p>
<p>Importing data is hassle free and intuitive, the <a href="http://powerpivotgeek.com/2009/11/11/a-peek-inside-the-client-architecture/">VertiPaq engine</a> does a wonderful job of compressing the imported data and the resulting in-memory column-store is certainly very fast. I like the &#8216;linked-table&#8217; option which allows for normal Excel tables (the one useful <a href="http://www.databison.com/index.php/table-formulas-in-excel/">new feature</a> that Excel 2007 introduced) to be added to the PowerPivot star-schema.  Being able to import any datasource publishable in AtomPub format (<a href="http://code.google.com/apis/gdata/">such as Google Docs spreadsheets</a>!) is also nice.</p>
<p>So it&#8217;s all good then? Yes, except for one really annoying missing; no drill-through (aka Show Detail, aka drill-thru) allowed.</p>
<p>What? surely some mistake!</p>
<p>Afraid not, I initially thought the measures I attempted to <em>Show Detail</em> on were too complex (as the error message &#8220;Show Details cannot be completed on a calculated cell&#8221; suggested). Then I assumed it had been cut to meet a delivery deadline and would appear in a subsequent version. But no,<a href="http://twitter.com/donalddotfarmer"> Donald Farmer</a> confirmed that is was intentionally removed as the feedback from IT organisations was not to allow end-users the ability to drill-through to potentiality millions of rows when running under a SharePoint server. As for those of us running PowerPivot on the client, we already have all the data, so no need for show-detail!</p>
<p>Okay, I can understand IT&#8217;s reluctance to allow a multi-million row drill-through but surely that should be the decision of individual IT groups to allow or not, and if allowed, to provide the ability to limit the amount of data returned.</p>
<p>A million row result is of limited use, most <em>Show Details</em> are a few thousand at max, and typically sub-1000, so sensible limits can easily be enforced.</p>
<p>The client-side (the side that matters most to me) is a very different story. Here the excuse that the data is already there, is exactly that, an excuse. Using the same logic the drill-through on a normal pivot-table should be unnecessary. Yet, if you watch end-users construct pivots they use it constantly; not just to discover the detail behind a figure but as often as not as a way of validating that the model they&#8217;ve constructed is correct.</p>
<p>This spot-checking of figures is the main &#8216;test methodology&#8217; used in the wild. Spreadsheet &#8216;developers&#8217; do not construct sophisticated test harnesses and procedures. You might argue they should, but they don&#8217;t and likely never will. And as for the &#8216;multi-million row result&#8217; problem, end-users are not idiots they&#8217;re just end-users, they&#8217;ll do it once, and learn to be more careful the next time (or they&#8217;ll use the limit-rows option).</p>
<p>This lack of drill-though will definitely mean I will continue to use normal pivot tables for situations that would otherwise be better solved using PowerPivot. As many such models will be based on relatively small datasets ( sub-100,000  &#8217;facts&#8217;) it might be suggested there&#8217;s no need for PowerPivot. But this is to miss the &#8216;intellectual&#8217; power (as opposed to the massive data crunching power of VeriPaq) at the heart of PowerPivot; the star-schema.</p>
<p>Most of  the commentary on PowerPivot has focused on its ability to handle really large datasets but this emphasis on &#8216;big-data&#8217; (something the rest of the BI industry share) often ignores the power of human-scale small-data (i.e. the world of the spreadsheet jockeys). The power of <a href="http://en.wikipedia.org/wiki/Star_schema">a star-schema</a> to model BI problems (be they small or large) is something I&#8217;ll return to in a later post. (UPDATE: <a href="http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/">http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/</a>)</p>
<p>Star-schema models (particularly when the speed of access worries are removed by an in-memory column-store) are superior in many respects to the fully denormalised flattened tables that we currently build to support pivot-tables and are also more flexible than the multi-dimensional cell approach of pure MOLAP cubes. Combining such data models with the user-friendliness of spreadsheets,alongside the added magic of a modelling language such as DAX (and some MDX where necessary) on a datasmith&#8217;s laptop is the true beauty of PowerPivot.</p>
<p>So, lack of drill-through aside, &#8220;Well done Microsoft!&#8221;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/1011/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/1011/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/1011/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=1011&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/06/29/powerpivot-show-detail-not-allowed/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>SQLite XML Streaming Virtual Table via Expat</title>
		<link>http://blog.gobansaor.com/2010/06/16/sqlite-xml-streaming-virtual-table-via-expat/</link>
		<comments>http://blog.gobansaor.com/2010/06/16/sqlite-xml-streaming-virtual-table-via-expat/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 17:45:26 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[Expat]]></category>
		<category><![CDATA[SQLite Virtual Tables]]></category>
		<category><![CDATA[SQLite Extension]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=989</guid>
		<description><![CDATA[A few weeks ago I wrote a post on handling large XML datasets using MS&#8217;s SAX2 parser from within Excel. Although fast, the SAX2 parser is not as fast as the original of the species, the Expat Streaming XML parser; being &#8230; <a href="http://blog.gobansaor.com/2010/06/16/sqlite-xml-streaming-virtual-table-via-expat/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=989&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I wrote <a href="http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/">a post on handling large XML datasets using MS&#8217;s SAX2 parser</a> from within Excel. Although fast, the SAX2 parser is not as fast as the original of the species, the <a href="http://expat.sourceforge.net/">Expat Streaming XML parser</a>; being written in C and with a minimalist approach Expat&#8217;s speed is to be expected and this speed can be a real time saver.</p>
<p>Being at<a href="http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/"> heart a SQL Jockey</a>, I also like to transfer XML data structures to relational ones at the earliest opportunity, especially when I&#8217;m in discovery mode. Small XML pretty-indented documents can be analysed by eye (wasn&#8217;t this the great selling-point, XML&#8217;s user-friendliness <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  ), but when the dataset is large,ugly and deeply nested I prefer to  load the data into a classic parent-child table such as this&#8230;<br />
<code><br />
<strong> static const char* DDL = "create table FOO("<br />
"fileName HIDDEN text,"<br />
"xmlString HIDDEN text,"<br />
"filter HIDDEN text,"<br />
"deParent text,"<br />
"deChild text,"<br />
"deLevel int,"<br />
"isAttribute text,"<br />
"ChildType text,"<br />
"LineNo HIDDEN int,"<br />
"Offset HIDDEN int,"<br />
"ItemNo HIDDEN int,"<br />
"parentRowid int"<br />
")";</strong></code></p>
<p><code><strong> </strong></code></p>
<p>As I found myself doing this repeatability I decided to code a <a href="http://www.sqlite.org/cvstrac/wiki?p=VirtualTables">SQLite Virtual Table</a> which by wrapping the Expat parser would stream out the parsed data  to populate such tables. The streaming data could either be consumed as it arrived or saved to a real (i.e. non-virtual) table for further analysis. In effect, creating a &#8216;relational DOM&#8217;, but unlike a DOM which is memory bound, it could exist as an in-memory table or as a disk-based one, making the handling and analysis of large datasets easier to achieve. The above snippet of code is the definition of the my VT_xmlPC virtual table. HIDDEN columns are a means of adding utility columns that don&#8217;t necessarily belong to the logical dataset, in this case fileName and xmlString are used to identify the XML to load.</p>
<p>If you&#8217;re interested in writing your own SQLite virtual tables this Dr. Dobb&#8217;s article <a href="http://www.drdobbs.com/database/202802959">Query Anything with SQLite</a> offers a good introduction to the subject. Wrapping Expat adds some extra complexity as you have to deal with two sets of callbacks, SQLite&#8217;s row request events and Expat&#8217;s events, and also with nested Expat parsers; so if you&#8217;re new to virtual tables start with the Dr Dobb&#8217;s example before diving into the xmlPC.c code.</p>
<p>The resulting virtual table is capable of handling most XML &#8216;data documents&#8217; as seen in the wild; as coded, it regards attributes as the equivalent of nested elements (with isAttribute = &#8216;Y&#8217;), and it doesn&#8217;t handle CDATA (but could easily be modified to so do).</p>
<p>To use it:</p>
<p>(a) load the xmlPC.dll <a href="http://www.sqlite.org/cvstrac/wiki?p=LoadableExtensions">extension</a></p>
<p>(b) create virtual table myTableName using VT_xmlPC</p>
<p>(c) select * from myTableName where fileName matches &#8220;C:\some xml file&#8221; or select * from myTableName where xmlString matches &#8221; &#8230;&#8221;</p>
<p>The &#8220;matches&#8221; predicates filter prior to row creation, &#8220;=&#8221;s etc. filter after row is constructed. The &#8220;fileName&#8221;, &#8220;xmlString&#8221; and &#8220;filter&#8221; columns are &#8220;matches&#8221; only columns as they initiate the creation of the table (using &#8220;matches&#8221; rather than &#8220;=&#8221;s for such utility columns is my own convention not a requirement of SQlite virtual tables). The &#8220;filter&#8221; column will restrict loading to elements of that name and the children nodes of that element.</p>
<p>Find here:</p>
<ul>
<li><a href="http://www2.gobansaor.com/share/xmlPC.c">the xmlPC.c code</a>,</li>
<li><a href="http://www2.gobansaor.com/share/xmlPC.dll">the VT_xmlPC virtual table implemented as a DLL SQLite extension</a>.</li>
</ul>
<p>Have fun.</p>
<p><em>Update:</em></p>
<p><em>The source code link on the Dr. Dobbs article appears to be broken, the code can be found </em><a href="ftp://66.77.27.238/sourcecode/ddj/2007/"><em>here</em></a><em> (Note: this is an FTP link) in the 0711.zip file. </em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/989/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/989/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=989&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/06/16/sqlite-xml-streaming-virtual-table-via-expat/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>JavaScript as an Excel scripting language via JSDB</title>
		<link>http://blog.gobansaor.com/2010/06/04/javascript-as-an-excel-scripting-language-via-jsdb/</link>
		<comments>http://blog.gobansaor.com/2010/06/04/javascript-as-an-excel-scripting-language-via-jsdb/#comments</comments>
		<pubDate>Fri, 04 Jun 2010 17:44:59 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=953</guid>
		<description><![CDATA[A few years back I posted about JavaScript as an Excel scripting language via ExcelDNA. That involved using JavaScript (in the guise of JScript.NET) as an ExcelDNA scripting language. It was purely an academic exercise to prove it could be done, I continued &#8230; <a href="http://blog.gobansaor.com/2010/06/04/javascript-as-an-excel-scripting-language-via-jsdb/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=953&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A few years back I posted about <a href="http://blog.gobansaor.com/2007/10/04/javascript-as-an-excel-scripting-language-via-exceldna/">JavaScript as an Excel scripting language via ExcelDNA</a>. That involved using JavaScript (in the guise of JScript.NET) as an <a href="http://exceldna.typepad.com/">ExcelDNA </a> scripting language. It was purely an academic exercise to prove it could be done, I continued to use C# (or increasingly VB.NET) to build .NET user defined functions. This time however, I&#8217;ve managed to embed JavaScript (in the guise of <a href="http://www.mozilla.org/js/spidermonkey/">Mozilla Foundation&#8217;s SpiderMonkey</a>) directly via a native C interface not to prove I could do it (even though there&#8217;s a definite satisfaction in simply doing it) but to use it.</p>
<p>Why add another scripting language to xLite, hasn&#8217;t it already got Python?</p>
<p>True, Python is and remains a very powerful add-on to <a href="http://www.gobansaor.com/xlite">xLite</a>. It&#8217;s a mature and long-established language, popular amongst IT professionals and &#8220;citizen programmers&#8221; alike. But, it&#8217;s a bit of a monster and can be awkward to package, particularly on Windows. By using <a href="http://www.py2exe.org/index.cgi/Tutorial">Py2Exe</a> and after a lot of digging on <a href="http://groups.google.com/group/wxPython-users/msg/dfad0122afda5d21?pli=1">the issue of manifest files</a> I have managed to package and isolate xLite&#8217;s Pythonic bits so that it can be used on a PC without first installing the required Python version (<del datetime="2010-06-11T10:46:38+00:00">I&#8217;ve only tested against V2.6, Python&#8217;s lack of a side-by-side Windows installation capability is a major pain-in-the-butt</del>   bad news: tested against Python 2.5 &amp; it doesn&#8217;t work; good news: side-by-side is possible; simply change the system path to reflect which-ever version you wish to run at the command line; xLite will (must) continue to use V2.6). <del datetime="2010-06-11T10:47:57+00:00">This &#8220;version-hell&#8221; mitigates against using Python as a core-element of xLite, fine for those of us who are comfortable with and require the full power of Python, but not as the tool&#8217;s primary scripting environment.</del></p>
<p>No, what I need is:</p>
<ul>
<li>a light (single EXE or DLL preferably) and an approachable popular language,</li>
<li>with native SQLite support,</li>
<li>runable as standalone executable (on both Windows &amp; Linux),</li>
<li>embeddable (is that a word?) in Excel via VBA-friendly DLL.</li>
</ul>
<p>Add to that essential list some nice-to-haves such as:</p>
<ul>
<li>native COM-interface support (for the likes of ADO etc.),</li>
<li>native networking support, for HTTP, raw TCP sockets etc.,</li>
<li>native (and easy to use) XML and JSON parsers and emitters,</li>
<li>ability to spawn detached/attached command line processes and the ability to stream data to and from such processes; allowing me to easily orchestrate &amp; provide a &#8220;grid&#8221; of processes (scripted, command line executables, Excel instances) both local and remote (with remote being either traditional servers, http servers or <a href="http://hadoop.apache.org/common/docs/current/streaming.html">Hadoop Streaming</a> grids).</li>
</ul>
<p>While looking at the various implementations of Javascipt as a server/shell language, I came across <a href="http://www.jsdb.org/">JSDB &#8211; JavaScript for Databases</a>. It&#8217;s a C++ wrapper around Mozilla&#8217;s SpiderMonkey, with lots of useful data related utility classes added. To make JSDB a perfect fit it simply required:</p>
<ul>
<li>a few minor changes to the <a href="http://bit.ly/bgwKI4">SQlite class</a> (allow the loading of Virtual Table extensions, add the ability to pass in the address of already open SQLite memory structures);</li>
<li>a linker change to use the DLL version of SQLite;</li>
<li>plus a <a href="http://bit.ly/aNYRCk">VBA-friendly DLL wrapper</a> ( &amp; <a href="http://bit.ly/dynHWD">VBA declares</a>, to call the <a href="http://bit.ly/arH2fB">DLL</a>) to replace the JSDB shell when embedding in Excel.</li>
</ul>
<p>From my <a href="http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/">SQLite as the MP3 of Data</a> post:  &#8221;<em>Just as &#8220;fractional horsepower” electrical motors revolutionised manufacturing and eventually all our lives (car starter-motors, fridge motors, washing machines etc.), “fractional horsepower” databases can do the same for data. Distributing data to where it is needed.&#8221; </em>I can now add a distributed &#8220;fractional horsepower&#8221; processing engine for that distributed data. This transforms <a href="http://www.gobansaor.com/xlite">xLite</a> from a micro-ETL platform into one capable of handling (or at least orchestrating) practically any ETL (Extract, Transform &amp; Load),DI (data integration) or &#8220;Time Asset&#8221; (<a href="http://blog.gobansaor.com/2010/05/12/time-assets/">see this post</a>) process.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/953/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/953/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/953/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=953&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/06/04/javascript-as-an-excel-scripting-language-via-jsdb/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>Time Assets</title>
		<link>http://blog.gobansaor.com/2010/05/12/time-assets/</link>
		<comments>http://blog.gobansaor.com/2010/05/12/time-assets/#comments</comments>
		<pubDate>Wed, 12 May 2010 09:48:56 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[GoogleApps]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[ITC skills]]></category>
		<category><![CDATA[Spreadsheets]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=928</guid>
		<description><![CDATA[This Stephen Hawkins article on &#8220;How to build a time machine&#8221; (all that&#8217;s needed is a wormhole, the Large Hadron Collider or a rocket that goes really,really fast) is well worth a read. The concept of time travel was, for most &#8230; <a href="http://blog.gobansaor.com/2010/05/12/time-assets/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=928&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://gobansaor.files.wordpress.com/2010/05/time-machine.jpg"><br />
<img class="size-medium wp-image-931 alignleft" title="time-machine (by Daniel Cardle)" src="http://gobansaor.files.wordpress.com/2010/05/time-machine.jpg?w=300&#038;h=224" alt="" width="300" height="224" /></a></p>
<p style="text-align:left;">This Stephen Hawkins article on <a href="http://www.dailymail.co.uk/home/moslive/article-1269288/STEPHEN-HAWKING-How-build-time-machine.html">&#8220;How to build a time machine&#8221;</a> (all that&#8217;s needed is a wormhole, the Large Hadron Collider or a rocket that goes really,really fast) is well worth a read.</p>
<p style="text-align:left;">The concept of time travel was, for most of my life, simply science fiction, but it&#8217;s now looking more &amp; more like science fact. Most science-fiction plots involving time-travel tend to involve travelling to the past; this, however, is not part of the emerging time travelling theory, moving forward in time seems the only option.</p>
<p style="text-align:left;">We may not be able to go back in time but we humans have become adept at &#8220;capturing time&#8221; and packaging it for reuse later on; our early ancestors spent valuable time crafting  tools and honing skills that they figured would repay any time spent many times over; they were in fact investing in time assets.</p>
<p style="text-align:left;">Software is perhaps human kind&#8217;s greatest time asset generator, similar in concept to the tools and machines we&#8217;ve always built, but nearly totally frictionless and with the potential of immense returns on the asset once the initial upfront cost has been met. Yet many are leaving our formal education systems with no idea of the power of software to harness time, to save it, shape it and reuse it again &amp; again. They have not been taught to program.</p>
<p style="text-align:left;">I&#8217;m not suggesting here that every student be forced to study computer science, no, just for them to be introduced to the practical everyday uses of programming (with some formal theory as a foundation) &#8211; Applied Computing, if you like. In fact, if hardcore geeks consider the course to be rubbish and refuse to take it, then you know you&#8217;re hitting the right note.</p>
<p style="text-align:left;">At a minimum, everybody should be taught the basics and the possibilities of spreadsheets.  Although using Excel for this purpose would be more &#8220;saleable&#8221; once students hit the streets and join the work force, I would think that Google Docs Spreadsheets would be a better option as a teaching tool, because:</p>
<ul style="text-align:left;">
<li>Firstly, it would be cheaper, no licences, minimum hardware requirement (anything with a browser) and <a href="http://www.google.com/google-d-s/tour1.html">the collaboration features of Goggle Docs</a> in general are ideally suited for use in education.</li>
<li>Secondly, such training should not be primarily vocational, it should be about learning the possibilities of end-user programming.</li>
<li>Excel&#8217;s macro language is VBA, a noble language with a long distinguished history, but a language that its owners have abandoned. Google Doc&#8217;s<a href="http://www.google.com/google-d-s/scripts/scripts.html"> scripting language is JavaScript</a>, like VB a language that has often been much maligned, but unlike VB, it&#8217;s a language with a future, it&#8217;s the magic behind the browser. So students would not only learn the fundamentals of spreadsheets but would through the courses&#8217; scripting modules learn a language that lies at the heart of their everyday computing experience.</li>
<li>Google Docs can also be manipulated via a web-based API and can be embedded in web pages. So again students would learn the fundamentals of <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST</a> and basic HTML markup, the underlying architecture of the WWW .</li>
</ul>
<p style="text-align:left;">Studying such a course, would not only teach a useful life skill (the manipulation of numbers and lists and the automation of such tasks to create time assets) but would also provide an understanding of the building blocks of modern IT.</p>
<p style="text-align:left;">We need more, and better prepared (dare I say, trained) citizen programmers; there&#8217;ll never be enough professional programmers to go around and even if there were, the cost will continue to be prohibitive in many situations (both the financial cost and the time cost of keeping professional programmers aligned with (or even aware of) the business needs of multitudes of organisations).</p>
<p style="text-align:left;">Just like the right to bear arms was regarded as a necessity in the frontier society of 18th century America, the right (and the basic skills) to program is a necessity on our modern IT frontier. Not everybody will use (or indeed even be capable of using, or allowed to use) that right, but for millions of others, having the power to build time assets for themselves or their businesses will be one of their most prized skills.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/928/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/928/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/928/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=928&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/05/12/time-assets/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://gobansaor.files.wordpress.com/2010/05/time-machine.jpg?w=300" medium="image">
			<media:title type="html">time-machine (by Daniel Cardle)</media:title>
		</media:content>
	</item>
		<item>
		<title>SAX and Bugs and XBRuLe</title>
		<link>http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/</link>
		<comments>http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/#comments</comments>
		<pubDate>Thu, 06 May 2010 12:26:10 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[XBRL]]></category>
		<category><![CDATA[SAX]]></category>
		<category><![CDATA[SAX2 MSXML]]></category>
		<category><![CDATA[business reporting]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=908</guid>
		<description><![CDATA[Okay, the XBRuLe is a bit laboured, should be SAX &#38; bugs &#38; XBRL, but any excuse to play some Ian Dury Bugs (the programming type, not the creepy-crawlies), Simple API for XML and Extended Business Reporting Language;  these represented &#8230; <a href="http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=908&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Okay, the XBRuLe is a bit laboured, should be <a href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX</a> &amp; bugs &amp; <a href="http://www.xbrl.org/Home/">XBRL</a>, but any excuse to play some Ian Dury <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><span style="text-align:center; display: block;"><a href="http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/"><img src="http://img.youtube.com/vi/gBLeVcP_JQg/2.jpg" alt="" /></a></span></p>
<p>Bugs (the programming type, not the creepy-crawlies), Simple API for XML and Extended Business Reporting Language;  these represented the trinity of my concerns for the last three weeks  or so.</p>
<h3>First, the bugs:</h3>
<p>Several weeks back, I decided that the C portion of <a href="http://www.gobansaor.com/xLite">xLite</a> needed an overhaul. The codebase contained a lot of stuff that I no longer used and also contained code that I&#8217;d written when I first re-started using C (after a lapse of 20 years or so, the phrase &#8220;I&#8217;d forgotten more than I&#8217;d ever known&#8221; sums up the experience best); some of this code was <a href="http://en.wikipedia.org/wiki/Memory_leak">memory-leaking</a> like a sieve.</p>
<p>Also, the original Pivotal Solutions code was not UTF-8 enabled, instead it used the host pc&#8217;s default character set codepage, this needed to change (if you don&#8217;t know what I&#8217;m talking about see <a href="http://www.joelonsoftware.com/articles/Unicode.html">Joel Spolsky&#8217;s lecture </a>to the developers of the world &#8211; well actually, primarily to those of us in the Anglo-Saxon &#8220;ascii-will-do-fine&#8221; world).</p>
<p>The bugs I introduced as a result of this upgrade were not of the logical kind but of a much nastier type, peculiar to the low-level world of C, &#8220;bad free()&#8221; bugs!</p>
<p>Excel was no longer leaking memory (well no more than it normally does) but it was crashing randomly (usually in a DDL called VB7, 1st upgrade to classic VB in over a decade!), a sure sign I was freeing memory that was not mine to free. Two days later, I&#8217;d tracked the bugs down, but only by a painful line by line code walk-through. If you&#8217;ve no idea what I&#8217;m talking about here, count your blessings and move on.</p>
<p>The other major change I added to xLite is the ability to code <a href="http://www.drdobbs.com/database/202802959">SQLite Virtual Tables</a> in VBA. The Python side of xLite has always had that facility, but I look on Python as a nice-to-have add-on, not as a core component. The growing need for &#8220;core&#8221; virtual tables meant either coding them in C or in VBA, see previous paragraphs for why VBA won the day.</p>
<p>The immediate driver for adding both UTF-8 support and quick-to-build virtual tables was the need to better handle XML data within xLite.</p>
<h3>&#8230; then the SAX:</h3>
<p>For small XML/HTML datasets, I, like the rest of the world, use DOM manipulation; but for larger sets I&#8217;ve tended to go down the <em>brute force and ignorance</em> approach of hand coded File I/O combined with <a href="http://blog.gobansaor.com/2008/07/01/regular-expressions-as-an-end-user-programming-tool/">regular expressions</a> to efficiently parse out the data required.</p>
<p>Last week, an email from a datasmith name Cathy prompted me to look into using Sax for loading XML. Cathy, like most datasmiths, is not a professional programmer, she has a &#8220;real job&#8221;; part of that job is analysing large datasets, she&#8217;s learned enough programming (mainly Access &amp; Excel) to do that job more efficiently. The data she needed to parse this time was encoded in XML, but being very large and built on a schema that constantly changed, the default <a href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM</a> approach overpowered both Access and Excel.</p>
<p>Cathy had originally contacted me looking for information on using <a href="http://blog.gobansaor.com/2008/08/02/talend-sqlite-groovy-the-new-oracle/">Talend</a> to read the data and it looked like she was about to start a new side-career as a Java programmer. I figured there must be a way for her to leverage her existing skill-set (VBA) &amp; this led me to <a href="http://msdn.microsoft.com/en-us/library/ms762776(VS.85).aspx">MSXML&#8217;s implementation of SAX2</a>. She was delighted; although many of the concepts would have been new to her, at least they were bounded within a world she was already comfortable with (the basis of how we all manage to incrementally expand our knowledge).</p>
<p>The only problem was, the example code no longer existed (it would have been in VB6, but who uses that these days, other than a few million VBA para-programmers, <em>let them eat the .NET cake</em>). So I coded up the first example in Excel/VBA and <a href="http://bit.ly/bdLBJS">here it is if you need a quick-start to the joys of SAX2</a>.</p>
<h3>Which leads me on to XBRuLe:</h3>
<h6><em>&#8220;One XML to rule</em><em> them all, One XML to find them, One XML to bring them all and in the darkness bind them&#8230;&#8221;</em></h6>
<p>A former colleague of mine when explaining his computer science studies to <a href="http://en.wikipedia.org/wiki/County_Meath">Meath</a> farmers whom he regularly met while hitch-hiking home from college (mid-1970s) was usually met with the response: <em>&#8220;Ah computers, dere de </em><em>comin</em><em>&#8216; </em><em>t&#8217;ing</em><em>&#8220;</em>. XBRL has been the <em>coming thing</em> for quite a while now.</p>
<p>Being in the business of Business Reporting, XBRL has always been on the radar, and of late, the radar is showing incoming fire. First the <a href="http://hitachidatainteractive.com/2010/01/11/xbrl-filings-for-the-sec-not-for-the-faint-of-heart-part-iv/">SEC</a>, and now the <a href="http://www.hmrc.gov.uk/ebu/ct_techpack/index.htm">UK&#8217;s </a><a href="http://www.hmrc.gov.uk/ebu/ct_techpack/index.htm">HMRC</a>, are mandating it as a filing method. Whether this is a good thing or not, is open to question. As this article puts it: &#8220;XBRL is a  case study in complexity&#8221; <a rel="nofollow" href="http://bit.ly/951oEY" target="_blank">http://bit.ly/951oEY</a> &#8220;the producer of the sample must have suffered a polymorphic recursive brain meltdown&#8221;.</p>
<p>But needs be; I&#8217;m in the business of shaping difficult data, so I&#8217;ve started to re-acquaint myself with the subject (last time I looked at XBRL in any depth was 2004). Part of that process will be to beef-up xLite&#8217;s XML capability, which, with me being on the table side of the <a href="http://blog.gobansaor.com/2007/03/03/tables-vs-xml-the-data-lingua-franca-debate/">&#8220;Tables Vs. XML; the data lingua franca debate</a>&#8220;, will involve getting the data into a relational form at the earliest possible moment. For example, for discovery I would use a classic parent-child recursive structure, but not having something like <a href="http://www.adp-gmbh.ch/ora/sql/connect_by.html">Oracle&#8217;s Start-with</a>, adding  a virtual table to make navigating such hierarchies easier with SQLite.</p>
<p>If anybody is peddling to you the concept that this brave new word of XBRL powered reporting will make your business reporting life easier, they&#8217;re either lying or don&#8217;t fully understand what they&#8217;re selling. As 19th century industrialists were wont to say: &#8220;Where there&#8217;s muck, there&#8217;s brass&#8221;; and with XBRL, you&#8217;ll be up to your <a href="http://www.medterms.com/script/main/art.asp?articlekey=25484">oxters</a> in muck, but with the brass all flowing to others, perhaps even to me!</p>
<div><span style="font-family:'Lucida Grande', sans-serif;line-height:16px;font-size:14px;"><span class="status-content" style="margin:0;padding:0;"><span class="entry-content" style="margin:0;padding:0;"><br />
</span></span></span></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/908/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/908/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/908/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=908&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/05/06/sax-and-bugs-and-xbrool/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>

		<media:content url="http://img.youtube.com/vi/gBLeVcP_JQg/2.jpg" medium="image" />
	</item>
		<item>
		<title>Excel as a document-oriented NoSQL database</title>
		<link>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/</link>
		<comments>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 17:48:44 +0000</pubDate>
		<dc:creator>gobansaor</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[CouchDb]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[document oriented]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=877</guid>
		<description><![CDATA[I&#8217;ve been a long time fan of CouchDB, one of the many NoSQL databases to appear in the last few years. CouchDB is a document-oriented database, which with solid B-tree indexing and easy replication, topped off by a MapReduce style view &#8230; <a href="http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=877&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been a <a href="http://blog.gobansaor.com/2007/09/14/couchdb-doucument-centric-ods/">long time fan of CouchDB</a>, one of the many <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL databases</a> to appear in the last few years. <a href="http://couchdb.apache.org/">CouchDB</a> is a document-oriented database, which with solid <a href="http://en.wikipedia.org/wiki/B-tree">B-tree indexing</a> and easy replication, topped off by a <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> style view mechanism, puts it up there as a best-of-breed noSQL datastore.</p>
<p>Now it may seem strange that somebody whose <em><strong><a href="http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/">SQL &#8211; does exactly what it says on the tin</a></strong></em> post clearly marks him out as an RDBMS fanboy, can also sing the praises of a noSQL database. Are they not mutually exclusive? To many, particularly in the noSQL world, this appears to be the case, with some clearly determined to re-invent the wheel, ignoring the lessons learned by relational database practitioners.</p>
<p>The main advantage to me of document-oriented databases, such as CouchDB, is the ease of setup and subsequent pain-free evolution of data models that comes with a <a href="http://blog.mongodb.org/post/119945109/why-schemaless">schema-less</a> database. The main disadvantage is the relative rigidity of downstream analysis built into most such databases. MapReduce, such as used by CouchDB, is fine for predefined views developed by programmers, but as we know, reporting never stops; datastores front-ended by a SQL interpreter open up the data within to a much wider audience (be that through hand-crafted SQL queries or more likley via reporting-tool generated SQL)</p>
<p>Of course document-oriented, noSQL, schema-less datastores have been all the rage with end-users for close on 30 years. They&#8217;re called spreadsheets. Excel has over the years added features (such as list handing &amp; filtering) that have made the spreadsheet the database of choice for millions. Anybody who deals in corporate data is aware (sometimes painfully aware) of just how much data is stored in these Data Populi repositories.</p>
<p>I, as an IT professional, am aware that Excel workbooks as books-of-record, have been, and continue to be, the cause of many data quality problems. Yet, I&#8217;ve also seen, and am myself responsible for, many successful Excel &#8216;database implementations&#8217;. Take for example, my filing system.</p>
<p>I don&#8217;t have a filing cabinet, instead I use small stackable cardboard boxes to store documents. As I receive or generate documents I simply place them in the current open box. Every so often, usually prompted by a VAT or other tax return deadline looming, I record what&#8217;s in the box, and if the box is looking full or maybe it&#8217;s end-of-year, I&#8217;ll &#8216;close&#8217; the box and open a new one.</p>
<p>Each box is represented by a separate workbook, each document by a separate worksheet. Some documents such as electronic Sales Invoices may not require a physical copy simply a link to a PDF, but I still tend to store a printed copy. Others, such as Purchase Invoices, have their details manually copied from the original paper based document, I usually also add a hyperlink to an image of the source document. (I no longer use my scanner, instead I use my phone camera to record paper documents).</p>
<p>Bank reconciliation involves recording the bank item ref against the appropriate document and linking back to the Bank Statement worksheet  (which as I still receive paper-statements consists simply of a link to a photo of the statement and basic info such as date of statement and whether or not I&#8217;ve reconciled it).</p>
<p><a href="http://www.revenue.ie/en/tax/vat/index.html">VAT Return</a> documents are generated using links back to source documents and a link to an image of the completed paper return (not yet signed up for <a href="http://www.revenue.ie/en/online/ros/index.html">ROS</a>). Similar documents are generated for year-end tax returns &amp; accounts.</p>
<p>So my &#8216;filing system&#8217; is also my &#8216;accounts system&#8217;. This is common practice amongst small (and not so small) businesses. The advantage of this approach rather than using a &#8220;proper accounts system&#8221;  is the simplicity and the in-depth knowledge it forces me to have of &#8216;my data&#8217;.</p>
<p>But can this type of thing scale, and what of the businesses that are using similar systems to manage thousands or indeed 10s of thousands of documents or transactions? The simple answer is no,  at least not without a semi-automated process and a cost-effective means of analysing the data; many such systems are on the road to disaster. That disaster may take the form of data quality issues or the significant (and often hidden) cost of operating such systems (often the operators are highly paid accounting staff or managers whose cost is buried in general overhead costs, unlike internal or external IT resources whose time tends to be project allocated).</p>
<p>But again, I and others, have managed to setup systems such as these that were  cost-effective (not just in initial construction but in ongoing running costs) and managed to maintain data quality. This usually involved building a simple work-flow process, automating to some degree but keeping the human touch as much as possible. My <a href="http://www.gobansaor.com/xlite">xLite datasmithing platform</a> had its beginnings in such RSS (Really Simple Systems) scenarios. Many such &#8220;systems&#8221; were IT driven <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a> processes or data cleansing initiatives, others, business initiatives such as sales planning/budgeting or customer surveys.</p>
<p>I haven&#8217;t used <a href="http://www.gobansaor.com/xlite">xLite</a> to automate my filing system (my transactional volumes are too low and my motto when it comes to systems is, &#8220;good enough&#8221; will do), instead, relying on standard spreadsheet formulas and few bits of VBA, but if I suddenly found myself at the business end of a fire-hose of documents I could easily do so.</p>
<p>Much like CouchDB, I could  create &#8216;map&#8217; views of my documents, but instead of MapReduce Javascript code, I&#8217;d  load the documents into SQLite tables (using a <a href="http://en.wikipedia.org/wiki/Duck_typing">duck typing approach</a>; if the document had the required data, e,g, Invoice No, etc. for Sales Invoices, load, otherwise ignore). The &#8216;reduce&#8217; part would then be standard SUM(), Group By SQL statements.</p>
<p>I could also mine the documents for text and then use SQLite&#8217;s <a href="http://sqlite.org/fts3.html">FTS full-text searching</a> to create a free-format search index or use<a href="http://blog.gobansaor.com/2009/09/29/tag-cubes-sqlite-star-query-part-iii/"> xLite&#8217;s TAG Cube functionality</a> for a more formal, hierarchy supporting, tagging index.</p>
<p>If I needed to share the system with others in my organisation I could use a light simple distributed version control system such the<a href="http://www.fossil-scm.org/index.html/doc/tip/www/index.wiki"> SQLite based  Fossil</a>. This would allow for many of the replication benefits that CouchDB offers.</p>
<p>In fact, if I wanted to backend the system with a server based database I could call in the services of CouchDB itself. Easily done as xLite has inbuilt Python support and the library that xLite uses to interact with SQLite on the Python side is <a href="http://code.google.com/p/apsw/">APSW</a>. And guess what, APSW now includes a <a href="http://apsw.googlecode.com/svn/publish/vtable.html#virtualtables">virtual table implementation</a> that lets you <a href="http://apsw.googlecode.com/svn/publish/couchdb.html">access CouchDB databases from SQLite</a>. Excel as a front-end to CouchDB!</p>
<p>If the &#8216;<a href="http://ronanfitzgerald.net/everythingelse/?p=8">it does exactly what it says on the tin</a>&#8216;  Ronseal catch-phrase epitomises SQL  then perhaps <a href="http://www.comparethemeerkat.com/my-tv-ads">&#8216; Simples</a>&#8216; as <a href="http://www.guardian.co.uk/media/2010/jan/16/aleksander-orlov-price-comparison-ads">Alexandr the Meerkat</a> might say epitomises the potential of document-based databases.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/877/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&amp;blog=110633&amp;post=877&amp;subd=gobansaor&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
	</channel>
</rss>