<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Gobán Saor &#187; ETL</title>
	<atom:link href="http://blog.gobansaor.com/category/etl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gobansaor.com</link>
	<description>A country datasmith.</description>
	<lastBuildDate>Tue, 02 Mar 2010 19:32:41 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='blog.gobansaor.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/67e164f5d51c2b3115a7819b84505c13?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>Gobán Saor &#187; ETL</title>
		<link>http://blog.gobansaor.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.gobansaor.com/osd.xml" title="Gobán Saor" />
	<atom:link rel='hub' href='http://blog.gobansaor.com/?pushpress=hub'/>
		<item>
		<title>Excel as a document-oriented NoSQL database</title>
		<link>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/</link>
		<comments>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 17:48:44 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[CouchDb]]></category>
		<category><![CDATA[document oriented]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=877</guid>
		<description><![CDATA[I&#8217;ve been a long time fan of CouchDB, one of the many NoSQL databases to appear in the last few years. CouchDB is a document-oriented database, which with solid B-tree indexing and easy replication, topped off by a MapReduce style view mechanism, puts it up there as a best-of-breed noSQL datastore.
Now it may seem strange that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=877&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been a <a href="http://blog.gobansaor.com/2007/09/14/couchdb-doucument-centric-ods/">long time fan of CouchDB</a>, one of the many <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL databases</a> to appear in the last few years. <a href="http://couchdb.apache.org/">CouchDB</a> is a document-oriented database, which with solid <a href="http://en.wikipedia.org/wiki/B-tree">B-tree indexing</a> and easy replication, topped off by a <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> style view mechanism, puts it up there as a best-of-breed noSQL datastore.</p>
<p>Now it may seem strange that somebody whose <em><strong><a href="http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/">SQL &#8211; does exactly what it says on the tin</a></strong></em> post clearly marks him out as an RDBMS fanboy, can also sing the praises of a noSQL database. Are they not mutually exclusive? To many, particularly in the noSQL world, this appears to be the case, with some clearly determined to re-invent the wheel, ignoring the lessons learned by relational database practitioners.</p>
<p>The main advantage to me of document-oriented databases, such as CouchDB, is the ease of setup and subsequent pain-free evolution of data models that comes with a <a href="http://blog.mongodb.org/post/119945109/why-schemaless">schema-less</a> database. The main disadvantage is the relative rigidity of downstream analysis built into most such databases. MapReduce, such as used by CouchDB, is fine for predefined views developed by programmers, but as we know, reporting never stops; datastores front-ended by a SQL interpreter open up the data within to a much wider audience (be that through hand-crafted SQL queries or more likley via reporting-tool generated SQL)</p>
<p>Of course document-oriented, noSQL, schema-less datastores have been all the rage with end-users for close on 30 years. They&#8217;re called spreadsheets. Excel has over the years added features (such as list handing &amp; filtering) that have made the spreadsheet the database of choice for millions. Anybody who deals in corporate data is aware (sometimes painfully aware) of just how much data is stored in these Data Populi repositories.</p>
<p>I, as an IT professional, am aware that Excel workbooks as books-of-record, have been, and continue to be, the cause of many data quality problems. Yet, I&#8217;ve also seen, and am myself responsible for, many successful Excel &#8216;database implementations&#8217;. Take for example, my filing system.</p>
<p>I don&#8217;t have a filing cabinet, instead I use small stackable cardboard boxes to store documents. As I receive or generate documents I simply place them in the current open box. Every so often, usually prompted by a VAT or other tax return deadline looming, I record what&#8217;s in the box, and if the box is looking full or maybe it&#8217;s end-of-year, I&#8217;ll &#8216;close&#8217; the box and open a new one.</p>
<p>Each box is represented by a separate workbook, each document by a separate worksheet. Some documents such as electronic Sales Invoices may not require a physical copy simply a link to a PDF, but I still tend to store a printed copy. Others, such as Purchase Invoices, have their details manually copied from the original paper based document, I usually also add a hyperlink to an image of the source document. (I no longer use my scanner, instead I use my phone camera to record paper documents).</p>
<p>Bank reconciliation involves recording the bank item ref against the appropriate document and linking back to the Bank Statement worksheet  (which as I still receive paper-statements consists simply of a link to a photo of the statement and basic info such as date of statement and whether or not I&#8217;ve reconciled it).</p>
<p><a href="http://www.revenue.ie/en/tax/vat/index.html">VAT Return</a> documents are generated using links back to source documents and a link to an image of the completed paper return (not yet signed up for <a href="http://www.revenue.ie/en/online/ros/index.html">ROS</a>). Similar documents are generated for year-end tax returns &amp; accounts.</p>
<p>So my &#8216;filing system&#8217; is also my &#8216;accounts system&#8217;. This is common practice amongst small (and not so small) businesses. The advantage of this approach rather than using a &#8220;proper accounts system&#8221;  is the simplicity and the in-depth knowledge it forces me to have of &#8216;my data&#8217;.</p>
<p>But can this type of thing scale, and what of the businesses that are using similar systems to manage thousands or indeed 10s of thousands of documents or transactions? The simple answer is no,  at least not without a semi-automated process and a cost-effective means of analysing the data; many such systems are on the road to disaster. That disaster may take the form of data quality issues or the significant (and often hidden) cost of operating such systems (often the operators are highly paid accounting staff or managers whose cost is buried in general overhead costs, unlike internal or external IT resources whose time tends to be project allocated).</p>
<p>But again, I and others, have managed to setup systems such as these that were  cost-effective (not just in initial construction but in ongoing running costs) and managed to maintain data quality. This usually involved building a simple work-flow process, automating to some degree but keeping the human touch as much as possible. My <a href="http://www.gobansaor.com/xlite">xLite datasmithing platform</a> had its beginnings in such RSS (Really Simple Systems) scenarios. Many such &#8220;systems&#8221; were IT driven <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a> processes or data cleansing initiatives, others, business initiatives such as sales planning/budgeting or customer surveys.</p>
<p>I haven&#8217;t used <a href="http://www.gobansaor.com/xlite">xLite</a> to automate my filing system (my transactional volumes are too low and my motto when it comes to systems is, &#8220;good enough&#8221; will do), instead, relying on standard spreadsheet formulas and few bits of VBA, but if I suddenly found myself at the business end of a fire-hose of documents I could easily do so.</p>
<p>Much like CouchDB, I could  create &#8216;map&#8217; views of my documents, but instead of MapReduce Javascript code, I&#8217;d  load the documents into SQLite tables (using a <a href="http://en.wikipedia.org/wiki/Duck_typing">duck typing approach</a>; if the document had the required data, e,g, Invoice No, etc. for Sales Invoices, load, otherwise ignore). The &#8216;reduce&#8217; part would then be standard SUM(), Group By SQL statements.</p>
<p>I could also mine the documents for text and then use SQLite&#8217;s <a href="http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex">FTS full-text searching</a> to create a free-format search index or use<a href="http://blog.gobansaor.com/2009/09/29/tag-cubes-sqlite-star-query-part-iii/"> xLite&#8217;s TAG Cube functionality</a> for a more formal, hierarchy supporting, tagging index.</p>
<p>If I needed to share the system with others in my organisation I could use a light simple distributed version control system such the<a href="http://www.fossil-scm.org/index.html/doc/tip/www/index.wiki"> SQLite based  Fossil</a>. This would allow for many of the replication benefits that CouchDB offers.</p>
<p>In fact, if I wanted to backend the system with a server based database I could call in the services of CouchDB itself. Easily done as xLite has inbuilt Python support and the library that xLite uses to interact with SQLite on the Python side is <a href="http://code.google.com/p/apsw/">APSW</a>. And guess what, APSW now includes a <a href="http://apsw.googlecode.com/svn/publish/vtable.html#virtualtables">virtual table implementation</a> that lets you <a href="http://apsw.googlecode.com/svn/publish/couchdb.html">access CouchDB databases from SQLite</a>. Excel as a front-end to CouchDB!</p>
<p>If the &#8216;<a href="http://ronanfitzgerald.net/everythingelse/?p=8">it does exactly what it says on the tin</a>&#8216;  Ronseal catch-phrase epitomises SQL  then perhaps <a href="http://www.comparethemeerkat.com/my-tv-ads">&#8216; Simples</a>&#8216; as <a href="http://www.guardian.co.uk/media/2010/jan/16/aleksander-orlov-price-comparison-ads">Alexandr the Meerkat</a> might say epitomises the potential of document-based databases.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/877/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/877/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/877/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=877&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/03/02/excel-as-a-document-oriented-nosql-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>Excel 2010 Application.Caller Bug</title>
		<link>http://blog.gobansaor.com/2010/02/11/excel-2010-application-caller-bug/</link>
		<comments>http://blog.gobansaor.com/2010/02/11/excel-2010-application-caller-bug/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 20:45:04 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[Application.Caller]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[Excel 2010]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=851</guid>
		<description><![CDATA[I&#8217;ve just released another xLite &#8220;introduction&#8221;, this time the xLiteWorkbookFunction function. I&#8217;ve had most of the now released functionality working (and in use) for quite a while but had delayed publishing until I&#8217;d installed Excel 2010 as I&#8217;d wished to test against a modern Excel version.
I&#8217;d not bothered with Excel 2007, as I couldn&#8217;t see the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=851&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just released another <a href="http://www.gobansaor.com/xlite">xLite &#8220;introduction&#8221;</a>, this time the <a href="http://www.gobansaor.com/introducing-xlite/xliteworkbookfunction">xLiteWorkbookFunction</a> function. I&#8217;ve had most of the now released functionality working (and in use) for quite a while but had delayed publishing until I&#8217;d installed Excel 2010 as I&#8217;d wished to test against a modern Excel version.</p>
<p>I&#8217;d not bothered with Excel 2007, as I couldn&#8217;t see the advantage over Excel 2003, but Excel&#8217;s new <a href="http://www.powerpivot.com/">PowerPivot</a> is one hell of a reason to upgrade to 2010. I&#8217;d preformed a quick test against 2007 by installing a trial version on an EC2 Window&#8217;s image and it had appeared to work fine; but it was a different story under 2010,strange things started to happen.</p>
<p>The core functionality, as tested by VBA code, worked OK but when I tested using xLite.SQL as a UDF (a user defined &#8220;formula&#8221;) things fell apart. For an explanation of what the xLite.SQL function is and why I wasn&#8217;t that surprised when it started to act up, <a href="http://www.gobansaor.com/introducing-xlite/introducingthesqlfunction">see here</a>. As xLite.SQL plays to the rules rather than the spirit of a UDF, I assumed it was pay back time  for my blasé ignoring of functional programming constraints and I set about tracking down the cause.</p>
<p>It turns out the cause is a change in behaviour (a bug) whereby in certain circumstances the cell range returned by Application.Caller is not, as one would expect, the cell hosting the called UDF but that cell more usually associated with Application.ActiveCell (i.e. most likely the cell where the cursor currently resides).</p>
<p>Why is this a problem and what is Application.Caller usually used for? The most common use I&#8217;ve made of Application.Caller is to determine whether a VBA function had been called from a cell as a UDF or from a menu, button or some VBA code. This is important because when called in UDF mode, a function must be side-effect free, i.e. its only affect on the workbook is the return value; attempting anything else will silently fail (or in extreme cases, abort Excel). This functionality is not affected by &#8220;the bug&#8221; as the usual method of achieving this is by &#8230;</p>
<p>If IsObject(Application.Caller) Then</p>
<p>&#8230; this will work even if Application.Caller returns .ActiveCell, as both are objects.</p>
<p>If however, various properties of that range need to be interrogated (such as the actual address or the formula text that xLite.SQL requires) then Application.Caller in certain circumstances returning the .ActiceCell range rather than the calling cell&#8217;s range causes problems. I&#8217;ve managed to get around these problems by adding an extra parameter (homeCell) to xLite.SQL which the function will auto populate on first entry (when Application.Caller and .ActiveCell are guaranteed to be the same object). For example, a SQL call entered in cell A2 of Sheet1 as ..</p>
<p>=SQL(&#8220;Select name from sqlite_master&#8221;)</p>
<p>is automatically rewritten as &#8230;</p>
<p>=SQL(&#8220;Select name from sqlite_master&#8221;,,,,,,&#8221;[test.xls]Sheet1!$A$2&#8243;)</p>
<p>Not ideal but it gets around the problem in the short term.  Long term I may do a version for sub-2010 reverting back to original dependence on Application.Caller. As xLite studiously avoids Excel&#8217;s UI features such as menus/ribbons I&#8217;d hope to avoid different versions for pre/post ribbon editions, but needs must.</p>
<p>The xLiteWorkbook example (test_call_workbook_function.xls calling test_workbook_function.xls) consistently generates the bug (there&#8217;s a SQL call against the log database at the end to test for this). Executing the same logic manually on the called workbook (test_workbook_function.xls) generally doesn&#8217;t, but it has done so occasionally!</p>
<p>I&#8217;ve messed around changing the .xls files to .xlsm in case of compatibility problems but it doesn&#8217;t appear to affect the outcome.</p>
<p>If anybody else has come across this problem or has an alternative to using Application.Caller to return a UDF&#8217;s calling cell&#8217;s range, do let me know.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/851/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/851/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/851/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/851/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/851/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/851/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/851/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/851/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/851/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/851/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=851&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/02/11/excel-2010-application-caller-bug/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>xLite Beta Updated &#8211; adds Python as an Excel Scripting Language</title>
		<link>http://blog.gobansaor.com/2010/02/07/xlite-beta-updated-adds-python-as-an-excel-scripting-language/</link>
		<comments>http://blog.gobansaor.com/2010/02/07/xlite-beta-updated-adds-python-as-an-excel-scripting-language/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 17:02:37 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[xLite]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=846</guid>
		<description><![CDATA[I&#8217;ve updated the xLite Beta with bug fixes and added a new page introducing xLite&#8217;s Excel/VBA and Python extensions to SQLite.
See http://www.gobansaor.com/xlite
The u() function allows any VBA UDF (user defined functions) to be called from SQLite.
The x() function allows an inbuilt function or indeed most any formula (but not a UDF, use u() instead) to be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=846&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve updated the xLite Beta with bug fixes and added a new page introducing xLite&#8217;s Excel/VBA and Python extensions to SQLite.</p>
<p>See <a href="http://www.gobansaor.com/xlite">http://www.gobansaor.com/xlite</a></p>
<p>The u() function allows any VBA UDF (<a href="http://www.ozgrid.com/VBA/Functions.htm">user defined functions</a>) to be called from SQLite.</p>
<p>The x() function allows an inbuilt function or indeed most any formula (but not a UDF, use u() instead) to be called from SQLite.</p>
<p>The f() function allows for standard worksheet cascading formulas to be referenced by SQL, in effect, <em>worksheet user defined functions</em>.  Really useful in building &amp; testing workbook code/models.</p>
<p>Finally, xLitePyScript is a UDF that allows Python to be used as an Excel scripting language.  Can either be inserted into a SQL statement wrapped by the u() function or called like a regular function from VBA or as a cell formula.</p>
<p><a href="http://www.gobansaor.com/xlite">Have fun &#8230;</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/846/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=846&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2010/02/07/xlite-beta-updated-adds-python-as-an-excel-scripting-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>TAG Cubes &#8211; SQLite Star Query Part III</title>
		<link>http://blog.gobansaor.com/2009/09/29/tag-cubes-sqlite-star-query-part-iii/</link>
		<comments>http://blog.gobansaor.com/2009/09/29/tag-cubes-sqlite-star-query-part-iii/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 13:36:16 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Palo]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[olap]]></category>
		<category><![CDATA[xLite]]></category>
		<category><![CDATA[hypercube]]></category>
		<category><![CDATA[Mondrian]]></category>
		<category><![CDATA[TAG Cube]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=738</guid>
		<description><![CDATA[It&#8217;s no secret that I&#8217;m a huge fan of SQLite and Excel, particularly when used in combination. I also greatly admire the open source BI engines, Palo and Mondrian. Mondrian appeals because of its &#8220;ROLAP with a cache&#8221; architecture and its implementation of MS&#8217;s excellent MDX language. When I say MDX is excellent I&#8217;m talking with my [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=738&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s no secret that <a href="http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/">I&#8217;m a huge fan of SQLite</a> and Excel, particularly when used in combination. I also greatly admire the open source BI engines, <a href="www.jedox.com">Palo</a> and <a href="http://sourceforge.net/projects/mondrian/">Mondrian</a>. Mondrian appeals because of its &#8220;ROLAP with a cache&#8221; architecture and its implementation of MS&#8217;s excellent<a href="http://en.wikipedia.org/wiki/Multidimensional_Expressions"> </a><a href="http://en.wikipedia.org/wiki/Multidimensional_Expressions">MDX</a> language. When I say MDX is excellent I&#8217;m talking with my professional programmer&#8217;s hat on, as an end-user tool it&#8217;s a non-runner. This is where Palo comes in, building on the <a href="http://en.wikipedia.org/wiki/Hypercube">hypercube</a> concepts pioneered by the likes of  <a href="http://en.wikipedia.org/wiki/Applix">TM1</a> and <a href="http://en.wikipedia.org/wiki/Essbase">ESSbase</a>, it presents a designview that&#8217;s approachable by a vastly greater percentage of &#8220;civilians&#8221; than is the case with<a href="http://en.wikipedia.org/wiki/ROLAP"> </a><a href="http://en.wikipedia.org/wiki/ROLAP">ROLAP-based</a> solutions.</p>
<p>The trick behind TM1, Essbase, Palo etc. is the extension of the spreadsheet metaphor from two to multiple dimensions, while still binding the interface closely to the familiar spreadsheet (which for most of the business world is still Excel).</p>
<p>So where does <a href="http://www.sqlite.org/">SQLite</a> come in all this?</p>
<p><a href="http://blog.gobansaor.com/2007/08/30/sqlite-star-query/">At first glance</a>, SQLite lacks the s<a href="http://www.sqlite.org/optoverview.html">ophisticated join functionality</a> to support star-queries, but of course, if the dataset is small then a full-table scan of a fact table, or better still, loading the fact table into memory negates any such short-comings.</p>
<p>In fact, all traditional ROLAP engines have problems with dimensional models, particularly when you reach the point of using summary tables or query re-writes, that&#8217;s why the emerging SQL-speaking <a href="http://www.information-management.com/blogs/columnar_databases_column_major_data_warehouse_startup_analytics-10015473-1.html">columnar-databases</a> are such a godsend for ROLAP data warehouses.</p>
<p>It was SQLite combined with Excel acting as a data prep platform that was originally my main interest, so for pivoting, Excel&#8217;s own pivot table would have to do.  Nevertheless, I felt the tool was incomplete without the ability to directly pivot the underlying SQLite database.</p>
<p>Why not use Palo or Mondrian as a pivot tool? Well yes, where a fixed permanent &#8220;solution&#8221; is required then the extra moving parts of either approach would be justified and indeed necessary but that is to miss the essence of what I call datasmithing.</p>
<p>Datasmithing is not data warehousing nor is it the provision of solutions (which, for example, Palo superbly enables in multi-user budgeting situations). Datasmithing, as a skill, is of course part of the process of both, but it&#8217;s on the edges, at perhaps the planning or consumption stages.</p>
<p>Datasmiths deal in the unknown, in change, in disaster recovery, in systems&#8217; commissioning, in the never-ending barely-repeatable processes thrown up by daily business life.  For that, the toolset required must be as simple as possible (but no simpler), self-contained, document-oriented, secureable (is that a word?) and easily archived and retrieved. Excel and file-based DBMSs such as MSAccess or SQLite fit the bill nicely, server-based technologies such as DBA controlled database servers or IT installed &#8220;solutions&#8221;, less so.</p>
<p>Jedox has made Palo relatively easy to install (and likewise, Canada&#8217;s SQLPower has made Mondrian setup a painless exercise via their excellent <a href="http://www.sqlpower.ca/page/wabit">Wabit reporting too</a>l), but, the zero-install, email friendly document approach that spreadsheets are famous (and infamous) for, is preferable in many situations. This is something that Microsoft have recognised in their <a href="http://blog.gobansaor.com/2009/04/01/project-gemini-xxl-excel-on-steroids/">Gemini add-in for Excel 2010</a>, but Excel 2010 is a not here yet and it&#8217;s likely to be five years or more before it&#8217;s as common as Excel 2003 is today.</p>
<p>The inclusion of <a href="http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex">FTS full-text searching</a> with SQLite triggered an ah-ah moment with regards to pivot-enabling SQLite.</p>
<p>The usual method that hypercube-like excel-friendly OLAP tools use to return data is via a UDF like so&#8230;</p>
<p style="text-align:center;"><em>=DATA(&#8220;CubeName&#8221;,&#8221;value1&#8243;,value2&#8243;,&#8230;)</em></p>
<p>&#8230;where valueN represents dimensional elements, so&#8230;</p>
<p style="text-align:center;"><em>=DATA(&#8220;SalesCube&#8221;,&#8221;Beer&#8221;,&#8221;Profit&#8221;,&#8221;Jan 09&#8243;,&#8221;Actual&#8221;)</em></p>
<p>&#8230;is the Actual Profit for Beer sales in Jan 09. The dimensional elements act as &#8220;tags&#8221; to locate a particular value, there is of course much more to tools like Palo; hierarchies, intra-cube rules etc. but in essence most OLAP tools are like <a href="http://delicious.com/">www.delicious.com</a> for number crunchers. This method of retrieving data fits well with how people use Excel and not just for pivots, but for embedding OLAP aggregated cells in <em>lists</em>.  For example, a CRM scenario; a Sales Rep makes a list of her &#8216;best&#8217; (subjective) customers, but needs hard (objective) stats, to be placed alongside the list to convince the boss or to track actuals against expectation.</p>
<p>Dimensional elements as tags; FTS3 virtual tables as fact table indexes; the concept of a TAG Cube was born.</p>
<p>In the above example &#8220;Profit&#8221; would most likely be described as a <em>measure</em> (Palo, a near pure hypercube does not distinguish between Measure and other Dimensional coordinates). Dimensions, measures and attributes are in reality interchangeable (a Customer ID can act as a dimension or an attribute, but by applying a  Count Distinct to it, it&#8217;s a measure) but most OLAP solutions treat &#8220;Measure Dimensions&#8221; as different, and so do TAG Cubes.</p>
<p>By using the default fact table structure (a single-columned table) and querying using the default measure (which translates to the SUM() of that single value) a &#8216;pure&#8217; approach can be used. But, ROLAP is tightly bound to the concept of a fact table, and since SQLite is relational, TAG Cubes offer the ability to use a wide fact table approach and I think gains considerably in flexibility by so going.</p>
<p>The above example of using Count Distinct, or the simple creation of calculated measures are examples of this flexibility. Another, is a measure based on SQLite&#8217;s concat_group aggregate function to provide a drill-down facility, e.g.</p>
<p style="text-align:center;"><em>=DATA(&#8220;SalesCUBE&#8221;,&#8221;ROWIDList,&#8221;Beer&#8221;,&#8221;Jan 09&#8243;,&#8221;Actual&#8221;)</em></p>
<p>&#8230;where &#8220;ROWIDList&#8221; would be setup as <em>concat_group(rowid,&#8217;,') </em>and will return a comma separated list of the underlying fact table ROWIDs.</p>
<p>A major reason for rolling my own pivot engine was to add a concept of <a href="http://en.wikipedia.org/wiki/Namespace_(computer_science)">&#8220;namespaces&#8221; </a>and to separate the implementation of these namespaces from the actual pivot.  When a tag (or a predefined hierarchy of tags) is assigned to a cube, it&#8217;s also assigned to a namespace, in many cases namespace and cube would be synonymous, but in some cases a more sophisticated approach is required:</p>
<ul>
<li>Multiple cubes sharing the same set of conformed dimensions would be best served by such cubes sharing a common namespace, and so they can.</li>
<li>Different consumers of the pivot may require the use of a different language, be that a spoken language or a different &#8216;business language&#8217; e.g. Manufacturing Product Codes V Consumer Product Names. Again, easily done.</li>
<li>Sometimes identifying data can&#8217;t be shared with the datasmith or the numerical analyst working on a problem; in such cases being able to replace  the actual namespace with an obfuscated one can be very useful. Or, for added security, the namespace might only be issued to approved  PCs while the tag index and fact table are stored on a shared drive.  Needs some more work to make managing such scenarios secure and easy to use but the structure is there.</li>
</ul>
<p>As hinted on above, the three elements of a Tag Cube, the namespace, tag index and fact table can be assigned to different databases (i.e. files). Due to the wonders of SQLite&#8217;s <a href="http://www.sqlite.org/lang_attach.html">ATTACH statement</a> and the <a href="http://www.sqlite.org/backup.html">backup API&#8217;s</a> ability to quickly load/unload databases in/out of memory, it&#8217;s possible, for example, to load namespace and tag index (i.e. the &#8216;dimensions&#8217;) into a memory database, while a very large (i.e. too big to fit to memory) fact table remains on disk. Fast and cheap <a href="http://en.wikipedia.org/wiki/Solid-state_drive">SSDs</a> will add further configuration options.</p>
<p>Although most of the TAG Cube functionality is available only within Excel, I&#8217;ve built a C based SQLite Virtual Table (cFact) to allow the tag index to used outside xLite. This means that SQLite drivers for ODBC (for use as a Pivot Table source, for example) or JDBC (for use in  SQLPower Wabit perhaps) can efficiently access data models built using xLite.</p>
<p><em>I had to revert to using C rather than my preferred Python (did I mention that xLite now embeds Python in Excel, no, well it does, <a href="http://blog.gobansaor.com/2008/04/11/python-the-new-vba/">Python the newVBA ?</a>), having failed to get around multi-threading issues with callbacks to Python in both the </em><a href="http://www.ch-werner.de/sqliteodbc/"><em>ODBC</em></a><em> and </em><a href="http://www.zentus.com/sqlitejdbc/"><em>JDBC</em></a><em> drivers. I&#8217;d make a career promise to myself many years ago, not to having anything to do with printers or threads, and I think I&#8217;ll stick with it <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p>TAG Cubes are the latest addition (still WIP actually) to be added to xLite, adding to:</p>
<ul>
<li>VBA coded SQLite SQL functions.</li>
<li>Worksheet Functions; call out to a &#8216;function&#8217; built using Excel formula, passing a parameter list and returning a value.</li>
<li>Workbook Functions; like Worksheet Functions, but loading a new Workbook, passing in parameters, passing back a value (or tables) and closing the Workbook when finished.</li>
<li>XLiteScript; xLite exposes its functionality via VBA coded UDFs, which can be called like any other formula, but data prep activities often require sequential procedural logic, xLiteScript is a <a href="http://blog.gobansaor.com/2007/03/03/tables-vs-xml-the-data-lingua-franca-debate/">table-oriented</a> scripting mechanism offering basic flow-control logic.</li>
<li>pyScript; I embedded Python into xLite to take advantage of Python&#8217;s speed in developing Virtual Tables, SQL Functions and extensions to SQLite and to tap in the wonderful world of Python code. I&#8217;ve also added the ability to use Python from scripts defined within Excel (to indent, tab to the next cell!).</li>
<li>Fast load/unload to/from CSV.</li>
<li>Load from any ADO source.</li>
<li>Remove xLite formulae and rename and save Workbook, very handy when used via Workbook Functions to mass produce Excel &#8220;reports&#8221;.</li>
<li>Other WIP items are; load from SAP, load/unload to/from Amazon S3, use Palo cubes as TAG Cube &#8220;facts&#8221;, slot in/out Palo for TAG Cubes, auto-generate Mondrian XML based on TAG Cubes, write-back and splash, Python &amp; VBA TAG Cube &#8220;rules&#8221;.</li>
</ul>
<p><a href="http://www.gobansaor.com/xlite">I&#8217;ve started the process of releasing the beta code here &#8230;</a></p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a style="text-decoration:none;color:#265e15;border-bottom-color:#996633;border-bottom-width:1px;border-bottom-style:dashed;margin:0;padding:0;" href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/738/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/738/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/738/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/738/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/738/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/738/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/738/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/738/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/738/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/738/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=738&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2009/09/29/tag-cubes-sqlite-star-query-part-iii/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>LiteBI, Heavy ETL</title>
		<link>http://blog.gobansaor.com/2009/04/24/litebi-heavy-etl/</link>
		<comments>http://blog.gobansaor.com/2009/04/24/litebi-heavy-etl/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 12:27:57 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[kettle]]></category>
		<category><![CDATA[olap]]></category>
		<category><![CDATA[LiteBI]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=686</guid>
		<description><![CDATA[Although my major BI interest is in micro-BI (or is that  workgroup-BI?)  i.e. data, perhaps cleansed and packaged elsewhere, available locally on a datasmith&#8217;s PC,with most likely an in-memory OLAP as the analysis tool; the possibilities of the &#8220;cloud&#8221; as a BI platform have not escaped me.
From a micro-BI perspective, the ability to act as a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=686&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Although my major BI interest is in micro-BI (or is that  workgroup-BI?)  i.e. data, perhaps cleansed and packaged elsewhere, available locally on a datasmith&#8217;s PC,with most likely an in-memory OLAP as the analysis tool; the possibilities of the &#8220;cloud&#8221; as a BI platform have not escaped me.</p>
<p>From a micro-BI perspective, the ability to act as a backup/mirroring tool or as ETL/marshaling tool (anybody for Hadoop and SQLite?) attracts. I&#8217;ve yet to make up my mind on BI delivered as a cloud<a href="http://en.wikipedia.org/wiki/Platform_as_a_service"> PaaS</a> but obviously <a href="http://jeromepineau.blogspot.com/2009/04/on-demand-bi-beyond-smb.html">many others believe it has a future</a>.</p>
<p>My main worry with PaaS is not lock-in (which exists equally for in-house proprietary solutions) but the dangers of a <a href="http://www.accmanpro.com/tag/coghead/">Coghead-like lock-out</a>.  My other doubts are more technical; believing, as I do, that in-memory offers significant advantages over traditional ROLAP (simplicity been the main one) and multi-tenant in-memory architectures are not yet a runner.  But last week I had a demo of new Spanish BI PaaS service,<strong> </strong><a href="http://www.litebi.com/"><strong>LiteBI</strong></a>, which might just change my mind.</p>
<p>Javier Giménez Aznar and his team previously worked on delivering Pentaho based datawarehouses to large Spanish corporations and government agencies, so they have a deep understanding of <a href="http://mondrian.pentaho.org/">Mondrian ROLAP</a> and are using that knowledge to build the LiteBI service, but this time with SMBs as the target customers rather than corporates. Pricing starts at €145 per month and is based on number of concurrent users, number of analytical spaces and the data volumes, so it&#8217;s not for very small firms more for the Medium in SMB.</p>
<p>Impressions? The cube designer, dashboard builders and the general UI are all very good and I would think would appeal to end-user datasmiths and, as such, will be a major up-front aid to selling this product.  But it was LiteBIs approach to the thorny issue of ETL and data loading that impressed me and also helped ease some of my Coghead-induced-fears.</p>
<p>BI technology stacks consist of three elements:</p>
<ul>
<li>The &#8220;fancy&#8221; front-end; graphs,animated dashboads and so on.</li>
<li>The pivot engine; ROLAP or MOLAP or both.</li>
<li>The ETL process.</li>
<li>(Many would say there&#8217;s an important 4th, the data-warehouse, but not every BI effort requires one, but that&#8217;s another issue)</li>
</ul>
<p>LiteBI is continuing to build yet more functionality into their UI and this &#8220;fancy&#8221; front-end is essential as it&#8217;s their &#8220;shop window&#8221;.</p>
<p>Mondrian provides their pivot engine, and again they continue to work on optimisations such as column-based datastores to increase speed and automate responsiveness tuning (end-users are very unforgiving of slow pivots).</p>
<p>But it&#8217;s in the 3rd area, that of the ETL process, that you realise the LiteBI team has real-world BI experience.  Data is loaded into LiteBI via an API, but with the ETL process itself happening on the customer side.</p>
<p>&#8220;Well,so what?&#8221; you may ask. The extraction of data has to obviously happen customer-side (even though not in the case of data being sourced from the likes of SalesForce.com). Yes, but it&#8217;s the transformations and data cleansing that adds true value to the ETL process and subsequently determines the quality and usefulness (as opposed to the speed or the &#8220;prettiness&#8221; of delivery) of the solution.</p>
<p>Part of the process of adopting LiteBI, is an ETL consultancy stage where a LiteBI partner company will provide on-site services to build this ETL layer, handling not just transformations but initial load and automating the subsequent delta uploads.</p>
<p>So the cost mounts up, but in reality you can&#8217;t do BI without this investment; there&#8217;s no ETL magic bullet.  Even still, Javier says the typical go-live time for a LiteBI project would be in the order of 3-4 weeks rather than the 3-4 months of similar on-site Pentaho projects.</p>
<p>The end-user &#8216;owning&#8217; the ETL process makes the prospect of a service lock-out slightly less worrying as, at least, one would still have a good starting point for moving to another provider or back in-house. What I would really like to see would be the option to self-host LiteBI, which I guess would involve open sourcing large parts of the service (the automated optimisation strategies could, for example, be excluded from this open source version).</p>
<p>The load API comes packaged as a plugin to <a href="http://kettle.pentaho.org/">Kettle</a> (aka PDI) and the intention is to offer a similar add-on for <a href="http://www.talend.org">Talend</a> in the near future. LiteBI also offers a white-label offering whereby 3rd party OLTP solution providers can use the service as their product&#8217;s BI suite.</p>
<p>Like the <a href="http://www.everything2.org/title/Skibbereen%2520Eagle">Skibbereen Eagle keeping its eye on the Czar of Russia</a>, I too will be keeping a watchful eye on <a href="http://www.litebi.com/">LiteBI</a> and the march of on-demand BI in general.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/686/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=686&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2009/04/24/litebi-heavy-etl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>Project Gemini &#8211; XXL, Excel on Steroids</title>
		<link>http://blog.gobansaor.com/2009/04/01/project-gemini-xxl-excel-on-steroids/</link>
		<comments>http://blog.gobansaor.com/2009/04/01/project-gemini-xxl-excel-on-steroids/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 11:47:52 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[Excel 2010]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[Project Gemini]]></category>
		<category><![CDATA[Workgroup BI]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=662</guid>
		<description><![CDATA[In my last post about why I use SQLite in combination with Excel for datasmithing tasks, I listed the more traditional backends (Excel itself, MS Access, RDBMs &#38; MOLAP cubes) that one would expect to &#8220;compete&#8221; with such an idea.   But I suspect that if that same post appeared  two years or so into [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=662&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/">last post</a> about why I use SQLite in combination with Excel for datasmithing tasks, I listed the more traditional backends (Excel itself, MS Access, RDBMs &amp; MOLAP cubes) that one would expect to &#8220;compete&#8221; with such an idea.   But I suspect that if that same post appeared  two years or so into the future, there would be a fifth contender, Project Gemini cubes.</p>
<p>Project Gemini (now called PowerPivot) is due to be delivered as a free add-in to the next version of Excel (2010) ,like the Analysis ToolPak or the Data Mining add-ins for Excel 2003.  (See this OLAP Report <a href="http://www.olapreport.com/Comment_Gemini.htm">Project Gemini, Microsoft&#8217;s Brillaint Trojan Horse</a> for a good overview of the tool).</p>
<p><a href="http://twitter.com/donalddotfarmer">Donald Farmer</a> ,who works on the project, having seen the <a href="http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/">SQLite as the MP3 of data</a> post and recognising that the <a href="http://en.wikipedia.org/wiki/Use_case">use cases</a> behind combining SQLite with Excel were similar to those of Project Gemini, kindly offered me a demo of the product.  Well, the phrase &#8220;Excel on steroids&#8221; has been much used in the past (in particular of add-ins such as Essbase, Palo or TM1) but this &#8220;ya gotta see&#8221;, Donald likes to call it XXL.</p>
<p>Millions of rows of data in-memory on a 4GB PC being &#8220;modeled&#8221; using a &#8220;user-friendly&#8221; pivot-table-like interface. And when I say, modeled, the user isn&#8217;t being confronted with concepts such as dimensions, levels, attributes, facts and so on, but a classic star schema model is nevertheless being built behind the scenes.  And it&#8217;s this model that allows Gemini to escape some of the inadequacies of pivot tables, e.g. allowing for rules and hierarchies to be defined.  The resulting model can then be saved and shared as a file (keeping to the document-centric ethos of Excel) but it can also be posted to and managed by SharePoint.</p>
<p>SharePoint will be extended to allow the IT function to manage and audit shared models to whatever degree the organisation requires, but the single file format will also allow smaller groups to share without the need for IT involvement (essential if bottom-up adoption is to be encouraged).  SharePoint will also add the &#8220;Web2.0 collaboration layer&#8221;.</p>
<p>How will MS make money from this if it&#8217;s free?  The first clue is the SharePoint backend, more functionality means more reasons to purchase and use MS&#8217;s server stack and the same applies to Excel itself. I, like many others, are very happy using Excel 2003 and look on Excel 2007 the same way the market in general has looked on Vista; i.e. pretty, but lacking a strong enough reason to upgrade unless forced to do so. (Excel 2007 also has the ribbon issue, not one I find a major problem myself, <a href="http://smurfonspreadsheets.wordpress.com/2009/03/13/the-ribbon-bet/">but others do</a>).  But I would upgrade to a version Excel that offered Project Gemini capabilities and I&#8217;m sure others would follow (and more importantly to MS&#8217;s revenues, thousands of corporate accounts would too).</p>
<p>Project Gemini offers proof that MS realises, what those of us on the ground have know for years, that BI projects are in the main, Excel-centric; all the &#8216;hard sums&#8217;  and awkward decisions end-up back on the desktop.  MS has decided to publicly recognise that fact and profit from it. The timing is both economically and technically opportune; PC speed and cheap memory means that a huge chunk of even a large corporation&#8217;s datasets can be analysed by a PC (<a href="http://www.b-eye-network.com/view/9752">according to this</a>, the median size of original data in OLAP datasets is about 5GB); and there&#8217;s obvious cost-benefits for companies facing difficult times requiring more to be done with fewer resources.</p>
<p>What will the effect be on tools such as Essbase, TM1, Palo etc. ?  Well, let me put it this way, if their owners are making strategic plans for 2010 onwards and they&#8217;re not taking account of the Gemini effect perhaps they should.  Most likely Gemini will help increase the overall market for OLAP tools, with the incumbents tending to specialise in their existing niches (e.g. <a href="http://www.jedox.com/en/Sample-Uses-of-Palo/Budget-and-Corporate-Planning.html">Palo in Budgeting</a>, with the added value of being free and open source, which has a premium over just being &#8216;free&#8217;).</p>
<p>So will I put away my Excel-SQLite fixation then? No, for two reasons:</p>
<ul>
<li>Project Gemini is not here yet, and the proof of the pudding will be in the eating. Also, when it does appear it will only apply to Excel 2010 (or whatever) and as many companies are still on Office 2000 (and a few on 97!), it&#8217;ll be at least  5 years before a significant percentage of sites upgrade.</li>
<li>The SQLite addition to Excel offers not just BI capabilities but also makes a nimble ETL and data integration engine. I&#8217;m also experimenting with Amazon S3 integration to enable simple work-flows for small distributed teams (or even same-office groups where <a href="http://blog.gobansaor.com/2007/12/17/the-wan-is-the-new-lan/">the WAN is the new LAN</a>).</li>
</ul>
<p>Whether you agree or not in the validity of  &#8221;<a href="http://esj.com/Articles/2009/03/25/Workgroup-BI-Poised-for-a-Comeback.aspx?Page=1">workgroup BI</a>&#8220;, be aware that MS does and it thinks that BI is about to enter a new phase,  for proof see MS&#8217;s <a href="http://twitter.com/nicfish">Nic Smith&#8217;s</a> <a href="http://blogs.msdn.com/bi/archive/2009/03/22/history-of-business-intelligence.aspx">The History of Business Intelligence</a> video.</p>
<p>UPDATE: 19th Nov 2009</p>
<p>Last evening I downloaded for the 1st time both Excel 2010 Beta and the <a href="http://twitter.com/gobansaor/status/5840829440">PowerPivot (new name for Gemini) add-in</a>.  First impressions; yep, in the flesh it&#8217;s just as impressive as the above demo led me to believe it would be.  As I said on Twitter last night <a href="http://twitter.com/gobansaor/status/5840829440"><strong><em>Datasmiths of the world; download the Excel 2010 Beta and PowerPivot add-in; this ya gotta see!!!</em></strong></a></p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/662/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=662&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2009/04/01/project-gemini-xxl-excel-on-steroids/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>SQLite as the MP3 of data</title>
		<link>http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/</link>
		<comments>http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/#comments</comments>
		<pubDate>Sat, 14 Mar 2009 19:13:25 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Palo]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[VBA]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[olap]]></category>
		<category><![CDATA[MP3]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=622</guid>
		<description><![CDATA[&#8230; and Excel as its &#8220;mixing desk&#8221;.
When I tell people that I use SQLite in combination with Excel (via xLite) as my datasmithing platform, many ask why SQLite? (Many others ask why Excel?  but &#8220;sin scéal eile&#8221;, that&#8217;s another discussion &#8211; Excel as the iPod of Downloaded Data.) Those that question my use of SQLite [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=622&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>&#8230; and Excel as its &#8220;mixing desk&#8221;.</p>
<p>When I tell people that I use SQLite in combination with Excel (via<a href="http://www.gobansaor.com/xlite"> <strong>xLite</strong></a><strong>)</strong> as my datasmithing platform, many ask why SQLite? (Many others ask why Excel?  but &#8220;sin scéal eile&#8221;, that&#8217;s <a href="http://blog.gobansaor.com/2009/10/25/excel-as-the-ipod-of-downloaded-data/">another discussion &#8211; Excel as the iPod of Downloaded Data</a>.) Those that question my use of SQLite tend to cluster into four camps:</p>
<ul>
<li>Pure Excel jocks.</li>
<li>MS Access fans.</li>
<li>The client server database brigade (SQL Server,Oracle; or if FOSS fans; MySQL, PostrgeSQL).</li>
<li>The MOLAP folks (Essbase, TM1, Palo).</li>
</ul>
<p>Now while I have used and will continue to use/encounter all four &#8216;approaches&#8217;, I&#8217;ve come to believe over the last couple of years that SQLite brings something special to the datasmithing game. When I look back over nearly 30 years in the data handling business I keep thinking &#8211; &#8220;If only I had SQLite then, how much easier/quicker/cheaper that task would have been!&#8221;.</p>
<p>Just as &#8220;fractional horsepower&#8221; electrical motors revolutionised manufacturing and eventually all our lives (car starter-motors, fridge motors, washing machines etc.), &#8220;fractional horsepower&#8221; databases can do the same for data. Distributing data to where it is needed.</p>
<p>As operational local caches, this use of SQLite is already far advanced. SQLite is embedded in lots of <a href="http://www.sqlite.org/famous.html">every day software tools</a>, everything from McAfee anti-virus to <a href="http://www.tweetdeck.com/beta/">TweetDeck</a> Twitter clients (best one IMHO). But my interest is more in SQLite&#8217;s potential as a micro-BI (or maybe more correctly a distributed-BI) platform. A sort of MP3 format for distributed structured data, if you like.</p>
<p>But why SQLite (and in particular SQLite in combination with Excel) as my datasmithing tool rather than the four other approaches?  First, what&#8217;s a datasmith?</p>
<p>Managing and manipulating datasets has become an integral part of many people’s job, not just accountants (the original of the species) but marketing executives, sales staff, pricing analysts, process engineers; different job titles, different roles but using a skill that they’ve likely never been formally trained in, a skill without a name; a skill I call datasmithing. I like to think of  myself as a master datasmith, or a datamith&#8217;s datasmith.</p>
<p>If you consider yourself a datasmith then most likely the tool you use to manage your datasets is Excel. And before you apologise, don’t. Excel is by far the best and most flexible end-user data manipulation tool out there. Everything from the current financial crisis downwards has at some stage being blamed on Excel, but you know and I know that many tasks would remain undone or under-done were it not for end-user generated spreadsheets.</p>
<p>Spreadsheets are not however optimal for some tasks, linked spreadsheets in particular are data disasters in waiting. While fantastic for data transformations and presentations, as books-of-record they’re rarely suitable. Other tools such as SQL based relational databases and in-memory OLAP offer much better and potentially much more cost-effective data modelling functionality, but also at a cost of extra complexity and ongoing technical support.</p>
<p>MS Access (which like SQLite, is a document-centric, non-client-server database; but unlike it, is also a forms/reporting development environment) would appear to be the natural local store database. My problem with MS Access has been its tendency to try to be all things to all men, ending up not fully satisfying anybody. Professional developers think it&#8217;s too limiting, non-techs find it too intimidating, even reporting, where it once showed promise left a big enough opening for Crystal Reports to evolve. It is also limited to Windows which might not seem to be a problem if combining with Excel, but, as it&#8217;s often necessary, due to scale or complexity of the data,  to use &#8216;proper&#8217; ETL tools such as Talend, having an OS agnostic database format than can act as a distribution media (think MP3s again) between &#8220;mixing desks&#8221; can be very useful.</p>
<p>The big difference to MS Access for me is SQLite&#8217;s open source code; code that&#8217;s a pleasure to browse and with an approachable API that even I, with my very rusty C skills, can manipulate. Having access to that code allows me to tightly integrate it with Excel, so much so, that I can use Excel functions (built-in functions, VBA user-defined functions and 3rd party add-in functions) directly from SQLite&#8217;s SQL; and vice-versa, access SQL functionality via Excel &#8220;formula&#8221; calls. It is  also possible to  load most datasets into memory using SQLite&#8217;s in-memory mode enabling very fast processing  and near zero-latency when passing data to and from Excel/VBA. In the near future, cheap, large <a href="http://en.wikipedia.org/wiki/Solid-state_drive">SSDs </a>will enable non-memory databases to offer similar speed but also handle extremely large datasets (<a href="http://i.gizmodo.com/5166798/24-solid-state-drives-open-all-of-microsoft-office-in-5-seconds">see this for a glimpse of that future</a>).</p>
<p>What about the big beasts of the data world, the client-server databases? Having spent most of my professional life working with such tools I&#8217;m aware of the power of a well designed relational database. If SQLite is the MP3, then these are the master tapes, the DDD recordings. Most of the data that eventual ends up in SQLite for analysis and/or transformation will have originated in data-warehouses or be directly sourced  from OLTP systems built using relational technology. But for close-up analysis and transformation, the pure simplicity and convenience of SQLite is hard to beat. That simplicity is primarily due to its Excel-like &#8216;document&#8217; nature, all code and data can be housed in a single folder (or <a href="http://www.truecrypt.org/">true-crypt container</a> for added security), ensuring that the &#8216;problem domain&#8217; can be easily archived and/or shared with others without the need for professional IT resources.</p>
<blockquote><p>And yes, I hear you, isn&#8217;t that the basis of Excel-hell? Yes it is, but over the years I&#8217;ve found that this is rarely a problem for datasmiths, they deal day-in day-out with document work-flows, they understand the risks and the benefits (mainly the simplicity) of the approach. Where the nightmare truly happens is when this approach is used as an alternative to an OLTP system i.e. using Excel and other document-like datastores as books-of-record in large multi-user environments &#8211; &#8220;there be monsters for sure&#8221;.</p></blockquote>
<p>How about MOLAP? Wasn&#8217;t Essbase&#8217;s name derived from &#8220;extended spreadsheet database&#8221; and doesn&#8217;t Palo offer a truly excel-friendly multi-user database back-end? Again, having worked with Essbase for many years and now being a big fan of the open source <a href="http://www.palo.net">Palo</a> MOLAP tool, I fully appreciate the power that such tools brings to analysis and multi-user planning tasks. But for many situations, an Excel Pivot Table is &#8220;good enough&#8221; and even when it&#8217;s not, it is possible by utilising what I call a tOLAP cube (essentially, a fact table indexed via tags enabled by Google&#8217;s great addition to SQLite, the <a href="http://dotnetperls.com/Content/SQLite-FTS3.aspx">FTS3</a> virtual table) to build and access  powerful, yet simple, cube-like data structures.</p>
<p>By integrating SQLite with Excel, datasmiths can have the best of both worlds, familiar <a href="http://smurfonspreadsheets.wordpress.com/2007/02/20/accel-or-excess/">spreadsheet front-end combined with a fast and powerful SQL engine and datastore</a>, in fact, everything that MS Access should have been.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/622/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/622/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/622/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/622/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/622/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/622/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/622/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/622/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/622/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/622/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=622&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2009/03/14/sqlite-as-the-mp3-of-data/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		<georss:point>53.204039 -6.574340</georss:point>
		<geo:lat>53.204039</geo:lat>
		<geo:long>-6.574340</geo:long>
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>Talend ETL Excel report generator</title>
		<link>http://blog.gobansaor.com/2009/02/13/talend-etl-excel-report-generator/</link>
		<comments>http://blog.gobansaor.com/2009/02/13/talend-etl-excel-report-generator/#comments</comments>
		<pubDate>Fri, 13 Feb 2009 14:43:34 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[excel]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=608</guid>
		<description><![CDATA[Hugo, who you may remember from his OLAP Cube as a Mind Map project, has struck again.  This time something really useful, a component  for the Talend ETL platform that generates Excel reports using templates and a JSP style TAG language to control the output.
I&#8217;ve in the past used the excellent Xlsgen to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=608&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://twitter.com/hugoetl">Hugo,</a> who you may remember from his <a href="http://blog.gobansaor.com/2008/07/30/olap-cube-as-a-mind-map/">OLAP Cube as a Mind Map project,</a> has struck again.  This time something really useful, <a href="http://hugoworld.wordpress.com/2009/02/08/taking-the-pain-out-of-excel-reporting/">a component  for the Talend ETL platform that generates Excel reports</a> using templates and a JSP style TAG language to control the output.</p>
<p>I&#8217;ve in the past used the excellent <a href="http://xlsgen.arstdesign.com/">Xlsgen</a> to automate the production of Excel reports, but Hugo&#8217;s component has the benefit of being free (xlsgen now costs €390!) and open source and it also taps into the vast world of existing <a href="http://www.talend.com">Talend ETL</a> components.</p>
<p>Well done Hugo.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/608/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=608&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2009/02/13/talend-etl-excel-report-generator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; does exactly what it says on the tin</title>
		<link>http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/</link>
		<comments>http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 20:51:30 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[AmazonAWS]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[SimpleDB]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.gobansaor.com/?p=590</guid>
		<description><![CDATA[SQL how unloved it must feel sometimes, constantly being maligned, accused of being on the wrong side of the object-relational impedance mismatch,  lacking the glamour of OO programming languages that claim the moral high ground. Yet at the same time hewing and hauling most of the world&#8217;s structured data on its old but well fashioned [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=590&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><strong>SQL</strong> how <a href="http://codehappy.wordpress.com/2007/07/30/databases-need-a-new-language/?referer=sphere_related_content/">unloved it must feel sometimes</a>, constantly being maligned, accused of being on the wrong side of the <a href="http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch">object-relational impedance mismatch</a>,  lacking the glamour of OO programming languages that claim the moral high ground. Yet at the same time hewing and hauling most of the world&#8217;s structured data on its old but well fashioned back.</p>
<p><strong>SQL</strong> is perhaps the <a href="http://en.wikipedia.org/wiki/Domain-specific_programming_language">world&#8217;s most popular DSL</a>, a<a href="http://en.wikipedia.org/wiki/Declarative_language"> declarative language</a> for the manipulation of tabular data, easy to learn yet capable of powerful (and sometimes complex) expressions.  And like <a href="http://ronanfitzgerald.net/everythingelse/?p=8">the Ronseal ad</a>, a SQL statement no matter how simple or complex, does exactly what it says, all the complexity of loops and iterations and the attendant errors, abstracted away, it just works!</p>
<p><strong>SQL</strong> is both a programmer and an end-user tool; after Excel formulas, it&#8217;s the language most likely to be understood and used by &#8220;civilians&#8221;.  There are few enough such cross-over tools, so think twice before building a datastore that doesn&#8217;t offer a SQL API.  And I guess that&#8217;s what Amazon did. Although SimpleDB is not a relational database, they&#8217;ve <a href="http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/">decided to add a SQL API</a>, following Google&#8217;s lead with its <a href="http://code.google.com/appengine/docs/datastore/gqlqueryclass.html">SQL front-end</a> to the non relational big-table backed Google App datastore.</p>
<p><strong>SQL</strong> is also the reason why I&#8217;ve integrated SQLite with Excel , leveraging SQL to manipulate tabular data with greater efficiency and fewer errors while still keeping the touchy-feely power of Excel.   I expose SQLite to Excel via <a href="http://www.ozgrid.com/VBA/Functions.htm">UDFs</a> rather than menu options or wizards, so that the transformation logic is visible and approachable (at least to those comfortable with excel formula &#8220;programming&#8221; and with basic SQL).</p>
<p><strong>SQL</strong> is my weapon of choice because of my belief in the primacy of data. It is data that matters in the long run, not the algorithms or GUIs that temporarily use (and abuse) it.  In my time in Guinness Ireland I had the task of transferring master and historical transactional data from &#8220;legacy systems&#8221; into SAP ,Siebel and a new datawarehouse; data that had a decade and a half earlier been transferred by me  into those same legacy systems from even older systems. In fact, the data&#8217;s electronic lineage could be traced back to a 1960&#8217;s era ICL mainframe  (I have the original spec!) and I&#8217;m sure it existed in <a href="http://encyclopedia2.thefreedictionary.com/accounting+machine">accountancy machine</a> punch-cards  prior to that. Understand a business&#8217;s data and you&#8217;ll not just understand the business as it currently operates but also how it operated in the past and its future potential.</p>
<p><strong>SQL</strong> abú.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/590/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=590&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2008/12/18/sql-does-exactly-what-it-says-on-the-tin/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
		<item>
		<title>Pentaho Data Integration (Kettle) V Talend Benchmark</title>
		<link>http://blog.gobansaor.com/2008/12/04/pentaho-data-integration-kettle-v-talend-benchmark/</link>
		<comments>http://blog.gobansaor.com/2008/12/04/pentaho-data-integration-kettle-v-talend-benchmark/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 17:56:22 +0000</pubDate>
		<dc:creator>Tom Gleeson</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[kettle]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[Matt Casters]]></category>
		<category><![CDATA[PDI.TOS]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=587</guid>
		<description><![CDATA[Pentaho&#8217;s Matt Caster has just published a benchmarking exercise comparing Kettle and Talend.  In it he admits he&#8217;s not a Talend expert and he advises that people should perform their own benchmarks where possible as requirements differ.  Nevertheless, unlike most other benchmarks we&#8217;ve seen on the subject he publishes not just the results but the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=587&subd=gobansaor&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ibridge.be/?p=150">Pentaho&#8217;s Matt Caster has just published a benchmarking exercise comparing Kettle and Talend</a>.  In it he admits he&#8217;s not a Talend expert and he advises that people should perform their own benchmarks where possible as requirements differ.  Nevertheless, unlike most <a href="http://blog.gobansaor.com/2008/10/30/open-source-metrics/">other benchmarks we&#8217;ve seen on the subject </a>he publishes not just the results but the actual transformation &#8220;code&#8221; used in the tests. </p>
<p><a href="http://www.nicholasgoodman.com/bt/blog/2008/11/26/an-arms-race-my-customers-dont-care-about/"><span style="color:#000000;text-decoration:none;">For </span></a><a href="http://www.nicholasgoodman.com/bt/blog/2008/11/26/an-arms-race-my-customers-dont-care-about/">many people these benchmarks are of no real interest</a> as long as the product does what is required within the time and resources available they&#8217;re content.  But it would be a mistake to think that benchmarks don&#8217;t matter, they do; people have and will make that final decision based on them.  Remember ETL is not life and death, the decision which tool (if any) to go with may not get the level of investigation that the developers behind such products expect of their potential clientele and this is particularly true of open source.  Busy people will use such reports to direct them down a path or to confirm their existing prejudices. So I&#8217;m really glad to see Matt responding and in particular, responding in the manner he has.</p>
<p>Databases vendors have for years played the benchmarking game, setting and breaking records either via real technological advances or simply gaming the process.  We as purchasers and users knew in many cases to take the results with a large dose of salt, but purchasing decisions where nevertheless made on the backs of these surveys.</p>
<p style="text-align:right;"><em>Why not join me on Twitter at </em><a href="http://www.twitter.com/gobansaor"><em>gobansaor</em></a><em>?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gobansaor.wordpress.com/587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gobansaor.wordpress.com/587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gobansaor.wordpress.com/587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gobansaor.wordpress.com/587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gobansaor.wordpress.com/587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gobansaor.wordpress.com/587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gobansaor.wordpress.com/587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gobansaor.wordpress.com/587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gobansaor.wordpress.com/587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gobansaor.wordpress.com/587/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.gobansaor.com&blog=110633&post=587&subd=gobansaor&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.gobansaor.com/2008/12/04/pentaho-data-integration-kettle-v-talend-benchmark/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b714f82b5e24beb3b74779615b6ad969?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">gobansaor</media:title>
		</media:content>
	</item>
	</channel>
</rss>