I’ve been a long time fan of CouchDB, one of the many NoSQL databases to appear in the last few years. CouchDB is a document-oriented database, which with solid B-tree indexing and easy replication, topped off by a MapReduce style view mechanism, puts it up there as a best-of-breed noSQL datastore.
Now it may seem strange that [...]
Archive for the ‘ETL’ Category
Excel as a document-oriented NoSQL database
Posted in ETL, Python, SQLite, VBA, data, excel, xLite, tagged CouchDb, document oriented, NoSQL on March 2, 2010 | Leave a Comment »
Excel 2010 Application.Caller Bug
Posted in ETL, SQLite, VBA, excel, xLite, tagged Application.Caller, bug, Excel 2010 on February 11, 2010 | Leave a Comment »
I’ve just released another xLite “introduction”, this time the xLiteWorkbookFunction function. I’ve had most of the now released functionality working (and in use) for quite a while but had delayed publishing until I’d installed Excel 2010 as I’d wished to test against a modern Excel version.
I’d not bothered with Excel 2007, as I couldn’t see the [...]
xLite Beta Updated – adds Python as an Excel Scripting Language
Posted in BI, ETL, Python, excel, xLite on February 7, 2010 | Leave a Comment »
I’ve updated the xLite Beta with bug fixes and added a new page introducing xLite’s Excel/VBA and Python extensions to SQLite.
See http://www.gobansaor.com/xlite
The u() function allows any VBA UDF (user defined functions) to be called from SQLite.
The x() function allows an inbuilt function or indeed most any formula (but not a UDF, use u() instead) to be [...]
TAG Cubes – SQLite Star Query Part III
Posted in ETL, Palo, SQLite, VBA, excel, olap, xLite, tagged hypercube, Mondrian, TAG Cube on September 29, 2009 | 9 Comments »
It’s no secret that I’m a huge fan of SQLite and Excel, particularly when used in combination. I also greatly admire the open source BI engines, Palo and Mondrian. Mondrian appeals because of its “ROLAP with a cache” architecture and its implementation of MS’s excellent MDX language. When I say MDX is excellent I’m talking with my [...]
LiteBI, Heavy ETL
Posted in BI, ETL, Talend, cloud, data, kettle, olap, tagged LiteBI on April 24, 2009 | 2 Comments »
Although my major BI interest is in micro-BI (or is that workgroup-BI?) i.e. data, perhaps cleansed and packaged elsewhere, available locally on a datasmith’s PC,with most likely an in-memory OLAP as the analysis tool; the possibilities of the “cloud” as a BI platform have not escaped me.
From a micro-BI perspective, the ability to act as a [...]
Project Gemini – XXL, Excel on Steroids
Posted in BI, ETL, SQLite, excel, tagged Excel 2010, PowerPivot, Project Gemini, Workgroup BI on April 1, 2009 | 9 Comments »
In my last post about why I use SQLite in combination with Excel for datasmithing tasks, I listed the more traditional backends (Excel itself, MS Access, RDBMs & MOLAP cubes) that one would expect to “compete” with such an idea. But I suspect that if that same post appeared two years or so into [...]
SQLite as the MP3 of data
Posted in BI, ETL, Palo, SQLite, VBA, excel, olap, tagged MP3 on March 14, 2009 | 18 Comments »
… and Excel as its “mixing desk”.
When I tell people that I use SQLite in combination with Excel (via xLite) as my datasmithing platform, many ask why SQLite? (Many others ask why Excel? but “sin scéal eile”, that’s another discussion – Excel as the iPod of Downloaded Data.) Those that question my use of SQLite [...]
Talend ETL Excel report generator
Posted in ETL, Talend, excel on February 13, 2009 | Leave a Comment »
Hugo, who you may remember from his OLAP Cube as a Mind Map project, has struck again. This time something really useful, a component for the Talend ETL platform that generates Excel reports using templates and a JSP style TAG language to control the output.
I’ve in the past used the excellent Xlsgen to [...]
SQL – does exactly what it says on the tin
Posted in AmazonAWS, ETL, SQLite, data, excel, tagged DSL, SimpleDB, SQL on December 18, 2008 | 10 Comments »
SQL how unloved it must feel sometimes, constantly being maligned, accused of being on the wrong side of the object-relational impedance mismatch, lacking the glamour of OO programming languages that claim the moral high ground. Yet at the same time hewing and hauling most of the world’s structured data on its old but well fashioned [...]
Pentaho Data Integration (Kettle) V Talend Benchmark
Posted in ETL, Talend, kettle, tagged benchmark, Matt Casters, PDI.TOS, Pentaho on December 4, 2008 | 4 Comments »
Pentaho’s Matt Caster has just published a benchmarking exercise comparing Kettle and Talend. In it he admits he’s not a Talend expert and he advises that people should perform their own benchmarks where possible as requirements differ. Nevertheless, unlike most other benchmarks we’ve seen on the subject he publishes not just the results but the [...]
Spending time on Excel-SQLite, C, VBA Callbacks & Twitter
Posted in BI, ETL, Palo, SQLite, VBA, Web2.0, excel, xLite, tagged c#, Twitter on November 20, 2008 | 3 Comments »
Haven’t posted here in a while as my spare time has been soaked up programing, well actually refactoring would be more exact. My xLite “SQLite empowered Excel” codebase has grown over the years and required a serious makeover to get rid of stuff I no longer use and to generally make it more robust. I [...]
Open Source Metrics and Benchmarks
Posted in ETL, Talend, kettle, tagged ETL benchmarks, PDI 3.0, WaveMaker on October 30, 2008 | 13 Comments »
Marc Russel’s blog links to a Manapps ELT benchmark report comparing the performance of several leading ETL tools both proprietary (DataStage and Informatica) and OS (Talend and PDI (aka Kettle)). As would be expected each tool has their own strengths and weaknesses, but one thing stands out, the venerable Kettle ETL aka PDI 3.0 is now [...]
Why Larry hates the cloud, and my data trinity.
Posted in AmazonAWS, ETL, Palo, SQLite, cloud, excel, olap, tagged cloud bursting, Oracle on October 4, 2008 | Leave a Comment »
Last week Oracle certified Amazon EC2 as a supported platform, that same week Larry Elison attacked the concept of cloud computing as pure hype. Obviously, Larry is not happy with this whole cloud thing, and I think it’s not just the threat it poses to the software industry’s traditional licensing model that worries him, rather, as Robert X. Cringely [...]
Clouds no longer pass by Windows.
Posted in AmazonAWS, EC2, ETL, RSSBus, Web2.0, data, news, tagged cloud, cloud burst, SQLServer on EC2, Windows on EC2 on October 1, 2008 | 4 Comments »
Amazon today announced that later this year, Windows Server woud be available on EC2. No details on cost and licensing etc. but this is major. Up until now, that portion of the business world who are pure MS shops (a very large percentage especially amongst SMEs) were excluded from taking advantage of Amazon’s amazing (and [...]
Cloudy skies, cloudy apps…
Posted in BI, ETL, Ireland, Palo, Web2.0, cloud, data, excel, news, olap, tagged Freiburg, Jedox, WaveMaker, Worksheet Server on August 28, 2008 | 4 Comments »
Just back from a break in Clifden, Connemara, summer is nearly over, the kids return to school today, back to work.
Counties Galway and Mayo were like the rest of the country last week, a tad wet, but unlike the developed east of the island, flooding was not a problem; a problematic drainage area is called [...]
Talend + SQLite + Groovy the new Oracle …
Posted in BI, EC2, ETL, Groovy, Palo, SQLite, Talend, data, excel, olap, tagged Oracle, Oracle 10g Express on August 2, 2008 | 5 Comments »
… well, at least for me. Let me explain.
For most of my datasmithing career, I’ve had access to corporate Oracle databases and now with the availability of Oracle10g Express I can even run my own Oracle instances at home or on EC2. The combination of a powerful SQL engine, expressive scripting language (PL/SQL) ,OS independence, [...]
New universal SQLite JDBC library.
Posted in ETL, Java, SQLite, Talend, kettle, news, tagged JDBC, universal, zentus.com on July 21, 2008 | Leave a Comment »
Both Talend (Java) and Kettle distribute the Zentus.com pure-Java SQLite JDBC driver and for most purposes this run-anywhere version is fine. But, if you really need to take advantage of SQLite’s speed then connecting using the native JNI version is a must. Doing this was easy enough, just change over to using a generic JDBC [...]
Groovy as Talend’s scripting language
Posted in ETL, Groovy, Java, Palo, SQLite, Talend, data, tagged Jetty, SQLite user defined functions on July 20, 2008 | 6 Comments »
Although I had decided to use Talend (Java version) as my primary ETL tool I still had one major problem with it, its lack of a scripting tool. Kettle (Pentaho PDI) has Javascript, Excel has VBA, Picalo has (well OK, is) Python and Talend in its Perl version has Perl. I could have gone (and [...]
Regular Expressions as an end-user programming tool?
Posted in ETL, Talend, excel, kettle, tagged regex, regular expressions on July 1, 2008 | 2 Comments »
“What? Have you completely lost the plot, Gleeson?”, I hear you scream. Jamie Zawinski’s famous quote is intoned once more ..
Some people, when confronted with a problem, think
“I know, I’ll use regular expressions.” Now they have two problems.
Of course the above quote could be (and probably has been) changed to…
Most business people, when confronted with [...]
What to do when Talend gets its knickers in a twist?
Posted in ETL, Talend, tagged .item, .JETEmtiters, Java on June 30, 2008 | 2 Comments »
If you’ve done any significant amount of work with Talend you’ll undoubtedly have experienced situations where either the generated code/JETemitters or the GUI representation of a job become unstable like so…
The usual advice is to backup your projects (workspace/projectName) , delete the workspace/.Java (or .Perl) and workspace/.JETEmitters folders and restart Talend to force a [...]