<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Open Source Metrics and Benchmarks</title>
	<atom:link href="http://blog.gobansaor.com/2008/10/30/open-source-metrics/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/</link>
	<description>A country datasmith.</description>
	<lastBuildDate>Tue, 02 Mar 2010 17:49:09 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Pentaho Data Integration (Kettle) V Talend Benchmark &#171; Gobán Saor</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4811</link>
		<dc:creator>Pentaho Data Integration (Kettle) V Talend Benchmark &#171; Gobán Saor</dc:creator>
		<pubDate>Thu, 04 Dec 2008 17:57:23 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4811</guid>
		<description>[...] perform their own benchmarks where possible as requirements differ.  Nevertheless, unlike most other benchmarks we&#8217;ve seen on the subject he publishes not just the results but the actual transformation &#8220;code&#8221; used in the [...]</description>
		<content:encoded><![CDATA[<p>[...] perform their own benchmarks where possible as requirements differ.  Nevertheless, unlike most other benchmarks we&#8217;ve seen on the subject he publishes not just the results but the actual transformation &#8220;code&#8221; used in the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel McCaffrey</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4810</link>
		<dc:creator>Daniel McCaffrey</dc:creator>
		<pubDate>Thu, 04 Dec 2008 03:48:07 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4810</guid>
		<description>Nick, I read my comment, and it made me laugh. :-o   Ok, I was expecting a little much, and it would be *really* good to have that kind of comparison around, but it has to be fair and thorough enough to be repeatable by others.     This one just seems to be missing too much terms of how it was run.</description>
		<content:encoded><![CDATA[<p>Nick, I read my comment, and it made me laugh. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_surprised.gif' alt=':-o' class='wp-smiley' />    Ok, I was expecting a little much, and it would be *really* good to have that kind of comparison around, but it has to be fair and thorough enough to be repeatable by others.     This one just seems to be missing too much terms of how it was run.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nicholas Goodman</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4809</link>
		<dc:creator>Nicholas Goodman</dc:creator>
		<pubDate>Tue, 02 Dec 2008 16:46:58 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4809</guid>
		<description>Dan - spoken like a guy who spent years working at a company that knows a thing or two about experiments?  :)</description>
		<content:encoded><![CDATA[<p>Dan &#8211; spoken like a guy who spent years working at a company that knows a thing or two about experiments?  <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel McCaffrey</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4807</link>
		<dc:creator>Daniel McCaffrey</dc:creator>
		<pubDate>Sun, 30 Nov 2008 20:58:31 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4807</guid>
		<description>I appreciate the effort, but the referenced article isn&#039;t convincing for me.  If he truly wants to be scientific about it, he needs to provide the actual files used, what the program settings were, etc. so it could be replicated.   Otherwise it&#039;s a bit like cold fusion.   

The wording used in statements like below suggest a bias to me as well.  This and the lack of experimental detail leaves me waiting for a study that validates the results here before I&#039;d accept any of it as fact.  There are too many missing variables.
&quot;Our main reason for this assessment of Pentaho is
mostly linked to the many parameters that need to be learnt. However, we think that if you *invest lots of time in it*, it could become an powerful tool.&quot;</description>
		<content:encoded><![CDATA[<p>I appreciate the effort, but the referenced article isn&#8217;t convincing for me.  If he truly wants to be scientific about it, he needs to provide the actual files used, what the program settings were, etc. so it could be replicated.   Otherwise it&#8217;s a bit like cold fusion.   </p>
<p>The wording used in statements like below suggest a bias to me as well.  This and the lack of experimental detail leaves me waiting for a study that validates the results here before I&#8217;d accept any of it as fact.  There are too many missing variables.<br />
&#8220;Our main reason for this assessment of Pentaho is<br />
mostly linked to the many parameters that need to be learnt. However, we think that if you *invest lots of time in it*, it could become an powerful tool.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nicholas Goodman on Business Intelligence &#187; Blog Archive &#187; An arms race my customers don&#8217;t care about</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4804</link>
		<dc:creator>Nicholas Goodman on Business Intelligence &#187; Blog Archive &#187; An arms race my customers don&#8217;t care about</dc:creator>
		<pubDate>Thu, 27 Nov 2008 00:14:44 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4804</guid>
		<description>[...] Recently, I observed a thread at the blog of Goban Saor entitled &#8220;Open Source Metrics.&#8221; [...]</description>
		<content:encoded><![CDATA[<p>[...] Recently, I observed a thread at the blog of Goban Saor entitled &#8220;Open Source Metrics.&#8221; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fabrice</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4789</link>
		<dc:creator>Fabrice</dc:creator>
		<pubDate>Fri, 14 Nov 2008 14:49:03 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4789</guid>
		<description>Talend experts? You&#039;re kidding!
I think the guys who wrote this bench don&#039;t know much more about Talend than you do Matt!

Some examples: they didn&#039;t use v3.0 (milestone versions of v3.0 where available since July
and we made some huge enhancements on a lot of components on tFileInput, tAggregate, t…)), in numerous jobs, the basic approach is used, it would be much better (efficiency speaking) to leverage dedicated components (tJoin, tFilterColumns…) but I think theses observations are also good for others tools).
No really, Talend experts are not working like that.

Fabrice &#124; talend</description>
		<content:encoded><![CDATA[<p>Talend experts? You&#8217;re kidding!<br />
I think the guys who wrote this bench don&#8217;t know much more about Talend than you do Matt!</p>
<p>Some examples: they didn&#8217;t use v3.0 (milestone versions of v3.0 where available since July<br />
and we made some huge enhancements on a lot of components on tFileInput, tAggregate, t…)), in numerous jobs, the basic approach is used, it would be much better (efficiency speaking) to leverage dedicated components (tJoin, tFilterColumns…) but I think theses observations are also good for others tools).<br />
No really, Talend experts are not working like that.</p>
<p>Fabrice | talend</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Casters</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4785</link>
		<dc:creator>Matt Casters</dc:creator>
		<pubDate>Tue, 11 Nov 2008 13:13:34 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4785</guid>
		<description>&quot;Talend experts writes benchmark that says Talend is faster&quot;. 

I don&#039;t salute their initiative at all.  It&#039;s misinformation at best, slander at worst.</description>
		<content:encoded><![CDATA[<p>&#8220;Talend experts writes benchmark that says Talend is faster&#8221;. </p>
<p>I don&#8217;t salute their initiative at all.  It&#8217;s misinformation at best, slander at worst.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: I.Kato</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4783</link>
		<dc:creator>I.Kato</dc:creator>
		<pubDate>Mon, 10 Nov 2008 17:15:06 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4783</guid>
		<description>Thanks Tom for the link.

We all know that:
- a benchmark is a benchmark and remains subjective (you always know a tool more than another one…).
- benchmarks are never pleasing enough for everyone.

This being said, even if certain results need to be looked at closer (especially Informatica’s results in my opinion), this is a really good job from the Manapps team. It takes a lot of time and skill to produce this kind of paper and I salute their initiative.

Great job!</description>
		<content:encoded><![CDATA[<p>Thanks Tom for the link.</p>
<p>We all know that:<br />
- a benchmark is a benchmark and remains subjective (you always know a tool more than another one…).<br />
- benchmarks are never pleasing enough for everyone.</p>
<p>This being said, even if certain results need to be looked at closer (especially Informatica’s results in my opinion), this is a really good job from the Manapps team. It takes a lot of time and skill to produce this kind of paper and I salute their initiative.</p>
<p>Great job!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sean</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4778</link>
		<dc:creator>Sean</dc:creator>
		<pubDate>Fri, 31 Oct 2008 17:28:21 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4778</guid>
		<description>This is a nice find. Thanks Tom. 

I have not used Kettle or Data Stage at all. But I am very familiar with Informatica. I could see that they developer seemed a bit unsure with INFA as a number of places, he/she could have used a ROUTER rather than a FILTER and saved some time. Also it is a good practice to user a SORTER before an AGGREGATOR as the sorter transform is considerably faster than the aggregator in INFA. Another thing in Informatica&#039;s defense is that this is being done on an XP laptop/desktop. INFA probably is significantly faster when deployed on an AIX type machine with large memory. And I also agree that the best way to aggregate is in the DB if possible. Even with INFA, they have taken the output to an aggregator (again without a sorter) rather that doing it in the Source Qualifier via SQL query. 

As far as bias is concerned, they did show Datastage PX to be as fast or faster than Talend. So this might be inexperience with the other tools rather than deliberate bias. 

I feel that one is going to use the tool that one is most comfortable in (for me Informatica and Talend at this point) unless you just cannot achieve something at all with that tool.</description>
		<content:encoded><![CDATA[<p>This is a nice find. Thanks Tom. </p>
<p>I have not used Kettle or Data Stage at all. But I am very familiar with Informatica. I could see that they developer seemed a bit unsure with INFA as a number of places, he/she could have used a ROUTER rather than a FILTER and saved some time. Also it is a good practice to user a SORTER before an AGGREGATOR as the sorter transform is considerably faster than the aggregator in INFA. Another thing in Informatica&#8217;s defense is that this is being done on an XP laptop/desktop. INFA probably is significantly faster when deployed on an AIX type machine with large memory. And I also agree that the best way to aggregate is in the DB if possible. Even with INFA, they have taken the output to an aggregator (again without a sorter) rather that doing it in the Source Qualifier via SQL query. </p>
<p>As far as bias is concerned, they did show Datastage PX to be as fast or faster than Talend. So this might be inexperience with the other tools rather than deliberate bias. </p>
<p>I feel that one is going to use the tool that one is most comfortable in (for me Informatica and Talend at this point) unless you just cannot achieve something at all with that tool.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Gleeson</title>
		<link>http://blog.gobansaor.com/2008/10/30/open-source-metrics/#comment-4777</link>
		<dc:creator>Tom Gleeson</dc:creator>
		<pubDate>Thu, 30 Oct 2008 16:23:13 +0000</pubDate>
		<guid isPermaLink="false">http://gobansaor.wordpress.com/?p=557#comment-4777</guid>
		<description>Matt,

Your right as regards SQL, can&#039;t understand this aversion to using it.  In fact, even when I&#039;m mainly dealing with text or Excel data I  load the data into a SQLite database (in memory preferably) and let it do the heavy lifting; quicker and less error prone than using tool based transformations.

Anyway this speed thing is a red herring, people will use the tool (and programming language) that they &quot;want&quot; to use (even to the point of paying big bucks when there&#039;s an OSS solution that could equally do the job!).

If there are speed problems these days it&#039;s often easier to throw hardware (or another EC2 instance) at the problem.

Tom</description>
		<content:encoded><![CDATA[<p>Matt,</p>
<p>Your right as regards SQL, can&#8217;t understand this aversion to using it.  In fact, even when I&#8217;m mainly dealing with text or Excel data I  load the data into a SQLite database (in memory preferably) and let it do the heavy lifting; quicker and less error prone than using tool based transformations.</p>
<p>Anyway this speed thing is a red herring, people will use the tool (and programming language) that they &#8220;want&#8221; to use (even to the point of paying big bucks when there&#8217;s an OSS solution that could equally do the job!).</p>
<p>If there are speed problems these days it&#8217;s often easier to throw hardware (or another EC2 instance) at the problem.</p>
<p>Tom</p>
]]></content:encoded>
	</item>
</channel>
</rss>
