<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for MetaOptimize</title>
	<atom:link href="http://blog.metaoptimize.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.metaoptimize.com</link>
	<description>building machine learning and natural language processing tools</description>
	<lastBuildDate>Sun, 28 Feb 2010 23:20:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Constitution for Governance of Open-Source Projects (v20100227) by Evgeny</title>
		<link>http://blog.metaoptimize.com/2010/02/27/constitution-for-governance-of-open-source-projects-v20100227/comment-page-1/#comment-360</link>
		<dc:creator>Evgeny</dc:creator>
		<pubDate>Sun, 28 Feb 2010 23:20:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=89#comment-360</guid>
		<description>Makes sense. Also - I like that it is concise and sticks to just basic principles.

Should it address the potential side-effects of &quot;undo-ocracy&quot; and &quot;redo-ocracy&quot;?

-Evgeny.</description>
		<content:encoded><![CDATA[<p>Makes sense. Also &#8211; I like that it is concise and sticks to just basic principles.</p>
<p>Should it address the potential side-effects of &#8220;undo-ocracy&#8221; and &#8220;redo-ocracy&#8221;?</p>
<p>-Evgeny.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Constitution for Governance of Open-Source Projects (v20100227) by AppRacer &#8211; Boost development of your PHP Projects to 10+ times &#124; Startup Websites</title>
		<link>http://blog.metaoptimize.com/2010/02/27/constitution-for-governance-of-open-source-projects-v20100227/comment-page-1/#comment-353</link>
		<dc:creator>AppRacer &#8211; Boost development of your PHP Projects to 10+ times &#124; Startup Websites</dc:creator>
		<pubDate>Sun, 28 Feb 2010 02:46:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=89#comment-353</guid>
		<description>[...] MetaOptimize » Constitution for Governance of Open-Source Projects &#8230; [...]</description>
		<content:encoded><![CDATA[<p>[...] MetaOptimize » Constitution for Governance of Open-Source Projects &#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why can&#8217;t you pickle generators in Python? A pattern for saving training state by Why you cannot pickle generator</title>
		<link>http://blog.metaoptimize.com/2009/12/22/why-cant-you-pickle-generators-in-python-workaround-pattern-for-saving-training-state/comment-page-1/#comment-158</link>
		<dc:creator>Why you cannot pickle generator</dc:creator>
		<pubDate>Tue, 29 Dec 2009 23:07:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=72#comment-158</guid>
		<description>[...] Turian wrote a post about regarding pickling generator on his blog. In his post, he says: However, generators become problematic when you want to persist [...]</description>
		<content:encoded><![CDATA[<p>[...] Turian wrote a post about regarding pickling generator on his blog. In his post, he says: However, generators become problematic when you want to persist [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why can&#8217;t you pickle generators in Python? A pattern for saving training state by Joseph Turian</title>
		<link>http://blog.metaoptimize.com/2009/12/22/why-cant-you-pickle-generators-in-python-workaround-pattern-for-saving-training-state/comment-page-1/#comment-145</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Wed, 23 Dec 2009 00:10:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=72#comment-145</guid>
		<description>Response to criticism on &lt;a href=&quot;http://news.ycombinator.com/item?id=1010533&quot; rel=&quot;nofollow&quot;&gt;hackernews&lt;/a&gt;:

&lt;i&gt;(1) You can only pickle generators that generate the same sequence every time they are restarted.&lt;/i&gt;

I don&#039;t know how you can persist state if you do not make this assumption.

&lt;i&gt;(2) All the work the generator did prior to pickling must be performed again on unpickling.&lt;i&gt;

Something faster would be to use file.tell() to get the state and file.seek() to set the state. Since the &quot;unpickling&quot; is not a bottleneck, I didn&#039;t optimize this.</description>
		<content:encoded><![CDATA[<p>Response to criticism on <a href="http://news.ycombinator.com/item?id=1010533" rel="nofollow">hackernews</a>:</p>
<p><i>(1) You can only pickle generators that generate the same sequence every time they are restarted.</i></p>
<p>I don&#8217;t know how you can persist state if you do not make this assumption.</p>
<p><i>(2) All the work the generator did prior to pickling must be performed again on unpickling.</i><i></p>
<p>Something faster would be to use file.tell() to get the state and file.seek() to set the state. Since the &#8220;unpickling&#8221; is not a bottleneck, I didn&#8217;t optimize this.</i></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why can&#8217;t you pickle generators in Python? A pattern for saving training state by Richard Tew</title>
		<link>http://blog.metaoptimize.com/2009/12/22/why-cant-you-pickle-generators-in-python-workaround-pattern-for-saving-training-state/comment-page-1/#comment-142</link>
		<dc:creator>Richard Tew</dc:creator>
		<pubDate>Tue, 22 Dec 2009 22:19:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=72#comment-142</guid>
		<description>You can do this with Stackless, of course.

http://www.disinterest.org/resource/stackless/2.6.4-docs-html/library/stackless/pickling.html</description>
		<content:encoded><![CDATA[<p>You can do this with Stackless, of course.</p>
<p><a href="http://www.disinterest.org/resource/stackless/2.6.4-docs-html/library/stackless/pickling.html" rel="nofollow">http://www.disinterest.org/resource/stackless/2.6.4-docs-html/library/stackless/pickling.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Fast deserialization in Python by Nir</title>
		<link>http://blog.metaoptimize.com/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-36</link>
		<dc:creator>Nir</dc:creator>
		<pubDate>Tue, 10 Nov 2009 08:41:47 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-36</guid>
		<description>Seems that Bob Ippolito fixed simplejson slowness. 
Retry with latest version.</description>
		<content:encoded><![CDATA[<p>Seems that Bob Ippolito fixed simplejson slowness.<br />
Retry with latest version.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Use flag &#8211;xml when you run mysqldump by Michael E Driscoll</title>
		<link>http://blog.metaoptimize.com/2009/10/14/use-flag-xml-when-you-run-mysqldump/comment-page-1/#comment-16</link>
		<dc:creator>Michael E Driscoll</dc:creator>
		<pubDate>Thu, 15 Oct 2009 01:47:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=60#comment-16</guid>
		<description>XML has its place (somewhere), but in this programmer&#039;s humble opinion, exporting tabular data is not one of them.


http://www.dataspora.com/blog/xml-and-big-data/</description>
		<content:encoded><![CDATA[<p>XML has its place (somewhere), but in this programmer&#8217;s humble opinion, exporting tabular data is not one of them.</p>
<p><a href="http://www.dataspora.com/blog/xml-and-big-data/" rel="nofollow">http://www.dataspora.com/blog/xml-and-big-data/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Use flag &#8211;xml when you run mysqldump by Joseph Turian</title>
		<link>http://blog.metaoptimize.com/2009/10/14/use-flag-xml-when-you-run-mysqldump/comment-page-1/#comment-15</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Wed, 14 Oct 2009 23:17:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=60#comment-15</guid>
		<description>In response to &lt;a href=&quot;http://groups.google.com/group/get-theinfo/browse_thread/thread/99c8a28bcaca685d&quot; rel=&quot;nofollow&quot;&gt;Joshua Reich&lt;/a&gt;:

Let me answer your last question first:

&gt; 4. Why aren&#039;t you using postgres ?

I was getting data from someone that uses MySQL.
Knowing what I know now, I believe should should have advised him to use the --xml flag.

&gt; 1. I am friends with awk &amp; pals, and stripping out INSERT .. VALUE from
&gt; mysql dumps that I get from people is no biggy

I am friends with perl, and you cannot simply split using /,/ to get your fields. The comma might be right in the middle of a string.

With XML, though, it is simple to grep for &lt;row&gt;, because you know 100% that &lt; will only be in the markup.

&gt; 2. I&#039;m pretty sure MySQL supports dumping table data as CSV (SELECT ...
&gt; OUTHOUSE &#039;/tmp/file.csv&#039; ...)

Same problems as above.

&gt; 3. For big data, XML is just silly big.

Why? It gzips easily.
Not being able to load it all into memory is less of an issue if it is easy to split the data using regular expressions.</description>
		<content:encoded><![CDATA[<p>In response to <a href="http://groups.google.com/group/get-theinfo/browse_thread/thread/99c8a28bcaca685d" rel="nofollow">Joshua Reich</a>:</p>
<p>Let me answer your last question first:</p>
<p>> 4. Why aren&#8217;t you using postgres ?</p>
<p>I was getting data from someone that uses MySQL.<br />
Knowing what I know now, I believe should should have advised him to use the &#8211;xml flag.</p>
<p>> 1. I am friends with awk &#038; pals, and stripping out INSERT .. VALUE from<br />
> mysql dumps that I get from people is no biggy</p>
<p>I am friends with perl, and you cannot simply split using /,/ to get your fields. The comma might be right in the middle of a string.</p>
<p>With XML, though, it is simple to grep for &lt;row&gt;, because you know 100% that &lt; will only be in the markup.</p>
<p>> 2. I&#8217;m pretty sure MySQL supports dumping table data as CSV (SELECT &#8230;<br />
> OUTHOUSE &#8216;/tmp/file.csv&#8217; &#8230;)</p>
<p>Same problems as above.</p>
<p>> 3. For big data, XML is just silly big.</p>
<p>Why? It gzips easily.<br />
Not being able to load it all into memory is less of an issue if it is easy to split the data using regular expressions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Fast deserialization in Python by John Millikin</title>
		<link>http://blog.metaoptimize.com/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-9</link>
		<dc:creator>John Millikin</dc:creator>
		<pubDate>Tue, 24 Mar 2009 13:59:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-9</guid>
		<description>(reposting a comment from Hacker News, at Joseph Turian&#039;s request)&lt;br&gt;&lt;br&gt;I&#039;m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!&lt;br&gt;&lt;br&gt;First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you&#039;ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.&lt;br&gt;&lt;br&gt;Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It&#039;s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.&lt;br&gt;&lt;br&gt;Third, it&#039;s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors -- I prefer strict conformance, others less strict -- but cjson is so different as to be simply unusable.&lt;br&gt;&lt;br&gt;Yes, it&#039;s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson&#039;s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn&#039;t everything. As the saying goes, &quot;if I want my math performed fast and wrong I&#039;ll ask my cat&quot;.&lt;br&gt;&lt;br&gt;In my opinion, the only Python JSON libraries worth considering are:&lt;br&gt;&lt;br&gt;* simplejson -- it&#039;s in the standard library, and should therefore be considered first and most thoroughly.&lt;br&gt;&lt;br&gt;* jsonlib -- it&#039;s fast, well-tested, and standards-compliant.&lt;br&gt;&lt;br&gt;* demjson -- has several options for reliable parsing of invalid input.&lt;br&gt;&lt;br&gt;Last time I checked, jsonlib and simplejson&#039;s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson&#039;s extensions are only used for certain subsets of input -- if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.&lt;br&gt;&lt;br&gt;Apologies for the brain-dump, but even if you skip right over it, please remember: don&#039;t use cjson.</description>
		<content:encoded><![CDATA[<p>(reposting a comment from Hacker News, at Joseph Turian&#39;s request)</p>
<p>I&#39;m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!</p>
<p>First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you&#39;ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.</p>
<p>Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It&#39;s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.</p>
<p>Third, it&#39;s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors &#8212; I prefer strict conformance, others less strict &#8212; but cjson is so different as to be simply unusable.</p>
<p>Yes, it&#39;s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson&#39;s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn&#39;t everything. As the saying goes, &#8220;if I want my math performed fast and wrong I&#39;ll ask my cat&#8221;.</p>
<p>In my opinion, the only Python JSON libraries worth considering are:</p>
<p>* simplejson &#8212; it&#39;s in the standard library, and should therefore be considered first and most thoroughly.</p>
<p>* jsonlib &#8212; it&#39;s fast, well-tested, and standards-compliant.</p>
<p>* demjson &#8212; has several options for reliable parsing of invalid input.</p>
<p>Last time I checked, jsonlib and simplejson&#39;s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson&#39;s extensions are only used for certain subsets of input &#8212; if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.</p>
<p>Apologies for the brain-dump, but even if you skip right over it, please remember: don&#39;t use cjson.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Fast deserialization in Python by Joseph Turian</title>
		<link>http://blog.metaoptimize.com/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-8</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Mon, 23 Mar 2009 16:50:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-8</guid>
		<description>I am excited for a faster protobuf. In particular, haberman&#039;s &lt;a href=&quot;http://github.com/haberman/pbstream/tree/master&quot; rel=&quot;nofollow&quot;&gt;C extensions&lt;/a&gt; look promising.&lt;br&gt;&lt;br&gt;Compactness is very important for transferring data over a network.&lt;br&gt;However, during the development cycle, human readability is important and often overlooked. If all you need to do to read your data is type &#039;zcat&#039;, you are much more likely to be looking at your data, and hence more likely to catch bugs.</description>
		<content:encoded><![CDATA[<p>I am excited for a faster protobuf. In particular, haberman&#39;s <a href="http://github.com/haberman/pbstream/tree/master" rel="nofollow">C extensions</a> look promising.</p>
<p>Compactness is very important for transferring data over a network.<br />However, during the development cycle, human readability is important and often overlooked. If all you need to do to read your data is type &#39;zcat&#39;, you are much more likely to be looking at your data, and hence more likely to catch bugs.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
