<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: How to convert html entities to &#8220;real&#8221; unicode in Python</title>
	<atom:link href="http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/</link>
	<description>A blog about SecondLife, webservices, and whatever else seems to fit.</description>
	<lastBuildDate>Fri, 16 Oct 2009 01:42:45 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: LlucPot</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-866</link>
		<dc:creator>LlucPot</dc:creator>
		<pubDate>Fri, 02 Oct 2009 11:03:29 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-866</guid>
		<description>from BeautifulSoup import BeautifulStoneSoup

def HTMLtoUni(entity):
    uni=str()
    if type(entity)!= type(str()):
        ## entity is not a string
        return
    if not re.match(u&#039;&amp;[#a-z0-9]+;&#039;,entity):
        ##entity is not an HTML entity
        return
    if re.match(u&#039;&amp;#x[a-f0-9]+;&#039;,entity):
        ## convert hex HTML entity to HTML decimal entity
        entity=str(&#039;&amp;#%i;&#039;%int(&#039;0%s&#039;%entity[2:-1],16))
    return unicode(BeautifulStoneSoup(entity,convertEntities=BeautifulStoneSoup.HTML_ENTITIES ))</description>
		<content:encoded><![CDATA[<p>from BeautifulSoup import BeautifulStoneSoup</p>
<p>def HTMLtoUni(entity):<br />
    uni=str()<br />
    if type(entity)!= type(str()):<br />
        ## entity is not a string<br />
        return<br />
    if not re.match(u&#8217;&amp;[#a-z0-9]+;&#8217;,entity):<br />
        ##entity is not an HTML entity<br />
        return<br />
    if re.match(u&#8217;&amp;#x[a-f0-9]+;&#8217;,entity):<br />
        ## convert hex HTML entity to HTML decimal entity<br />
        entity=str(&#8216;&amp;#%i;&#8217;%int(&#8216;0%s&#8217;%entity[2:-1],16))<br />
    return unicode(BeautifulStoneSoup(entity,convertEntities=BeautifulStoneSoup.HTML_ENTITIES ))</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LlucPot</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-865</link>
		<dc:creator>LlucPot</dc:creator>
		<pubDate>Fri, 02 Oct 2009 10:51:27 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-865</guid>
		<description>Solved:
hex entity to decimal entity:

## IF you have encodedStringHex
## DO(in Python):
encodedStringDecimal=str(&#039;&amp;#%i;&#039;%int(&#039;0%s&#039;%encodedStringHex[2:-1],16))
##THEN  the solution posted above will work</description>
		<content:encoded><![CDATA[<p>Solved:<br />
hex entity to decimal entity:</p>
<p>## IF you have encodedStringHex<br />
## DO(in Python):<br />
encodedStringDecimal=str(&#8216;&amp;#%i;&#8217;%int(&#8216;0%s&#8217;%encodedStringHex[2:-1],16))<br />
##THEN  the solution posted above will work</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LlucPot</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-864</link>
		<dc:creator>LlucPot</dc:creator>
		<pubDate>Fri, 02 Oct 2009 10:27:10 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-864</guid>
		<description>Tough it&#039;s a helpul piece of advice, I&#039;m in a trickier situation. 

I have to convert HTML entities IN HEX to uni. So BeautifulSoup complains that I give to it an &#039;invalid literal for int() with base 10&#039;. The entities I face are the type: {&amp;#x~hex number here~;} (i.e.: &#039;&amp;#xf2&#039; for &#039;&amp;242&#039;/&#039;&#242;&#039;). Does anyone knows soemthing about it?</description>
		<content:encoded><![CDATA[<p>Tough it&#8217;s a helpul piece of advice, I&#8217;m in a trickier situation. </p>
<p>I have to convert HTML entities IN HEX to uni. So BeautifulSoup complains that I give to it an &#8216;invalid literal for int() with base 10&#8242;. The entities I face are the type: {&amp;#x~hex number here~;} (i.e.: &#8216;&amp;#xf2&#8242; for &#8216;&amp;242&#8242;/&#8217;&ograve;&#8217;). Does anyone knows soemthing about it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: javaJake</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-863</link>
		<dc:creator>javaJake</dc:creator>
		<pubDate>Mon, 14 Sep 2009 21:46:46 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-863</guid>
		<description>Thanks ciemaar for this blog post!</description>
		<content:encoded><![CDATA[<p>Thanks ciemaar for this blog post!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonas Byström</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-842</link>
		<dc:creator>Jonas Byström</dc:creator>
		<pubDate>Tue, 31 Mar 2009 09:46:03 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-842</guid>
		<description>Thanks a million! Yeah, amazing they slipped on this one.</description>
		<content:encoded><![CDATA[<p>Thanks a million! Yeah, amazing they slipped on this one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Young</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-838</link>
		<dc:creator>Ian Young</dc:creator>
		<pubDate>Sat, 22 Nov 2008 03:26:28 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-838</guid>
		<description>&quot;Beautiful Soup uses a class called UnicodeDammit to detect the encodings of documents you give it and convert them to Unicode, no matter what. If you need to do this for other documents (without using Beautiful Soup to parse them), you can use UnicodeDammit by itself.&quot;

http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit

I haven&#039;t tried using the class by itself, but it sounds like that could be the solution to the not-needing-the-whole-library woes.</description>
		<content:encoded><![CDATA[<p>&#8220;Beautiful Soup uses a class called UnicodeDammit to detect the encodings of documents you give it and convert them to Unicode, no matter what. If you need to do this for other documents (without using Beautiful Soup to parse them), you can use UnicodeDammit by itself.&#8221;</p>
<p><a href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit" rel="nofollow">http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit</a></p>
<p>I haven&#8217;t tried using the class by itself, but it sounds like that could be the solution to the not-needing-the-whole-library woes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: M</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-837</link>
		<dc:creator>M</dc:creator>
		<pubDate>Tue, 04 Nov 2008 17:35:28 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-837</guid>
		<description>I did what rabio suggested and it works perfectly! Thanks a lot!</description>
		<content:encoded><![CDATA[<p>I did what rabio suggested and it works perfectly! Thanks a lot!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tørbjorn</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-836</link>
		<dc:creator>Tørbjorn</dc:creator>
		<pubDate>Tue, 14 Oct 2008 08:15:29 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-836</guid>
		<description>I just wanna join in on the praise of the article, it really helped me out too.
BeautifulSoup is a hell of library, and it has saved my more times than I want to remember.

On the other hand, why is coding and deconding HTML / XML entities so &quot;hard&quot; in python ?</description>
		<content:encoded><![CDATA[<p>I just wanna join in on the praise of the article, it really helped me out too.<br />
BeautifulSoup is a hell of library, and it has saved my more times than I want to remember.</p>
<p>On the other hand, why is coding and deconding HTML / XML entities so &#8220;hard&#8221; in python ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: akahn</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-835</link>
		<dc:creator>akahn</dc:creator>
		<pubDate>Mon, 06 Oct 2008 21:57:12 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-835</guid>
		<description>Looks like Beautiful Soup is a solid library, but it seems like overkill for my usage, unfortunately. In my sub-300 line script, I&#039;m trying to decode 160 character strings, 20 at a time, so using this whole library seems wrong... maybe I&#039;ll do something like rabio suggests?</description>
		<content:encoded><![CDATA[<p>Looks like Beautiful Soup is a solid library, but it seems like overkill for my usage, unfortunately. In my sub-300 line script, I&#8217;m trying to decode 160 character strings, 20 at a time, so using this whole library seems wrong&#8230; maybe I&#8217;ll do something like rabio suggests?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rabio</title>
		<link>http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-830</link>
		<dc:creator>rabio</dc:creator>
		<pubDate>Wed, 30 Jul 2008 16:56:45 +0000</pubDate>
		<guid isPermaLink="false">http://channel3b.wordpress.com/2007/07/04/how-to-convert-html-entities-to-real-unicode-in-python/#comment-830</guid>
		<description>Another great solution that not even require any external modules can
be found on: http://effbot.org/zone/re-sub.htm#unescape-html</description>
		<content:encoded><![CDATA[<p>Another great solution that not even require any external modules can<br />
be found on: <a href="http://effbot.org/zone/re-sub.htm#unescape-html" rel="nofollow">http://effbot.org/zone/re-sub.htm#unescape-html</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
