<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: What can you do with texts that are in a digital format?</title>
	<atom:link href="http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/feed/" rel="self" type="application/rss+xml" />
	<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/</link>
	<description>Exploring what digital scholarship is and how to do it in the context of the humanities</description>
	<lastBuildDate>Tue, 24 Nov 2009 21:32:03 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: So you&#8217;ve digitized your text. Now what? &#171; (Digital) Humanities</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-245</link>
		<dc:creator>So you&#8217;ve digitized your text. Now what? &#171; (Digital) Humanities</dc:creator>
		<pubDate>Tue, 12 Aug 2008 14:52:32 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-245</guid>
		<description>[...] Filed under Digital Humanities, issues and tagged: digitization   In a May posting to her blog, Digital Scholarship in the Humanities, Lisa Spiro addresses one of the most obvious questions to strike anyone who&#8217;s interested in [...]</description>
		<content:encoded><![CDATA[<p>[...] Filed under Digital Humanities, issues and tagged: digitization   In a May posting to her blog, Digital Scholarship in the Humanities, Lisa Spiro addresses one of the most obvious questions to strike anyone who&#8217;s interested in [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lms4w</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-176</link>
		<dc:creator>lms4w</dc:creator>
		<pubDate>Mon, 19 May 2008 21:08:31 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-176</guid>
		<description>@JGE: Yes, ideally both the image and the text would be made available.</description>
		<content:encoded><![CDATA[<p>@JGE: Yes, ideally both the image and the text would be made available.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jge</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-173</link>
		<dc:creator>jge</dc:creator>
		<pubDate>Mon, 19 May 2008 14:45:35 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-173</guid>
		<description>Text images have three advantages: 1. the ocr may be wrong, but the image is always right. 2. you can cite it just as the printed version. 3. printing is easy and gives you a very usable text, because the scanned version has already been optimised for print.
But of course 2. and 3. are also possible in xml text (with a lot more work).</description>
		<content:encoded><![CDATA[<p>Text images have three advantages: 1. the ocr may be wrong, but the image is always right. 2. you can cite it just as the printed version. 3. printing is easy and gives you a very usable text, because the scanned version has already been optimised for print.<br />
But of course 2. and 3. are also possible in xml text (with a lot more work).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kelly Searsmith, Ph.D.</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-162</link>
		<dc:creator>Kelly Searsmith, Ph.D.</dc:creator>
		<pubDate>Fri, 16 May 2008 15:33:17 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-162</guid>
		<description>Thanks for mentioning our work at SEASR (Software Environment for the Advancement of Scholarly Research).  We are readying our software for first release (set for later this summer) and are seeking humanities collaborators who would like to make use of SEASR.  We have a good deal of information up on our website: www.seasr.org, including a helpful technology description.</description>
		<content:encoded><![CDATA[<p>Thanks for mentioning our work at SEASR (Software Environment for the Advancement of Scholarly Research).  We are readying our software for first release (set for later this summer) and are seeking humanities collaborators who would like to make use of SEASR.  We have a good deal of information up on our website: <a href="http://www.seasr.org" rel="nofollow">http://www.seasr.org</a>, including a helpful technology description.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: What can you do with texts in a digital format? &#171; (Digital) Humanities</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-161</link>
		<dc:creator>What can you do with texts in a digital format? &#171; (Digital) Humanities</dc:creator>
		<pubDate>Fri, 16 May 2008 14:20:39 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-161</guid>
		<description>[...] page images are enough for scholars?), the excellent Digital Scholarship in the Humanities blog has a terrific posting explaining the many ways digital formats can support humanities [...]</description>
		<content:encoded><![CDATA[<p>[...] page images are enough for scholars?), the excellent Digital Scholarship in the Humanities blog has a terrific posting explaining the many ways digital formats can support humanities [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lms4w</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-159</link>
		<dc:creator>lms4w</dc:creator>
		<pubDate>Wed, 14 May 2008 18:17:53 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-159</guid>
		<description>@Kevin: Good point.  A lot of what I was describing applies to texts in a digital form (whether PDF, XML, or page images)--that&#039;s what the title of the post indicates, although my opening paragraph does focus more on page images vs OCRed/keyboarded texts.  I wanted to keep the post short and simple, but I should have been clearer.</description>
		<content:encoded><![CDATA[<p>@Kevin: Good point.  A lot of what I was describing applies to texts in a digital form (whether PDF, XML, or page images)&#8211;that&#8217;s what the title of the post indicates, although my opening paragraph does focus more on page images vs OCRed/keyboarded texts.  I wanted to keep the post short and simple, but I should have been clearer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Hawkins</title>
		<link>http://digitalscholarship.wordpress.com/2008/05/14/what-can-you-do-with-texts-that-are-in-a-digital-format/#comment-158</link>
		<dc:creator>Kevin Hawkins</dc:creator>
		<pubDate>Wed, 14 May 2008 17:35:56 +0000</pubDate>
		<guid isPermaLink="false">http://digitalscholarship.wordpress.com/?p=39#comment-158</guid>
		<description>A debate on electronic text versus page images is definitely worth having, but I think the comparison is muddled in this posting.  While I work in electronic publishing, I&#039;ll advocate page images here since I fully agree that providing broad access is more important than producing hard-crafted masterpieces of electronic text.

Nearly all projects which scan pages of text -- from venerable projects like Making of America to more recent ones like Google Book Search -- use OCR software on these images.  OCR accuracy rates vary but are well above 99% for contemporary typefaces.  So when you include these projects in the page-image category, you find that you can do nearly everything you can do with electronic text:

1. Read it: If you have a PDF of page images (such as you can download through Google Books for public-domain works), you can use a process like this to view on an Amazon Kindle: http://www.mobileread.com/forums/showthread.php?t=18968 .  As the corpus of digitized works grows, expect methods like this to be made more user-friendly.

2. Copy and paste it: Many page-image projects allow you to view the OCR of content.  While there will be OCR errors and possibly a loss of formatting, it&#039;s not really such a bad start.

3. Search it: Searching on full text uses the OCR text, and searching on the metadata uses the any metadata provided in the system.  As you point out, Google Books lets you find items quite easily.

4. Build a personal collection: You can download PDFs of pages images just as easily as other electronic text formats, though these PDFs may not be searchable the way electronic text is.

5. Share it: PDFs of page images can be shared as easily.

6. Analyze/visualize/mine it: You can do word-count analysis on the OCR text.  What you can&#039;t do is take advantage of any highly structured markup, but there are few sources of uniformly encoded data that allows you to do this kind of searching anyway, despite good efforts by the TEI.  See Glen Worthey&#039;s presentation at the DLF Spring Forum 2008: http://www.diglib.org/forums/spring2008/2008springprogram.htm .

7. Remix &amp; play with it: You can do this with the OCR text.</description>
		<content:encoded><![CDATA[<p>A debate on electronic text versus page images is definitely worth having, but I think the comparison is muddled in this posting.  While I work in electronic publishing, I&#8217;ll advocate page images here since I fully agree that providing broad access is more important than producing hard-crafted masterpieces of electronic text.</p>
<p>Nearly all projects which scan pages of text &#8212; from venerable projects like Making of America to more recent ones like Google Book Search &#8212; use OCR software on these images.  OCR accuracy rates vary but are well above 99% for contemporary typefaces.  So when you include these projects in the page-image category, you find that you can do nearly everything you can do with electronic text:</p>
<p>1. Read it: If you have a PDF of page images (such as you can download through Google Books for public-domain works), you can use a process like this to view on an Amazon Kindle: <a href="http://www.mobileread.com/forums/showthread.php?t=18968" rel="nofollow">http://www.mobileread.com/forums/showthread.php?t=18968</a> .  As the corpus of digitized works grows, expect methods like this to be made more user-friendly.</p>
<p>2. Copy and paste it: Many page-image projects allow you to view the OCR of content.  While there will be OCR errors and possibly a loss of formatting, it&#8217;s not really such a bad start.</p>
<p>3. Search it: Searching on full text uses the OCR text, and searching on the metadata uses the any metadata provided in the system.  As you point out, Google Books lets you find items quite easily.</p>
<p>4. Build a personal collection: You can download PDFs of pages images just as easily as other electronic text formats, though these PDFs may not be searchable the way electronic text is.</p>
<p>5. Share it: PDFs of page images can be shared as easily.</p>
<p>6. Analyze/visualize/mine it: You can do word-count analysis on the OCR text.  What you can&#8217;t do is take advantage of any highly structured markup, but there are few sources of uniformly encoded data that allows you to do this kind of searching anyway, despite good efforts by the TEI.  See Glen Worthey&#8217;s presentation at the DLF Spring Forum 2008: <a href="http://www.diglib.org/forums/spring2008/2008springprogram.htm" rel="nofollow">http://www.diglib.org/forums/spring2008/2008springprogram.htm</a> .</p>
<p>7. Remix &amp; play with it: You can do this with the OCR text.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
