CAS-IT Data Services

Helping University of Oregon faculty, students, and staff with their data needs. Come to office hours to learn more!

Why Should I Care About Optical Character Recognition (OCR)?

| 0 comments

OCR is an important tool in the digital humanities. It gives us the ability to use digital tools to analyze written text.

What does that mean?
Let’s say you are Indiana Jones (hey, why not?), and you are exploring an underground castle. As you walk through the deep tunnels, you find an amazing library! One of the texts in the library is a never before known Shakespearean play that Shakespeare himself wrote for the the king of that underground castle. Because you’re Indiana Jones, you believe that there are very important clues in this text. So, you take the book back to your laboratory and decide that you want to analyze the number of times certain words are printed so you canĀ figure out your next step. Counting the words by hand, however, is not an option because the world might end in 48 hours if you do not solve the puzzle! Luckily, a beautiful scientist works in your lab and she has a scanner and some OCR software. First, she scans the play into a .pdf. Next, she uses OCR software to turn the file into a digitized, searchable document. Then, with her coding and programming knowledge, she converts the messy, digitized copy into a workable document. Finally, she analyzes the text! And boom… you’ve solved your puzzle!

Now, even if you are not Indiana Jones, OCR can be a great tool. For example, many records in economic history have not yet been digitized. If you wanted to do any sort of calculations with the numbers, you would need them in a excel document, which is something OCR can do. Or, some lesser known texts and older newspaper articles are not on the web. For these, you could also use OCR to digitize them and then analyze different components.

I encourage those interested in learning more about OCR come to the data services lab office hours for guidance on where to start!

Office hours are: Monday 12:30 – 3:30, Tuesday 11:30 – 3:00, Thursday 11:30 – 1:30

Leave a Reply

Required fields are marked *.


Skip to toolbar