Optical Character Recognition

Discussion in 'The Barracks' started by von Poop, Oct 14, 2009.

  1. roodymiller

    roodymiller Senior Member

    quicker just to re-type!!
     
  2. adamitshelanu

    adamitshelanu Steve in Raleigh, NC

    Epson Perfection V300 Photo bundles ABBYY FineReader 6.0 Sprint (not the full professional)...I bought 2 months ago for $80, and on sale now for $60 (CompUSA.com).

    I'm pleased with the scanning and with the OCR. I like being able to save the scanned page as it looked in a PDF. It is searchable. Also you can highlight in the PDF and then paste into a text file (or OCR to text direct which is not as "orderly").

    This software warns if you scanned below 300 dpi. Tries to auto rotate for upside-down scans.

    I scanned a certain book that used to give my old HP scanner/OCR trouble. The scan was upside down and tilted 45 degrees.

    The ABBYY OCR could not read it or adjust the rotation.

    So I used IrfanView(.com free) to do a fine tuned rotation and then ABBYY OCR'd it perfectly, reading from the TIF file.
     
  3. slaphead

    slaphead very occasional visitor

    I have an ooooold v5 of Abbyy. One of the best yet also the worst feature is that if there is something it doesnt "get" and it thinks it might be an image it saves it as an inline image. At first glance the OCR'd page looks great but if you then try to reuse what you think is pure text you end up with missing buts where the picture of the letter(s) once was/were. In "Andys guide to the Universe" I'd put it down a "mostly useful"
     
  4. OpanaPointer

    OpanaPointer Pearl Harbor Myth Buster

    I have an ooooold v5 of Abbyy. One of the best yet also the worst feature is that if there is something it doesnt "get" and it thinks it might be an image it saves it as an inline image. At first glance the OCR'd page looks great but if you then try to reuse what you think is pure text you end up with missing buts where the picture of the letter(s) once was/were. In "Andys guide to the Universe" I'd put it down a "mostly useful"
    I use FineReader 9.0. It's much better than 8.0, the only previous version I've worked with. Between the two I've put maybe 100,000 pages of text online in the last 2.8 years. Never, ever, trust the program to read anything correctly, even on pristine copies. The program will guess if it can't decide on a character, and its guesses can be wildly wrong.

    FineReader 10 is out, btw. The current price is around $100.
     
    von Poop likes this.
  5. Recce_Mitch

    Recce_Mitch Very Senior Member

    I'm using Read Iris 11 at the moment. I am finding it very accurate and a lot faster than I could type. It also reads other languages.

    Cheers
    Paul
     
  6. OpanaPointer

    OpanaPointer Pearl Harbor Myth Buster

    I've been asked about what would be good to add to the internet.

    First, do what you're interested in, that way it never gets old. Second, check your material to see if is already online. If you have something that doesn't show up you have a chance to add to the body of knowledge. You might be sitting on the only surviving copy of something, and can change that to the entire world having a chance to look at it.

    There are two ways to go about putting material online. PDF makes a faithful copy and is fast, but the files can't be properly searched in many cases, especially with "mixed language files" the the Handbook on German Military Forces. They're also much larger than the same information in HTML.

    HTML files take more work, but you have the chance to make sure that the text is correct and they're readable by search engines, so they can be found more easily. And you can clean up some ugly files, like the third generation memographs of the Final Report on Operation on Guadalcanal. And you can read the document as you go along, and maybe provide links to other material in the process.

    You can feed camera images to an OCR program, so those books that are stuck away in the back of a library and never get checked out can be made available to researchers all over the world.
     
  7. sapper

    sapper WW2 Veteran WW2 Veteran

    My typing with large hands.. Single Finger style! can only be described as Hieroglyphic bedlam. When I Look up at the screen, all I See is a bloody awful mess. So for quite a long time, I have been trying out Speech To Text programmes...Dragon for example, NO matter how I try, I have never found one yet that works good enough to use at all.

    I tried the posh voice and damn near everything else... All to no avail
    Anyone got any ideas?
    sapper
     
  8. Swiper

    Swiper Resident Sospan

    I didn't know how to title this...

    I know some of you here use software that reads the words from photographs (type) and copies them for Word etc?

    What is it and how much is it? Finally joined the photographing War Diary crowd and will slowly transcribe them...
     
  9. von Poop

    von Poop Adaministrator Admin

    Aye Swiper,
    I merged your query onto this thread.
    I know many are rather keen on more scanned in War Diaries.
    Hopefully it'll help a bit.
    (took me a while to pin down the terminology too ;) )
     
  10. Swiper

    Swiper Resident Sospan

    Brill! Much reading to be had ;)
     
  11. Jan7

    Jan7 Senior Member

    I didn't know how to title this...

    I know some of you here use software that reads the words from photographs (type) and copies them for Word etc?

    What is it and how much is it? Finally joined the photographing War Diary crowd and will slowly transcribe them...


    Swiper, these matters are solved for an aplication named Abby Fine Reader. I use in photos of old documents of the NA in Kew via ADM 199, and reach high percentages of right transcription. See the first and sucessive pages of this thread fot more details.



    Jan.
     
  12. Verrieres

    Verrieres no longer a member

  13. Hebridean Chindit

    Hebridean Chindit Lost in review... Patron

    I've had my Epson Perfection 1200 PHOTO for far to many years - still works well and the negative scanner (updated to a dedicated budget jobby) still functions efficiently and I still use it for larger or odd sized negs - I bought a budget OCR some time back too: Texbridge Pro 11 - okay for its time but clunky...

    So, research time... Now, mention of the ABBYY product has sprung up on this thread in association with the Epson V330 (sorry, but read some poor reviews; not just Amazon - negative scan issues for bulk users) and others, but the Epson V33 has had fairly favourable reviews and part of the package includes a cut-down version of a recent ABBYY product - Fine Reader 9.0 Sprint...

    I have far too much to scan into my system and the budget is tight but Amazon had this at a very good price, so...

    I scanned a 150 page book earlier today, two pages at a time, and as far as I can tell, faultlessly - the only quirk that is not clear with the info is that you must close down the scan to trigger the conversion - I scanned in groups of 30 pages and this cleared through within scant minutes - you can set it to pause for "X" seconds as you pick up the book and turn the page - a nice feature is the sideways opening lid - I can live with the most peculiar positioning of the USB port and the power jack-point (front right, adjacent to the controls) - intergrates directly with WORD (I'm running 2003-SPK3 - daughter runs Office 2010 - both running WIN7 on a Dell laptop) and can create PDF's from a text - also has a screen-direct-capture feature but not tried that yet - if I have to acquire something like that I tend to copy/paste, but it may be useful for pre-scanned images...

    All-in-all, the Epson Perfection V33 does wot it sez on the tin... ;)

    (ps - no affilation with any of these companies - just a happy user)
     
  14. Peccavi

    Peccavi Senior Member

    Does anyone have any recommendations of the best image recognition software to use with "typed reports" such as those from WW2.

    I use ABBY Professional and it is excellent at picking up printed material within modern books but will not work at all well on WW2 typewriter material even if the document in question is very legible.

    Would save a lot of typing up of parts of these old reports to use in discussions on websites such as this.

    Ideas or experiences welcome.
     
  15. Peccavi

    Peccavi Senior Member

    To be precise I meant OCR - optical character recognition
     
  16. von Poop

    von Poop Adaministrator Admin

  17. PsyWar.Org

    PsyWar.Org Archive monkey

    ABBYY have a higher level professional product called "Recognition Server": http://www.frakturschrift.com/en:products:recognition_server
    It has a built in setting for typed pages and better yet for German Fraktur (Gothic) typefaces.

    I've been using it a lot over the last few weeks on Fraktur texts and has done a great job. Haven't tried it on typed documents though to see if it is any improvement on the previous editions of ABBYY Fineprint. My free trial is just about up so can't experiment anymore.

    Also the workflow can be unusual for those not familar with this type of product.

    Basically you set up setting scripts which point to specific folders on your hard drive. Anything dropped in the relevant "in" folder gets OCR'd and sent to the "out" folder.
     
  18. Andreas

    Andreas Working on two books

    Here's what I got from scanning a random page from my Malta files, dealing with air ops.

    The original file is attached below, for comparison. This is not a bad result, considering the poor quality of the original, but one may end up spending more time cleaning the file up than would have been spent typing it up.

    All the best

    Andreas
     

    Attached Files:

  19. von Poop

    von Poop Adaministrator Admin

    Discovered last week that online OCR seems to have improved immensely.
    Eg.
    http://www.onlineocr.net

    Not tried on many documents, but it does appear to do a better job than my version of ABBY. Certainly near perfect on crisp printed text.
     

Share This Page