Need to parse PDF Document inner text

    Hi everybody

    i need to parse PDF files to extract its inner string to be indexed and searchable so could anyone help me to find a compnent or a code to make that avilable...


    I'm no expert, but this article should get you started:

    I created a program that does this and will also rename the contents of the PDF by the text parsed from the PDF.

    It was actually designed to OCR and file tiff images by Parsed Text; however an engineering company had PDF's that they were going to convert to tiff images so they could OCR them and parse the text. So instead of doing it that way and have a chance of an OCR error, I built in the ability to extract the PDF text and parse it. There is options to use preserve formating when parsing the extracted text or just pull it out with out formating. I noticed the code project mentioned above does form fields. this utility will pull all text data not just what is in the form fields. It's output is a standard csv file. You can learn more about it at

