I used itextsharp library to convert the PDF files to text and then go from there. Once you have the file in text format you can then determine how to parse - that would be the structure you will see all the time for that document. In my case, each document differ by vendors, hence different parsers. That's the gist of it.