I don't know how well proper names are preserved though (probably one of the most urgent search criteria) since the dictionary correction cannot be applied. There also are some dozen handwritten pages that need to be manually processed. The automated output of these pages is just garbage (cf. set 5). This still needs to be done.
The resulting PDF files are named like the originals with an "_ocr" suffix for easy sorting and comparison.
I believe that copying and processing these files is ok, since the defenselink website specifically claims that "Information presented on DefenseLINK is considered public information and may be distributed or copied unless otherwise specified." If somebody feels their privacy compromised though, please drop me a short note and I'll remove the pages in question.
I hope the files are of some use.
Last change March 8th, 2006