Crawling PDF content within your MOSS 2007 environment requires a bit of configuration. I came across two posts from Microsoft that describe what needs to be done using Adobe Reader v.8.
-
Indexing pdf documents with Adobe Reader v.8 and MOSS 2007 [
Filter Central]
-
Of course, Adobe has since released Adobe Reader v.9. While the steps are essentially the same, I wanted to post my steps for reference (or until Adobe Reader version 10 or MOSS 200x is released).
-
Download and install Adobe Reader v.9.
-
Navigate to 'Configure Search Settings' found in your Shared Services Administration application.
-
Click on 'Manage Files Types' to show the file types included in the content index, then add pdf as a new file type
-
The PDF Filter registry key value is still {E8978DA6-047F-4E3D-9C78-CDBE46041603} under Adobe Reader v.9
-
Replace HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf with value in Step 4
-
Replace HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf with value in Step 4
-
Update your PATH environment variable to include 'C:\Program Files\Adobe\Reader 9.0\Reader;' or the installation directory used in Step 1
-
Recycle the search service from the command prompt - 'net stop osearch' then 'net start osearch'
-
Recycle IIS from the command prompt (not sure if this is required, but I found it handy) - 'iisreset /noforce'
-
-
Save the icon to 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images' folder
-
Edit DOCICON.XML found in C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\XML
-
Add <Mapping Key="pdf" Value="NameofIconFile.gif"/> inside the <ByExtension> node and save the document
-
Perform a full crawl on your content sources