PDF indexing in Sharepoint 2007
January 24, 2008 – 14:22 by Hannes Van de VelIntroduction
By default the SharePoint 2007 Search indexed only the meta data of a PDF document. By installing and configuring a PDF IFilter the Search will also index the contents of the PDF document. This allows users to find documents based on text inside the document. This process is called full text indexing.
[Indexing Server]: the server(s) in the SharePoint Farm that has/have the “Indexing” Role assigned. In a small farm this can be a single server for all roles.
[Web Front End Server]: the server(s) in the SharePoint Farm that has/have the “Web Front End” Role assigned. In a small farm this can be a single server for all roles.
Windows SharePoint Services 3.0
[Indexing Server]
- Install the PDF IFilter (see below for a list of available IFilters)
- Add the .pdf file type to the index list:
- Open the Registry Editor (Start > Run > regedit)
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<GUID>\Gather\Search\Extensions\ExtensionList
- Add a new String Value
- Value name: 38
- Value data: pdf
- [This step only applies to 64 bit servers (Foxit x64 PDF IFilter)]
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
- Change the (Default) key value
- Old value: {987F8D1A-26E6-4554-B007-6B20E2680632}
- New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop spsearch
- net start spsearch
- cd “C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN”
- stsadm.exe –o spsearch -action fullcrawlstop
- stsadm.exe –o spsearch -action fullcrawlstart
[Web Front End Server]
- Copy the ICPDF.GIF (
) file to “C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images”
- Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
- Add an entry for the .pdf extension
<Mapping Key=”pdf” Value=”icpdf.gif”/>
- Add an entry for the .pdf extension
Microsoft Office SharePoint Server 2007
[Indexing Server]
- Install the PDF IFilter (see below for a list of available IFilters)
- Add the .pdf file type to the index list:
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
- Add a new file type pdf
- [This step only applies to 64 bit servers (Foxit x64 PDF IFilter)]
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
- Change the (Default) key value
- Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
- New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop osearch
- net start osearch
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing PDF files
[Web Front End Server]
- Copy the ICPDF.GIF (
) file to “C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images”
- Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
- Add an entry for the .pdf extension
<Mapping Key=”pdf” Value=”icpdf.gif”/>
- Add an entry for the .pdf extension
Available IFilters
- free (always good !)
- single threaded (affects indexing performance)
- 32 bit only (applies to the [Indexing Server])
- free for desktops, servers require a license
- multi threaded, multi CPU (affects indexing performance, improved performance)
- 32 bit and 64 bit (IA64 currently being tested, applies to the [Indexing Server])
Conclusion
Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your PDF document’s contents indexed by the SharePoint Search.
One Response to “PDF indexing in Sharepoint 2007”
This is great exactly what I needed was struggling trying to get the last piece of setting up PDF indexing on WSS for a client.
Thanks
By Michael Ryan on May 7, 2008