Archive

Archive for July, 2015

Script to recrawl all crawl errors

I’m currently working on a document management project, which holds 10+ million documents. During a full crawl we had a temporary network issue, which resulted in 340.000 crawl errors. I didn’t want to do a new full crawl again, since the full crawl did finish with all documents. Instead, I want those items to be picked up in the next incremental crawl. Using Central Administration you can select the option “Recrawl the item in the next crawl” for each item which caused an error, but I obviously didn’t want to manually select this option for all errors.

To automate this, I’ve created a PowerShell script which can list the errors, but can also mark all errors automatically for the recrawl. The explanation of the script can be found in the comments of the script.

#——————————————————————————
# Provide parameters
#——————————————————————————
param (
   # Name of the search service application is mandatory
   [string] $SearchServiceApplicationName = $(throw “Please specify a search service application”),
  # By default, use all available content sources
   [string] $ContentSourceName = “”,
   # By default only a list of the errors is shown
   [switch] $RecrawlErrors = $false
)

#——————————————————————————
# Ensure the SharePoint PowerShell Snapin is loaded
#——————————————————————————
if ((Get-PSSnapin “Microsoft.SharePoint.PowerShell” -ErrorAction SilentlyContinue) -eq $null) {
    Add-PSSnapin “Microsoft.SharePoint.PowerShell”
}

#——————————————————————————
# Set some constant values
#——————————————————————————
# The id of the error stating a document will be processed in the next crawl
[int] $errorIdRetryNextCrawl = 437
# The number of documents which should be retrieved per batch from the ssa
[int] $batchSize = 1000
# 2 stands for Errors
[int] $errorLevel = 2

#——————————————————————————
# Retrieve the seach service application and crawl log
#——————————————————————————
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity $SearchServiceApplicationName
$crawlLog = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $ssa

#——————————————————————————
# Retrieve the content source for which the errors should be loaded
#——————————————————————————
# Default use all content sources
[int] $contentSourceId = -1

# If a content source is provided, determine the ID
if([string]::IsNullOrEmpty($ContentSourceName) -eq $false) {
    write-host “Retrieving content source with the name $ContentSourceName… ” -NoNewline
    $contentSource = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Identity $ContentSourceName -ErrorAction SilentlyContinue

    if($contentSource -eq $null) {
       write-host “Invalid content source provided” -ForegroundColor Red
      return
   }
    else {
       $contentSourceId = $contentSource.Id
write-host “Content source found” -ForegroundColor Green
    }
}
else {
   write-host “No content source provided, all available content sources will be used”-ForegroundColor Yellow
}

#——————————————————————————
# Process the crawl errors per error type
#——————————————————————————
write-host “”
write-host “Checking errors from the crawl log”
$crawlLog.GetCrawlErrors($contentSourceId, 1) | ForEach-Object {
    write-host ([string]::Format(“- {0}: {1}”, $_.ErrorCount, $_.ErrorMessage ))

    # Enable recrawl of errors for all errors except the recrawl on next crawl error
    if($RecrawlErrors -and $_.ErrorID -ne $errorIdRetryNextCrawl) {
       write-host “`t- Marking the errors for recrawl on next crawl”
# Get the first batch
       $processedItems = 0
       $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

       DO{
          write-host ([string]::Format(“`t`t – Processing batch {0}/{1}… “, $processedItems, $processedItems + $batchSize)) -NoNewline
   
# Recrawl the errors
          $errors | ForEach-Object {
          $crawlLog.RecrawlDocument($_.FullUrl) | Out-Null
       }
       write-host “done” -ForegroundColor Green
       $processedItems += $batchSize

       # Get the next batch
   $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

    }
    while ($errors -ne $null)
       write-host “”
    }
}

Advertisements

Open PDF documents in your client application

July 2, 2015 1 comment

When working with PDF files in SharePoint, most of the time these have to be opened in the browser and most of the time that works. That is because an Adobe plugin within your browser checks if the file which is returned has a content type of application/pdf and if so, it opens the document within your browser.

For one of my customers this was not what they wanted. The wanted the option to open multiple PDF documents and show them next to each other on their screen (not in different browser windows). You can disable the Adobe browser plugin, but this will impact all PDF’s you download, also from other sources (Internet etc.)

Okey, so how did we fixed this. The most important part is that the Adobe plugin checks the content type (the MIME type) returned. So to start, we need to change this one… Perform these steps on ALL of your front-end servers:

  1. Open Internet Information Services.
  2. On the GLOBAL level, navigate to the MIME types.
    The PublishingHttpModule handles the authorize request and also looks up the MIME type, this will be done based on the global settings. This means this method will work for all web applications within your farm, if you want it or not.
  3. Find the MIME type for the extension pdf.
  4. Change the MIME type to application/pdf2.
  5. Perform an IISRESET.

These actions will prevent the PDF document to open in the browser, but instead it will show you the save dialog (which I don’t want):PDF Download Dialog
Note: This dialog will only be shown for documents which haven’t been opened before. For other documents, you will have to clear your local browser cache!

If you want the document to open automatically within the Adobe Reader or Writer, or whichever application you use to open PDF files, this can be achieved by updating your registry. This can of course also be set by a policy for your entire company. The following steps can be made to update your registry to automatically open your PDF application:

  • Start regedit on your client (or create a policy)
  • Navigate to HKEY_CURRENT_USER\Software\Microsoft\Windows\Shell
  • Navigate to AttachmentExecute or create this key if it doesn’t exist yet.
  • Navigate to {0002DF01-0000-0000-C000-000000000046} or create this key if it doesn’t exist yet.
  • Create a new Binary Value with the name AcroExch.Document.11

Note: The value of the name of the binary value can differ with your company. The AcroExch.Document.11 is used for the Adobe Acrobat Reader. To check your value open your command prompt and execute the command assoc.[extension], so in this case assoc.pdf

That’s it! All PDF documents stored within your SharePoint environment will now be opened within your client application.

Categories: Environment, SharePoint
Ben Prins

What I want to remember about SharePoint

blog.frederique.harmsze.nl

my world of work and user experiences

Bram de Jager - Coder, Speaker, Author

Office 365, SharePoint and Azure