Archive

Posts Tagged ‘PowerShell’

Script to recrawl all crawl errors

I’m currently working on a document management project, which holds 10+ million documents. During a full crawl we had a temporary network issue, which resulted in 340.000 crawl errors. I didn’t want to do a new full crawl again, since the full crawl did finish with all documents. Instead, I want those items to be picked up in the next incremental crawl. Using Central Administration you can select the option “Recrawl the item in the next crawl” for each item which caused an error, but I obviously didn’t want to manually select this option for all errors.

To automate this, I’ve created a PowerShell script which can list the errors, but can also mark all errors automatically for the recrawl. The explanation of the script can be found in the comments of the script.

#——————————————————————————
# Provide parameters
#——————————————————————————
param (
   # Name of the search service application is mandatory
   [string] $SearchServiceApplicationName = $(throw “Please specify a search service application”),
  # By default, use all available content sources
   [string] $ContentSourceName = “”,
   # By default only a list of the errors is shown
   [switch] $RecrawlErrors = $false
)

#——————————————————————————
# Ensure the SharePoint PowerShell Snapin is loaded
#——————————————————————————
if ((Get-PSSnapin “Microsoft.SharePoint.PowerShell” -ErrorAction SilentlyContinue) -eq $null) {
    Add-PSSnapin “Microsoft.SharePoint.PowerShell”
}

#——————————————————————————
# Set some constant values
#——————————————————————————
# The id of the error stating a document will be processed in the next crawl
[int] $errorIdRetryNextCrawl = 437
# The number of documents which should be retrieved per batch from the ssa
[int] $batchSize = 1000
# 2 stands for Errors
[int] $errorLevel = 2

#——————————————————————————
# Retrieve the seach service application and crawl log
#——————————————————————————
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity $SearchServiceApplicationName
$crawlLog = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $ssa

#——————————————————————————
# Retrieve the content source for which the errors should be loaded
#——————————————————————————
# Default use all content sources
[int] $contentSourceId = -1

# If a content source is provided, determine the ID
if([string]::IsNullOrEmpty($ContentSourceName) -eq $false) {
    write-host “Retrieving content source with the name $ContentSourceName… ” -NoNewline
    $contentSource = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Identity $ContentSourceName -ErrorAction SilentlyContinue

    if($contentSource -eq $null) {
       write-host “Invalid content source provided” -ForegroundColor Red
      return
   }
    else {
       $contentSourceId = $contentSource.Id
write-host “Content source found” -ForegroundColor Green
    }
}
else {
   write-host “No content source provided, all available content sources will be used”-ForegroundColor Yellow
}

#——————————————————————————
# Process the crawl errors per error type
#——————————————————————————
write-host “”
write-host “Checking errors from the crawl log”
$crawlLog.GetCrawlErrors($contentSourceId, 1) | ForEach-Object {
    write-host ([string]::Format(“- {0}: {1}”, $_.ErrorCount, $_.ErrorMessage ))

    # Enable recrawl of errors for all errors except the recrawl on next crawl error
    if($RecrawlErrors -and $_.ErrorID -ne $errorIdRetryNextCrawl) {
       write-host “`t- Marking the errors for recrawl on next crawl”
# Get the first batch
       $processedItems = 0
       $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

       DO{
          write-host ([string]::Format(“`t`t – Processing batch {0}/{1}… “, $processedItems, $processedItems + $batchSize)) -NoNewline
   
# Recrawl the errors
          $errors | ForEach-Object {
          $crawlLog.RecrawlDocument($_.FullUrl) | Out-Null
       }
       write-host “done” -ForegroundColor Green
       $processedItems += $batchSize

       # Get the next batch
   $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

    }
    while ($errors -ne $null)
       write-host “”
    }
}

Advertisements
Ben Prins

What I want to remember about SharePoint

blog.frederique.harmsze.nl

my world of work and user experiences

Bram de Jager - Coder, Speaker, Author

Office 365, SharePoint and Azure