Script to recrawl all crawl errors

I’m currently working on a document management project, which holds 10+ million documents. During a full crawl we had a temporary network issue, which resulted in 340.000 crawl errors. I didn’t want to do a new full crawl again, since the full crawl did finish with all documents. Instead, I want those items to be picked up in the next incremental crawl. Using Central Administration you can select the option “Recrawl the item in the next crawl” for each item which caused an error, but I obviously didn’t want to manually select this option for all errors.

To automate this, I’ve created a PowerShell script which can list the errors, but can also mark all errors automatically for the recrawl. The explanation of the script can be found in the comments of the script.

#——————————————————————————
# Provide parameters
#——————————————————————————
param (
   # Name of the search service application is mandatory
   [string] $SearchServiceApplicationName = $(throw “Please specify a search service application”),
  # By default, use all available content sources
   [string] $ContentSourceName = “”,
   # By default only a list of the errors is shown
   [switch] $RecrawlErrors = $false
)

#——————————————————————————
# Ensure the SharePoint PowerShell Snapin is loaded
#——————————————————————————
if ((Get-PSSnapin “Microsoft.SharePoint.PowerShell” -ErrorAction SilentlyContinue) -eq $null) {
    Add-PSSnapin “Microsoft.SharePoint.PowerShell”
}

#——————————————————————————
# Set some constant values
#——————————————————————————
# The id of the error stating a document will be processed in the next crawl
[int] $errorIdRetryNextCrawl = 437
# The number of documents which should be retrieved per batch from the ssa
[int] $batchSize = 1000
# 2 stands for Errors
[int] $errorLevel = 2

#——————————————————————————
# Retrieve the seach service application and crawl log
#——————————————————————————
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity $SearchServiceApplicationName
$crawlLog = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $ssa

#——————————————————————————
# Retrieve the content source for which the errors should be loaded
#——————————————————————————
# Default use all content sources
[int] $contentSourceId = -1

# If a content source is provided, determine the ID
if([string]::IsNullOrEmpty($ContentSourceName) -eq $false) {
    write-host “Retrieving content source with the name $ContentSourceName… ” -NoNewline
    $contentSource = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Identity $ContentSourceName -ErrorAction SilentlyContinue

    if($contentSource -eq $null) {
       write-host “Invalid content source provided” -ForegroundColor Red
      return
   }
    else {
       $contentSourceId = $contentSource.Id
write-host “Content source found” -ForegroundColor Green
    }
}
else {
   write-host “No content source provided, all available content sources will be used”-ForegroundColor Yellow
}

#——————————————————————————
# Process the crawl errors per error type
#——————————————————————————
write-host “”
write-host “Checking errors from the crawl log”
$crawlLog.GetCrawlErrors($contentSourceId, 1) | ForEach-Object {
    write-host ([string]::Format(“- {0}: {1}”, $_.ErrorCount, $_.ErrorMessage ))

    # Enable recrawl of errors for all errors except the recrawl on next crawl error
    if($RecrawlErrors -and $_.ErrorID -ne $errorIdRetryNextCrawl) {
       write-host “`t- Marking the errors for recrawl on next crawl”
# Get the first batch
       $processedItems = 0
       $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

       DO{
          write-host ([string]::Format(“`t`t – Processing batch {0}/{1}… “, $processedItems, $processedItems + $batchSize)) -NoNewline
   
# Recrawl the errors
          $errors | ForEach-Object {
          $crawlLog.RecrawlDocument($_.FullUrl) | Out-Null
       }
       write-host “done” -ForegroundColor Green
       $processedItems += $batchSize

       # Get the next batch
   $errors = $crawlLog.GetCrawledUrls($false, $batchSize, “”, $true, $contentSourceId, $errorLevel, $_.ErrorID, [datetime]::MinValue,[datetime]::MaxValue)

    }
    while ($errors -ne $null)
       write-host “”
    }
}

Advertisements

Open PDF documents in your client application

July 2, 2015 1 comment

When working with PDF files in SharePoint, most of the time these have to be opened in the browser and most of the time that works. That is because an Adobe plugin within your browser checks if the file which is returned has a content type of application/pdf and if so, it opens the document within your browser.

For one of my customers this was not what they wanted. The wanted the option to open multiple PDF documents and show them next to each other on their screen (not in different browser windows). You can disable the Adobe browser plugin, but this will impact all PDF’s you download, also from other sources (Internet etc.)

Okey, so how did we fixed this. The most important part is that the Adobe plugin checks the content type (the MIME type) returned. So to start, we need to change this one… Perform these steps on ALL of your front-end servers:

  1. Open Internet Information Services.
  2. On the GLOBAL level, navigate to the MIME types.
    The PublishingHttpModule handles the authorize request and also looks up the MIME type, this will be done based on the global settings. This means this method will work for all web applications within your farm, if you want it or not.
  3. Find the MIME type for the extension pdf.
  4. Change the MIME type to application/pdf2.
  5. Perform an IISRESET.

These actions will prevent the PDF document to open in the browser, but instead it will show you the save dialog (which I don’t want):PDF Download Dialog
Note: This dialog will only be shown for documents which haven’t been opened before. For other documents, you will have to clear your local browser cache!

If you want the document to open automatically within the Adobe Reader or Writer, or whichever application you use to open PDF files, this can be achieved by updating your registry. This can of course also be set by a policy for your entire company. The following steps can be made to update your registry to automatically open your PDF application:

  • Start regedit on your client (or create a policy)
  • Navigate to HKEY_CURRENT_USER\Software\Microsoft\Windows\Shell
  • Navigate to AttachmentExecute or create this key if it doesn’t exist yet.
  • Navigate to {0002DF01-0000-0000-C000-000000000046} or create this key if it doesn’t exist yet.
  • Create a new Binary Value with the name AcroExch.Document.11

Note: The value of the name of the binary value can differ with your company. The AcroExch.Document.11 is used for the Adobe Acrobat Reader. To check your value open your command prompt and execute the command assoc.[extension], so in this case assoc.pdf

That’s it! All PDF documents stored within your SharePoint environment will now be opened within your client application.

Categories: Environment, SharePoint

Cross-domain errors with SharePoint Apps

When building SharePoint Apps, JavaScript can be used to communicate with your SharePoint environment. Lately I’ve got a couple of questions about how this works with CORS (Cross-Origin Resource Sharing).

The problem people faced was that SharePoint was hosted on an URL like https://mytenant.sharepoint.com while the app itself was hosted on an URL like https://myapp.whatever.com. While developing Apps for SharePoint it’s a common and best practice to use totally different domains for security purposes (app isolation).

Within SharePoint there is something called the Cross-Domain library. This is not a document library within SharePoint, but a JavaScript file (SP.RequestExecutor.js) which contains files that allow you to perform CRUD operations within SharePoint from a different domain. It basically works as a proxy.

There are plenty of examples on how this library can be used, for example the article Access SharePoint 2013 data from apps using the cross-domain library on MSDN, but they still had issues getting it to work cross-domain.

The problem is that a lot of companies have their SharePoint URL as a Trusted Site or Local Intranet zone within their browser settings, but not the URL where the app is hosted. The cross-domain calls can only work if BOTH URL’s are added to the same zone! Or… not added at all. It will not work when placed in different security zones…

Categories: SharePoint

Forcefully delete site collection

August 21, 2014 5 comments

Today I found a site collection on a customer environment which gave a completely blank page when you opened it via a browser. It didn’t gave a 404 (Not Found) error, it was just a blank page. I decided to figure out what was happening and found that during the creation of the site collection, an IISRESET had taken place. Because of this, the site wasn’t completely provisioned. Well, if it wasn’t completely provisioned, I don’t need it… Nobody could have added content.

I found out that I couldn’t remove the site using Central Administration. When you navigate to the site collection using the “Delete a site collection” page, the details (right hand site of the page) where not loaded and you cannot select the site collection. So… I wanted to delete the site using PowerShell, but this gave me an error:

PS C:\Users\macaw> remove-spsite http://dms/case/P68430
Confirm
Are you sure you want to perform this action?
Performing the operation “Remove-SPSite” on target “http://dms/case/P68430“.
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is “Y”): Y
remove-spsite : <nativehr>0x80070003</nativehr><nativestack></nativestack>
At line:1 char:1
+ remove-spsite http://dms/case/P68430
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (Microsoft.Share…mdletRemoveSite:SPCmdletRemoveSite) [Remove-SPSite], DirectoryNotFoundException
+ FullyQualifiedErrorId : Microsoft.SharePoint.PowerShell.SPCmdletRemoveSite

Apparently, the normal remove-spsite cmdlet cannot delete a site collection which is not fully provisioned, and this cmdlet doesn’t have a force flag. To forcefully delete the site collection, I used the SPContentDatabase.ForceDeleteSite method:

$siteUrl =http://dms/case/P68430
$site = get-spsite $siteUrl
$siteId = $site.Id
$siteDatabase = $site.ContentDatabase
$siteDatabase.ForceDeleteSite($siteId, $false, $false)

Create lookup field using PowerShell and CSOM

May 19, 2014 3 comments

For our projects we always try to avoid manual configurations. This is because it is a tedious and error prone process if you work with a DTAP environment. To avoid this, we also try to script as much as possible for SharePoint Online projects. Lately we worked with creating lookup fields in SharePoint online, using PowerShell and CSOM. Creating fields this way is pretty easy, but connecting lookup fields forced us to think about casting the Microsoft.SharePoint.Client.Field object to a Microsoft.SharePoint.Client.FieldLookup object.

Within CSOM this can be done by leveraging the ClientRuntimeContext.CastTo method, but… This is a generic method (object of type T). This is something which is not easily supported by PowerShell. To use this method, you can use reflection using the MakeGenericMethod method.

The full PowerShell script is provided below

#————————————————————-
# LOAD CLIENT ASSEMBLIES
#————————————————————-
$clientAssembliesFolder = “D:\ClientAssemblies”
Add-Type -Path (Join-Path -Path $clientAssembliesFolder -ChildPath “Microsoft.SharePoint.Client.dll”)
Add-Type -Path (Join-Path -Path $clientAssembliesFolder -ChildPath “Microsoft.SharePoint.Client.Runtime.dll”)

#————————————————————-
# INITIALIZE CONTEXT
#————————————————————-
[string]$siteUrl = "https://[UseYourOwn].sharepoint.com/sites/Dev"
[string]$username = “admin@[UseYourOwn].onmicrosoft.com”
[string]$password = “[UseYourOwn]”
$pwd = $password | ConvertTo-SecureString -AsPlainText -Force
$context = New-Object Microsoft.SharePoint.Client.ClientContext($siteUrl)
$credentials = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($username, $pwd)
$context.Credentials = $credentials

#————————————————————-
# LOAD CASTTO FOR LOOKUPS
#————————————————————-
$castToMethodGeneric = [Microsoft.SharePoint.Client.ClientContext].GetMethod(“CastTo”)
$castToMethodLookup = $castToMethodGeneric.MakeGenericMethod([Microsoft.SharePoint.Client.FieldLookup])

#————————————————————-
# LOAD LISTS
#————————————————————-
[string] $originaListTitle = “List1”
[string] $destinationListTitle = “List2”
$listOriginal = $context.Web.Lists.GetByTitle($originaListTitle)
$context.Load($listOriginal)
$listDestination = $context.Web.Lists.GetByTitle($destinationListTitle)
$context.Load($listDestination)
$context.ExecuteQuery() # This loads the necessary list ID

#————————————————————-
# CREATE LOOKUP
#————————————————————-
[string] $internalName = “LookupWithStaticName”
[string] $displayName = “LookupTest”
[string] $displayFieldForLookup = “Title”
[string] $lookupFieldXML = “<Field DisplayName=`”$internalName`” Type=`”Lookup`” />”
$option = [Microsoft.SharePoint.Client.AddFieldOptions]::AddFieldToDefaultView

$newLookupField
= $listDestination.Fields.AddFieldAsXml($lookupFieldXML, $true, $option)
$context.Load($newLookupField)
$lookupField = $castToMethodLookup.Invoke($context, $newLookupField)
$lookupField.Title = $displayName
$lookupField.LookupList = $listOriginal.Id
$lookupField.LookupField = $displayFieldForLookup
$lookupField.Update()
$context.ExecuteQuery()

SharePoint 2013 warm-up script

For SharePoint On Premise platforms it’s a good practice to use a warm-up script to avoid long loading times in the morning. By default IIS recycles the web application pools every night to clean up the memory and this is a good practice. Todd Klindt written a nice post about using the Invoke-WebRequest cmdlet which is available in PowerShell v3 and how to use this as basis for your warm-up script.

I used it as a basis and created the script you find below. Important notes:

  • The script will load the start page of the root site collection of every web application.
  • Different types of web templates, use different assemblies. If you want to preload all assemblies, ensure you load the different types of sites. The additionalUrls array is used for that in the script.
  • When you use multiple front-end servers, you want schedule the script on all front-end servers. Also make sure the server doesn’t use a load balancer when you are on the server itself, you can do this by updating the hosts file.

#——————————————————
# Ensure the SharePoint Snappin has been loaded
#——————————————————
if ( (Get-PSSnapin -Name “Microsoft.SharePoint.PowerShell” -ErrorAction SilentlyContinue) -eq $null ) {
    Add-PSSnapin “Microsoft.SharePoint.PowerShell”
}

#——————————————————

# Simple method to write status code with a colour
#——————————————————
function Write-Status([Microsoft.PowerShell.Commands.WebResponseObject] $response) {
    $foregroundColor = “DarkRed”
    if($response.StatusCode -eq 200) {
        $foregroundColor = “DarkGreen”
    
}
    write-host ([string]::Format(“{0} (Status code: {1})”, $response.StatusDescription, $response.StatusCode)) -ForegroundColor $foregroundColor
}

#——————————————————
# Warm-up all web applications
#——————————————————
Get-SPWebApplication | ForEach-Object {
    
write-host ([string]::Format(“WebApplication request fired for {0} [{1}]… “, $_.DisplayName, $_.Url)) -NoNewline
    
Write-Status -response (Invoke-WebRequest $_.url -UseDefaultCredentials -UseBasicParsing)
}

#——————————————————
# Since the root of web applications use different templates then other site collections, also load other sites of different
# types. This ensures their assemblies also get loaded in memory
#——————————————————
$additionalUrls = @(http://developmentserver/sites/search&#8221;,
 http://developmentserver/site/teamsite&#8221;)
$additionalUrls | ForEach-Object {
    write-host ([string]::Format(“Additional web request fired for Url: {0}… “, $_)) -NoNewline
    
Write-Status -response (Invoke-WebRequest $_ -UseDefaultCredentials -UseBasicParsing)
}

 

 

Re-activating web features within web application

One of my projects is a huge SharePoint 2013 On-Premise platform with 200.000+ (sub) sites. I’ve created a custom web template to ensure all sites are created the same way, with the same settings. A web template works very well for these environments, but when you update the template, the changes will not be made in all existing sites. The web template will only be applied when creating new sites.

I will not throw away all sites when we have new updates to re-create the sites, but I will re-active certain features to ensure the updates are applied.

The script I’m using is as followed:

#———————————————————————————————————————
# Add SharePoint PowerShell Snapin 
#———————————————————————————————————————
if ( (Get-PSSnapin -Name Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue) -eq $null ) {
    Add-PSSnapin Microsoft.SharePoint.Powershell

#———————————————————————————————————————
# Set variables 
#———————————————————————————————————————
$webApplicationUrl = http://veemssdev02&#8221;
$featureIds = @(“e4acfa03-b1e6-4eed-aeab-1bd17551aa59”,“Macaw.SP2013.Intranet.InSite_AddDefaultPage_Web”

#———————————————————————————————————————
# Reactivate features 
#———————————————————————————————————————
Get-SPWebApplication -Identity $webApplicationUrl | get-spsite -Limit all | get-spweb -Limit all | ForEach-Object {
    write-host ([string]::Format(“Testing web {0} [{1}]”, $_.Title, $_.Url))

    foreach($featureId in $featureIds) {
        $feature = $_.Features | where {$_.DefinitionId -eq $featureId -or $_.Definition.DisplayName -eq $featureId}
        if($feature -ne $null) {
            write-host ([string]::Format(“`t- Feature {0} ({1}) found. Re-enabling the feature.”, $feature.Definition.DisplayName, $feature.DefinitionId))
           write-host “`t`t- Disabling feature”
            Disable-SPFeature -Identity $featureId -Url $_.Url -Confirm:$false
            write-host “`t`t- Enabling feature”
            Enable-SPFeature -Identity $featureId -Url $_.Url -Confirm:$false -force
        
}
    
}
}

When you do not want to re-activate features, but want to enable new features, you can simply use the same code, but remove the feature check (if($feature
-ne $null)
) and the Disable-SPFeature.

Categories: PowerShell, SharePoint
Ben Prins

What I want to remember about SharePoint

blog.frederique.harmsze.nl

my world of work and user experiences

Bram de Jager talking Office 365, SharePoint and Azure

My view and thoughts on Productivity and more.