SharePoint Get Number of Pages in each Document in library

18/10/2014

This blog post is about one of those questions you get asked where you wonder is that possible? It’s pretty useless, and your time would probably have been spent better than trying to figure out if it’s possible, but nevertheless you end up investigating it to prove to yourself that it can be done. The question that I answer in this post, is: “Can we count how many pages that are in each individual document in a SharePoint document library in an automated way to get the total sum, either with PowerShell or JavaScript”.

The short answer to the question is of course, yes, otherwise this blog post would not have seen the light of day, the long answer, is the rest of the post.

My initial thought when I got the question from one of my colleagues, who got it on behalf of a client (I still wonder what the use case is), was it must be pretty simple, all we have to do is to use Office Web Apps (OWA) preview service as it gives us the total number of pages in a document. I knew the information was available to the end users directly in SharePoint, so I just needed to find a way to automate it to get the total for all documents.
pagecount

Unfortunately it quickly turned out that calling the OWA for this information is not really meant to be automated, at least not yet.

First step, figuring out how the preview pane works

I quickly disregarded solving this problem with PowerShell, and went for a JavaScript solution. My first mission was to get an overview of how the preview pane interacts with the OWA server.
When a user clicks the “…” next to a document, a lot of things goes on if you inspect the requests in something like fiddle, for my investigation two requests are of particular interest.

First a request to WopiFrame.aspx on your local sharepoint tenant (I tested this in Office 365 on a developer site, but I believe the concepts should be the same onprem and on other plans).

https://.sharepoint.com/_layouts/15/WopiFrame.aspx?sourcedoc={08b71731-22cc-45b2-81cf-54881d903a14}&action=interactivepreview&wdSmallView=1

After this request there is a request to Microsoft’s OWA server

https://euc-word-view.officeapps.live.com/wv/docdatahandler.ashx?WOPIsrc=https%3A%2F%2F%2Esharepoint%2Ecom%2F%5Fvti%5Fbin%2Fwopi%2Eashx%2Ffiles%2F08b7173122cc45b281cf54881d903a14&access_token=&access_token_ttl=1413691112224&z=%2522%257B08B71731%252D22CC%252D45B2%252D81CF%252D54881D903A14%257D%252C1%2522&type=png&o15=1&ui=en-US&PdfMode=1

In both requests should be replaced by the name of your tenant. And I have removed the from the second request as it is kinda long.

The WopiFrame.aspx page is interesting because it has an input element that contains the access token that we need to request the preview information from the OWA server.
accesstoken
I did some reflection of the code behind of the page, but unfortunately all the methods needed for generating the access token are internal, so it’s probably not meant to be used in your own code. The code however does reveal that the auth flow is an OAuth2 flow, like used elsewhere in Office 365. If you are interested in the code, take a look at Microsoft.SharePoint.Utilities.SPWOPIHost class in particular the GetAccessToken methods in the Microsoft.SharePoint.dll.

Besides the access token, the WopiFrame.aspx also contains the access_token_ttl that we need to form the request to the OWA server. These two pieces of information are the important information we need to request the preview information, as the rest of the parameters in the url can be obtained rather easily. Lets run through the parameters.

Parameter Explanation
WOPIsrc A pointer to the document you want the OWA server to return information about. The url is URL encoded, and it links to a special wopi handler that needs the UniqueID of the file that we want to process. https://.sharepoint.com/_vti_bin/wopi.ashx/files/. In my example the unique file Id is 08b7173122cc45b281cf54881d903a14
access_token The access token, we give this token so that the OWA server can access files in our sharepoint server that would otherwise be inaccessible to the 3rd party. The only reliable way of obtaining it that I have found is by reading the WopiFrame.aspx body.
access_token_ttl The life time of the access token, in SharePoint online the life time is 10 hours.
z I have not been able to figure out what this parameter is good for, but as far as I can tell, it’s not needed the request works just fine without it
type As far as I can tell, this tells OWA what type of preview image to generate
o15 Unknown, but looks like some sort caller identifier
ui The UI language
PdfMode Optional paramter that is set to =1 when the file is a pdf.

Wrapping it up in a Chrome Extension

With the above understanding of the OWA integration, I was able to build a quick and dirty Chrome Extension that can calculate the number of pages for each document in a document library.

The extension uses the SharePoint REST API to pull all files UniqueID of a document library, requesting the WopiFrame.aspx for each individual file with the correct query string. With the access token in hand the extension goes off and calls the Office 365 OWA server on euc-word-view.officeapps.live.com for each file to get the preview information. Here’s an example of how the preview return by the docdatahandler.ashx looks like:
[xml]
<?xml version="1.0" encoding="utf-8"?>
<docdata>
<document pages="1" hasIds="false" hasComments="false" dypInch="294912" dxpInch="294912">
<pageset count="128" height="3244032" width="2506752" />
</document><status>Success</status><dialog><title /><description /><errorId>00000000-0000-0000-0000-000000000000, 20141018181753</errorId></dialog>
</docdata>
[/xml]

The last step of calling the Office 365 OWA server is why I did it as a Chrome Extension as that request would have been blocked due to cross domain restriction, if I had just done it as a SharePoint App.

The last thing I would like to point attention to is that calling the SharePoint REST API without any accept header now returns the minimal odata response. That is good if you want to save bandwidth, but per default it doesn’t return the UniqueID on the SPFile, so I had to use odata=verbose to get the information I needed.

If you are interested in seeing how I hacked the app together, feel free to take a look at my github repo. Beware, that this App just a prototype, as I got kinda fed up with the whole idea, once I figured out that it was possible, so I didn’t take the time to make it into a nice product.