When Data Export Fails

Kaspars Ozols 23.08.2015 17.00.00

The problem

To export all the pages in one go, I ran export on feature root page. Unfortunately, this process always ended with this error:

[Exporting content 2069] Failed to export property MainBody, exception: Invalid URI: The Uri string is too long.

At first glance, the error message seemed to be very informative and even contained page ID and page property name. In the end it turned out to be very misleading exactly because it was so detailed.

I looked up page by Id from error message and expected to see something interesting in MainBody property. Actually there was nothing interesting. To be more precise, there was nothing at all and completely empty. I even checked if the page had some older versions with content in MainBody, but only this single page version existed.

At this point I started to scratch my forehead as I could not find anything wrong with this page. As I didn't want to spend any more time on this issue, I thought that it will be faster to re-create this single page manually. So, I just deleted this page and ran Export data operation again.

Bam! The same message occured, but this time with different page Id:

And again - I checked page by Id and found that MainBody property was empty. Obviously, the reason was not the pages in the error message but something else. Error message was just wrong and misleading.

Digging Deeper

My next step was to look into EPiServer log file for more details and I found full error message with call stack:

15-08-14 13:20:38,070 [51] ERROR EPiServer.Core.Transfer.TransferLogger: Export/import Exception: 
System.UriFormatException: Invalid URI: The Uri string is too long.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.UriBuilder..ctor(String uri)
at EPiServer.UrlBuilder.Init(String url)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.ExtractPermanentLink(FragmentParserContext context, AttributeFragment attribute, String permanentLink, String prefix, String suffix)
at EPiServer.Core.Transfer.UrlForExportFragmentHandler.ParsePossibleLink(FragmentParserContext context, AttributeFragment attribute, String attributeValue, String prefix, String suffix)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.TryToExtractLinkFromAttribute(FragmentParserContext context, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.ParseAttribute(FragmentParserContext context, ElementFragment parentFragment, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseAttribute(FragmentParserContext context, ElementFragment fragment, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseAttributes(FragmentParserContext context, ElementFragment fragment)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseStaticElement(FragmentParserContext context, ElementFragment element)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseElement(FragmentParserContext context, ElementFragment element)
at EPiServer.Core.Html.StringParsing.FragmentParser.ProcessFragments(FragmentParserContext context, Boolean returnOnEndElement)
at EPiServer.Core.Html.StringParsing.FragmentParser.Parse(String html, FragmentParserMode parserMode, Boolean evaluateHash)
at EPiServer.Core.Transfer.PropertyXhtmlTransform.ExportEventHandler(Object sender, TransformPropertyEventArgs e)
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at EPiServer.Enterprise.DataExporter.OnExportProperty(TransformPropertyEventArgs e)
at EPiServer.Core.Transfer.ContentTransfer.ExportProperties(RawContent contextContent, RawProperty[] properties, String masterLanguage)
System.UriFormatException: Invalid URI: The Uri string is too long.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.UriBuilder..ctor(String uri)
at EPiServer.UrlBuilder.Init(String url)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.ExtractPermanentLink(FragmentParserContext context, AttributeFragment attribute, String permanentLink, String prefix, String suffix)
at EPiServer.Core.Transfer.UrlForExportFragmentHandler.ParsePossibleLink(FragmentParserContext context, AttributeFragment attribute, String attributeValue, String prefix, String suffix)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.TryToExtractLinkFromAttribute(FragmentParserContext context, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.UrlFragmentHandler.ParseAttribute(FragmentParserContext context, ElementFragment parentFragment, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseAttribute(FragmentParserContext context, ElementFragment fragment, AttributeFragment attribute)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseAttributes(FragmentParserContext context, ElementFragment fragment)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseStaticElement(FragmentParserContext context, ElementFragment element)
at EPiServer.Core.Html.StringParsing.FragmentParser.ParseElement(FragmentParserContext context, ElementFragment element)
at EPiServer.Core.Html.StringParsing.FragmentParser.ProcessFragments(FragmentParserContext context, Boolean returnOnEndElement)
at EPiServer.Core.Html.StringParsing.FragmentParser.Parse(String html, FragmentParserMode parserMode, Boolean evaluateHash)
at EPiServer.Core.Transfer.PropertyXhtmlTransform.ExportEventHandler(Object sender, TransformPropertyEventArgs e)
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at EPiServer.Enterprise.DataExporter.OnExportProperty(TransformPropertyEventArgs e)
at EPiServer.Core.Transfer.ContentTransfer.ExportProperties(RawContent contextContent, RawProperty[] properties, String masterLanguage)

The call stack did not give a clue what went wrong right away. At least now I knew in what assembly and what class error occured.

Digging Even Deeper

It was time to bring out the big guns. I decompiled EPiServer.dll file using tool called Reflector Pro. Using this great tool I was able to generate PDB files and debug through EPiServer code to see what was going on.

I sat breakpoint in ParsePossibleLink method and ran export process again. Now I was able to go through all the links being parsed. It didn't take long before I finally found the culprit.

Debug, ParsePossibleLink, attributeValue=""data:image/png;base64,..."

Eureka!

Once I saw that link being parsed started with "data:image/png;base64,..." it hit me like a ton of bricks. Apparently there was inline image somewhere which EPiServer export could not handle. Inline images can be quite big in size and there is no wonder that UrlFragmentHandler which is designed to work with HTTP URL's could not handle them.

Now when I knew what to look for, it was easy to find problematic content. Page with problematic content had no connection to the pages reported in initial error message and indeed contained some inline images. In TinyMCE they looked exactly as regular images, except the Image URL.

Inline image

After talking with editors it turned out that they had copied some of the content directly from existing Word document. This content contained text together with some images. Content was copied using LibreOffice. By default when you copy image content in LibreOffice it is embedded using "data:" URI schema. This is not obvious as when it is pasted in TinyMCE and it looks just like regular image from editors perspective.

Why this is bad and how to fix this?

Data export failure is not the only reason why you should fix this. There are multiple reasons why inline images are bad in general - most important reasons are that they affect page load time (as image data is downloaded together with HTML markup) and image would never be cached. Inline images are appropriate only for small icons where extra request would be more expensive compared to extra bytes in markup. But this is another story - at this point I had to fix data export and not to concentrate on optimizations.

As it happens in many cases, problem was much more complex than the solution - I just had to upload all the images once again and replace inline images with their uploaded image counterparts. 

After fixing all the images, I finally got this joyful message:

Export completed without errors or warnings

I have reported this to EPiServer support and they have added this issue to the bug list: http://world.episerver.com/support/Bug-list/bug/129012