C# – Convert HTML to PDF using Pechkin (WkHtmlToPdf)

HTMLtoPDFThere are a lot of C# libraries that can be used to create PDF files but I needed one that was able to convert an HTML file (with CSS and images) into a PDF document. Searching for a solution I found good reviews for a free library called Pechkin so I decided to give it a try.

While using it I stumbled upon some little problems but in the end I was satisfied with the result hence this post where I will detail my experience with this library.

Pechkin is a .NET Wrapper for another library called WkHtmlToPdf that uses the WebKit engine to convert HTML pages to PDF… and this is a pretty good reason to give it a try. If not yet convinced, below is a description of WebKit taken from Wikipedia:

WebKit is a layout engine software component designed to allow web browsers to render web pages. It powers Google’s Chrome web browser versions up to 27, and Apple’s Safari web browser applications. As of November 2012 it has the most market share of any layout engine at over 40% of the browser market share — ahead of both the Trident engine used by Internet Explorer, and the Gecko engine used by Firefox.

It is also used as the basis for the experimental browser included with the Amazon Kindle e-book reader, as well as the default browser in the Apple iOS, Android, BlackBerry 10, and Tizen mobile operating systems.

For testing the library I created a simple web application that reads the HTML content at a given URL and generates a PDF file. You can download it from Codeplex: https://pechkinwebtest.codeplex.com/.Test web application After trying with different URLs I got:

Problems and Solutions:

  1. Gif images are not supported – I couldn’t find any solution for this one.
  2. With the original Pechkin library the DLLs used for rendering the PDF remain hanging in memory:
    • If used inside a web application, the libraries that generate the PDF remain loaded in the memory indefinitely after making a new deploy. For example, if you run the site using the Visual Studio built-in web server and then you generate a PDF file the libraries will get loaded into memory. If, after that, you rebuild and run the site using the same built-in server you will get some errors like the one below.

      Could not copy “C:\WebSite\libgcc_s_dw2-1.dll” to “bin\libgcc_s_dw2-1.dll”. Exceeded retry count of 10.

      So I had to stop and restart the the server. You’ll get the same behavior when deploying to IIS. More details about this issue here: https://github.com/gmanny/Pechkin/issues/12

    • Solution: Someone made a branch of the original project and solved this issue, so make sure to use this project https://github.com/tuespetre/Pechkin and not the original one that is here: https://github.com/gmanny/Pechkin
  3. Images under HTTPS are not rendered inside the PDF. Solution: Download and install OpenSSL on the server (More details here: http://www.openssl.org/related/binaries.html)

How to use it:

  1. Download and compile this branch of the Pechkin project: https://github.com/tuespetre/Pechkin (This will solve the DLL hanging problem :)). Or, if you’re lazy, you can download here the needed DLLs. Buid the updated pechkin library
  2. Inside your web application solution add a reference to: Common.Logging.dll and Pechkin.dll.
    Add references
  3. Add to the root of your solution the following DLLs and set from the properties window: Copy to output directory » Copy always.Copy always
  4. Now you can start generating PDF files from HTML strings. Below is a very simple example that does this:
    //Transform the HTML into PDF
    var pechkin = Factory.Create(new GlobalConfig());
    var pdf = pechkin.Convert(new ObjectConfig()
    						.SetLoadImages(true).SetZoomFactor(1.5)
    						.SetPrintBackground(true)
    						.SetScreenMediaType(true)
    						.SetCreateExternalLinks(true), html);
    
    //Return the PDF file
    Response.Clear();
    
    Response.ClearContent();
    Response.ClearHeaders();
    
    Response.ContentType = "application/pdf";
    Response.AddHeader("Content-Disposition", string.Format("attachment;filename=test.pdf; size={0}", pdf.Length));
    Response.BinaryWrite(pdf);
    
    Response.Flush();
    Response.End();
    
  5. After publishing it to IIS, select the application pool for your web site, then right-click and select Advanced Settings... Here make sure to enable 32-bit applications otherwise you will get this error:

    Could not load file or assembly ‘Pechkin’ or one of its dependencies. An attempt was made to load a program with an incorrect format.

    Application pool settings

The source code of the test application, together with the Pechkin DLLs and the OpenSSL installer (Win32OpenSSL_Light-1_0_1e.exe) can be downloaded from Codeplex.

Advertisements

EncodingConverter Console Application

EncodingConverter
Here you can download a Windows console application that can be used to change the encoding of a text file. For example you can convert the content of a file that has an UTF encoding to an ANSI encoding.

How to use it:

  1. Download the application
  2. Open command prompt
  3. Launch the application passing the following arguments separated with a space:
    1. The path to the folder that contains the files to be converted. Example: C:\Files
    2. A file pattern (see the next link for details about the available wildcard characters http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/find_c_search_where.mspx?mfr=true). Example: *.csv
    3. The desired encoding. Example: ANSI

For example, let’s suppose that you’re calling the application passing the arguments: C:\Files *.csv ANSI (like in the image above). In this case, the application will convert to an ANSI encoding all the files with the extension .csv that are located under the folder C:\Files.

The complete list of available encodings can be found on MSDN.
You can see the same list by launching the console application with the command prompt and passing the argument: HELP
EncodingConverter HelpBelow is a list with the most common encodings that you can use:

Code Page Name Display Name
1200 utf-16 Unicode
1201 unicodeFFFE Unicode (Big endian)
1250 windows-1250 Central European (Windows)
1251 windows-1251 Cyrillic (Windows)
1252 Windows-1252 Western European (Windows)
1253 windows-1253 Greek (Windows)
1254 windows-1254 Turkish (Windows)
1255 windows-1255 Hebrew (Windows)
1256 windows-1256 Arabic (Windows)
1257 windows-1257 Baltic (Windows)
1258 windows-1258 Vietnamese (Windows)
12000 utf-32 Unicode (UTF-32)
12001 utf-32BE Unicode (UTF-32 Big endian)
20127 us-ascii US-ASCII
65000 utf-7 Unicode (UTF-7)
65001 utf-8 Unicode (UTF-8)

Finally, if you want to see the source code, you can find it at http://encodingconverterconsoleapplication.codeplex.com/