C# – Convert HTML to PDF using Pechkin (WkHtmlToPdf)

HTMLtoPDFThere are a lot of C# libraries that can be used to create PDF files but I needed one that was able to convert an HTML file (with CSS and images) into a PDF document. Searching for a solution I found good reviews for a free library called Pechkin so I decided to give it a try.

While using it I stumbled upon some little problems but in the end I was satisfied with the result hence this post where I will detail my experience with this library.

Pechkin is a .NET Wrapper for another library called WkHtmlToPdf that uses the WebKit engine to convert HTML pages to PDF… and this is a pretty good reason to give it a try. If not yet convinced, below is a description of WebKit taken from Wikipedia:

WebKit is a layout engine software component designed to allow web browsers to render web pages. It powers Google’s Chrome web browser versions up to 27, and Apple’s Safari web browser applications. As of November 2012 it has the most market share of any layout engine at over 40% of the browser market share — ahead of both the Trident engine used by Internet Explorer, and the Gecko engine used by Firefox.

It is also used as the basis for the experimental browser included with the Amazon Kindle e-book reader, as well as the default browser in the Apple iOS, Android, BlackBerry 10, and Tizen mobile operating systems.

For testing the library I created a simple web application that reads the HTML content at a given URL and generates a PDF file. You can download it from Codeplex: https://pechkinwebtest.codeplex.com/.Test web application After trying with different URLs I got:

Problems and Solutions:

  1. Gif images are not supported – I couldn’t find any solution for this one.
  2. With the original Pechkin library the DLLs used for rendering the PDF remain hanging in memory:
    • If used inside a web application, the libraries that generate the PDF remain loaded in the memory indefinitely after making a new deploy. For example, if you run the site using the Visual Studio built-in web server and then you generate a PDF file the libraries will get loaded into memory. If, after that, you rebuild and run the site using the same built-in server you will get some errors like the one below.

      Could not copy “C:\WebSite\libgcc_s_dw2-1.dll” to “bin\libgcc_s_dw2-1.dll”. Exceeded retry count of 10.

      So I had to stop and restart the the server. You’ll get the same behavior when deploying to IIS. More details about this issue here: https://github.com/gmanny/Pechkin/issues/12

    • Solution: Someone made a branch of the original project and solved this issue, so make sure to use this project https://github.com/tuespetre/Pechkin and not the original one that is here: https://github.com/gmanny/Pechkin
  3. Images under HTTPS are not rendered inside the PDF. Solution: Download and install OpenSSL on the server (More details here: http://www.openssl.org/related/binaries.html)

How to use it:

  1. Download and compile this branch of the Pechkin project: https://github.com/tuespetre/Pechkin (This will solve the DLL hanging problem :)). Or, if you’re lazy, you can download here the needed DLLs. Buid the updated pechkin library
  2. Inside your web application solution add a reference to: Common.Logging.dll and Pechkin.dll.
    Add references
  3. Add to the root of your solution the following DLLs and set from the properties window: Copy to output directory » Copy always.Copy always
  4. Now you can start generating PDF files from HTML strings. Below is a very simple example that does this:
    //Transform the HTML into PDF
    var pechkin = Factory.Create(new GlobalConfig());
    var pdf = pechkin.Convert(new ObjectConfig()
    						.SetLoadImages(true).SetZoomFactor(1.5)
    						.SetPrintBackground(true)
    						.SetScreenMediaType(true)
    						.SetCreateExternalLinks(true), html);
    
    //Return the PDF file
    Response.Clear();
    
    Response.ClearContent();
    Response.ClearHeaders();
    
    Response.ContentType = "application/pdf";
    Response.AddHeader("Content-Disposition", string.Format("attachment;filename=test.pdf; size={0}", pdf.Length));
    Response.BinaryWrite(pdf);
    
    Response.Flush();
    Response.End();
    
  5. After publishing it to IIS, select the application pool for your web site, then right-click and select Advanced Settings... Here make sure to enable 32-bit applications otherwise you will get this error:

    Could not load file or assembly ‘Pechkin’ or one of its dependencies. An attempt was made to load a program with an incorrect format.

    Application pool settings

The source code of the test application, together with the Pechkin DLLs and the OpenSSL installer (Win32OpenSSL_Light-1_0_1e.exe) can be downloaded from Codeplex.

Advertisements

76 comments on “C# – Convert HTML to PDF using Pechkin (WkHtmlToPdf)

    • In the past i used iTextSharp but it has a big problem, it’s not able to convert non trivial HTML (that also has CSS classes). That was the reason why I was searching for another library and so I found Pechkin that is able to convert complex HTML files into good quality PDFs. The result obtained with Pechkin it’s superior because it uses the WebKit layout engine that is also used by browsers like Chrome and Safari.

      • Hello Mighty.Thanks for the suggestion.I have a problem with the images like dynamically i can increase and decrease the coordinates of the images before converting to the PDF.Will this have support for that too.Thanks in advance

      • Hello Akhil, this component can be used to convert HTML into PDF. The nice part is that you can use CSS in order to position the HTML elements in the same way you would use CSS to position elements in a normal browser. So, in order to change the coordinates of the images dynamically, one idea could be to inject the needed CSS in the head part of the HTML. So the answer is: the Pechkin library doesn’t offer support for manipulating the HTML elements directly via C# methods but it knows to interpret CSS so you could use CSS in order to style your HTML.

  1. Thanks for posting this. very helpful indeed :). I have two problems, and wanted to ask if anyone managed to solve them:

    1) The biggest, and it’s common, – how do you deal with page breaks? For instance if a table is 2 pages-long it might happen that a row is split in half, or an image, text.. whatever
    2) I run a console application and all works except that i get 2 lines written :
    “Qt: Could not initialize OLE (error 80010106)
    0x5175888” – the pdf gets generated and everything is ok, but I’m worried about this.

    Has anyone come across these before? I would appreciate some help. Thanks

    • This may or may not help (not sure) – when I got weird results converting with Aspose.PDF and iTextSharp (both failed miserably in this task) I ran my HTML through the WC3 HTML validator and found that I had some non-standard garbage in my html. The validator is here;
      http://validator.w3.org/check

      Also – for those looking to dynamically change the HTML before converting it, I Use HTML AgilityPack – also free and works great. I’m using the combination of these two toolkits to generate PDF invoices from a website dynamically.

    • Vlad,
      Did you ever get a reply or solution to how to deal with Page Breaks? I am currently having an issue with images breaking across pages.

    • Can you give me more details about what you’d like to do? With Pechkin you can create a PDF file by passing to the library a string containing HTML markup. GET and POST are HTTP request methods and are not directly related to the creation of the PDF. The library might make HTTP GET requests in case the HTML string you pass contains links to other resources (for example images). Otherwise, if the HTML string is very simple and doesn’t have links to external resources, no HTTP requests will be made.

  2. Hi, there, thanks for the great article.
    I am using Pechkin.Synchronized and having trouble with CSS; however, your test application converted the same webpage perfectly. Any idea? Thanks a lot.

    • Hi Henry, could it be that the link to the CSS file is not fully qualified, something like:

      <link rel="stylesheet" type="text/css" href="styles.css" media="screen" />

      instead of something like this:

      <link rel="stylesheet" type="text/css" href="http:/localhost/styles.css" media="screen" />

      Another idea, check if the css file is under https, in this case you need to install first OpenSSL http://www.openssl.org/related/binaries.html

      • Madalina, thank you very much for prompt reply and suggestion. The link to CSS is complete. It’s something else, the page I generated on the fly had nested body tags; once that is removed it’s perfect. Thanks again. This article has been very helpful.

      • Hi, Madalina, sorry to bother you again.
        Now that my site works fine for a non-secured development site, when deploying the project to a secured site, I do have issue with CSS and images having an https protocol. I have installed OpenSSL on the server (which still runs Windows Server 2003); however, the PDF conversion obviously missed them.
        Did I miss any step? Thanks.

      • Hi Henry, you don’t bother me at all. If I remember well, when we published to production we had this issue and we solved it with a windows restart 🙂 I really hope this helps because I don’t have other ideas. Please let me know if it works. Thanks.

      • Madalina, thanks for reply and suggestion. I can’t restart the server now since there are many sites running on it now. I will surely let you know later when I have a chance to restart the server. Thanks again.

  3. Quick question, I am trying to do this in VB.net. I am getting error : ‘Factory’ and ‘GlobalConfig’ is not declared. This is in a Web.app ASP .NEt 4.0

  4. Quick question,,, WHY ARE YOU using a Proxy???. I get:

    Error message: The remote server returned an error: (407) Proxy Authentication Required.
    ——————————————————————————–
    Stack trace: at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request) at System.Net.WebClient.DownloadString(Uri address) at System.Net.WebClient.DownloadString(String address) at Pechkin_WebTest.Default.btnGetPdt_Click(Object sender, EventArgs e)

  5. Pingback: Getting Constructor on type XX not found. Using Pechkin plugin - Tech Forum Network

  6. Hi Madalina,

    Thank you for the post!!

    I was using Pechkin. I tested the Webtest to convert HTML generated by my application into PDF. Its works perfectly fine. Although what I have is not a web application. I generate a report from this stand alone application using only C#(no ASP) in HTML format. I wanted to be able to save this report as PDF. I keep getting this runtime error when I use HttpResponse object with below lines.

    Response.Clear();
    Response.ClearContent();
    Response.ClearHeaders();

    “Response is not available in this context.”. Any ideas to resolve this? Is there a way around? Please let me know.

    Thanks,
    Jay

    • Hi Jay,

      Sorry for the late reply. My suggestion would be to save the PDF to the file system using the .NET framework method File.WriteAllBytes(string path, byte[] bytes). This means that, at step 4, as described in this post, you should use something like this:

      var pechkin = Factory.Create(new GlobalConfig());
      var pdf = pechkin.Convert(new ObjectConfig()
                              .SetLoadImages(true).SetZoomFactor(1.5)
                              .SetPrintBackground(true)
                              .SetScreenMediaType(true)
                              .SetCreateExternalLinks(true), html);
       
      //Save the PDF file to the disk
      File.WriteAllBytes("C:\\MyFile.pdf", pdf );
      

      I hope it helps. Please let me know if this is not what you were looking for.

      Thanks,
      Madalina

  7. Hi Mightymada,
    It looks very promising. But one thing when i convert the images are not converting and not showing in pdf. Can you tell me how to do this.

    Thanks
    Chandan

    • Hi Chandan,

      Usually the images don’t appear if they are GIFs (the GIF format is not supported) or if the images are under https (in this case you have to install OpenSSL to solve this issue).

      Hope it helps.

      All the best,
      Madalina

      • Hi Madalina,
        Thanks for your quick reply. Currently when I use this there is an error called “Could not load file or assembly ‘Pechkin’ or one of its dependencies. An attempt was made to load a program with an incorrect format.”
        I am using all dlls. and my IIS Application pool is set as enable 32bit true. I am using Windows 7 with 64 bit. VS 2012.
        Can you please how to fix this problem.
        Thanks
        Chandan

      • Hi Chandan,

        You’re welcome.
        I’m not sure what the problem could be but I’ll try to make a lucky guess :). The Pechkin library comes together with 5 other DLLs that need to exist in the root of your site. First check if those DLLs exist under the root folder: Additional DLLs
        Then, make sure that you set the property: Copy to output directory » Copy always.

        All the best,
        Madalina

    • Hi Alex,
      just to let you know I had a minor issue with javascript and the workaround.
      I was going to disable a div using javascript so that I can inject some js code on the fly when building my html to make it ready to generate pdf.

      I have the following :

      .disable {
      display: none;
      }


      ….

      //getting the div
      var item = document.getElementById(“mydiv”);

      // adding css class “disable” to classList did not work but changing the style //property directly worked.
      //It probably comes down to the fact that classList is an HTML5 property

      //item.classList.add(“disable”); did not work
      item.style.display = “none”; // worked fine

      Hope it helps 🙂

  8. Hi Madalina,
    Thanks for great post. I have a simple image tag in my html like

    img folder is in debug\bin folder containing the sign.jpg. However image does not show up.
    My app is a console app and everything shows up in the target .pdf file but the image.
    any idea why ?

    following is my code:

    var html = ReadFile(htmlFullPathFileName).ToString();

    var pechkin = Factory.Create(new GlobalConfig());
    var pdf = pechkin.Convert(new ObjectConfig()
    .SetLoadImages(true).SetZoomFactor(1.5)
    .SetPrintBackground(true)
    .SetScreenMediaType(true)
    .SetCreateExternalLinks(true), html);

    Stream.Write(pdf,0,pdf.Length);

    Thanks in advance 🙂

    • Hi Afshin,

      The problem is that, inside the HTML you’ll have to specify the full path to the image. Try something like this:

      <img src='file:///C:/yourApplication/debug/bin/sign.jpg'>
      

      You’ll have to replace C:/yourApplication/debug/bin/sign.jpg with the actual file path on your computer.

      Please let me know if it works, I didn’t try the code.

      Thanks!

  9. Thanks it rocks 🙂
    One quick question though, does the library work with JavaScript as well?
    Say you have a javascript in which you disable or enable some divs of your html, based on some criteria in your application.

    Thanks a lot 🙂

    • I’m glad it works. I just made a fast check and it seems that simple javascript works. For example if you convert the HTML below to a PDF you will not see the content of the div with the id “toHide”.

      <html>
      <body>
         <div id='toHide'>Text to hide</div>
         Some other text.
         <script type='text/javascript'>document.getElementById('toHide').style.display='none';</script>
      </body>
      </html>
      
      • Beautiful 🙂

        Yes you’r right. For example HTML5 properties like classList did not work. However as you rightly pointed out setting style property works fine.

        me and always questions:

        I have not plugged it in my web app running of iis. Apart from 32 bit support is that any other consideration I need to take into account ?

        Thanks Madalina 🙂

  10. Hello ,

    I have download code in codeplex and check window application and its working fine but when i check my HTML code it’s not working some css not working like background color and etc, please check and let me know how can i use css ..

    Thanks & Regards,
    Ashish Rathod

  11. hii all, i used the same library to convert html to pdf but images & watermark is not loading on generated pdf. Please suggest its critical.

  12. Hi, After the final statments of the class:

    Response.BinaryWrite(result);
    Response.Flush();
    Response.End();
    I’d like to redirect to another view using return RedirectToAction(“End”) in order to display final comment to the user. How can I achive that? If I put RedirectToAction(“End”) just after Response.End(); statement there is no redirect action. I know that I can’t have a HTTP 200 (Success) and a HTTP 302 (Redirect) in the same request but nevertheless I’d like to achive the goal somehow: after button in the view is pressed the pdf file is created (and aviable to to the user) and next the new view is loaded.

  13. Hi, I’m trying to open the solution in VS 2008, however I’m not able to since it was created using a newer version. I needed the DLLs with version .Net 2.0. Please help! Thanks 🙂

  14. getting error at this line
    var pdf = pechkin.Convert(new ObjectConfig()
    .SetLoadImages(true).SetZoomFactor(1.5)
    .SetPrintBackground(true)
    .SetScreenMediaType(true)
    .SetCreateExternalLinks(true), html);
    the Name html does not exist in the current context

  15. If I keep authentication mode=”Windows” in web.config file then request just keeps clocking and never converts html to pdf and have to recycle app pool in IIS, however if I keep authentication mode=”Forms” then it does work fine. Does this component not work with “Windows” authentication mode.

    I tried NuGet Package like Pechkin, Pechkin.Synchronized & TuesPechkin(With TuesPechkin.Wkhtmltox.Win32) but non of these packages works on server.

    For me these precompiled dll works on sever and only in Forms authentication mode

  16. Hi, I find it really interesting! I create my report with iTextSharp, done that I should enter the html code generated with Pechkin, how could I do it?
    thanks

  17. Hi,
    I used this code, but it hangs from this line.
    var pdf = pec.Convert(new ObjectConfig()
    .SetLoadImages(true).SetZoomFactor(1.5)
    .SetPrintBackground(true)
    .SetScreenMediaType(true)
    .SetCreateExternalLinks(true), data);

    and after this noting is running and page showing loading sign.

    Please tell me the solution why it stopped at that line.

    Thanks

  18. Hi Madalina,
    Thank you so much for this excellent post, this is very helpful for me. Though I am a big fan of itextsharp, but for creating pdf from HTML (CSS / image ) it’s not good. Previously i was using spire.pdf to generate pdf in asp.net but its not free. And then I found pechkin dll I landed here, It’s awesome am using it in many of my projects.

    If you don’t mind can I share it on my blog http://codepedia.info . Thanks again for providing pechkin DLL
    Keep sharing.

    • Hi Satinder,
      thanks for your nice words and for sharing you experience on PDF creation. Please feel free to share this post on you blog.
      Happy coding 🙂
      Madalina

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s