HTML Conversion

The Essential DocIO converts the HTML file into Word document and vice versa. It supports only the HTML files that meets the validation either against XHTML 1.0 strict or XHTML 1.0 Transitional schema.

The following code example shows how to convert the HTML file into Word document.

//Loads the HTML document against transitional schema validation

WordDocument document = new WordDocument("Sample.html", FormatType.Html, XHTMLValidationType.Transitional);

//Saves the Word document

document.Save("HTMLtoWord.docx", FormatType.Docx);

//Closes the document

document.Close();
' Loads the HTML document against transitional schema validation 

Dim document As New WordDocument("Sample.html", FormatType.Html, XHTMLValidationType.Transitional)

'Saves the Word document

document.Save("HTMLtoWord.docx", FormatType.Docx)

'Closes the document

document.Close()

The following code example shows how to convert the Word document into HTML.

//Loads the template document

WordDocument document = new WordDocument("Template.docx", FormatType.Docx);

//Saves the document as Html file

document.Save("WordToHtml.html", FormatType.Html);

//Closes the document 

document.Close();
'Loads the template document

Dim document As New WordDocument("Template.docx", FormatType.Docx)

'Saves the document as Html file

document.Save("WordToHtml.html", FormatType.Html)

'Closes the document 

document.Close()

Customization settings

The Essential DocIO provides settings while performing HTML to Word conversion and vice versa.

Customizing the HTML to Word conversion

The Essential DocIO provides settings while performing HTML to Word conversion as mentioned as follows:

  • Validate the HTML string against XHTML 1.0 Strict and Transitional schema.
  • Insert the HTML string at the specified position of the document body contents.
  • Append HTML string to the specified paragraph.

The following code example shows how to customize the HTML to Word conversion.

//Loads the template document

WordDocument document = new WordDocument("Template.docx");

//Html string to be inserted

string htmlstring = "<p><b>This text is inserted as HTML string.</b></p>";

//Validates the Html string

bool isValidHtml = document.LastSection.Body.IsValidXHTML(htmlstring, XHTMLValidationType.Transitional);

//When the Html string passes validation, it is inserted to the document

if (isValidHtml)

{

//Appends Html string as first item of the second paragraph in the document

document.Sections[0].Body.InsertXHTML(htmlstring, 2, 0);

//Appends the Html string to first paragraph in the document

document.Sections[0].Body.Paragraphs[0].AppendHTML(htmlstring);

}

//Saves and closes the document

document.Save("Sample.docx");

document.Close();
'Loads the template document

Dim document As New WordDocument("Template.docx")

'Html string to be inserted

Dim htmlstring As String = "<p><b>This text is inserted as HTML string.</b></p>"

'Validates the Html string

Dim isValidHtmlAs Boolean = document.LastSection.Body.IsValidXHTML(htmlstring, XHTMLValidationType.Transitional)

'When the Html string passes validation, it is inserted to document

If isValidHtmlThen

'Appends Html string as first item of the second paragraph in the document

document.Sections(0).Body.InsertXHTML(htmlstring, 2, 0)

'Appends the Html string to first paragraph in the document

document.Sections(0).Body.Paragraphs(0).AppendHTML(htmlstring)

End If

'Saves and closes the document

document.Save("Sample.docx")

document.Close()

NOTE

  1. Inserting XHTML string is not supported in Silverlight, Windows Phone, and Xamarin applications.
  2. XHTML validation against XHTML 1.0 Strict and Transitional schema is not supported in Windows Store applications.
  3. XHTMLValidationType.Transitional: Default validation while importing HTML file.
  4. XHTMLValidationType.None: Validates the HTML file against XHTML format and it doesn’t perform any schema validation.

Customizing the Word to HTML conversion

You can customize the Word to HTML conversion with the following options:

  • Extract the images used in the HTML document at the specified file directory
  • Specify to export the header and footer of the Word document in the HTML
  • Specify to consider Text Input field as a editable fields or text
  • Specify the CSS style sheet type and its name

NOTE

While exporting header and footer, DocIO exports the first section header content at the top of the HTML file and first section footer content at the end of the HTML file.

The following code sample shows how to customize Word to HTML conversion.

//Loads an existing document

WordDocument document = new WordDocument("Template.docx");

HTMLExport export = new HTMLExport();

//The images in the input document are copied to this folder

document.SaveOptions.HtmlExportImagesFolder = @"D:\Data\";

//The headers and footers in the input are exported

document.SaveOptions.HtmlExportHeadersFooters = true;

//Exports the text form fields as editable

document.SaveOptions.HtmlExportTextInputFormFieldAsText = false;

//Sets the style sheet type

document.SaveOptions.HtmlExportCssStyleSheetType = CssStyleSheetType.External;

//Sets name for style sheet

document.SaveOptions.HtmlExportCssStyleSheetFileName = "UserDefinedFileName.css";

//Saves the document as html file

export.SaveAsXhtml(document, "WordtoHtml.html");

document.Close();
'Loads an existing document

Dim document As New WordDocument("Template.docx")

Dim export As New HTMLExport()

'The images in the input document are copied to this folder

document.SaveOptions.HtmlExportImagesFolder = "D:\Data\"

'The headers and footers in the input are exported

document.SaveOptions.HtmlExportHeadersFooters = True

'Exports the text form fields as editable

document.SaveOptions.HtmlExportTextInputFormFieldAsText = False

'Sets the style sheet type

document.SaveOptions.HtmlExportCssStyleSheetType = CssStyleSheetType.External

'Sets name for style sheet

document.SaveOptions.HtmlExportCssStyleSheetFileName = "UserDefinedFileName.css"

'Saves the document as html file

export.SaveAsXhtml(document, "WordtoHtml.html")

document.Close()

Supported and unsupported items

The following document elements and attributes are supported by DocIO in Word to HTML and HTML to Word conversions.

Document Element Attribute Support Status Notes
Bookmark

Id

Yes

-

Border



Color

Yes

-



Distance from text

Yes

-



Line style

Partial

Some line styles are rendered as solid.



Line width

Yes

-

Document Properties



Yes

-

Field



Yes

-

Footnotes and Endnotes



Yes

-

Form Field

Text input, Checkbox and combo box

Yes

-

Header / Footer

Different per section

Partial

Only odd header of the first section is preserved in HTML export.

Hyperlink

External URL

Yes

-



Local

Yes

-

Image

Inline

Yes

-



Scale

Yes

-

List

Custom bullets

Yes

-



Multi-level

Yes

-



Numbered

Yes

-



Restart numbering

Yes

-



Standard bullets

Yes

-

Comment



No



Symbols



Yes



Paragraph

Alignment

Yes





Borders

Yes

See Borders, for more details.



Keep lines and paragraphs together

Yes

-



Paragraph Indents

Yes

-



Line spacing

Yes

-



Page break before

Yes

-



Shading

Yes

See Shading, for more details.



Spacing before and after

Yes

-

Shading



Background color

Partial

Solid background colors are supported.



Foreground color

Partial

Solid foreground color is used when background color is auto.

Styles



Paragraph styles

Yes

-



Character styles

Yes

-



List styles

Yes

-

Table



Alignment

Yes

-



Cell margins

Yes

-



Column widths

Yes

-



Indent from left

Yes

-



Preferred width

Yes

-



Spacing between cells

Yes

-



Borders

Partial

See Borders, for more details.



Shading

Partial

See Shading, for more details.

Nested Table



Yes



Table Cell



Borders

Partial

See Borders, for more details.



Cell margins

Yes

-



Horizontal merge

Yes

-



Shading

Partial

See Shading, for more details.



Vertical alignment

Yes

-



Vertical merge

Yes

-

Table Row

Height

Yes

-



Padding

Yes

-

Text



All caps

Yes

-



Bold

Yes

-



Character spacing

Yes

-



Color

Yes

-



Emboss

Partial

Rendered as bold.



Engrave

Partial

Rendered as bold.



Font

Yes

-



Hidden

Yes

-



Highlighting

Yes

-



Imprint

Partial

Rendered as bold.



Italic

Yes

-



Line breaks

Yes

-



Outline

Partial

Rendered as bold.



Page breaks

Yes

-



Shading

Partial

See Shading, for more details.



Small caps

Yes

-



Special symbols

Yes

-



Strike out

Yes

-



Subscript / Superscript

Yes

-



Underline

Partial

Underline types and colors are ignored.