Word to HTML and HTML to Word Conversions

16 Oct 2020 / 2 minutes to read

The Essential DocIO converts the HTML file into a Word document and vice versa. You can also convert the Word document (DOCX, DOTX, DOCM, and DOTM) into HTML format.

In the Word library (DocIO) we use the XmlReader for parsing the content from input HTML. So, the input HTML should meet the XML standard (have proper open and close tags), even if you specify the XHTMLValidationType parameter as XHTMLValidationType.None.

XHTML Validation

Every HTML content is validated against a Document Type Declaration (DTD) which is a set of mark-up declarations that define a document type for a SGML-family mark-up language (GML, SGML, XML, HTML).

XHTML validation types

The following XHTML validation types are supported in Essential DocIO while importing an HTML content.

XHTML validation types Description
XHTMLValidationType.None It does not perform any schema validation but the given HTML content should meet the XHTML 1.0 format.
XHTMLValidationType.Transitional It allows several attributes within the tags.
XHTMLValidationType.Strict It does not allow the attributes inside the tag.

The following code example shows how to convert the HTML file into a Word document.

//Load the HTML document against the validation type none.
WordDocument document = new WordDocument("Input.html", FormatType.Html);
document.setXHTMLValidateOption(XHTMLValidationType.None);
//Save the Word document.
document.save("HTMLtoWord.docx", FormatType.Docx);
//Close the document.
document.close();

The following code example shows how to convert the Word document into HTML.

//Load the template document.
WordDocument document = new WordDocument("Template.docx", FormatType.Docx);
//Saves the document as an Html file.
document.save("WordToHtml.html", FormatType.Html);
//Close the document. 
document.close();

Supported and unsupported items

The following document elements and attributes are supported by DocIO in Word to HTML and HTML to Word conversions.

Document Element Attribute Support Status Notes
Bookmark

Id

Yes

-

Border



Color

Yes

-



Distance from text

Yes

-



Line style

Partial

Some line styles are rendered as solid.



Line width

Yes

-

Document Properties



Yes

-

Field



Yes

-

Footnotes and Endnotes



No

-

Form Field

Text input, Checkbox and combo box

Yes

-

Header / Footer

Different per section

Partial

Only odd header of the first section is preserved in HTML export.

Hyperlink

External URL

Yes

-



Local

Yes

-

Image

Inline

Yes

-



Scale

Yes

-

List

Custom bullets

Yes

-



Multi-level

Yes

-



Numbered

Yes

-



Restart numbering

Yes

-



Standard bullets

Yes

-

Comment



No



Symbols



Yes



Paragraph

Alignment

Yes





Borders

Yes

See Borders, for more details.



Keep lines and paragraphs together

Yes

-



Paragraph Indents

Yes

-



Line spacing

Yes

-



Page break before

Yes

-



Shading

Yes

See Shading, for more details.



Spacing before and after

Yes

-

Shading



Background color

Partial

Solid background colors are supported.



Foreground color

Partial

Solid foreground color is used when background color is auto.

Styles



Paragraph styles

Yes

-



Character styles

Yes

-



List styles

Yes

-

Table



Alignment

Yes

-



Cell margins

Yes

-



Column widths

Yes

-



Indent from left

Yes

-



Preferred width

Yes

-



Spacing between cells

Yes

-



Borders

Partial

See Borders, for more details.



Shading

Partial

See Shading, for more details.

Nested Table



Yes



Table Cell



Borders

Partial

See Borders, for more details.



Cell margins

Yes

-



Horizontal merge

Yes

-



Shading

Partial

See Shading, for more details.



Vertical alignment

Yes

-



Vertical merge

Yes

-

Table Row

Height

Yes

-



Padding

Yes

-

Text



All caps

Yes

-



Bold

Yes

-



Character spacing

Yes

-



Color

Yes

-



Emboss

Partial

Rendered as bold.



Engrave

Partial

Rendered as bold.



Font

Yes

-



Hidden

Yes

-



Highlighting

Yes

-



Imprint

Partial

Rendered as bold.



Italic

Yes

-



Line breaks

Yes

-



Outline

Partial

Rendered as bold.



Page breaks

Yes

-



Shading

Partial

See Shading, for more details.



Small caps

Yes

-



Special symbols

Yes

-



Strike out

Yes

-



Subscript / Superscript

Yes

-



Underline

Partial

Underline types and colors are ignored.