Word to HTML and HTML to Word Conversions
18 Nov 20182 minutes to read
The Essential DocIO converts the HTML file into a Word document and vice versa. You can also convert the Word document (DOCX, DOTX, DOCM, and DOTM) into HTML format.
In the Word library (DocIO) we use the XmlReader for parsing the content from input HTML. So, the input HTML should meet the XML standard (have proper open and close tags), even if you specify the XHTMLValidationType
parameter as XHTMLValidationType.None
.
XHTML Validation
Every HTML content is validated against a Document Type Declaration (DTD) which is a set of mark-up declarations that define a document type for a SGML-family mark-up language (GML, SGML, XML, HTML).
XHTML validation types
The following XHTML validation types are supported in Essential DocIO while importing an HTML content.
XHTML validation types | Description |
XHTMLValidationType.None | It does not perform any schema validation but the given HTML content should meet the XHTML 1.0 format. |
XHTMLValidationType.Transitional | It allows several attributes within the tags. |
XHTMLValidationType.Strict | It does not allow the attributes inside the tag. |
The following code example shows how to convert the HTML file into a Word document.
//Load the HTML document against the validation type none.
WordDocument document = new WordDocument("Input.html", FormatType.Html);
document.setXHTMLValidateOption(XHTMLValidationType.None);
//Save the Word document.
document.save("HTMLtoWord.docx", FormatType.Docx);
//Close the document.
document.close();
The following code example shows how to convert the Word document into HTML.
//Load the template document.
WordDocument document = new WordDocument("Template.docx", FormatType.Docx);
//Saves the document as an Html file.
document.save("WordToHtml.html", FormatType.Html);
//Close the document.
document.close();
Supported and unsupported items
The following document elements and attributes are supported by DocIO in Word to HTML and HTML to Word conversions.
Document Element | Attribute | Support Status | Notes |
---|---|---|---|
Bookmark |
Id |
Yes |
- |
Border |
Color |
Yes |
- |
|
Distance from text |
Yes |
- |
|
Line style |
Partial |
Some line styles are rendered as solid. |
|
Line width |
Yes |
- |
Document Properties |
|
Yes |
- |
Field |
|
Yes |
- |
Footnotes and Endnotes |
|
No |
- |
Form Field |
Text input, Checkbox and combo box |
Yes |
- |
Header / Footer |
Different per section |
Partial |
Only odd header of the first section is preserved in HTML export. |
Hyperlink |
External URL |
Yes |
- |
|
Local |
Yes |
- |
Image |
Inline |
Yes |
- |
|
Scale |
Yes |
- |
List |
Custom bullets |
Yes |
- |
|
Multi-level |
Yes |
- |
|
Numbered |
Yes |
- |
|
Restart numbering |
Yes |
- |
|
Standard bullets |
Yes |
- |
Comment |
|
No |
|
Symbols |
|
Yes |
|
Paragraph |
Alignment |
Yes |
|
|
Borders |
Yes |
See Borders, for more details. |
|
Keep lines and paragraphs together |
Yes |
- |
|
Paragraph Indents |
Yes |
- |
|
Line spacing |
Yes |
- |
|
Page break before |
Yes |
- |
|
Shading |
Yes |
See Shading, for more details. |
|
Spacing before and after |
Yes |
- |
Shading |
Background color |
Partial |
Solid background colors are supported. |
|
Foreground color |
Partial |
Solid foreground color is used when background color is auto. |
Styles |
Paragraph styles |
Yes |
- |
|
Character styles |
Yes |
- |
|
List styles |
Yes |
- |
Table |
Alignment |
Yes |
- |
|
Cell margins |
Yes |
- |
|
Column widths |
Yes |
- |
|
Indent from left |
Yes |
- |
|
Preferred width |
Yes |
- |
|
Spacing between cells |
Yes |
- |
|
Borders |
Partial |
See Borders, for more details. |
|
Shading |
Partial |
See Shading, for more details. |
Nested Table |
|
Yes |
|
Table Cell |
Borders |
Partial |
See Borders, for more details. |
|
Cell margins |
Yes |
- |
|
Horizontal merge |
Yes |
- |
|
Shading |
Partial |
See Shading, for more details. |
|
Vertical alignment |
Yes |
- |
|
Vertical merge |
Yes |
- |
Table Row |
Height |
Yes |
- |
|
Padding |
Yes |
- |
Text |
All caps |
Yes |
- |
|
Bold |
Yes |
- |
|
Character spacing |
Yes |
- |
|
Color |
Yes |
- |
|
Emboss |
Partial |
Rendered as bold. |
|
Engrave |
Partial |
Rendered as bold. |
|
Font |
Yes |
- |
|
Hidden |
Yes |
- |
|
Highlighting |
Yes |
- |
|
Imprint |
Partial |
Rendered as bold. |
|
Italic |
Yes |
- |
|
Line breaks |
Yes |
- |
|
Outline |
Partial |
Rendered as bold. |
|
Page breaks |
Yes |
- |
|
Shading |
Partial |
See Shading, for more details. |
|
Small caps |
Yes |
- |
|
Special symbols |
Yes |
- |
|
Strike out |
Yes |
- |
|
Subscript / Superscript |
Yes |
- |
|
Underline |
Partial |
Underline types and colors are ignored. |