Transform HTML from WordPress, Drupal, or any CMS into clean, structured Portable Text JSON. Perfect for large-scale CMS migrations and legacy content modernization.
Your content is trapped in HTML; thousands of articles wrapped in <div> soup, inline styles, and deprecated tags from 2010. Maybe it's a WordPress export, a Drupal database dump, or years of accumulated HTML from various CMSes. Now you're moving to Sanity, and that HTML needs to become clean, structured Portable Text.
Manual conversion is impossible at scale. A single HTML article with nested divs, custom classes, and inline styles takes 45+ minutes to clean and convert. For a typical site with 1,000+ pages, that's 750 hours of mind-numbing work.
Our HTML to Portable Text converter automatically transforms messy HTML into clean Portable Text JSON. It strips unnecessary markup, preserves semantic meaning, and outputs properly structured content ready for Sanity. No more manual cleaning, no more lost structure, no more migration nightmares.
HTML was designed for browsers, not content management. Over years, your HTML content accumulates cruft: inline styles from old editors, wrapper divs from ancient themes, class names from deprecated frameworks. This presentation-focused markup makes content migration a nightmare.
Here's what typical CMS HTML looks like after years of accumulation:
<div class="post-content"> <div class="wrapper"> <p style="margin-bottom: 20px; font-size: 16px;"> This is <span style="font-weight: bold;">important</span> content with <a href="/old-link" class="internal-link" target="_blank">a link</a>. </p> <div class="list-wrapper"> <ul style="margin-left: 40px;"> <li>First item</span></li> <li>Second item</li> </ul> </div> </div></div>Versus clean Portable Text structure:
[ { "_key": "e29133e865ed", "children": [ { "_type": "span", "marks": [], "text": "This is important content with ", "_key": "d699f9245ab4" }, { "_type": "span", "marks": [ "55acd23526eb" ], "text": "a link", "_key": "469a7aea1534" }, { "_type": "span", "marks": [], "text": ".", "_key": "696bae35f57a" } ], "markDefs": [ { "_key": "55acd23526eb", "_type": "link", "href": "/old-link" } ], "_type": "block", "style": "normal" }, { "_key": "0fad85c9fa76", "children": [ { "_type": "span", "marks": [], "text": "First item", "_key": "5ae5410aca70" } ], "markDefs": [], "_type": "block", "style": "normal", "level": 1, "listItem": "bullet" }, { "_key": "1c19a9679c17", "children": [ { "_type": "span", "marks": [], "text": "Second item", "_key": "95488c890deb" } ], "markDefs": [], "_type": "block", "style": "normal", "level": 1, "listItem": "bullet" }]The converter strips the cruft and preserves the meaning.
With 40% of the web running on WordPress, millions of sites need modern content infrastructure. WordPress stores content as HTML in the database, making Sanity migration complex.
Projected numbers:
Organizations running Drupal, Joomla, or custom CMSes from the 2000s have massive HTML content stores that need modernization.
Projected numbers:
Companies moving from static HTML sites or Jekyll/Hugo to Sanity need to convert years of accumulated HTML content.
Projected numbers:
Enterprises consolidating multiple CMSes into Sanity face diverse HTML formats from different systems.
Challenge Solved:
Our converter doesn't just strip tags. It understands HTML semantics:
Challenge: Shortcodes, Gutenberg blocks, theme-specific markup
Solution: Intelligent pattern recognition and extraction
Challenge: Style attributes everywhere breaking structure
Solution: Strip presentation, preserve semantics
Challenge: Excessive wrapper elements obscuring content
Solution: Recursive unwrapping while maintaining hierarchy
Challenge: Character encoding problems from old systems
Solution: Automatic encoding detection and normalization
We don't just strip HTML. We understand and preserve the semantic meaning of your content.
No unnecessary nesting, no empty spans, no redundant marks. Just clean, efficient Portable Text.
Handles massive documents, malformed HTML, and legacy encoding issues that enterprise migrations encounter.
Stop wasting 45 minutes per article on manual content migration. Convert ChatGPT, Claude, and Markdown files directly to Sanity's Portable Text format with one click.
Convert rich text from Google Docs, Word, Notion, or any WYSIWYG editor to Sanity-ready Portable Text JSON. No more losing formatting during content migration.
© Copyright 2026 ContentWrap. All Rights Reserved.