ticket id
000062
status
resolved
priority
???
assigned to
Gerry
Reported by: davidchambers
Component:

This is a very strange bug.

Markdown:

<p>Paragraph.</p>
<ul>
<li>
<p>Paragraph.</p>
</li>
</ul>

Desired HTML output:

<p>Paragraph.</p>
<ul>
<li>
<p>Paragraph.</p>
</li>
</ul>

Actual HTML output:

<p>Paragraph.</p>
<ul>
<li>
<p>Paragraph.</p>

<p></li>
</ul></p>

I've simplified the input text as much as possible – correct behaviour results if either paragraph is removed, or if the first paragraph is replaced by its raw Markdown equivalent (i.e. "Paragraph.\n" rather than "<p>Paragraph.</p>").

The input text is valid HTML and should be left as is.

Comments

By Waylan on 7/14/2010

First, let's run the given sample through the parser

>>> md = markdown.Markdown()
>>> md.convert("<p>foo</p>\n<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>")
u'<p>foo</p>\n<ul>\n<li>\n<p>bar</p>\n\n<p></li>\n</ul></p>'
>>> md.htmlStash.rawHtmlBlocks
[(u'<p>foo</p>\n<ul>\n<li>\n<p>bar</p>', False), (u'</li>', False), (u'</ul>', False)]

Notice that the raw html is seen as three separate blocks and the second two are closing tags only. Thats why they are getting wrapped in a <p> tag.

Now let's remove that first paragraph from the input text:

>>> md.reset()
>>> md.convert("<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>")
u'<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>'
>>> md.htmlStash.rawHtmlBlocks
[(u'<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>', False)]

This time everything is seen as one block so it works correctly. So the trick is to figure out why the rawhtml parser is thrown off by the first paragraph so that is doesn't include the closing tags in the last two lines in the same block -- and then to fix it.

Now one final test:

>>> md.reset()
>>> md.convert("<p>foo</p>\n\n<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>")
u'<p>foo</p>\n\n<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>'
>>> md.htmlStash.rawHtmlBlocks
[(u'<p>foo</p>', False), (u'<ul>\n<li>\n<p>bar</p>\n</li>\n</ul>', False)]

Here I added a blank line between the first paragraph and the list and everything works as it should. In the input that breaks I suspect that the parser is matching the first <p> with the last </p> and completely ignores the </p><ul><li><p> in the middle. So to fix it we need the parser to stop on the first </p>, then restart on the list. Doesn't really matter if everything is all in one block or two as long as each block contains a valid block of html.

Resolution

fixed

Powered by Sputnik | XHTML 1.1