Tweaking RDiscount
As I mentioned in my previous post I’d had some issues with Pygments and RDiscount. I have been using Pygments to highlight my code blocks in my blog. Unfortunately right after using a code block with linenos none of the markdown was being parsed anymore.
The Discount library upon which RDiscount is based has the ability to output a debug tree which shows the different blocks within the document. Things like paragraphs, quotes, code blocks and so on. Using this I was able to determine that the HTML block which contained the highlighted code was the last block being detected.
The Pygments highlighted source with line numbers is actually rendered in a table. This table closes with
</td></tr></table>
and herein lay the problem. The Discount library detects the closing HTML tag with the following function
static Line *
htmlblock(Paragraph *p, char *tag)
{
Line *t = p->text, *ret;
int closesize;
char close[MAXTAG+4];
if ( selfclose(t, tag) || (strlen(tag) >= MAXTAG) ) {
ret = t->next;
t->next = 0;
return ret;
}
closesize = sprintf(close, "</%s>", tag);
for ( ; t ; t = t->next) {
if ( strncasecmp(T(t->text), close, closesize) == 0 ) {
ret = t->next;
t->next = 0;
return ret;
}
}
return 0;
}
which as you might be able to tell, on line 17, checks the line for the presence of the HTML closing element. In the case of the Pygments closing element, this was not on its own line. While looking for </table> it would only read up to </td></t from the line before giving up.
Testing with a pre-pygmentised file and pushing the </table> onto its own new line confirmed that this was the problem.
But how do we fix it?
Well I started out looking for C string searching functions, of course strstr was the first candidate. Replacing line 17 of the aforementioned htmlblock() function with
if ( strstr(T(t->text), close) != NULL ) {
seemed like a good idea, but running the RDiscount test suite revealed my naivety of the solution. With my change text such as </table> was matched breaking HTML code examples, this wasn’t good enough.
I spent much longer than I really should have trying other solutions before having the critical Eureka moment. I should add that prior to this moment I’d learnt of the more suitable strcasestr() function to better match the strncasecmp() function originally used.
At any rate my Eureka was realising that I was only ever going to need to search a string of closing tags, and that these closing tags would never be preceded by white space. Initially was thinking of using the CTYPE isspace macro, but a far simpler solution also struck me.
A closing tag, or string of closing tags will always start with a <, so provided the line started with < and then contained the respective closing tag I could reasonably safely assume the HTML block was being closed.
So I next transformed line 17 of htmlblock() to
if ( T(t->text)[0] == '<' && strcasestr(T(t->text), close) != NULL ) {
and re-ran the RDiscount test suite to see if the tests passed, they did. Next I added a snippet of the Pygments table to one of the test files with some extra markdown after it, updated the expected output file as well and re-ran the test suite. Again it all passed perfectly.
Once this was all done, I committed my changes to a branch of the fork I’d made of RDiscount and pushed them up to my fork on GitHub.
As I final test I updated the Discount source as well and re-ran it over the file to output the debug tree again. I was pleased to see all the right blocks of the file being shown, lovely.
Update
My fork has been merged with the master and been refined as well.