Friday, March 1, 2024
HomeJavaScriptProducing Pandoc Heading Identifiers In ColdFusion

Producing Pandoc Heading Identifiers In ColdFusion


Over on my Characteristic Flags e book web site, I am utilizing my e book’s Markdown content material to generate the HTML for the web page. I then use jSoup to inject a desk of contents (TOC); which requires that I insert an identifier into every header component. And, now that I am attempting to make use of Pandoc to generate an EPUB (digital e book) model, I have to make it possible for my ColdFusion-based header identifiers match those that Pandoc will generate within the closing EPUB.

The Pandoc documentation on “Headings and Sections” describes the algorithm that it makes use of to generate the heading identifiers:

  • Take away all formatting, hyperlinks, and so forth.
  • Take away all footnotes.
  • Take away all non-alphanumeric characters, besides underscores, hyphens, and intervals.
  • Exchange all areas and newlines with hyphens.
  • Convert all alphabetic characters to lowercase.
  • Take away the whole lot as much as the primary letter (identifiers might not start with a quantity or punctuation mark).
  • If nothing is left after this, use the identifier “part”.

The Pandoc documentation additionally offers a set of pattern headings and the identifiers that it’ll generate. We are able to use these samples to check our ColdFusion algorithm. And, after all, we’ll make ample use of Common Expressions to unravel this downside.

Within the following ColdFusion code, we’re looping over the samples offered by Pandoc and asserting that our ColdFusion-generated identifier matches the anticipated identifier:

<cfscript>

	// These values are offered within the Pandoc documentation on Headings and Sections.
	assertions = [
		{
			heading: "Heading identifiers in HTML",
			identifier: "heading-identifiers-in-html"
		},
		{
			heading: "Maître d'hôtel",
			identifier: "maître-dhôtel"
		},
		{
			heading: "*Dogs*?--in *my* house?",
			identifier: "dogs--in-my-house"
		},
		{
			heading: "[HTML], [S5], or [RTF]?",
			identifier: "html-s5-or-rtf"
		},
		{
			heading: "3. Purposes",
			identifier: "purposes"
		},
		{
			heading: "33",
			identifier: "part"
		}
	];

	// Let's check the Pandoc header assertions towards our ColdFusion algorithm, yay!
	for ( assertion in assertions ) {

		identifier = generateIdentifier( assertion.heading );

		writeOutput("
			<p>
				Heading: #encodeForHtml( assertion.heading )# <br />
				Anticipated: #encodeForHtml( assertion.identifier )# <br />
				Obtained: #encodeForHtml( identifier )# <br />
				Go: <b>#yesNoFormat( assertion.identifier == identifier )#</b>
			</p>
		");

	}

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I generate a Pandoc part identifier (ie, URL anchor) from the given heading textual content.
	* 
	* ASSUMPTION: For this demo, I'm assuming that each one formatting, hyperlinks, and footnotes
	* have already been eliminated and that we're coping with plain-text header values.
	*/
	public string operate generateIdentifier( required string heading ) {

		var identifier = heading
			.trim()
			// Convert all alphabetic characters to lowercase.
			.lcase()

			// Exchange all areas and newlines with hyphens.
			.reReplace( "s+", "-", "all" )

			// Take away all non-alphanumeric characters, besides underscores, hyphens,
			// and intervals.
			.reReplace( "[^w.-]+", "", "all" )

			// Take away the whole lot as much as the primary letter (identifiers might not start with
			// a quantity or punctuation mark).
			.reReplace( "^[^a-z]+", "" )
		;

		// If nothing is left after this, use the identifier part.
		if ( ! identifier.len() ) {

			return( "part" );

		}

		return( identifier );

	}

</cfscript>

As a basic rule, when utilizing Common Expressions to unravel an issue, all the time transfer the “convert to lowercase” step as high-up within the algorithm as you’ll be able to. That approach, you’ll be able to simplify your patterns through the use of [a-z] as a substitute of [a-zA-Z]; and, you should utilize .reReplace() as a substitute of .reReplaceNoCase(), which will probably be extra environment friendly.

On this ColdFusion code, I’ve used Pandoc’s description of every step as a remark within the code so as to see how every RegEx sample maps to Pandoc’s supposed final result. If Common Expressions seem to be a international language to you, try my video presentation on primary sample utilization. When you begin utilizing patterns, you may discover that they enhance the standard of your developer life.

With that mentioned, if we run this ColdFusion code, we get the next output:

Output of header identifier assertions showing that ColdFusion generated the correct values.

As you’ll be able to see, the heading identifiers generated by our ColdFusion Common Expression replacements match the identifier assertions offered by Pandoc. At this level, I can replace my Characteristic Flags web site logic and never fear in regards to the inter-chapter hyperlinks breaking once I generate my EPUB.

Notice: My Characteristic Flags web site makes use of Flexmark to transform from Markdown to HTML in ColdFusion (throughout web site bootstrapping and initialization); which is why the 2 algorithms have to be aligned. This fashion, I neither want to put in Pandoc on my server nor do I have to commit the generated HTML to my supply management.

Wish to use code from this publish?
Take a look at the license.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments