Saturday, April 20, 2024
HomeJavaScriptSustaining White Area Utilizing jSoup And ColdFusion

Sustaining White Area Utilizing jSoup And ColdFusion


jSoup is a Java library for parsing and manipulating HTML strings. For the previous couple of years, I have been utilizing jSoup to clean-up and normalize my weblog posts. And now, I am wanting to make use of jSoup to assist me remodel and cache GitHub Gists. On the time of this writing, Gist code is rendered in an HTML <desk> with cells that use white-space: pre because the technique of controlling white area output. jSoup does not parse the CSS; so, it does perceive that it must preserve this white area when serializing the doc again into HTML. If we need to maintain this white area within the resultant doc, we now have to disable fairly printing.

ASIDE: jSoup will naturally preserve white area that’s contained inside a <pre> tag. Nonetheless, that does not apply to components utilizing white-space: pre CSS properties.

The beautiful print settings management how white area is dealt with inside the .html() and .textual content() strategies. These strategies can be utilized to entry components of the jSoup Doc Object Mannequin (DOM); and, are used internally throughout the serialization course of.

The beautiful print settings are outlined on the Doc degree and could be accessed at:

doc.outputSettings()

This object offers a getter / setter for the beautiful printing:

outputSettings.prettyPrint( [ boolean ] )

With a purpose to disable fairly printing and preserve the unique white area, we now have to invoke this technique with (false) earlier than we serialize our doc. To see this in motion, I will parse a Paragraph tag that incorporates main and trailing white area. Then, I am going to serialize the resultant doc: as soon as with fairly printing after which as soon as after fairly printing has been disabled:

<cfscript>

	// Be aware that our interior content material is surrounded by main / trailing areas.
	enter = "<p>     Some content material with areas     </p>";

	doc = javaNew( "org.jsoup.Jsoup" )
		.parseBodyFragment( enter )
	;

	// Let's replace the doc content material (to exhibit that we now have motive to parse and
	// then re-serialize the content material).
	doc.selectFirst( "p" )
		.attr( "data-edited", "true" )
	;

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// By default, fairly printing is enabled inside the doc. This implies, once we go
	// to serialize the doc as HTML, it would normalize all of the textual content. Which suggests,
	// any "pointless" main / trailing areas can be trimmed.
	writeOutput( "<h2> Fairly Print Enabled </h2>" );
	renderDocumentAsPre( doc );

	// After we disable fairly printing, jSoup will go away all of the textual content nodes AS IS, even when
	// they don't seem to be strictly vital.
	doc.outputSettings()
		.prettyPrint( false )
	;

	writeOutput( "<h2> Fairly Print Disabled </h2>" );
	renderDocumentAsPre( doc );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I render the given jSoup doc as an escaped markup inside PRE tags.
	*/
	public string operate renderDocumentAsPre( required any doc ) {

		writeOutput(
			"<pre>" &
				encodeForHtml( doc.physique().html() ) &
			"</pre>"
		);

	}


	/**
	* I create a brand new Java class wrapper utilizing the jSoup JAR recordsdata.
	*/
	public any operate javaNew( required string className ) {

		var jarPaths = [
			expandPath( "./jsoup-1.16.1.jar" )
		];

		return( createObject( "java", className, jarPaths ) );

	}

</cfscript>

Primarily, this ColdFusion code is taking the jSoup DOM and calling .html() on it so as to serialize the DOM again into an HTML string. It is doing this twice, as soon as earlier than and as soon as after the beautiful printing has been disabled. And, once we run this ColdFusion code, we get the next output:

As you’ll be able to see, the primary serialization of the jSoup DOM resulted in stripped-out white area. Nonetheless, after we disabled fairly printing, the second serialization of the jSoup DOM leaves our white area in tact.

Wish to use code from this put up?
Try the license.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments