Saturday, May 18, 2024
HomeGolangHTML desk extraction case research

HTML desk extraction case research


You’ve possible heard and browse dozens of tales about generics in Go about extraordinary slices and maps however haven’t but considered a enjoyable strategy to apply this characteristic. Let’s implement the peer of pandas.read_html, which maps HTML tables into slices of structs! If it’s achievable even with Rust, why shouldn’t or not it’s with Go?! This essay will present you an exhilarating mixture of reflection and generics to achieve concise exterior APIs in your libraries.

First, let’s have a look at the direct inspiration for this text — the most well-liked interactive knowledge evaluation library, Pandas: studying HTML appears to be so frequent that it’s deemed a commodity and thus, works exterior of the field:

Instance of utilizing pandas.read_html from Jupyter pocket book.

To comply with the idiomatic desk parsing instance, let’s goal at taking S&P 500 listing from Wikipedia and switch it right into a slice of Ticker cases, the place we annotate each column with a desk header title:

https://nf-x.medium.com/media/d04c4e21f4e4bdbb92810e8dd816c11eThings might even be concise in Go

One thing that we’ll take note of right here can be [Ticker] from “NewSliceFromURL[Ticker](URL).” This Go 1.18+ characteristic known as kind parameter is our fancied strategy to inform NewSliceFromURL the title of the sort, the place reflection will help us in uncovering the names of headers. Earlier than generics, you might have written the same API as “NewSliceFromURL( Ticker{}, URL),” although I all the time discovered it reasonably complicated:

Why do we have to go the empty occasion of kind if our goal is passing simply the sort?

Having spent quite a few years writing Java code, I’ve gotten used to the idea of “object mapping” from libraries like Jackson. However this weblog is about Go, and also you’ve in all probability landed right here to determine find out how to obtain the same factor. You could have assumed that Go generics “simply work,” however your degree of enjoyment relies on your affinity to the opposite programming ecosystems. As of the time of this writing, strategies can not but have kind parameters, which “opens the flood gates of creativity” for API design. Right here’s the illustration:

Eradicating the necessity for Public API to initialize the empty struct with a kind parameter

It seems to be a bit like magic, however right here’s the simplified mind-set about it: once we name “NewSliceFromURL[Ticker](),” the compiler substitutes kind parameter references with the precise kind and dummy T in feeder[T] kind turns into dummy Ticker. Nonetheless exhausting to comply with, however thrilling? Please learn a few introductory articles (or extra superior ones).

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments