Monday, July 15, 2024
HomeGolangTake away non-printable characters from a string in Go (Golang)

Take away non-printable characters from a string in Go (Golang)



When working with exterior information or consumer enter, it’s usually a good suggestion to take away invisible characters that may trigger issues. These characters are “non-printable” – they don’t occupy an area in printing and fall beneath the Different or Separator class within the Unicode customary. For instance, non-printable are:

  • Whitespaces (besides the ASCII area character)
  • Tabs
  • Line breaks
  • Carriage returns
  • Management characters

To take away non-printable characters from a string in Go, you must iterate over the string and examine if a given rune is printable utilizing the unicode.IsPrint() perform. If not, the rune must be ignored, in any other case it must be added to the brand new string.

As an alternative of iterating and manually creating a brand new string within the for loop, you should use the strings.Map(), which returns a duplicate of the string with all characters modified in response to the mapping perform. One of the best half is that the character is dropped if the mapping perform returns a destructive worth for a given rune. So, we are able to return -1 for a non-printable character, and an unmodified rune if the unicode.IsPrint() returns true. See the next instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
bundle predominant

import (
    "fmt"
    "strings"
    "unicode"
)

func predominant() {
    textual content := "bu00a0eu200bhindn"

    fmt.Println(textual content)
    fmt.Println(len(textual content))
    fmt.Println("---")

    textual content = strings.Map(func(r rune) rune {
        if unicode.IsPrint(r) {
            return r
        }
        return -1
    }, textual content)

    fmt.Println(textual content)
    fmt.Println(len(textual content))
}

Output

b e​hind

12
---
behind
6

The unicode.IsPrint() returns true for:

  • letters
  • marks
  • numbers
  • punctuation
  • symbols
  • the ASCII area character

There’s additionally a perform unicode.IsGraphic(), that works nearly the identical, besides that it returns true for all area characters within the class Zs of the Unicode customary.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments