Bleach is deprecated, here is methods to come near replicating bleach.clear()
utilizing the nh3 model of .clear()
.
import nh3
def clean_string(string: str) -> str:
return nh3.clear(
string,
tags={
"a",
"abbr",
"acronym",
"b",
"blockquote",
"code",
"em",
"i",
"li",
"ol",
"robust",
"ul",
},
attributes={
"a": {"href", "title"},
"abbr": {"title"},
"acronym": {"title"},
},
url_schemes={"http", "https", "mailto"},
link_rel=None,
)
The massive distinction is not like the safing of HTML executed by bleach, nh3 removes the offending tags altogether. Learn the feedback beneath to see what this implies.
Outcomes:
>>> input_from_user = """<b>
<img src="">
I am not attempting to XSS you <a href="https://instance.com">Hyperlink</a>
</b>"""
>>>
>>>
>>>
>>> bleach.clear(input_from_user)
'<b><img src="">I am not attempting to XSS you <a href="https://instance.com">Hyperlink</a></b>'
>>>
>>>
>>>
>>> clean_string(input_from_user)
'<b>nnI'm not attempting to XSS you <a href="https://instance.com">Hyperlink</a>n</b>'
Benefits of switching to nh3 are:
- nh3 is actively maintained, bleach is formally deprecated.
- I consider the nh3 strategy of stripping tags relatively than permitting safing is safer. The concept of safing is nice, however I’ve all the time puzzled if a artistic attacker might discover a technique to exploit it. So I believe it’s higher to take away the offending tags altogether.
- The preservation of whitespace is actually helpful for preserving content material submitted in a textarea. That is very true for Markdown content material.
- nh3 is a binding to the rust-ammonia mission. They declare a 15x pace enhance over bleach’s binding to the html5lib mission. Even when that could be a 3x exaggeration, that is nonetheless a 5x pace enhance.