|
Following the removal of innerHTML manipulation, we no longer need these
sanitization functions.
I've reviewed every safeTags call site to make sure the outputs don't
end up anywhere unsafe, and malicious input can't malipulate DOM or
execute code. These values either end up either as plain text
(textContent, innerText, createTextNode, title, option) or as a URL
path to request assets to the server (encoded using encodeURI).
That is, if safeTags was even effective, considering all that function
did was replace '<' and '>' symbols with Unicode lookalikes. Even the
comment was suggesting the use of fundamentally safer functions instead
of these hacks.
Replace remaining uses of prepChat with unescapeChat as we still need
to do the token substitution (like "<and>" to "&"). decodeChat was
escaping Unicode sequences like \uXXXX, but I don't see the reason for
this, AO2 Client doesn't have this feature, and considering WebSocket
text frames are strictly UTF-8, we don't need these encodings.
|
|
For whatever reason, WebAO decides to normalize almost every string
component in URLs, packets, and INI files to lower case.
First, the glaring issue. In the URLs, this handling of paths is utterly
broken and corrupts data. By mangling characters, you change the
resource identity and break valid URLs. According to section 6.2.2.1 of
RFC 3986 (Case Normalization):
> When a URI uses components of the generic syntax, the component syntax
> equivalence rules always apply; namely, that the scheme and host are
> case-insensitive and therefore should be normalized to lowercase. For
> example, the URI <HTTP://www.EXAMPLE.com/> is equivalent to
> <http://www.example.com/>. The other generic syntax components are
> assumed to be case-sensitive unless specifically defined otherwise by
> the scheme (see Section 6.2.3)
Scheme and host _are_ case-insensitive. Path is _not_, so isn't
everything else. Section 6.2.3 doesn't define any normalization for the
path component in HTTP schemes. Thus, example.com/item and
example.com/Item are two different resources.
I can only think of idiotic conventions of a particular poorly designed
file system when it comes to this absurdity. There's no reason to drag
them around in our developments. For these systems, case doesn't matter
anyway, normalization is their job, not server hosts' who end up having
to either rewrite every URL request for every asset, or mangle their
asset directory and then rewrite almost every INI config (and spam
"showname=Name" everywhere because now your character directory has to
be "name").
So, instead of using absurd ad-hoc solutions to a broken implementation
such as forcing everything to lower case on the server side, this commit
attempts to fix the root issue and make URL handling conformant to
relevant standards.
Similar situation with strings within packets, although not as severe
in practice. Case must be preserved, otherwise it's corrupting data for
no reason. If a normalization is needed, it should be done at the call
site of whatever requires it (like a filtering function), not by the
parser.
As for the INI, it's opinionated. While the values absolutely must not
be normalized, a case can be made for keys and section names: why not
allow "Options", "options", or even "oPtiOnS"? It's more convenient, and
corresponds to the platform quirk of Windows (which Qt unfortunately
inherits in AO2 Client). I don't think there's a good reason to allow
such leniency in parsing, and removing superfluous normalization is a
better move: less data transformations, less ambiguity, more strictness.
In practice, INIs tend to be well-formed, and it's good discipline to
write them this way.
In several places, the case-folding does make sense: callwords,
OOC commands, CSS class names for areas, and character list filters.
These will behave weirdly and inconveniently without it. In most places,
however, it only causes unnecessary breakage.
|