Member-only story

Aberrations with character encoding in Chrome and Tomcat

What are the defaults and how to change them? Terrible things that happen if the charset is not specified

Marian C.
9 min readJan 23, 2022

If you deal only with English characters you are lucky because you do not face any problems with character encoding. If you deal not only with the Latin alphabet you must have faced at some point boring character encoding-related problems.

According to modern specifications and recommendations all the characters transmitted on the Web should be encoded with UTF-8 (Unicode Transformation Format). In this post I explore how the leading browser Chrome and the most popular Java application server Tomcat comply with the modern Web standards. I explore character encodings in static files, in servlet responses and in data posted by a browser.

First, I experiment with Tomcat 8.5. It is the latest Tomcat version not implementing the modern Servlet API 4.0 (There is no difference between Servlet 4.0 and 5.0). Then I show that Servlet API-related problems of Tomcat 8.5 do not exist in the latest Tomcat 10 because it uses a feature of the latest Servlet API to change the obsolete defaults of Servlet API.

Note, term character encoding has a synonymous term charset. For a web developer both terms mean the same —…

--

--

Marian C.
Marian C.

Written by Marian C.

Java, JavaScript and SQL developer. Interested in data collection and visualization.

Responses (1)