Measuring The Performance Of Typefaces For Users (Part 1)
About The Author
Thomas Bohm studied graphic communication design at college (BTEC, Leicester College, UK) and university (BA, Norwich University of the Arts, UK). Runs User …
Quick summary ↬
In this article, we will be discussing how typefaces work, ways to test typefaces, and broader typographic issues. Though this article is slightly high-end and academic, fear not. The content will be of interest, especially for committed typeface designers and typographers. If you like complex issues and problems, this is for you.
Our focus is on typefaces for reading large amounts of text and information in the most efficient, legible, pleasurable, comprehensible, and effective way possible. For instance, typefaces used for a novel, an academic paper in a journal, or a lengthy online article like this one that uses the Elena typeface, that you are reading now on this webpage. The questions that we will explore are:
Should typefaces be measured? There is no simple answer. The short answer: yes. The long answer: it is a difficult and imprecise task. We will discuss the pros and the cons, and I will show you what things are involved and how we could go about doing it.
A Very Short Introduction To Typefaces
For 100s of years, we have enjoyed using typefaces. These compiled systems for letters and symbols, which are representations of sounds and information, get a lot of use and are a large part of graphic communication.
The first movable type machine, and therefore the first printing press, was created by a man named Bi Sheng, who lived in Yingshan, China, from what we believe to be 970–1051 AD — over four full centuries before Johannes Gutenberg was even born. The moveable type, sculptured in a lead-based alloy — which is essentially metal blocks of letters and symbols that can be moved, arranged, and used for mass printing — was Johannes Gutenberg’s contribution. Fast forward to the early 1960s, phototypesetting systems appeared. These devices consisted of glass disks (one per typeface) that spun in front of a light source, which exposed characters onto light-sensitive paper. Later on, in the 1980s, type started to be used in a digital context in computers. And today, we still have type in a digital context, but it travels through cables, wirelessly on smartphones, and in virtual reality glasses in 3D.
There are many different classifications of typefaces. To name a few: sans serif, serif, slab serif, script, handwritten, display, ornamental, stencil, and monospace. In a way, technology also created new typeface classifications. Today, we even have mixed typefaces with elements of serif and sans serif, such as the Luc(as) de Groot’s typeface TheMix. This diversity adds to the difficulty and complexity of defining and testing typefaces.
Reasons To Measure The Performance Of Typefaces?
Because technology has made it possible to design typefaces easier than ever before, we seem to be reinventing “different types of wheels” that already get the job done. However, rather than reinventing these typefaces, maybe we can get some objective measures, learn from what works and what does not work, and then design objectively better wheels (typefaces).
If your aim is to produce a new typeface based on historical exemplars, tradition, or design, then fine, this is what you will be aiming for. Alternatively, if you want to do something new and expressive, or that has never been done before, then fine, of course. However, some contexts, situations, and users need and demand highly functional typefaces.
As I briefly mentioned, measuring a typeface’s effectiveness is difficult. Since many new typefaces are not supplied with any objective concrete testing data, how do we determine how well they work and where they succeed or fail?
Should We Measure The Typeface Alone, And/Or The Context And Environment That The Typeface Is Used In?
When considering the questions above, we can see that this is a large and complex issue. There are many different types of information, situations in which information is used, types of environments, and there are many different categories of people. Here are some extreme examples:
More after jump! Continue reading below ↓
“TypeScript in 50 Lessons”, our shiny new guide to TypeScript. With detailed code walkthroughs
to be dangerous.
Jump to table of contents ↬
Measuring Typefaces And The Resulting Performance Data
One of the reasons why measuring a typeface’s effectiveness is difficult is that we cannot accurately measure what goes on in people’s minds. Many factors are invisible, as Paul Luna — a professor at the University of Reading’s Department of Typography & Graphic Communication — mentions in this video . In addition, Robert Waller, information designer at the Simplification Centre states:
Furthermore, Ralf Hermann, director of Typography.Guru in his article says:
The observations expressed in these quotes demonstrate that testing typefaces involves many complex factors. Because of this complexity, it has to be carefully controlled and modified, but it may not even be worth the effort.
Consistency And Variables
When testing typefaces or a selection of typefaces against another, we need to keep the typographic design parameters and variables the same, so we do not introduce or change the previously tested type settings. One example is the difference between the typefaces’ x-height’s (the height of a lowercase x) of any two typefaces we are testing. It is unlikely that they will be the same, as x-heights differ greatly. Thus, one of the two typeface x-height’s will seem to be larger in size, although it may be the same point size in the software. I will show you more about typographic variables under the section “Specific Typographic Design Variables Affecting Performance” in the second part of this article.
Robert Waller mentions in “The Clear Print standard: arguments for a flexible approach” that “although both point size and x-height are specified, it is the point size (pt) that is most commonly quoted — and point size is a notoriously imprecise measure.” It is, however, more effective and accurate to set an x-height measurement and set the typefaces being compared to that same x-height measurement. The x-height using point sizes actually results in different sizes — and does not look inconsistent between different typefaces.
Notice on the 1st line that we can see that both typefaces are set to 26 pt in Adobe InDesign. However, if you look at the tops of the “erdana” you can see that they go slightly above the line, so the Verdana typeface is, in essence, larger than the Info Display typeface, even when they are both typeset at 26 points. On the 2nd line, both typefaces have been typeset to a consistent and accurate measurement of an x-height of 5.5 mm. Notice that while the x-height is the same for both typefaces on the 2nd line, it gives a different point size for each typeface. This is why point size is not an accurate way to measure typeface size and for testing and comparing two or more typefaces.
Additionally, how you use and typeset the typeface in the actual typographic design and layout (line length, typeface size, color, spacing, leading, and so on) is probably more important than the actual typeface used. Thus, you could use one of the world’s most legible typefaces, but if you typeset it with a leading of -7 points and a line length of 100 characters, it would be rendered nearly useless.
As you can see, we can’t use a singular factor to measure typefaces. Instead, we need to address multiple factors within the design system. They all have to work well together to bring an ideal and effective final presentation.
Do We Need To Decide On A Base Default Typeface To Standardized Test Typefaces Against?
I would like to make things more complicated. (Remember when I told you this article had some difficult and complex issues?) So as an example, let’s say that we want to test a serif typeface against another serif and then again a sans serif against another sans serif. One would think that one of the two serifs or one of the sans serifs would perform better than the other, right? Well maybe, but not quite. Now, let’s say that we have the previous person testing two serif typefaces and two sans serif typefaces. What would happen if someone else did the same test but then tested their serif and sans serif against a different serif and sans serif typefaces that the 1st person used. Well, the result is simply that two people tested a serif and a sans serif typeface against different serif and sans serif typefaces, and they are not cross comparable.
So, the question is: should we, as a community, decide on base typefaces to test against? So, for a serif, it is quite popular and common in academic journals to test against Times New Roman. So, for sans serif, Arial is again another popular base typeface typically used to test another sans serif against. Then for monospace, Courier?
Last but not least, we have 2 people previously testing typefaces, but what typographic design and typesetting settings and variables did they use? Once again, even more inconsistency is introduced because they would most definitely test their typefaces with different typographic designs and typesetting settings. Do we need to set a base/default typographic design and typesetting, so everyone tests and measures against the same thing?
The Difference Between Near-identical Typefaces: Two Brief Discussions
There are many typefaces, and many of them are very similar or are nearly identical to previous or contemporary versions available. Here are some examples:
Note: For more information, see my article “No more new similar typefaces for extended reading, please!”
If we look at a typeface like Garamond, we can see that there are many versions of Garamond — all with slightly different interpretations of what the ultimate or most accurate version of Garamond is. Furthermore, they are all designed for slightly different uses, contexts, and technological choices:
Typeface designers and foundries supplying these versions of Garamond say theirs is the best, but which one is right? They were all designed for slightly different contexts and technological times. It would be interesting to find out what the performance differences are between these very similar typefaces.
Furthermore, if we compare a typeface like Minion Pro (which is quite robust and sturdy) against a typeface like Monotype Baskerville, we can observe that Minion Pro has more consistent stroke widths and slightly less personality than Monotype Baskerville. In contrast, Monotype Baskerville has more variance in stroke width, with more of a posh and sophisticated personality than Minion Pro. Therefore, how would these differences affect their performance? Perhaps there is no one right answer to these questions.
I certainly feel, in 2022, that we, as designers, researchers, and academics, have certainly built up some fairly sensible and reasonable conclusions based on previous research and previous arguments over the last years. Nevertheless, the questions seem to remain around — what are the characteristics that make typefaces work a little bit better, and what would be “more advisable?”
Kai Bernau’s Neutral Typeface
Neutral, a typeface designed by Kai Bernau, is an interesting example of how an ideal utopian legible typeface might look and be like. We can see in the image below that Bernau has analyzed the fine and very subtle characteristics that are common in neutral typefaces, such Univers, Helvetica, TheSans, and so on.
The term “neutral” basically refers to a typeface design that does not say much or does not say anything and that has a very “no style/anonymous” feel and voice — like the color grey. Think of it like speaking to someone with little personality or who has a not-obvious personality. In his design, Bernau is attempting to find out what an almost merged letter skeleton of all these neutral typefaces would look like when comparing all these typefaces.
Kai Bernau’s Neutral typeface began as a graduation project at KABK (the Academy of Art, The Hague), taking inspiration from typefaces that seem ageless, remain fresh and relevant even decades after they were designed. His project was constructed based on a set of parameters derived by measuring and averaging a number of popular 20th-century sans serif fonts (like Helvetica, Univers, Frutiger, Meta, and TheSans), trying to design the ultimate “neutral” font. It is a very interesting idea that builds on previous best practices to find an optimal solution. This is much more like the conceptual typeface design that we need to see in the future. For more, see the “An Idea of a typeface” article by Kai Bernau.
Can A Utopian Highly Legible Typeface Exist?
Bernau’s typeface aims for neutrality and utopian legibility. However, if I asked the question: what would the world’s most legible sans serif, serif, or slab serif typeface look like? How much better than a typically good highly legible sans serif, serif, or slab serif typeface would it be? Furthermore, is it even worth the effort?
Whatever your thoughts are, it is the designer’s job to design things that work and read well. Adding to this conversation, Sofie Beier (a legibility expert) says:
Perhaps, we should consider developing a central list of the elements, research, data, sources, and aspects to create legible and usable typefaces, so we can easily choose? This may lead to better typeface design decisions, choices, and better typefaces in the future.
Changing The Change: From What To What?
Another reason why we need to measure typefaces and know how well they work is highlighted by David Sless, information design pioneer and director of the Communication Research Institute in Australia, in his article “Changing the change: from what to what?”:
This is one of the reasons we need to measure typefaces and know how well they perform. That way, when we design new ones in the future, we can learn from past data and then use that knowledge in future typefaces, rather than relying on a bit of research and personal self-expression.
Redesigning typefaces makes us end up in the same place (whether good or bad), and we are not necessarily making and designing better typefaces. Although typeface design provides us with both the aesthetic appeal to meet the functional needs, it is the functional need and its functional aspect that is frequently missing.
With the thoughts mentioned above, I would like to raise another debate, because I know that typographic discussions and debates are usually beneficial and productive for all involved.
This is a question that is not easily answered. It depends on what position you decide to take. Kris Sowersby, director of Klim Type Foundry argues that typefaces are not tools in his article “A typeface is not a tool”:
Though Sowersby points out a functional aspect, he makes no mention that typefaces are used to achieve certain and precise responses and effects from users’ behaviors and emotional responses. For example, I might choose typefaces specifically because of their legibility — when a typeface is considered legible and easy-to-read by people with less-than-good eyesight. And so a well-designed typeface (or tool) is crucial.
Another reason I may choose a typeface is to bring a certain “more ideal” and to bring a “more specific” response and behavior from people. So, I may also choose a typeface as a tool for better communication with specific audiences. This is similar to why we choose a hammer over another, even though they all do the same job. There are 100s of different types of hammers, but builders do seem to have an “emotional favorite.”
In addition, typefaces can be more or less legible on different screens and monitor resolutions because they can be rendered with varying degrees of quality and sharpness.
Let us move on to a more precise and probably more important aspect, and that is testing data value.
The Two Types Of Testing Data: “Subjective” And “Objective”
When testing, there are two types of testing data: subjective (meaning mainly coming from a personal view and opinion) and objective (coming from a result from reality or the ability to do or not do something). They are valuable in their own ways. However, an objective measurement may be more desirable. It is important to know the difference between the two. Below is a brief description of both as it applies to our topic:
A subjective measure:
A user says: “I can read this typeface better.” This may be the case and what the person feels. However, if the measurement, in this case, is “better,” then the questions are: how much better, what kind of a measure and how accurate a measure is “better,” and how much better (than what) is it? However, what one person likes may not be what another one likes. Is it better because I said so? They may not be able to describe or know why they like it, but they just say: “I like it.” Because this measure is based on what the person feels, it is not accurately measurable.
An objective measure:
A user identified a letter correctly and within a certain timeframe. The data is either correct or incorrect, they could or could not do it, and they did or did not do it in a measurable recorded time span.
Kevin Larson, principal researcher on Microsoft’s Advanced Reading Technologies team, explains:
David Sless also states in his article “Choosing the right method for testing:”
David Sless continues the discussion by adding [slightly reworded and edited by me]:
In summary, the most important question is: what do we want the users to do? Based on the previous examples and discussions in this article, we can see that not all data or information gained is necessarily useful or accurate.
What Do We Want People To Do With Highly Legible Typefaces?
Consulting Sless’ article “Changing the change: from what to what?” again:
Let’s try to outline what we want people to do with highly legible typefaces for extended reading, like reading large amounts of information, effectively and precisely:
More Coming Up In Part 2
Let us dive into more of these amazing complex issues (as I said they would be) in the second part of this article — available next Friday, June 10th. We will look deeply at how we can test typefaces and how to get the best out of it. Stay tuned!
(ah, vf, yk, il, nl)
This content was originally published here.