Expand description
Mapping complex text to a sequence of positioned glyphs.
Shaping is the process of converting a sequence of character clusters into a sequence of glyph clusters with respect to the rules of a particular writing system and the typographic features available in a font. The shaper operates on one item at a time where an item is a run of text with a single script, language, direction, font, font size, and set of variation/feature settings. The process of producing these runs is called itemization and is out of scope for this crate.
§Building the shaper
All shaping in this crate takes place within the purview of a
ShapeContext
. This opaque struct manages internal LRU caches and scratch
buffers that are necessary for the shaping process. Generally, you’ll
want to keep an instance that persists for more than one layout pass as
this amortizes the cost of allocations, reduces contention for the global
heap and increases the hit rate for the internal acceleration structures. If
you’re doing multithreaded layout, you should keep a context per thread.
The only method available on the context is builder
which takes a type that can be converted into a FontRef
as an argument
and produces a ShaperBuilder
that provides options for configuring and
building a Shaper
.
Here, we’ll create a context and build a shaper for Arabic text at 16px:
// let font = ...;
let mut context = ShapeContext::new();
let mut shaper = context.builder(font)
.script(Script::Arabic)
.direction(Direction::RightToLeft)
.size(16.)
.build();
You can specify feature settings by calling the features
method with an iterator that yields a sequence of values that are convertible
to Setting<u16>
. Tuples of (&str, u16) will work in a pinch. For example,
you can enable discretionary ligatures like this:
// let font = ...;
let mut context = ShapeContext::new();
let mut shaper = context.builder(font)
.script(Script::Latin)
.size(14.)
.features(&[("dlig", 1)])
.build();
A value of 0
will disable a feature while a non-zero value will enable it.
Some features use non-zero values as an argument. The stylistic alternates
feature, for example, often offers a collection of choices per glyph. The argument
is used as an index to select among them. If a requested feature is not present
in a font, the setting is ignored.
Font variation settings are specified in a similar manner with the
variations
method but take an f32
to define the value within the variation space for the requested axis:
// let font = ...;
let mut context = ShapeContext::new();
let mut shaper = context.builder(font)
.script(Script::Latin)
.size(14.)
.variations(&[("wght", 520.5)])
.build();
See ShaperBuilder
for available options and default values.
§Feeding the shaper
Once we have a properly configured shaper, we need to feed it some
clusters. The simplest approach is to call the add_str
method with a string:
shaper.add_str("a quick brown fox?");
You can call add_str
multiple times to add a sequence
of text fragments to the shaper.
This simple approach is certainly reasonable when dealing with text consisting of a single run on one line with a font that is known to contain all the necessary glyphs. A small text label in a UI is a good example.
For more complex scenarios, the shaper can be fed a single cluster at a time. This method allows you to provide:
- accurate source ranges per character even if your runs and items span multiple non-contiguous fragments
- user data per character (a single
u32
) that can be used, for example, to associate each resulting glyph with a style span - boundary analysis per character, carrying word boundaries and line break opportunities through the shaper.
This also provides a junction point for inserting a font fallback mechanism.
All of this is served by the functionality in the
text::cluster
module.
Let’s see a somewhat contrived example that demonstrates the process:
use swash::text::cluster::{CharCluster, CharInfo, Parser, Token};
let mut shaper = context.builder(font)
.script(Script::Latin)
.build();
// We'll need the character map for our font
let charmap = font.charmap();
// And some storage for the cluster we're working with
let mut cluster = CharCluster::new();
// Now we build a cluster parser which takes a script and
// an iterator that yields a Token per character
let mut parser = Parser::new(
Script::Latin,
"a quick brown fox?".char_indices().map(|(i, ch)| Token {
// The character
ch,
// Offset of the character in code units
offset: i as u32,
// Length of the character in code units
len: ch.len_utf8() as u8,
// Character information
info: ch.into(),
// Pass through user data
data: 0,
})
);
// Loop over all of the clusters
while parser.next(&mut cluster) {
// Map all of the characters in the cluster
// to nominal glyph identifiers
cluster.map(|ch| charmap.map(ch));
// Add the cluster to the shaper
shaper.add_cluster(&cluster);
}
Phew! That’s quite a lot of work. It also happens to be exactly what
add_str
does internally.
So why bother? As mentioned earlier, this method allows you to customize
the per-character data that passes through the shaper. Is your source text in
UTF-16 instead of UTF-8? No problem. Set the offset
and
len
fields of your Token
s to appropriate values. Are you shaping
across style spans? Set the data
field to the index of your span so
it can be recovered. Have you used the
Analyze
iterator to generate
CharInfo
s containing boundary analysis? This
is where you apply them to the info
fields of your Token
s.
That last one deserves a quick example, showing how you might build a cluster parser with boundary analysis:
use swash::text::{analyze, Script};
use swash::text::cluster::{CharInfo, Parser, Token};
let text = "a quick brown fox?";
let mut parser = Parser::new(
Script::Latin,
text.char_indices()
// Call analyze passing the same text and zip
// the results
.zip(analyze(text.chars()))
// Analyze yields the tuple (Properties, Boundary)
.map(|((i, ch), (props, boundary))| Token {
ch,
offset: i as u32,
len: ch.len_utf8() as u8,
// Create character information from properties and boundary
info: CharInfo::new(props, boundary),
data: 0,
}),
);
That leaves us with font fallback. This crate does not provide the infrastructure
for such, but a small example can demonstrate the idea. The key is in
the return value of the CharCluster::map
method which describes the
Status
of the mapping operation. This function
will return the index of the best matching font:
use swash::FontRef;
use swash::text::cluster::{CharCluster, Status};
fn select_font<'a>(fonts: &[FontRef<'a>], cluster: &mut CharCluster) -> Option<usize> {
let mut best = None;
for (i, font) in fonts.iter().enumerate() {
let charmap = font.charmap();
match cluster.map(|ch| charmap.map(ch)) {
// This font provided a glyph for every character
Status::Complete => return Some(i),
// This font provided the most complete mapping so far
Status::Keep => best = Some(i),
// A previous mapping was more complete
Status::Discard => {}
}
}
best
}
Note that CharCluster
maintains internal composed and decomposed sequences
of the characters in the cluster so that it can select the best form for each
candidate font.
Since this process is done during shaping, upon return we compare the selected font with our current font and if they’re different, we complete shaping for the clusters submitted so far and continue the process by building a new shaper with the selected font. By doing manual cluster parsing and nominal glyph mapping outside the shaper, we can implement per-cluster font fallback without the costly technique of heuristically shaping runs.
§Collecting the prize
Finish up shaping by calling Shaper::shape_with
with a closure that will be
invoked with each resulting GlyphCluster
. This structure contains borrowed data
and thus cannot be stored directly. The data you extract from each cluster and the
method in which you store it will depend entirely on the design of your text layout
system.
Please note that, unlike HarfBuzz, this shaper does not reverse runs that are in right-to-left order. The reasoning is that, for correctness, line breaking must be done in logical order and reversing runs should occur during bidi reordering.
Also pertinent to right-to-left runs: you’ll need to ensure that you reverse clusters and not glyphs. Intra-cluster glyphs must remain in logical order for proper mark placement.
Modules§
- Glyph cluster modeling– output from the shaper.
Structs§
- Context that manages caches and transient buffers for shaping.
- Maps character clusters to positioned glyph clusters according to typographic rules and features.
- Builder for configuring a shaper.
Enums§
- Text direction.