fast_srgb8

Function f32_to_srgb8

source
pub fn f32_to_srgb8(f: f32) -> u8
Expand description

Converts linear f32 RGB component to an 8-bit sRGB value.

If you have to do this for many values simultaneously, use f32x4_to_srgb8, which will compute 4 results at once (using SIMD instructions if available).

Input less than 0.0, or greater than 1.0, is clamped to be inside that range. NaN input is treated as identical to 0.0.

§Details

Conceptually, this is an optimized (and slightly approximated — see the “Approximation” section below) version of the following “reference implementation”, which more or less looks like:

// Conceptually equivalent (but see below)
fn to_srgb_reference(f: f32) -> u8 {
    let v = if !(f > 0.0) {
        0.0
    } else if f <= 0.0031308 {
        12.92 * f
    } else if f < 1.0 {
        1.055 * f.powf(1.0 / 2.4) - 0.055
    } else {
        1.0
    };
    (v * 255.0 + 0.5) as u8
}

This crate’s implementation uses a small lookup table (a [u32; 104] – around 6.5 cache lines), and avoids needing to call powf (which, as an added bonus, means it works great in no_std), and in practice is many times faster than the alternative.

Additional, it’s fairly amenable to implementing in SIMD (— everything is easily parallelized aside from the table lookup), and so a 4-wide implementation is also provided as f32x4_to_srgb8

§Approximation

Note that this is not bitwise identical to the results of the to_srgb_reference function above, it’s just very close. The maximum error is 0.544403 for an input of 0.31152344, where error is computed as the absolute difference between the rounded integer and the “exact” value.

This almost certainly meets requirements for graphics: The DirectX spec mandates that compliant implementations of this function have a maximum error of less than “0.6 ULP on the integer side” — Ours is ~0.54, which is within the requirement.

This means function is probably at least as accurate as whatever your GPU driver and/or hardware does for sRGB framebuffers and such — very likely even if it isn’t using DirectX (it’s spec tends to be descriptive of what’s available commonly, especially in cases like this (most cases) where it’s the only one that bothers to put a requirement).

Additionally, because this function converts the result u8 — for the vast majority of inputs it will return an identical result to the reference impl.

To be completely clear (since it was brought up as a concern): despite this approximation, this function and srgb8_to_f32 are inverses of eachother, and round trip appropriately.