Intro

Hey everyone, welcome to my Numbers in Rust!

This is a one-stop-shop for all things numbers as they relate to Rust and includes foundational CS information when we think it would be helpful. It is a living document and will continue to grow until it lives up to its goal. If you would like to help, please feel free to contribute via Issues, Pull-Requests, and Discussions on the repo!

Number Representation

In computer science, everything (at the moment) is represented by 1's and 0's, stored in groupings such as bytes (8 bits).
Each bit represents a power of 2, so a group of 8 bits representing the number 130 may look like this.

Bits10000010
Base 10 (Decimal)1286432168421
Base 2 (Binary)2726252423222120

128 + 0 + 0 + 0 + 0 + 0 + 2 + 0 == 130 The above representation is that of an unsigned integer with 8-bits, i.e. Rust's u8.

For simplicity, we'll only be looking at the 8-bit variants of integers in this major section. You can apply these topics to the larger variants by extending the number of bits towards the left-hand side, i.e. 256, 512, etc.

Unsigned Integers

In computer science, everything (at the moment) is represented by 1's and 0's, stored in groupings such as bytes (8 bits).
Each bit represents a power of 2, so a group of 8 bits representing the number 130 may look like this.

Bits10000010
Base 10 (Decimal)1286432168421
Base 2 (Binary)2726252423222120

128 + 0 + 0 + 0 + 0 + 0 + 2 + 0 == 130 The above representation is that of an unsigned integer with 8-bits, i.e. Rust's u8.

For simplicity, we'll only be looking at the 8-bit variants of integers in this major section. You can apply these topics to the larger variants by extending the number of bits towards the left-hand side, i.e. 256, 512, etc.

Rust makes viewing these representations fairly straightforward using one of its convenient formatting


#![allow(unused)]
fn main() {
println!("{:b}", 128);
// 10000000


println!("{:08b}", 1); // packed with 0's, up to 8 places
// 00000001
}

The largest number representable by u8 (all bits set to 1) is 255, and the lowest number (all bits set to 0) is 0. If we tried to represent something larger, say 256, this would happen.


#![allow(unused)]
fn main() {
let num: u8 = 256;
println!("{}", num);

// 
 --> src/main.rs:2:19
  |
2 |     let num: u8 = 256;
  |                   ^^^
  |
  = note: `#[deny(overflowing_literals)]` on by default
  = note: the literal `256` does not fit into the type `u8` whose range is `0..=255`
// rustc's error messages are really thoughtful, thanks compiler team!

}

The two notes above are especially helpful because they

a. try to provide as much detail as possible to the context of the error, possibly helping you fix the issue immediately, and

b. give you keywords to search for more in-depth information (overflowing_literals).

These numbers are unsigned because they do not contain a sign bit, a bit used to indicate whether a number is positive or negative. Therefore unsigned integers can only represent positive values. This makes them perfect for modeling situations where negative values don't exist, like a shopping cart; you can't have -1 headphones in your cart, right? As you'll see in the section on signed integers, the maximum number you can represent in a u8 is larger than that of an i8. This is true for unsigned type to their signed counterpart.

Signed Integers

There are different ways to represent signed integers, Rust uses a method called two's complement. Ben Eater has a video on two's complement where he does an amazing job of explaining things. The next few sub-sections are merely a recap of his video so feel free to skip it if you're familiar with the concept. I'm including this as a convenient reference when looking at how Rust handles integer overflow.

Unsigned Operations

To add two unsigned integers, convert them to their binary representations and add them as you would with pen and paper, by carrying the bit to the next .

128 = 10000000
+ 2 = 00000010
---------------
130 = 10000010

Introducing a Sign Bit, Signed Magnitude

A simple way to represent positive and negative numbers is by adding a sign bit. In this case, 0 represents positive integers while 1 represents negative integers.

64 with sign bit
0 1 0 0 0 0 0 0

-64 with sign bit
1 1 0 0 0 0 0 0

You'll notice we've lost an entire power of 2 by introducing a sign bit. The range we can represent with 8 bits has gone from 0..=255 to -127..=127.

While that's the cost to represent negatives, we've also lost the ability to add/subtract bits how we did previously, at least when negative numbers are involved.

8 + (-8)
  0 0 0 0 1 0 0 0
+ 1 0 0 0 1 0 0 0
-----------------
  1 0 0 1 0 0 0 0  == -16 ???

One's Complement

Sign bit and mirrored bits

24 in 8-bit ones-complement
 +/-  64 32  16  8  4  2  1
  0   0   0   1  1  0  0  0

-24 in 8-bit ones-complement
 +/-  64  32  16  8  4  2  1
  1   1   1   0  0  1  1  1

Almost, but not quite there. The following scenarios still happen.

There are two distinct possibilities for 0.
 +/-  64 32  16  8  4  2  1
  0   0   0   0  0  0  0  0  =  0
  1   0   0   0  0  0  0  0  = -0

!!!!!!!!Fix the next example

Operations with two negative numbers are off-by-one
(-8) + (-10)  
  1 1 1 1 0 1 1 1  // -8
+ 1 1 1 1 0 1 0 1  // -10
-----------------
  1 1 1 1 1 1 0 0  == -1


Two's Complement

One's complement, but with -0 removed, has the effect of converting the largest bit to a negative.

24 as u8
 128  64 32  16  8  4  2  1
  0   0   0   1  1  0  0  0

24 as i8
-128  64  32  16  8  4  2  1
  0    0   0   1  1  0  0  0
  
Bit Addition/Subtraction is restored!
You also now know why the min and max values representable with twos-complement are off-by-one!

Flipping Signs with Two's Complement

  1. Invert the bits
  2. Add 1
24 as u8
 128  64 32  16  8  4  2  1
  0   0   0   1  1  0  0  0
  
1) Invert the bits
  1   1   1   0  0  1  1  1
2) Add 1 (carry bits forward)
  1   1   1   0  1  0  0  0
  
-24 as i8
-128  64  32  16  8  4  2  1
  1   1   1   0  1  0  0  0

-128 + 64 + 32 + 8 
-128 + 104
-24
  

There you have it, this is how signed integers are represented in Rust.

Floats

Floats in Rust follow the 2008 revision of the IEEE-754 standard on single-precision floating-points(e.g. f32, float, binary32) and double-precision floating-points(e.g. f64, double, binary64) representations of floating-points.

Don't use floats for financial operations

 1/2, 1/4, 1/8, 1/16, 1/32

Better precision -> smaller range

larger range -> worse precision

Mantissa vs exponent

//TODO: Insert photos of float binary representations

23 binary digits of precision for f32

56 binary digits of precision for f64

Endianness

Won't go into too much detail here, but it wouldn't feel write to have something with this much about data representation and not mention endianness. Endianness is basically the ordering of bytes.

In all the examples I've displayed, you'll notice the Most Significant Byte (MSB) is on the left-hand side with the Least-Significant-Bit (LSB) on the right-hand side. If we were to expand this from 8-bit to 16-bit it would look like this


#![allow(unused)]
fn main() {
//TODO: add rust-examples of how to see these representations
}

  |---------------------------------------------------------unsigned 32-bit integer------------------------------------------------- -|
S |-------------------------------2 bytes---------------------------|-------------------------------2 bytes---------------------------| E
T |---------------8-bits -----------------| -----------8-bits-------|---------------8-bits------------------| -----------8-bits-------|
A |-------nibble--------|------nibble-----|----nibble---|---nibble--|-------nibble--------|------nibble-----|----nibble---|---nibble--| N
R |32768 16384 8192 4096|2048 1024 512 256|128 64 32  16| 8  4  2  1| 32768 16384 8192 4096|2048 1024 512 256|128 64 32 16| 8  4  2  1|  
T |  1     0     1    0 |  1    0   1   0 | 1   0  1   1| 1  0  1  1|  1     1     0    0 |  1    1   0   0 | 1   1  0   1| 1  1  0  1| D
  |---------------------|-----------------|-------------|-----------|---------------------|-----------------|-------------|-----------| 
0x|           A         |         A       |       B           B     |           C         |        C        |     D       |    D      |

This is referred to as Big-Endian or Network Byte Order (as it's the ordering used for network traffic).

The opposite ordering is called Little-Endian, and looks like this.

  |---------------------------------------------------------unsigned 32-bit integer------------------------------------------------- -|
S |-------------------------------2 bytes---------------------------|-------------------------------2 bytes---------------------------| E
T |---------------8-bits -----------------| -----------8-bits-------|---------------8-bits------------------| -----------8-bits-------|
A |-------nibble--------|------nibble-----|----nibble---|---nibble--|-------nibble--------|------nibble-----|----nibble---|---nibble--| N
R |32768 16384 8192 4096|2048 1024 512 256|128 64 32  16| 8  4  2  1| 32768 16384 8192 4096|2048 1024 512 256|128 64 32 16| 8  4  2  1|  
T |  1     1     0    1 |  1    1   0   1 | 1   1  0   0| 1  1  0  0|  1     0     1    1 |  1    0   1   1 | 1   0  1   0| 1  0  1  0| D
  |---------------------|-----------------|-------------|-----------|---------------------|-----------------|-------------|-----------| 
0x|          D          |        D        |      C      |    C      |           B         |         B       |       A     |     A     |

However, many of the most popular architectures and operating systems are Little-Endian, where the LSB would be on the left-hand side and MSB on the right-hand side. This means that little-endian architectures need to convert the byte ordering of network traffic in order to use it.

This is a bit of a black-hole of a subject so we won't go much further into it. Just know that this is an annoying thing to be aware of if you're ever implementing things where byte ordering is a thing.

The Rust standard library has numerical methods to identify, enforce, and referse byte orderings. Just look for methods with letters like le (little-endian), be (big-endian), and ne (native endian).

Example of endianness and compatibility Endianness, alignment, and char signage incompatibility

isize and usize

According to The Rust Reference, these integer types are machine-dependent and use the same number of bits as that of a pointer in the machine. This makes usize and isize proportional to the size of the machine's address space, and this is determined by its architecture (x86-64, etc). These types can be 16-bit, 32-bit, or 64-bit depending on the machine. However, due to many pieces of Rust code assuming sizes of 32-bit or 64-bit (common pointer sizes in modern architectures), 16-bit support is limited. The primary concerns around machine-dependent types are that of portability and security. By using isize or usize at inappropriate points in your code, you're introducing variation in how your code could operate. These could manifest as overflows on machines with 16-bit pointers due to the lack of support as mentioned at the beginning of this section or something less obvious (which is arguably worse).

So when should I use them? After some a few posts on the Rust User Forum, reddit, and stackoverflow, their usage tends to gravitate around two things:

  • indexing into an array
  • offsets

Even when it comes to the above, usize appears to be the dominant type used in these cases, with few general use-cases for isize.

If you have any opinions on this or any major use-cases you think would be useful for others, please feel free to open an issue/pull-request/discussion!

You could argue that this narrow scope actually leads to further introspection when it comes to what you're writing. It forces you to think more deeply about the bounds of the operations you'll be performing, and in doing so, make it easier to select the most appropriate type for the situation. From a holistic view, this makes your logic more robust and better aligned with the spirit of Rust.

Basically, stick to explicit integer types as much as possible and be wary when using machine-dependent integers outside of the narrow scope above.

Originally called int/uint, but was one of the early RFCs. https://github.com/rust-lang/rfcs/blob/master/text/0544-rename-int-uint.md

Operators

Operators are made available to different types and structures via certain traits (Add, Sub, Rem, etc). However, these traits (and therefore operators) have different implementations depending on the type. These implementations may have general norms and exceptions depending on the type and the operation. This section will help you determine if you can use a regular operator for an operation or if you should use an overflow-specific method.

https://github.com/rust-lang/rfcs/pull/560

Add (+)

Sub (-)

Mul (*)

/

Rem (%)

Converting with From

For numeric types, From is only implemented for lossless conversions.


#![allow(unused)]
fn main() {
let ten_as_i64 = i64::from(10_i32) // Ok!
let ten_as_i32 = i32::from(10_i64) // Panics!
}

Casting with As

as works for both lossless and lossy conversions.

When casting from one integer type to another integer type, with the same number of bits, it is a no-op, as in nothing happens. The underlying bits have not changed, merely the lens which these bytes are interpreted changes from that of a u128 to an i128. Unlike From, the true value of the integer is not maintained and the bits are merely interpreted as the destination type's encoding.


#![allow(unused)]
fn main() {
println!("128_u8 `as` i8 becomes {}", 128_u8 as i8);
// 128_u8 `as` i8 becomes -128
}

Rust Reference on Type Casting

Rust Reference on Casting Semantics

https://doc.rust-lang.org/stable/rust-by-example/types/cast.html

Overflow

Rust Reference on Overflow

Integer operators will panic when they overflow when compiled in debug mode. The -C debug-assertions and -C overflow-checks compiler flags can be used to control this more directly. The following things are considered to be overflow:

  • When +, * or - create a value greater than the maximum value, or less than the minimum value that can be stored. This includes unary - on the smallest value of any signed integer type.
  • Using / or %, where the left-hand argument is the smallest integer of a signed integer type and the right-hand argument is -1.
  • Using << or >> where the right-hand argument is greater than or equal to the number of bits in the type of the left-hand argument, or is negative.

The following is a paraphrasing from Myths and Legends about Integer Overflow in Rust

Rust's standard library provides four additional methods to handle bit overflows. These explicit implementations give you precise control over your numerical operations to ensure a defined behavior.

  • wrapping_<add/sub/mul/div/etc...>
  • saturating_<add/sub/mul/div/etc...>
  • overflowing_<add/sub/mul/div/etc...>
  • checked_<add/sub/mul/div/etc...>
  • unchecked_<add/sub/mul/div/etc...> // nightly-only (unchecked_math)

These methods will prevent panicking when overflow occurs and let you overflow purposefully if that's your intention (like in hashing algorithms and ring buffers).


#![allow(unused)]
fn main() {
//TODO: Elaborate on these further, especially in regard to signed integers`
}
  • wrapping_.. returns the straight two's complement result
  • saturating_.. returns the largest/smallest value (as appropriate) of the type when overflow occurs
  • overflowing_.. returns the two's complement result along with a boolean indicating if overflow occured
  • checked_.. returns an Option that's None if overflowing occurs
  • unchecked_.. assumes overflow cannot occur. Results in undefined behavior when result > <int>::MAX || result < <int>::MIN

#![allow(unused)]
fn main() {
//TODO: examples
}

Rust vs C/C++ on overflows

Reddit: Why are i32s the Fastest?

Wrapping

Saturating

Overflowing

Checked

Unchecked

Bitwise Operations

AND (&)

OR (|)

XOR (^)

NOT (!)

L-Shift (<<)

R-Shift (>>)

Constants

As great as it is to know what happens under the hood, it would be nice to have convenient ways to...

  • check what is the difference in precision for Pi as an f32 vs an f64
  • see what the minimum or maximum value for a given type is, like i128 or u128.
  • represent infinity or negative infinity or NaN

That's where constants come in.

fun fact: in the source code, NAN is represented as 0.0_f32 / 0.0_f32, INFINITY as 1.0_f32 / 0.0_f32, and NEG_INFINITY as -1.0_f32 / 0.0_f32. Dividing by zero appears to have its uses :)


#![allow(unused)]

fn main() {
//TODO: Add examples
}

Check a given type's documentation to see what kind of constants they have available. For example, std::i128 and std::f32

Popular Crates

Maintained Crates only

General

  • rand: Random number generators and other randomness functionality.
  • num-bigint: Big integer implementation for Rust.
  • num-traits: Numeric traits for generic mathematics
  • num-integer: Integer traits and functions.
  • number-prefix: Library for numeric prefixes (kilo, giga, kibi).

Linear Algebra

Statistics

  • statrs: Statistical computing library for Rust.

Geometry

Data Processing

  • polars: DataFrame Library based on Apache Arrow