IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: Nightly
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

String

struct String

Represents a mutable string.

This is Mojo's primary text representation, designed to efficiently handle UTF-8 encoded text while providing a safe and ergonomic interface for string manipulation.

You can create a String by assigning a string literal to a variable or with the String constructor:

# From string literals (String type is inferred)
var hello = "Hello"

# From String constructor
var world = String("World")
print(hello, world) # "Hello World"

You can convert many Mojo types to a String because it's common to implement the Writable trait:

var int : Int = 42
print(String(int)) # "42"

If you have a custom type you want to convert to a string, you can implement the Writable trait like this:

@fieldwise_init
struct Person(Writable):
var name: String
var age: Int

def write_to(self, mut writer: Some[Writer]):
t"{self.name} ({self.age})".write_to(writer)

var person = Person("Alice", 30)
print(String(person)) # => Alice (30)

However, print() doesn't actually specify String as its argument type. Instead, it accepts any type that conforms to the Writable trait (String conforms to this trait, which is why you can pass it to print()). That means it's actually more efficient to pass any type that implements Writable directly to print() (instead of first converting it to String). For example, float types are also writable:

var float : Float32 = 3.14
print(float)

Be aware of the following characteristics when working with String:

  • UTF-8 encoding: Strings store UTF-8 encoded text, so byte length may differ from character count. Use text.count_codepoints() to get the codepoint count:

    var text = "café" # 4 Unicode characters
    print(text.byte_length()) # Prints 5 (é is 2 bytes in UTF-8)
    print(text.count_codepoints()) # Prints 4 (correct Unicode count)
  • Always mutable: You can modify strings in-place:

    var message = "Hello"
    message += " World" # In-place concatenation
    print(message) # "Hello World"

    If you want a compile-time immutable string, use comptime:

    comptime GREETING = "Immutable string" # Fixed at compile time
    GREETING = "Not gonna happen" # error: expression must be mutable in assignment
  • Value semantics: String assignment creates a copy, but it's optimized with copy-on-write so that the actual copying happens only if/when one of the strings is modified.

    var str1 = "Hello"
    var str2 = str1 # Currently references the same data
    str2 += " World" # Now str2 becomes a copy of str1
    print(str1) # "Hello"
    print(str2) # "Hello World"

More examples:

var text = "Hello"

# String properties and indexing
print(text.byte_length()) # 5
print(text[byte=1]) # e (byte slice)
print(text[byte=text.byte_length() - 1]) # o (last character)

# In-place concatenation
text += " World"
print(text)

# Searching and checking
if "World" in text:
print("Found 'World' in text")

var pos = text.find("World")
if pos != -1:
print("'World' found at position:", pos)

# String replacement
var replaced = text.replace("Hello", "Hi") # "Hi World"
print(replaced)

# String formatting
var name = "Alice"
var age = 30
var formatted = "{} is {} years old".format(name, age)
print(formatted) # "Alice is 30 years old"

Related functions:

Related types:

  • StringSlice: A non-owning view of string data, which can be either mutable or immutable.
  • StaticString: An alias for an immutable constant StringSlice.
  • StringLiteral: A string literal. String literals are compile-time values.

Implemented traits

AnyType, Boolable, Comparable, ConvertibleFromPython, Copyable, Defaultable, Equatable, FloatableRaising, Hashable, ImplicitlyCopyable, ImplicitlyDeletable, IntableRaising, Movable, PathLike, Writable, Writer

comptime members

ASCII_LETTERS

comptime ASCII_LETTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

All ASCII letters (lowercase and uppercase).

ASCII_LOWERCASE

comptime ASCII_LOWERCASE = "abcdefghijklmnopqrstuvwxyz"

All lowercase ASCII letters.

ASCII_UPPERCASE

comptime ASCII_UPPERCASE = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

All uppercase ASCII letters.

DIGITS

comptime DIGITS = "0123456789"

All decimal digit characters.

FLAG_HAS_NUL_TERMINATOR

comptime FLAG_HAS_NUL_TERMINATOR = (Int(1) << Int((add bit_width_of[DType.int](), -3)))

Flag indicating string has accessible nul terminator.

FLAG_IS_INLINE

comptime FLAG_IS_INLINE = (Int(1) << Int((add bit_width_of[DType.int](), -1)))

Flag indicating string uses inline (SSO) storage.

FLAG_IS_REF_COUNTED

comptime FLAG_IS_REF_COUNTED = (Int(1) << Int((add bit_width_of[DType.int](), -2)))

Flag indicating string uses reference-counted storage.

HEX_DIGITS

comptime HEX_DIGITS = "0123456789abcdefABCDEF"

All hexadecimal digit characters.

INLINE_CAPACITY

comptime INLINE_CAPACITY = (Int((mul (bit_width_of[DType.int]() // Int(8)), 3)) - Int(1))

Maximum bytes for inline (SSO) string storage.

INLINE_LENGTH_MASK

comptime INLINE_LENGTH_MASK = (Int(31) << Int((add bit_width_of[DType.int](), -8)))

Bit mask for extracting inline string length.

INLINE_LENGTH_START

comptime INLINE_LENGTH_START = (bit_width_of[DType.int]() - Int(8))

Bit position where inline length field starts.

OCT_DIGITS

comptime OCT_DIGITS = "01234567"

All octal digit characters.

PRINTABLE

comptime PRINTABLE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\22#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\v\f"

All printable ASCII characters.

PUNCTUATION

comptime PUNCTUATION = "!\22#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"

All ASCII punctuation characters.

REF_COUNT_SIZE

comptime REF_COUNT_SIZE = size_of[Atomic[DType.int]]()

Size of the reference count prefix for heap strings.

Methods

__init__

def __init__(out self)

Construct an empty string.

def __init__(out self, *, capacity: Int)

Construct an empty string with a given capacity.

Args:

  • capacity (Int): The capacity of the string to allocate.

@implicit def __init__(out self, data: StringSlice[StaticConstantOrigin])

Construct a String from a StaticString without allocating.

Args:

@implicit def __init__(out self, data: StringLiteral)

Construct a String from a StringLiteral without allocating.

Args:

  • data (StringLiteral): The static constant string to refer to.

def __init__(out self, tstring: TString)

Construct a string from a TString (template string).

This constructor enables explicit conversion from TString to String.

Args:

  • tstring (TString): The TString to convert to a String.

def __init__(out self, *, unsafe_from_utf8: Span[UInt8])

Construct a string by copying the data. This constructor is explicit because it can involve memory allocation.

Consider using the String(from_utf8=...) or String(from_utf8_lossy=...) constructors instead, as they are safer alternatives to the unsafe_from_utf8 constructor.

Safety: unsafe_from_utf8 MUST be valid UTF-8 encoded data.

Args:

  • unsafe_from_utf8 (Span[UInt8]): The utf8 bytes to copy.

def __init__(out self, *, from_utf8_lossy: Span[UInt8])

Construct a string from a span of bytes, including invalid UTF-8.

Since String is guaranteed to be valid UTF-8, invalid UTF-8 sequences are replaced with the U+FFFD replacement character: .

Examples:

from std.testing import assert_equal

# Valid UTF-8 sequence
var fire_emoji_bytes = [Byte(0xF0), 0x9F, 0x94, 0xA5]
var fire_emoji = String(from_utf8_lossy=fire_emoji_bytes)
assert_equal(fire_emoji, "🔥")

# Invalid UTF-8 sequence
# "mojo<invalid sequence>"
var mojo_bytes = [Byte(0x6D), 0x6F, 0x6A, 0x6F, 0xF0, 0x90, 0x80]
var mojo = String(from_utf8_lossy=mojo_bytes)
assert_equal(mojo, "mojo�")

Args:

  • from_utf8_lossy (Span[UInt8]): The bytes to convert to a string.

def __init__(out self, *, from_utf8: Span[UInt8])

Construct a string from a span of bytes, raising an error if the data is not valid UTF-8.

Args:

  • from_utf8 (Span[UInt8]): The bytes to convert to a string.

Raises:

An error if the data is not valid UTF-8.

def __init__[*Ts: Writable](out self, *args: *Ts.values, *, sep: StringSlice[StaticConstantOrigin] = StringSlice(""), end: StringSlice[StaticConstantOrigin] = StringSlice(""))

Construct a string by concatenating a sequence of Writable arguments.

Examples:

Construct a String from several Writable arguments:

var string = String(1, 2.0, "three", sep=", ")
print(string) # "1, 2.0, three"

Forwarding from another variadic function:

def variadic_pack_to_string[*Ts: Writable](*args: *Ts) -> String:
return String(*args)

_ = variadic_pack_to_string(1, ", ", 2.0, ", ", "three")

Parameters:

  • *Ts (Writable): Types of the provided argument sequence.

Args:

def __init__(out self, *, unsafe_uninit_length: Int)

Construct a String with the specified length, with uninitialized memory. This is unsafe, as it relies on the caller initializing the elements with unsafe operations, not assigning over the uninitialized data.

Args:

  • unsafe_uninit_length (Int): The number of bytes to allocate.

def __init__(out self, *, unsafe_from_utf8_ptr: UnsafePointer[Int8])

Creates a string from a UTF-8 encoded nul-terminated pointer.

Safety:

  • unsafe_from_utf8_ptr MUST be valid UTF-8 encoded data.
  • unsafe_from_utf8_ptr MUST be null terminated.

Args:

  • unsafe_from_utf8_ptr (UnsafePointer[Int8]): An UnsafePointer[Byte] of null-terminated bytes encoded in UTF-8.

def __init__(out self, *, unsafe_from_utf8_ptr: UnsafePointer[UInt8])

Creates a string from a UTF-8 encoded nul-terminated pointer.

Safety:

  • unsafe_from_utf8_ptr MUST be valid UTF-8 encoded data.
  • unsafe_from_utf8_ptr MUST be null terminated.

Args:

  • unsafe_from_utf8_ptr (UnsafePointer[UInt8]): An UnsafePointer[Byte] of null-terminated bytes encoded in UTF-8.

def __init__(out self, *, copy: Self)

Copy initialize the string from another string.

Args:

  • copy (Self): The string to copy.

def __init__(out self, *, py: PythonObject)

Construct a String from a PythonObject.

Args:

Raises:

An error if the conversion failed.

__del__

def __del__(deinit self)

Destroy the string data.

__bool__

def __bool__(self) -> Bool

Checks if the string is not empty.

Returns:

Bool: True if the string length is greater than zero, and False otherwise.

__getitem__

def __getitem__[I: Indexer, //](self, *, byte: I) -> StringSlice[origin_of(self)]

Gets a single byte at the specified byte index.

This performs byte-level indexing, not character (codepoint) indexing. For strings containing multi-byte UTF-8 characters byte must fall on a codepoint boundary and an entire codepoint will be returned. Aborts if byte does not fall on a codepoint boundary.

Parameters:

  • I (Indexer): A type that can be used as an index.

Args:

  • byte (I): The byte index (0-based).

Returns:

StringSlice[origin_of(self)]: A StringSlice containing a single byte at the specified position.

def __getitem__(self, *, byte: IntLiteral) -> StringSlice[origin_of(self)]

Gets a single byte at the specified byte index.

This performs byte-level indexing, not character (codepoint) indexing. For strings containing multi-byte UTF-8 characters byte must fall on a codepoint boundary and an entire codepoint will be returned. Aborts if byte does not fall on a codepoint boundary.

Args:

Returns:

StringSlice[origin_of(self)]: A StringSlice containing a single byte at the specified position.

def __getitem__(self, *, byte: ContiguousSlice) -> StringSlice[origin_of(self)]

Gets a substring at the specified byte positions.

This performs byte-level slicing, not character (codepoint) slicing. The start and end positions are byte indices. For strings containing multi-byte UTF-8 characters, slicing at byte positions that do not fall on codepoint boundaries will abort.

Args:

  • byte (ContiguousSlice): A slice that specifies byte positions of the new substring.

Returns:

StringSlice[origin_of(self)]: A StringSlice containing the bytes in the specified range.

def __getitem__[I: Indexer, //](self, *, codepoint: I) -> StringSlice[origin_of(self)]

Gets the character at the specified position.

Parameters:

  • I (Indexer): A type that can be used as an index.

Args:

  • codepoint (I): The codepoint index.

Returns:

StringSlice[origin_of(self)]: A StringSlice view containing the unicode codepoint at the specified position.

def __getitem__(self, *, codepoint: ContiguousSlice) -> StringSlice[origin_of(self)]

Gets a substring at the specified codepoint positions.

Args:

  • codepoint (ContiguousSlice): A slice that specifies codepoint positions of the new substring.

Returns:

StringSlice[origin_of(self)]: A StringSlice containing the codepoints in the specified range.

def __getitem__(self, *, grapheme: T) -> StringSlice[origin_of(self)]

Gets the character at the specified position.

Args:

  • grapheme (T): The grapheme index.

Returns:

StringSlice[origin_of(self)]: A StringSlice view containing the unicode grapheme at the specified position.

__lt__

def __lt__(self, rhs: Self) -> Bool

Compare this String to the RHS using LT comparison.

Args:

  • rhs (Self): The other String to compare against.

Returns:

Bool: True if this String is strictly less than the RHS String and False otherwise.

__eq__

def __eq__(self, rhs: Self) -> Bool

Compares two Strings if they have the same values.

Args:

  • rhs (Self): The rhs of the operation.

Returns:

Bool: True if the Strings are equal and False otherwise.

def __eq__(self, other: StringSlice) -> Bool

Compares two Strings if they have the same values.

Args:

Returns:

Bool: True if the Strings are equal and False otherwise.

__ne__

def __ne__(self, other: StringSlice) -> Bool

Compares two Strings if they have the same values.

Args:

Returns:

Bool: True if the Strings are equal and False otherwise.

__contains__

def __contains__(self, substr: StringSlice) -> Bool

Returns True if the substring is contained within the current string.

Args:

Returns:

Bool: True if the string contains the substring.

__add__

def __add__(self, other: StringSlice) -> Self

Creates a string by appending a string slice at the end.

Args:

Returns:

Self: The new constructed string.

__mul__

def __mul__(self, n: Int) -> Self

Concatenates the string n times.

Args:

  • n (Int): The number of times to concatenate the string.

Returns:

Self: The string concatenated n times.

__radd__

def __radd__(self, other: StringSlice) -> Self

Creates a string by prepending another string slice to the start.

Args:

Returns:

Self: The new constructed string.

__iadd__

def __iadd__(mut self, other: StringSlice)

Appends another string slice to this string.

Args:

write

static def write[*Ts: Writable](*args: *Ts.values, *, sep: StringSlice[StaticConstantOrigin] = StringSlice(""), end: StringSlice[StaticConstantOrigin] = StringSlice("")) -> Self

Construct a string by concatenating a sequence of Writable arguments.

Parameters:

  • *Ts (Writable): Types of the provided argument sequence.

Args:

Returns:

Self: A string formed by formatting the argument sequence.

def write[*Ts: Writable](mut self, *args: *Ts.values)

Write a sequence of Writable arguments to the provided Writer.

Parameters:

  • *Ts (Writable): Types of the provided argument sequence.

Args:

  • *args (*Ts.values): Sequence of arguments to write to this Writer.

def write[T: Writable](mut self, value: T)

Write a single Writable argument to the provided Writer.

Parameters:

  • T (Writable): The type of the value to write, which must implement Writable.

Args:

  • value (T): The Writable argument to write.

static def write[T: Writable](value: T) -> Self

Write a single Writable argument to the provided Writer.

Parameters:

  • T (Writable): The type of the value to write, which must implement Writable.

Args:

  • value (T): The Writable argument to write.

Returns:

Self: A new String containing the written value.

capacity

def capacity(self) -> Int

Get the current capacity of the String's internal buffer.

Returns:

Int: The number of bytes that can be stored before reallocation is needed.

write_string

def write_string(mut self, string: StringSlice)

Write a StringSlice to this String.

Args:

  • string (StringSlice): The StringSlice to write to this String.

append

def append(mut self, codepoint: Codepoint)

Append a codepoint to the string.

Args:

  • codepoint (Codepoint): The codepoint to append.

__iter__

def __iter__(self) -> CodepointSliceIter[origin_of(self)]

Iterate over the string, returning immutable references.

Returns:

CodepointSliceIter[origin_of(self)]: An iterator of references to the string elements.

__reversed__

def __reversed__(self) -> CodepointSliceIter[origin_of(self), False]

Iterate backwards over the string, returning immutable references.

Returns:

CodepointSliceIter[origin_of(self), False]: A reversed iterator of references to the string elements.

__fspath__

def __fspath__(self) -> Self

Return the file system path representation (just the string itself).

Returns:

Self: The file system path representation as a string.

write_to

def write_to(self, mut writer: T)

Formats this string to the provided Writer.

Args:

  • writer (T): The object to write to.

write_repr_to

def write_repr_to(self, mut writer: T)

Formats this string slice to the provided Writer.

Notes: Mojo's repr always prints single quotes (') at the start and end of the repr. Any single quote inside a string should be escaped (\').

Args:

  • writer (T): The object to write to.

join

def join[T: Copyable & Writable](self, elems: Span[T]) -> Self

Joins string elements using the current string as a delimiter. Defaults to writing to the stack if total bytes of elems is less than buffer_size, otherwise will allocate once to the heap and write directly into that. The buffer_size defaults to 4096 bytes to match the default page size on arm64 and x86-64.

Notes:

  • Defaults to writing directly to the string if the bytes fit in an inline String, otherwise will process it by chunks.
  • The buffer_size defaults to 4096 bytes to match the default page size on arm64 and x86-64, but you can increase this if you're joining a very large List of elements to write into the stack instead of the heap.

Parameters:

  • T (Copyable & Writable): The type of the elements. Must implement the Copyable, and Writable traits.

Args:

  • elems (Span[T]): The input values.

Returns:

Self: The joined string.

bytes

def bytes(self) -> BytesIter[origin_of(self)]

Returns an iterator over the raw UTF-8 bytes of this string.

Unlike codepoints() and graphemes(), this iterator operates at the byte level and yields individual Byte values without interpreting multi-byte UTF-8 sequences.

Examples:

Iterate over the bytes of an ASCII string:

from std.testing import assert_equal, assert_raises

var s = String("abc")
var iter = s.bytes()
assert_equal(next(iter), Byte(ord("a")))
assert_equal(next(iter), Byte(ord("b")))
assert_equal(next(iter), Byte(ord("c")))
with assert_raises():
_ = next(iter) # raises StopIteration

Multi-byte UTF-8 sequences are yielded as individual bytes:

from std.testing import assert_equal

# "é" is encoded in UTF-8 as two bytes: 0xC3 0xA9.
var s = String("é")
var iter = s.bytes()
assert_equal(next(iter), Byte(0xC3))
assert_equal(next(iter), Byte(0xA9))

Returns:

BytesIter[origin_of(self)]: An iterator type that returns successive Byte values stored in this string.

codepoints

def codepoints(self) -> CodepointsIter[origin_of(self)]

Returns an iterator over the Codepoints encoded in this string slice.

Examples:

Print the characters in a string:

from std.testing import assert_equal, assert_raises

var s = "abc"
var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
assert_equal(iter.__next__(), Codepoint.ord("b"))
assert_equal(iter.__next__(), Codepoint.ord("c"))
with assert_raises():
_ = iter.__next__() # raises StopIteration

codepoints() iterates over Unicode codepoints, and supports multibyte codepoints:

from std.testing import assert_equal, assert_raises

# A visual character composed of a combining sequence of 2 codepoints.
var s = "á"
assert_equal(s.byte_length(), 3)

var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
with assert_raises():
_ = iter.__next__() # raises StopIteration

Returns:

CodepointsIter[origin_of(self)]: An iterator type that returns successive Codepoint values stored in this string slice.

codepoint_slices

def codepoint_slices(self) -> CodepointSliceIter[origin_of(self)]

Returns an iterator over single-character slices of this string.

Each returned slice points to a single Unicode codepoint encoded in the underlying UTF-8 representation of this string.

Examples:

Iterate over the character slices in a string:

from std.testing import assert_equal, assert_raises, assert_true

var s = "abc"
var iter = s.codepoint_slices()
assert_true(iter.__next__() == "a")
assert_true(iter.__next__() == "b")
assert_true(iter.__next__() == "c")
with assert_raises():
_ = iter.__next__() # raises StopIteration

Returns:

CodepointSliceIter[origin_of(self)]: An iterator of references to the string elements.

codepoint_slices_reversed

def codepoint_slices_reversed(self) -> CodepointSliceIter[origin_of(self), False]

Iterates backwards over the string, returning single-character slices.

Each returned slice points to a single Unicode codepoint encoded in the underlying UTF-8 representation of this string, starting from the end and moving towards the beginning.

Returns:

CodepointSliceIter[origin_of(self), False]: A reversed iterator of references to the string elements.

graphemes

def graphemes(self) -> GraphemeSliceIter[origin_of(self)]

Return an iterator over the grapheme clusters in this string.

A grapheme cluster is what a user would typically think of as a single "character" on screen.

Returns:

GraphemeSliceIter[origin_of(self)]: An iterator yielding each grapheme cluster as a StringSlice.

graphemes_reversed

def graphemes_reversed(self) -> GraphemeSliceIter[origin_of(self), False]

Return an iterator over the grapheme clusters in this string, yielding them in reverse order.

See graphemes() for the definition of a grapheme cluster.

Returns:

GraphemeSliceIter[origin_of(self), False]: A reverse iterator yielding each grapheme cluster as a StringSlice.

grapheme_indices

def grapheme_indices(self) -> GraphemeIndicesIter[origin_of(self)]

Return an iterator over grapheme clusters paired with their byte offsets.

Each yielded element is a Tuple[Int, StringSlice] where the first element is the byte offset at which the grapheme begins.

Returns:

GraphemeIndicesIter[origin_of(self)]: An iterator yielding (byte_offset, grapheme) pairs.

split_at_grapheme

def split_at_grapheme(self, n: Int) -> Tuple[StringSlice[origin_of(self)], StringSlice[origin_of(self)]]

Split this string at the n-th grapheme-cluster boundary.

Args:

  • n (Int): The grapheme-cluster boundary at which to split. Must be non-negative.

Returns:

Tuple[StringSlice[origin_of(self)], StringSlice[origin_of(self)]]: A tuple (prefix, suffix) of StringSlices. The prefix covers grapheme clusters [0, n) and the suffix covers the rest.

count_graphemes

def count_graphemes(self) -> Int

Count the number of grapheme clusters in this string.

This is an O(n) operation.

Returns:

Int: The number of grapheme clusters.

unsafe_ptr

def unsafe_ptr(self) -> UnsafePointer[UInt8, origin_of(self)]

Retrieves a pointer to the underlying memory.

Returns:

UnsafePointer[UInt8, origin_of(self)]: The pointer to the underlying memory.

unsafe_ptr_mut

def unsafe_ptr_mut(mut self, var capacity: Int = Int(0)) -> UnsafePointer[UInt8, origin_of(self)]

Retrieves a mutable pointer to the unique underlying memory. Passing a larger capacity will reallocate the string to the new capacity if larger than the existing capacity, allowing you to write more data.

Args:

  • capacity (Int): The new capacity of the string.

Returns:

UnsafePointer[UInt8, origin_of(self)]: The pointer to the underlying memory.

as_c_string_slice

def as_c_string_slice(mut self) -> CStringSlice[origin_of(self)]

Return a CStringSlice to the underlying memory of the string.

Returns:

CStringSlice[origin_of(self)]: The CStringSlice of the string.

as_bytes

def as_bytes(self) -> Span[UInt8, origin_of(self)]

Returns a contiguous slice of the bytes owned by this string.

Returns:

Span[UInt8, origin_of(self)]: A contiguous slice pointing to the bytes owned by this string.

unsafe_as_bytes_mut

def unsafe_as_bytes_mut(mut self) -> Span[UInt8, origin_of(self)]

Returns a mutable contiguous slice of the bytes owned by this string. This name has a _mut suffix so the as_bytes() method doesn't have to guarantee mutability.

Safety:

  • Any mutation of the byte slice must uphold UTF-8 validity of the overall string.

Returns:

Span[UInt8, origin_of(self)]: A contiguous slice pointing to the bytes owned by this string.

as_string_slice

def as_string_slice(self) -> StringSlice[origin_of(self)]

Returns a string slice of the data owned by this string.

Returns:

StringSlice[origin_of(self)]: A string slice pointing to the data owned by this string.

byte_length

def byte_length(self) -> Int

Get the string length in bytes.

Returns:

Int: The length of this string in bytes.

count_codepoints

def count_codepoints(self) -> Int

Calculates the length in Unicode codepoints encoded in the UTF-8 representation of this string.

This is an O(n) operation, where n is the length of the string, as it requires scanning the full string contents.

Examples:

Query the length of a string, in bytes and Unicode codepoints:

from std.testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(s.count_codepoints(), 7)
assert_equal(s.byte_length(), 21)

Strings containing only ASCII characters have the same byte and Unicode codepoint length:

from std.testing import assert_equal

var s = StringSlice("abc")
assert_equal(s.count_codepoints(), 3)
assert_equal(s.byte_length(), 3)

The character length of a string with visual combining characters is the length in Unicode codepoints, not grapheme clusters:

from std.testing import assert_equal

var s = StringSlice("á")
assert_equal(s.count_codepoints(), 2)
assert_equal(s.byte_length(), 3)

Notes: This method needs to traverse the whole string to count, so it has a performance hit compared to using the byte length.

Returns:

Int: The length in Unicode codepoints.

set_byte_length

def set_byte_length(mut self, new_len: Int)

Set the byte length of the String.

This is an internal helper method that updates the length field.

Args:

  • new_len (Int): The new byte length to set.

count

def count(self, substr: StringSlice) -> Int

Return the number of non-overlapping occurrences of substring substr in the string.

If sub is empty, returns the number of empty strings between characters which is the length of the string plus one.

Args:

Returns:

Int: The number of occurrences of substr.

find

def find(self, substr: StringSlice, start: Int = Int(0)) -> Int

Finds the offset in bytes of the first occurrence of substr starting at start. If not found, returns -1.

Args:

  • substr (StringSlice): The substring to find.
  • start (Int): The offset in bytes from which to find.

Returns:

Int: The offset in bytes of substr relative to the beginning of the string.

rfind

def rfind(self, substr: StringSlice, start: Int = Int(0)) -> Int

Finds the offset in bytes of the last occurrence of substr starting at start. If not found, returns -1.

Args:

  • substr (StringSlice): The substring to find.
  • start (Int): The offset in bytes from which to find.

Returns:

Int: The offset in bytes of substr relative to the beginning of the string.

isspace

def isspace(self) -> Bool

Determines whether every character in the given String is a python whitespace String. This corresponds to Python's universal separators " \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Returns:

Bool: True if the whole String is made up of whitespace characters listed above, otherwise False.

split

def split(self, sep: StringSlice) -> List[StringSlice[origin_of(self)]]

Split the string by a separator.

Examples:

# Splitting a space
_ = StringSlice("hello world").split(" ") # ["hello", "world"]
# Splitting adjacent separators
_ = StringSlice("hello,,world").split(",") # ["hello", "", "world"]
# Splitting with starting or ending separators
_ = StringSlice(",1,2,3,").split(",") # ['', '1', '2', '3', '']
# Splitting with an empty separator
_ = StringSlice("123").split("") # ['', '1', '2', '3', '']

Args:

Returns:

List[StringSlice[origin_of(self)]]: A List of Strings containing the input split by the separator.

def split(self, sep: StringSlice, maxsplit: Int) -> List[StringSlice[origin_of(self)]]

Split the string by a separator.

Examples:

# Splitting with maxsplit
_ = StringSlice("1,2,3").split(",", maxsplit=1) # ['1', '2,3']
# Splitting with starting or ending separators
_ = StringSlice(",1,2,3,").split(",", maxsplit=1) # ['', '1,2,3,']
# Splitting with an empty separator
_ = StringSlice("123").split("", maxsplit=1) # ['', '123']

Args:

  • sep (StringSlice): The string to split on.
  • maxsplit (Int): The maximum amount of items to split from String.

Returns:

List[StringSlice[origin_of(self)]]: A List of Strings containing the input split by the separator.

def split(self, sep: NoneType = None) -> List[StringSlice[origin_of(self)]]

Split the string by every Whitespace separator.

Examples:

# Splitting an empty string or filled with whitespaces
_ = StringSlice(" ").split() # []
_ = StringSlice("").split() # []

# Splitting a string with leading, trailing, and middle whitespaces
_ = StringSlice(" hello world ").split() # ["hello", "world"]
# Splitting adjacent universal newlines:
_ = StringSlice(
"hello \t\n\v\f\r\x1c\x1d\x1e\x85world"
).split() # ["hello", "world"]

Args:

Returns:

List[StringSlice[origin_of(self)]]: A List of Strings containing the input split by the separator.

def split(self, sep: NoneType = None, *, maxsplit: Int) -> List[StringSlice[origin_of(self)]]

Split the string by every Whitespace separator.

Examples:

# Splitting with maxsplit
_ = StringSlice("1 2 3").split(maxsplit=1) # ['1', '2 3']

Args:

  • sep (NoneType): None.
  • maxsplit (Int): The maximum amount of items to split from String.

Returns:

List[StringSlice[origin_of(self)]]: A List of Strings containing the input split by the separator.

splitlines

def splitlines(self, keepends: Bool = False) -> List[StringSlice[origin_of(self)]]

Split the string at line boundaries. This corresponds to Python's universal newlines: "\r\n" and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Args:

  • keepends (Bool): If True, line breaks are kept in the resulting strings.

Returns:

List[StringSlice[origin_of(self)]]: A List of Strings containing the input split by line boundaries.

replace

def replace(self, old: StringSlice, new: StringSlice) -> Self

Return a copy of the string with all occurrences of substring old if replaced by new.

Args:

Returns:

Self: The string where all occurrences of old are replaced with new.

strip

def strip(self, chars: StringSlice) -> StringSlice[origin_of(self)]

Return a copy of the string with leading and trailing characters removed.

Args:

  • chars (StringSlice): A set of characters to be removed. Defaults to whitespace.

Returns:

StringSlice[origin_of(self)]: A copy of the string with no leading or trailing characters.

def strip(self) -> StringSlice[origin_of(self)]

Return a copy of the string with leading and trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Returns:

StringSlice[origin_of(self)]: A copy of the string with no leading or trailing whitespaces.

rstrip

def rstrip(self, chars: StringSlice) -> StringSlice[origin_of(self)]

Return a copy of the string with trailing characters removed.

Args:

  • chars (StringSlice): A set of characters to be removed. Defaults to whitespace.

Returns:

StringSlice[origin_of(self)]: A copy of the string with no trailing characters.

def rstrip(self) -> StringSlice[origin_of(self)]

Return a copy of the string with trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Returns:

StringSlice[origin_of(self)]: A copy of the string with no trailing whitespaces.

lstrip

def lstrip(self, chars: StringSlice) -> StringSlice[origin_of(self)]

Return a copy of the string with leading characters removed.

Args:

  • chars (StringSlice): A set of characters to be removed. Defaults to whitespace.

Returns:

StringSlice[origin_of(self)]: A copy of the string with no leading characters.

def lstrip(self) -> StringSlice[origin_of(self)]

Return a copy of the string with leading whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Returns:

StringSlice[origin_of(self)]: A copy of the string with no leading whitespaces.

__hash__

def __hash__[H: Hasher](self, mut hasher: H)

Updates hasher with the underlying bytes.

Parameters:

  • H (Hasher): The hasher type.

Args:

  • hasher (H): The hasher instance.

lower

def lower(self) -> Self

Returns a copy of the string with all cased characters converted to lowercase.

Returns:

Self: A new string where cased letters have been converted to lowercase.

upper

def upper(self) -> Self

Returns a copy of the string with all cased characters converted to uppercase.

Returns:

Self: A new string where cased letters have been converted to uppercase.

startswith

def startswith(self, prefix: StringSlice, start: Int = Int(0), end: Int = Int(-1)) -> Bool

Checks if the string starts with the specified prefix between start and end positions. Returns True if found and False otherwise.

The start and end positions must be offsets given in bytes, and must be codepoint boundaries.

Args:

  • prefix (StringSlice): The prefix to check.
  • start (Int): The start offset in bytes from which to check.
  • end (Int): The end offset in bytes from which to check.

Returns:

Bool: True if the self[byte=start:end] is prefixed by the input prefix.

endswith

def endswith(self, suffix: StringSlice, start: Int = Int(0), end: Int = Int(-1)) -> Bool

Checks if the string end with the specified suffix between start and end positions. Returns True if found and False otherwise.

The start and end positions must be offsets given in bytes, and must be codepoint boundaries.

Args:

  • suffix (StringSlice): The suffix to check.
  • start (Int): The start offset in bytes from which to check.
  • end (Int): The end offset in bytes from which to check.

Returns:

Bool: True if the self[byte=start:end] is suffixed by the input suffix.

removeprefix

def removeprefix(self, prefix: StringSlice, /) -> StringSlice[origin_of(self)]

Returns a new string with the prefix removed if it was present.

Examples:

print(String('TestHook').removeprefix('Test')) # 'Hook'
print(String('BaseTestCase').removeprefix('Test')) # 'BaseTestCase'

Args:

  • prefix (StringSlice): The prefix to remove from the string.

Returns:

StringSlice[origin_of(self)]: string[byte=prefix.byte_length():] if the string starts with the prefix string, or a copy of the original string otherwise.

removesuffix

def removesuffix(self, suffix: StringSlice, /) -> StringSlice[origin_of(self)]

Returns a new string with the suffix removed if it was present.

Examples:

print(String('TestHook').removesuffix('Hook')) # 'Test'
print(String('BaseTestCase').removesuffix('Test')) # 'BaseTestCase'

Args:

  • suffix (StringSlice): The suffix to remove from the string.

Returns:

StringSlice[origin_of(self)]: string[byte=:(self.byte_length()-suffix.byte_length())] if the string ends with the suffix string, or a copy of the original string otherwise.

__int__

def __int__(self) -> Int

Parses the given string as a base-10 integer and returns that value. If the string cannot be parsed as an int, an error is raised.

Returns:

Int: An integer value that represents the string, or otherwise raises.

Raises:

If the operation fails.

__float__

def __float__(self) -> Float64

Parses the string as a float point number and returns that value. If the string cannot be parsed as a float, an error is raised.

Returns:

Float64: A float value that represents the string, or otherwise raises.

Raises:

If the operation fails.

format

def format[*Ts: Writable](self, *args: *Ts.values) -> Self

Produce a formatted string using the current string as a template.

The template, or "format string" can contain literal text and/or replacement fields delimited with curly braces ({}). Returns a copy of the format string with the replacement fields replaced with string representations of the args arguments.

For more information, see the discussion in the format module.

Example:

# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world

Parameters:

  • *Ts (Writable): The types of substitution values that implement Writable.

Args:

  • *args (*Ts.values): The substitution values.

Returns:

Self: The template with the given values substituted.

Raises:

If the operation fails.

is_ascii_digit

def is_ascii_digit(self) -> Bool

A string is a digit string if all characters in the string are ASCII digits and there is at least one character in the string.

Note that this currently only works with ASCII strings.

Returns:

Bool: True if all characters are digits and it's not empty else False.

isupper

def isupper(self) -> Bool

Returns True if all cased characters in the string are uppercase and there is at least one cased character.

Returns:

Bool: True if all cased characters in the string are uppercase and there is at least one cased character, False otherwise.

islower

def islower(self) -> Bool

Returns True if all cased characters in the string are lowercase and there is at least one cased character.

Returns:

Bool: True if all cased characters in the string are lowercase and there is at least one cased character, False otherwise.

is_ascii_printable

def is_ascii_printable(self) -> Bool

Returns True if all characters in the string are ASCII printable.

Note that this currently only works with ASCII strings.

Returns:

Bool: True if all characters are printable else False.

ascii_rjust

def ascii_rjust(self, width: Int, fillchar: StringSlice[StaticConstantOrigin] = StringSlice(" ")) -> Self

Returns the string right justified in a string of specified width.

Pads the string on the left with the specified fill character so that the total (byte) length of the resulting string equals width. If the original string is already longer than or equal to width, returns the original string unchanged.

Examples:

var s = String("hello")
print(s.ascii_rjust(10)) # " hello"
print(s.ascii_rjust(10, "*")) # "*****hello"
print(s.ascii_rjust(3)) # "hello" (no padding)

Args:

  • width (Int): The total width (in bytes) of the resulting string. This is not the amount of padding, but the final length of the returned string.
  • fillchar (StringSlice[StaticConstantOrigin]): The padding character to use (defaults to space). Must be a single-byte character.

Returns:

Self: A right-justified string of (byte) length width, or the original string if its length is already greater than or equal to width.

ascii_ljust

def ascii_ljust(self, width: Int, fillchar: StringSlice[StaticConstantOrigin] = StringSlice(" ")) -> Self

Returns the string left justified in a string of specified width.

Pads the string on the right with the specified fill character so that the total (byte) length of the resulting string equals width. If the original string is already longer than or equal to width, returns the original string unchanged.

Examples:

var s = String("hello")
print(s.ascii_ljust(10)) # "hello "
print(s.ascii_ljust(10, "*")) # "hello*****"
print(s.ascii_ljust(3)) # "hello" (no padding)

Args:

  • width (Int): The total width (in bytes) of the resulting string. This is not the amount of padding, but the final length of the returned string.
  • fillchar (StringSlice[StaticConstantOrigin]): The padding character to use (defaults to space). Must be a single-byte character.

Returns:

Self: A left-justified string of (byte) length width, or the original string if its length is already greater than or equal to width.

ascii_center

def ascii_center(self, width: Int, fillchar: StringSlice[StaticConstantOrigin] = StringSlice(" ")) -> Self

Returns the string center justified in a string of specified width.

Pads the string on both sides with the specified fill character so that the total length of the resulting string equals width. If the padding needed is odd, the extra character goes on the right side. If the original string is already longer than or equal to width, returns the original string unchanged.

Examples:

var s = String("hello")
print(s.ascii_center(10)) # " hello "
print(s.ascii_center(11, "*")) # "***hello***"
print(s.ascii_center(3)) # "hello" (no padding)

Args:

  • width (Int): The total width (in bytes) of the resulting string. This is not the amount of padding, but the final length of the returned string.
  • fillchar (StringSlice[StaticConstantOrigin]): The padding character to use (defaults to space). Must be a single-byte character.

Returns:

Self: A center-justified string of length width, or the original string if its length is already greater than or equal to width.

resize

def resize(mut self, length: Int, fill_byte: UInt8 = UInt8(0))

Resize the string to a new length. Panics if new_len does not lie on a codepoint boundary or if fill_byte is non-ascii (>=128).

Notes: If the new length is greater than the current length, the string is extended by the difference, and the new bytes are initialized to fill_byte.

Args:

  • length (Int): The new length of the string.
  • fill_byte (UInt8): The byte to fill any new space with. Must be a valid single-byte utf-8 character.

def resize(mut self, *, unsafe_uninit_length: Int)

Resizes the string to the given new size leaving any new data uninitialized. Panics if the new length does not lie on a codepoint boundary.

If the new size is smaller than the current one, elements at the end are discarded. If the new size is larger than the current one, the string is extended and the new data is left uninitialized.

Args:

  • unsafe_uninit_length (Int): The new size.

reserve

def reserve(mut self, new_capacity: Int)

Reserves the requested capacity.

Notes: If the current capacity is greater or equal, this is a no-op. Otherwise, the storage is reallocated and the data is moved.

Args:

  • new_capacity (Int): The new capacity in stored bytes.