Skip to content

ADR 13: Time-based alphanumeric id generation

Status

Accepted

Context

Current State

Payment Ids consist of a 9-digit alpha numeric portion. The orchestration is currently generating the number randomly. A non-random pattern is preferable, however the chance of collisions is very low.

The Request

The product ask is to generate only upper case alphanumeric characters for the 9-digit portion.

Problems

  • Moving to upper-case takes us from base 62 to base 36. Collisions will still be low, however could be 1-2 per month.
  • The random id pattern gives zero context into the ID, such as sequence or source.
  • A source system is still creating and originating payments for a time, so we can't control the upper or lower bounds.

Objectives

  • Add a partition or sequence to the ids. This could give us context to the time and or originating node.
  • Reduce or eliminate the chance of collisions.
  • Generate the id atomically.

Decision

The orchestration nodes will generate a 9-character Base36 ID structured as follows:

  • Timestamp (6 characters):
    • Encoded in Base36, representing the number of seconds since the custom epoch (2024).
    • This covers 60 years of operation, providing enough time granularity to ensure uniqueness across time.
  • Worker ID (1 character):
    • A single Base36 character uniquely identifies each worker node. This supports up to 36 nodes.
  • Counter (2 characters):
    • Each worker node will have a 2-character Base36 counter to differentiate IDs generated within the same second.
    • This counter supports 1,296 unique IDs per second for each worker node.

ID Format

[Timestamp][WorkerID][Counter]

Example

2KNQ8300Z
2KNQ83010
2KNQ83011
2KNQ83012
  • 2KNQ83: Represents the seconds since the epoch (January 1, 2022) in Base36 (6 characters).
  • 3: The worker ID (in this case, worker node 3).
  • 00Z, 010, 011, ...: The counter value for each generated ID (in Base36, using 2 characters).

A code sample provides a working example.

Consequences

Positive

  • The requirement to move to upper-alphanumeric will no longer affect the chance of collision.
  • The design ensures no collisions across 36 nodes generating up to 1,296 IDs per second per node.
  • Time-based partitioning provides sequential, non-adjacent IDs that are unique across the 60-year operational period.
  • Atomic generation at the node level ensures thread safety within each worker.
  • The ID will indicate the worker that created the record.

Negative

  • The 9-character limit constrains the number of worker nodes and IDs generated per second per worker. However, this is sufficient given the current system's requirements of generating 6 million IDs per month.
  • The complexity is higher than that of random generation.
  • We will still have a small chance of collision due to the existing ids.
    • A collision scenario should be tested to ensure minimal impact.
  • The worker limit of 36 is reasonable even for future load. If we need to exceed 36 workers, we have these options:
    • Route id creation to only 36 specific workers.
      • We only generate ids with outbound, for example.
    • Treat the 3 characters for worker + counter as a 16 bit integer as follows:
      • Allocate 6 bits for the Worker ID (64 possible workers).
      • Allocate 10 bits for the Counter (1,024 possible IDs per second per worker).

Also Considered

Generate in a SQL server proc

  • Prefer not to add more workload to SQL
  • We already generate in the app, so this is less invasive.

Perform a lookup to avoid collisions

  • Requires timing consideration for migrations
  • Affects archive efforts and would require storage of ranges that conflict with potential ids.
  • Doesn't seem worth the low collision risk that we will have.