ADR 13: Time-based alphanumeric id generation¶

Status¶

Accepted

Context¶

Current State¶

Payment Ids consist of a 9-digit alpha numeric portion. The orchestration is currently generating the number randomly. A non-random pattern is preferable, however the chance of collisions is very low.

The Request¶

The product ask is to generate only upper case alphanumeric characters for the 9-digit portion.

Problems¶

Moving to upper-case takes us from base 62 to base 36. Collisions will still be low, however could be 1-2 per month.
The random id pattern gives zero context into the ID, such as sequence or source.
A source system is still creating and originating payments for a time, so we can't control the upper or lower bounds.

Objectives¶

Add a partition or sequence to the ids. This could give us context to the time and or originating node.
Reduce or eliminate the chance of collisions.
Generate the id atomically.

Decision¶

The orchestration nodes will generate a 9-character Base36 ID structured as follows:

Timestamp (6 characters):
- Encoded in Base36, representing the number of seconds since the custom epoch (2024).
- This covers 60 years of operation, providing enough time granularity to ensure uniqueness across time.
Worker ID (1 character):
- A single Base36 character uniquely identifies each worker node. This supports up to 36 nodes.
Counter (2 characters):
- Each worker node will have a 2-character Base36 counter to differentiate IDs generated within the same second.
- This counter supports 1,296 unique IDs per second for each worker node.

ID Format

[Timestamp][WorkerID][Counter]

Example

2KNQ8300Z
2KNQ83010
2KNQ83011
2KNQ83012

2KNQ83: Represents the seconds since the epoch (January 1, 2022) in Base36 (6 characters).
3: The worker ID (in this case, worker node 3).
00Z, 010, 011, ...: The counter value for each generated ID (in Base36, using 2 characters).

A code sample provides a working example.

Consequences¶

Positive¶

The requirement to move to upper-alphanumeric will no longer affect the chance of collision.
The design ensures no collisions across 36 nodes generating up to 1,296 IDs per second per node.
Time-based partitioning provides sequential, non-adjacent IDs that are unique across the 60-year operational period.
Atomic generation at the node level ensures thread safety within each worker.
The ID will indicate the worker that created the record.

Negative¶

The 9-character limit constrains the number of worker nodes and IDs generated per second per worker. However, this is sufficient given the current system's requirements of generating 6 million IDs per month.
The complexity is higher than that of random generation.
We will still have a small chance of collision due to the existing ids.
- A collision scenario should be tested to ensure minimal impact.
The worker limit of 36 is reasonable even for future load. If we need to exceed 36 workers, we have these options:
- Route id creation to only 36 specific workers.
  - We only generate ids with outbound, for example.
- Treat the 3 characters for worker + counter as a 16 bit integer as follows:
  - Allocate 6 bits for the Worker ID (64 possible workers).
  - Allocate 10 bits for the Counter (1,024 possible IDs per second per worker).

Also Considered¶

Generate in a SQL server proc¶

Prefer not to add more workload to SQL
We already generate in the app, so this is less invasive.

Perform a lookup to avoid collisions¶

Requires timing consideration for migrations
Affects archive efforts and would require storage of ranges that conflict with potential ids.
Doesn't seem worth the low collision risk that we will have.