ADR 13: Time-based alphanumeric id generation¶
Status¶
Accepted
Context¶
Current State¶
Payment Ids consist of a 9-digit alpha numeric portion. The orchestration is currently generating the number randomly. A non-random pattern is preferable, however the chance of collisions is very low.
The Request¶
The product ask is to generate only upper case alphanumeric characters for the 9-digit portion.
Problems¶
- Moving to upper-case takes us from base 62 to base 36. Collisions will still be low, however could be 1-2 per month.
- The random id pattern gives zero context into the ID, such as sequence or source.
- A source system is still creating and originating payments for a time, so we can't control the upper or lower bounds.
Objectives¶
- Add a partition or sequence to the ids. This could give us context to the time and or originating node.
- Reduce or eliminate the chance of collisions.
- Generate the id atomically.
Decision¶
The orchestration nodes will generate a 9-character Base36 ID structured as follows:
- Timestamp (6 characters):
- Encoded in Base36, representing the number of seconds since the custom epoch (2024).
- This covers 60 years of operation, providing enough time granularity to ensure uniqueness across time.
- Worker ID (1 character):
- A single Base36 character uniquely identifies each worker node. This supports up to 36 nodes.
- Counter (2 characters):
- Each worker node will have a 2-character Base36 counter to differentiate IDs generated within the same second.
- This counter supports 1,296 unique IDs per second for each worker node.
ID Format
[Timestamp][WorkerID][Counter]
Example
2KNQ8300Z
2KNQ83010
2KNQ83011
2KNQ83012
2KNQ83
: Represents the seconds since the epoch (January 1, 2022) in Base36 (6 characters).3
: The worker ID (in this case, worker node 3).00Z
,010
,011
, ...: The counter value for each generated ID (in Base36, using 2 characters).
A code sample provides a working example.
Consequences¶
Positive¶
- The requirement to move to upper-alphanumeric will no longer affect the chance of collision.
- The design ensures no collisions across 36 nodes generating up to 1,296 IDs per second per node.
- Time-based partitioning provides sequential, non-adjacent IDs that are unique across the 60-year operational period.
- Atomic generation at the node level ensures thread safety within each worker.
- The ID will indicate the worker that created the record.
Negative¶
- The 9-character limit constrains the number of worker nodes and IDs generated per second per worker. However, this is sufficient given the current system's requirements of generating 6 million IDs per month.
- The complexity is higher than that of random generation.
- We will still have a small chance of collision due to the existing ids.
- A collision scenario should be tested to ensure minimal impact.
- The worker limit of 36 is reasonable even for future load. If we need to exceed 36 workers, we have these options:
- Route id creation to only 36 specific workers.
- We only generate ids with outbound, for example.
- Treat the 3 characters for worker + counter as a 16 bit integer as follows:
- Allocate 6 bits for the Worker ID (64 possible workers).
- Allocate 10 bits for the Counter (1,024 possible IDs per second per worker).
- Route id creation to only 36 specific workers.
Also Considered¶
Generate in a SQL server proc¶
- Prefer not to add more workload to SQL
- We already generate in the app, so this is less invasive.
Perform a lookup to avoid collisions¶
- Requires timing consideration for migrations
- Affects archive efforts and would require storage of ranges that conflict with potential ids.
- Doesn't seem worth the low collision risk that we will have.