Range

A business needs to bucket sets of consecutive, ordinal, data for aggregation purposes like age group, salary band, period, life cycle phase, and so on. This is different from grouping non-consecutive, nominal, data into buckets, such as hierarchies.

You need to store a complete definition of all buckets and the values they contain, while allowing for minimum and maximum values and gaps between buckets. Bucket boundaries can change regularly, while the precision and scale of boundaries changes at a slower rate.

Create a range with an inclusive lower boundary and an exclusive upper boundary, also known as [closed-open). Two boundaries allow you to handle extremities and gaps between ranges. System maximums for the extremities can cause overflow errors when used in calculations. Negative and positive infinities need special values; check if your chosen infinity values need three-valued logic; i.e. NULLs.

Modify each boundary name to show the inclusive or exclusive state, e.g. since/until for periods and from/up to for numbers. Boundaries hold real data values or the data's sorting values: e.g. January through December are ordinal if you use numbers 1 through 12 to sort them. Identify the correct data type, and its precision and scale, or grain.

Enforce uniqueness if overlap between ranges cannot be tolerated; this is not easy.

The exclusive upper boundary removes +/-1 calculations, except for ranges the size of a single data point; the grain. This allows data types to change to a finer grain — DECIMAL(6,2) becomes DECIMAL(8,4) — without having to update code or values, because the niggling +/-1 is obsolete[1].

The Range pattern cannot use the ANSI-SQL BETWEEN operator, which would be used to compare a single data point against a [closed-closed] range. You can write out the greater-than-or-equal-to and less-than logic and most RDBMS allow overloaded, deterministic functions as an alternative. Three-valued logic, if needed, would invalidate BETWEEN as well.

As a final consequence, joins between consecutive ranges are straight equi-joins without calculations, so both sides of the join can benefit from indexes. And you can query a continuous chain of ranges with a recursive CTE.[2]

Principle Trade-offs
Extend your understanding>Know your limits
Stay relevant<Build enduring
Do right=Share

Footnotes    (↑ returns to text)
  1. If you like a paper copy better…
  2. See also Common Table Expressions and Hierarchical Query

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="">