Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric data overflow issue in Redshift profiler ingestion. #16697

Open
mvpkarthik1 opened this issue Jun 18, 2024 · 0 comments
Open

Numeric data overflow issue in Redshift profiler ingestion. #16697

mvpkarthik1 opened this issue Jun 18, 2024 · 0 comments
Assignees
Labels
bug Something isn't working profiler

Comments

@mvpkarthik1
Copy link

The value in the after_etl_ts column is "1694590129000000" and the column type is int8 in redshift.

During profiler ingestion the bewlow query is run by Open metadata.

SELECT
AVG(after_etl_ts) AS mean,
COUNT(after_etl_ts) AS valuesCount,
COUNT(DISTINCT after_etl_ts) AS distinctCount,
MIN(after_etl_ts) AS min,
MAX(after_etl_ts) AS max,
SUM(CASE WHEN after_etl_ts IS NULL THEN 1 ELSE 0 END) AS nullCount,
STDDEV_POP(after_etl_ts) AS stddev,
SUM(CAST(after_etl_ts AS BIGINT)) AS sum
FROM
bronze.logbased5_tpch1_customer_copy
LIMIT 1;

The column contains 1694590129000000 in all 277087 rows.

--------------------------------------------------PROBLEM: -----------------------------------

sqlalchemy.exc.InternalError: (psycopg2.errors.InternalError_) Numeric data overflow (addition)

DETAIL:


error: Numeric data overflow (addition)

code: 1058

context:

query: 18299531

location: numeric_bound.cpp:183

process: query0_103_18299531 [pid=27173]


[SQL: /* {"app": "OpenMetadata", "version": "1.2.5.0"} */

after_etl_ts column is of type BIGINT (INT8 in Redshift), the issue is indeed related to the sum of a large number of large values exceeding the limit for BIGINT. Given that a single value in the column is 1694590129000000 and there are 277,087 rows, the sum would be:
Sum=1694590129000000×277087=4693928334013000000000

This value exceeds the maximum allowable value for BIGINT in Redshift, which is 9,223,372,036,854,775,807.

-------------------------------------------SOLUTION ---------------------------
To avoid the Numeric data overflow (addition) error when dealing with such large sums, you need to handle these large values properly by using the DECIMAL type for the summation and calculations. Here’s the revised query:

SELECT
AVG(CAST(after_etl_ts AS DECIMAL(38, 0))) AS mean,
COUNT(after_etl_ts) AS valuesCount,
COUNT(DISTINCT after_etl_ts) AS distinctCount,
MIN(after_etl_ts) AS min,
MAX(after_etl_ts) AS max,
SUM(CASE WHEN after_etl_ts IS NULL THEN 1 ELSE 0 END) AS nullCount,
STDDEV_POP(CAST(after_etl_ts AS DECIMAL(38, 0))) AS stddev,
SUM(CAST(after_etl_ts AS DECIMAL(38, 0))) AS sum
FROM
bronze.logbased5_tpch1_customer_copy
LIMIT 1;

The above query works fine in DB.

Casting to DECIMAL(38, 0):
This casting is necessary for AVG, STDDEV_POP, and SUM to handle large values without overflow. DECIMAL(38, 0) can handle very large numbers with up to 38 digits of precision.
SUM with DECIMAL(38, 0):
This ensures that the sum calculation can handle very large total values.

@ulixius9 ulixius9 added bug Something isn't working profiler labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working profiler
Projects
None yet
Development

No branches or pull requests

3 participants