Table Aggregations on whole Table / Querie limit 1000

Hi together,

I’d like to make aggregations on a complete table. Unfortunately they are only possible on queries and queries are still limited to 1000 positions.
This makes min, max, average, sum … useless, when the table could have more than 1000 entries.

Please allow aggregations on a complete table.

I also find queries limited to 1000 questionable.

A while ago I was asked, why I prefer our database with SQL-Connerctor over Tulip Tables.
This is one major reason!

Thanks

Thorsten

Hey @thorsten.langner -

Great suggestion! This is one that we are constantly discussing internally. This arbitrary limit is in place to try to safeguard users from themselves. Queries and Aggregations can take many seconds (if looking at a table with 10s of millions of rows) and we wanted to remove the need for users to understand the intricacy of data science work to be able to sort or filter their tables in a user-facing application and ensure the performance of the query they make is consistent regardless of its size.

This is something we occasionally run into as well, and is something we are looking to unblock, but its something we need to do with caution so users unfamiliar with what is going on in the background have visibility into the expected performance of their applications.

Pete

Hey @Pete_Hartnett,

thanks for your answer.
On the one hand, I completely agree with your fears, on the other hand, I really have concerns about receiving incorrect data without any warning.

At the moment it is very likely to get some return values, without knowing if it refers to all entries or if there are more than 1000. Then the data is not only useless, but potentially dangerous.

1 Like

Totally makes sense. This feedback is incredibly valuable. I will log it and make sure it is tracked as we continue internal discussions on how to approach this limitation.

Pete

1 Like

Hey @Pete_Hartnett

At present, are the queries still limited to 1000 positions ? :fearful:

The limit is still 1000 records at this time. This remains a very active topic internal to Tulip and here in the forum, but there’s no date on when we would raise the limit or deploy new features or settings related to the 1000 record limit.

Hi, any update on this topic? we have some internal issues because of this query limit, and now we need to change some stuff and make workaround to avoid this issue we have. Hopefully this query limit get increased soon.

2 Likes

@RussWaddell Hey following up here as well. Thanks!

Hi @emendoza can you explain more about your use case? How many records are you looking to query?

Hi Pete, based on our current queries for now I think up to 2k records will be great, however if we have the capability to query up to 5k will be awesome!

Thanks

Even 5K is a relatively small number for high volume production or (soon) collection lots of high speed machine data with automations. The focus should probably be on limiting the number of records returned from a query, and optimizing aggregates, rather than overly limiting the number of raw table records that might be involved in calculating those aggregates. Ideally couldn’t most of that work can be delegated to the data store itself?

Exactly.

The Query-Limit must be the same, than the limit of datasets. As long as the limit is smaller, I run the risk of getting wrong results.

This is always a huge issue, because the user will not recognize that there is something wrong.

@pete Why is the use case a question? It is an issue for any use case.
Everytime you do any aggregation, you get wrong results, as soon as the table has more than 1000 entries.
This is dangerous for any serious productive usage.
Even sorting and query only 1 is not reliable for min and max (as workaround) because null values will be given as “max” (to me thats misleading).

So as long as a table could possibly have more than 1000 entries at some point, you can not use aggregations, because at some point you will get wrong results. And we have to be able to rely on the results.

4 Likes

Completely agree. Getting inaccurate data is the worst possible outcome.

2 Likes

@thorsten.langner In this case, I ask about the use case to better understand the performance requirements. For example, does the result need to be immediate (less than 1 second) or is a longer response time acceptable? Our customers have a wide range of use cases, so the extra context is always helpful. We agree that, in all cases, the data must be accurate and are reviewing this now.

1 Like

Sorry for the late reply, I just didn’t see your post:

Here are some use cases:

  • Show all inventories in an interactive table, that are not marked as archived (status)
    Filtering in App should be additional so its possible to add these with an “or”
  • Find an Order with the max or min value
  • find the average cost of quality report in a year
  • show all materials of a storage list unique (we have a storage list with ~1500 materials unique and ~10,000 in total)

In some cases the performance is more relevant, in others less.
But:

  • Performance is one main criteria for user acceptance
  • SQL databases deliver those results in milliseconds. Even with 100,000 entries. This is one additional aspect to consider when deciding between TULIP Tables and SQL Database. Aggregations was released to increase the possibilities of Tables to reduce the needs of Database-Connections.

I wonder: What is the benefit of slower results, or what is the cost of a good performance?

Hi @pete
time goes by but the limitation is still there.
So, if you (Tulip) can’t give us a solution, give us a workaround.

In my case (using the TA for filtering tasks with stored finished tasks) I got 1000 unique records of finished tasks within the last 2 month and then the interactive table filter with the TA won’t work; performance isn’t too much important for this.
So for me a workaround could be to start the aggregation from the newest entry backward and not from the first entry in the table.

So please give us more often information of the process state (last information is 3 month ago)

Regards Chris

@ChrisF

If you change the sort of the query, the limit should apply to the first 1000 rows that the query pulls. This should resolve your issue.

We just came up with another issue, where we needed the amount of records.
Unfortunately the Aggregation always delivers 1000 :upside_down_face:

So @pete … is there any update on this issue?

This is a critical feature gap. This leads to incorrect aggregation results - which is a no-go. Of course we can think about doing workarounds (e.g. download all table data and perform the aggregation outside of Tulip) but then, why are we using Tulip for basic things.

5 Likes

I must have missed this since I have been using lots of aggregations and never thought to check that it was pulling in all records in the table vs. just the top 1000. Most should be fine based on sorting and # or records in the tables but a few may have been displaying incorrect info…i’ll need to investigate.

I understand performance concerns however the developers will quickly determine where aggregations can and can’t be used based on these performance issues while we are testing apps.

Is there a risk to our instances as a whole when one station is utilizing an aggregation from a massive table, or does it just cause the one station to slow down/freeze?