Custom Widget Race Conditions

Hello All!
Can some one please help me understand the race condition in this custom widget?

We use this to iterate over a list of id values that are returned by a http connector request.
The event fired in the forEach callback is used to make a long running POST for each id in the array.

Once in a while, we receive a redundant POST request in the receiving system for a single element in the array. Someone has suggested to me that a race condition in the widget is the cause.

To reiterate, once in a while we receive four POST requests for a GET response with three ids returned. The cause is apparently a race condition in this widget. Much thanks to the person who can help me understand as I do not at this time.

Have you checked how the widget gets initiated? The typical problem with this kind of use - which we have also run into on many occasions - is that there is currently no reliable way to explicitly call a function provided by the custom widget… because it has been designed primarily as an augmentation for custom UI rather than as a function provider.

1 Like

Hey,

Something I understood about those widgets is that they are not synced to Tulip.
It means that your for statement may be executed in 4-20ms and launching all the events to Tulip, while Tulip is still trying to figure out the first one.

My recommendation for those loops:
-Reset your trigger in Tulip, in the “After” Trigger.
-Add a cursor value (integer), initialize at 0.
-When Trigger is on, launch your onEach event with the cursor as index
-In Tulip, at the end of your trigger list for onEach, Increment your cursor by 1.
-Reset your Cursor in Tulip, in the “After” Trigger.

With this, you are forcing your custom widget to wait for Tulip before looping to the next iteration.

KR,
Kevin

1 Like

Thank you Kevin,

What you describe is effectively the implementation of the “Looper v2” widget available in the library. We are planning on implementing this per support.

The issue that we are dealing with specifically is that support is fixated on the idea that a race condition in this widget is the cause of our issue.

I’ll re-iterate the issue: we are receiving four requests (from the “onEach” event) for an array with three elements (one of the requests is exceeding out connector timeout and being retried). Support continues to stand on the idea that this is being caused by some race condition in the widget. Pretty sure it’s not.

Your response echos the fact that the race condition, if it does exist, is caused by the innerworkings of Tulip, and as far as plain old javascript goes the widget is sound.

It’s my hope that Tulip publishes better documentation regarding the behavior of trigger queues and connectors so that when we utilize a supported feature like custom widgets we can build robust, reliable solutions with them.

Leaving this thread open as the solution I am interested in is hearing from others as well regarding the existence (or not!) of a race condition in the widget provided.

Hi @eric.alwine,

Your need for additional documentation is heard - the team here can work towards providing additional insight into how Custom Widgets perform. More specifically, what types of content (in addition to behavior of trigger queues) would be helpful?

In the meantime, we will investigate the issue found in this custom widget.

Sincerely,

Jake

Hi Jake,

Thank you, I am available to answer any questions as understanding the race condition that is apparently in this is my primary goal here.

Documentation Content? Happy to.

  • Connector behavior on timeout. I was under the impression (through experience) that the connectors fail on timeout, our current issue suggests that the request is re-tried at least once. I cannot find any documentation regarding what to expect. This is the answer we seek in the support case we have open. “Race condition in custom widget, use looper v2” is the answer we got.

  • Trigger Queue Behavior A higher level description of how trigger processing behaves would be useful to understand for advanced users building custom widgets. I am operating under the assumption that the triggers for an app are processed through some form of single thread worker queue.

    I.E. in Kevins response he asserts that the forEach in our widget is rapidly registering all of the event trigger calls in the queue before the first has started processing, which I agree could be a likely scenario despite the fact that I disagree that that would cause the symptom we have (a race condition does not explain why we are infrequently receiving four requests on an array with three elements).

    I recognize that this is way more info than the average user needs, but it would be helpful to have a concrete reference for what to expect given that custom widgets are a tool that allows us to push the limits of the platform.

  • Custom Widget Dev Docs I appreciate that Tulip is a low code platform and the documentation by and large does a fantastic job of delivering the appropriate level of information. When it comes to custom widgets, it would be nice to have a more “normal” level of detail in the documentation, given that the users likely have development experience if they are building custom widgets. What are best practices for fireEvent? best practices for registering a change listener with getValue? Security concerns? How does trigger processing interact with fireEvent? what is the lifecycle for variables? (the existing docs do mention that you need to anticipate a certain level of instability in the bindings)

If there are conversations or resources out there that shed light on the subjects above that have eluded me I’ll thank you for a link. I’ve found information to be sparse on these subjects.

Hi @eric.alwine,

After conferring with the team, here’s our response:

“Since your CW doesn’t have any wait functionality, it just runs fireEvent and could create multiple requests in flight. The reason that the team has pushed towards Looper V2 is that it builds in logic to allow it to wait.”

Sincerely,

Jake

Hi Jake,

Respectfully, that still does not explain how fireEvent gets called four times on an array with three elements, which is what the team is inferring is happening.

  1. I have significant evidence that the connector is timing out and re-trying on one element.
  2. I am ultimately looking for information regarding connector timeout behavior that would corroborate #1
  3. Literally every person we have talked to is fixated on the custom widget and will not provide an ANSWER ABOUT THE CONNECTOR BEHAVIOR.

Happy to put in Looper v2 if only to diagnostically eliminate the custom widget so we can make some actual progress and get our answer about the connector behavior.

That is however, the clients call. They are not keen on spending the effort putting in Looper v2 unless it’s going to solve the problem and I’m not confident that it will.

Still hoping someone here can convince me.

Best,

An Update:
I’d like to recognize that @jakerigos answer from the Tulip Team does explain how a race condition can occur in the custom widget I provided. Thank you for helping me understand why the team fixates on race conditions.

Here’s the thing though. Even if the Array.forEach call is outpacing the trigger queue technology (which does constitute race conditions! I admit!) That means that one of the following is true:

  • Javascript Array.forEach() is calling it’s callback an extra time (unlikely!)
  • The Tulip trigger queue is erroneously duplicating a trigger event. If this is what the team means by “sending multiple requests inflight” please clarify.
  • The problem is not in the custom widget

Please keep in mind the symptom! Our API (NetSuite) is receiving an extra POST request once in a while. I have proof that the array had three elements and we received four POST requests. I also have response times on the incoming requests and can show that one of the four exceeded the connector timeout in every case.

Yet support continues to insist that the cause of the problem is a race condition in the custom widget.

Despite the deviation from the subject that the title of this post suggests, I am reserving my solution for the person who can definitively show me that a race condition is the cause of our occasional extra POST.

1 Like

Hey @eric.alwine, please excuse a brief digression from the troubleshoothing: I lead the Customer Education team here I wanted to chime in to say thank you for taking the time to write out these detailed and generous descriptions of your frustrations here. We care a lot about writing documentation at a granularity that can help folks self-diagnose and resolve their problems, and what you’ve identified here are definite gaps. As you suggested, we have a lot of opportunity to better describe expected behavior for folks who are coming from a more technical perspective. The good news is is that we’re actively working on improving this with the Developer Docs (much more coming here soon!), and we know this is a segment of users we currently underserve. This post has generated a lot of conversation behind the scenes that doesn’t come across on the forum.

So this post is mostly just a thank you and a shout out. We only get this right in partnership with our users, and I appreciate the time and care you’ve taken to share with us.

2 Likes

@John
Thank you, happy to be involved. Tulip is a great product and I love building solutions with it!

My mission with this post is as stated, to explicitly understand the “race condition” concern so that I can be confident that I’ve not built anything that will misbehave, especially since custom widgets are such a “red herring” when it comes to support cases regarding apps that use them.

A “How NOT to build custom widgets” detailing all the things I’m sure your internal team knows well about the pitfalls of custom widgets would go a long way!

2 Likes

I wanted to clear up some of the confusion here and provide clarification on the questions in this thread:

  • Connector Timeout Behavior

    • By design, there is no retry logic at the Connector level. If it does exist, this would be a bug.
    • Given that we believe that migrating to Looper v2 might resolve the issue, will resolve similar known issues, and also, make debugging easier, we recommend moving to Looper V2.
  • Trigger Queue Behavior

    • Having the trigger event queue going against the grain of the JS event queue within your custom widget leads to a few race conditions. This is the same structure as the LooperV1 widget. See comments above from @sebme and @kefl who provided some good additional insight about the potential issue here.
  • LooperV2 Widget

    • Custom widgets like Looper v1 (such as the one in question) do not have their order of operations synchronized and contemporaneous with the execution flow of triggers in Tulip.
    • The reason Looper v2 was created was to solve a number of race conditions and bugs that resulted from that abstract fact, and align it with Tulip Triggers.
      • As a result there are a few benefits to migrating to Looper V2:
        • Align the internal execution of Custom Widgets with the internal execution of Triggers.
        • No longer be subject to the various race conditions of Looper V1-esque architecture
        • Make it easier to debug the widget when there are issues.
      • This does not guarantee that the specific issue will be resolved. For this reason, we are still exhaustively investigating internally to prove out our hypothesis of the root cause for the behavior that you are seeing.

In summary, LooperV2 solves many race conditions, and similar known bugs from LooperV1 (and any similar custom widget following that class of architecture).

It does so by aligning its behavior with that of triggers, and it’s an important migration to undergo generally for debugging and reliability.

@eric.alwine thanks for the information and thorough investigation on this! We are still working to further improve custom widgets and our documentation.

1 Like

@aaronstone thank you for taking the time to help us understand the constraints of the platform at this level. Shout out to Mark as well.

It turns out that there are limitations to the throughput in the trigger queue that have been observed to manifest in duplicate trigger events.

This is effectively what @ jakerigos communicated, and was at the core of @ kefl’s and @ sebme’s contributions.

So, for a straight forward, intuitively written peice of javascript code like looper v1 (or the widget we are discussing here), the knowledge of caveats in the supporting infrastructure are critical for a developer to know if they have any hope of avoiding this mistake. In any other context, it would be ludicrous to assume that an asynchronous call inside of a forEach would result in extra calls.

@John, this is the sort of concept that definitely deserves it’s own chapter in your Dev Docs

Big thanks to everyone involved on this thread. As @aaronstone mentioned it is still undetermined that a race condition is the cause of our duplicate POST, but we have logging enabled and will be able to determine definitively what is happening soon. Given the new knowledge regarding duplicate trigger events, I’m inclined to agree that it’s more likely that we are seeing a race condition issue rather than an errant retry.

Will follow up with a solution when we have the answer.

Cheers

1 Like

@kefl how did you come to this understanding? Was it explained to you by someone, did you come across a thread here, did you read the looper v2 implementation and infer the cause?

@eric.alwine Well, I just realized that the official looper widget in our instance is still the v1. My previous answer was a description of the development of our own custom looper, which seem to be similar to Looper V2 :sweat_smile:.
The reason I wanted to update it was a mismatch between the visual bar progression queue and the actual Interactive table actions.

I tend to customize all the official custom widgets to have more inputs/different behavior or even more robust code. Example: the Stopwatch, I don’t like the calling of interval start and clear with this.t

KR,
Kevin

Hey @eric.alwine -

Pete here, I am responsible for Connectors at Tulip. Retrying for connectors certainly isn’t the expected behavior for functions so we have been doing some investigation to see if we could reproduce this. In this investigation we think this is may be tied how Firefox automatically handles 408 response codes (timeouts).

Can you share a little more information on where you are running your app? Are you using the browser Tulip player or the downloaded Player? Dev Mode? What browers are you using?

We are activly discussing internally what we could do to address this unintended behavior, but we want to make sure we understand root cause before we take more action.

Thanks
Pete

2 Likes

@Pete_Hartnett, my man.

This client does use the browser player and I believe their browser of choice may be Firefox. It’s not uniform across the across the board, however and they also use the player.

I’ll find out specifics and report back.

Looking into the 408 retry browser behavior now. Please keep me in the loop as your thought process evolves.

Hey @eric.alwine -

This information would be great - We have blockers in the product that wont allow player to run in unsupported browsers, but in our testing we have discovered some cases where those are not blocking users. The team is investigating this as we speak as well, and will get these gaps filled.

Pete

@Pete_Hartnett
Confirmed. They are using Firefox at the stations where they need to use a browser instance.

I was unaware that Firefox is not supported, but have now discovered and reviewed (gxp-docs.tulip.co/lts6-rev1/overview/S_PLAYER.html Amazing! Just discovered.) I have communicated this to the client and they are co-ordinating the switch to chrome.

If there is any reason we should wait, (perhaps to test your patch for the unsupported browser prevention) please let us know. The client does not mind assisting in whatever way we can.

Best,

Hey @eric.alwine -

I would recommend moving to Chrome if possible. The first step on our side will be just to disallow unsupported browsers (including firefox). Today it isn’t too clear if there is a great path for us to fully support Firefox because of this behavior.

Pete