On-Prem Connector Host Resiliency

Dan · January 24, 2024, 1:22pm

Hi Team,

We use an On-Prem Connector Host for each of our instances, and in general it works great. Every so often, however, we face issues with a connector host and it results in a loss of connectivity (and therefore machine data) for the duration of the outage.

It would be great to be able to deploy multiple OPCHs per instance, and to automatically failover between them if an issue is detected with the primary CH. This would decrease disruption to our end-users and, buy us valuable time to adequately troubleshoot and resolve issues.

Pete_Hartnett · January 24, 2024, 5:13pm

Hey Dan -

Thanks for the suggestion. A few questions on how we have thought about approaching this -

One approach we have considered is adding a connector host state chage as an event that can trigger an automaton, this would allow you to do a whole lot more than assigning a new connector host, including sending an email to IT stakeholders, tracking any potential outages to a table or other external system, etc. The downside here is it is build it yourself but the upside is you get tons of flexibility.
Another approach that we have considered is just building this as a point solution on top of the connector host UI, as you propose. Less flexible, but easier to configure.

Do you have a high level preference here as far as approach?

Both of these approaches come with the added challenge that the connector host will need to be of the same version (with the same functionality) to ensure that all of the associated connectors will be supported. When creating functions, we automatically check what capabilities the assigned connector host as and block those that are not possible on the respective connector host, if we allowed redundancy we would need to make sure that we do a further layer of checking for the primary and secondary connector host.

We do have a number of customers who are running OPCH instances within contained pods that monitor the connector host health, and will automatically spin up new instantiations of the OPCH of the health of the connection to Tulip is poor, addressing this problem fairly well. Not zero downtime, but well under 1 minute. I can connect you with some resources at those customers if you would be interested in their deployment details.

Pete

Dan · January 30, 2024, 5:55pm

Hey Pete,

Thanks for the thorough response.

I think the first option to trigger an automation is inherently preferable, provided that cutting over to a new CH is an available action. This would be particularly valuable for customers that may not have mature infrastructure monitoring tools outside of Tulip, though for us it would mostly be a matter of convenience to be able to configure alerts, etc. in Tulip.

I’ll get in touch to discuss HA options in more depth, as that is a preferable setup for us in most scenarios.

jjj · August 30, 2024, 1:26pm

I would agree with this. A notification about a CH offline would be the best quick solution for an ecosystem like ours.

I’m looking into monitoring tools but our global IT structure makes it challenging for new applications to be approved. Particularly cloud-based ones.

Topic		Replies	Views
Provide ability to tie connector state to general network / connectivity state within the Tulip Player and an app Product Suggestions	2	75	July 1, 2024
On-Premise Connector Host (OPCH) Help Support, Troubleshooting, & Help devices	3	357	January 10, 2024
Host Connector OPCH logs - need more detail Product Suggestions connector-host , opch , on-prem	3	60	March 31, 2025
Connector Host status Product Suggestions	1	54	August 30, 2024
Connector function timeout inside an automation Support, Troubleshooting, & Help	4	70	July 11, 2025

On-Prem Connector Host Resiliency

Related topics