Sayan Banerjee – BMC Software | Blogs

Future Ready Operations: Enhancing TrueSight Operations Management with BMC Helix AIOps and Observability

Sayan Banerjee — Wed, 17 Apr 2024 13:30:38 +0000

In today’s continuously evolving IT environment, staying ahead sometimes means embracing both the tried and true and the cutting cutting-edge For organizations deeply invested in TrueSight Operations Management, the prospect of integrating advanced artificial intelligence (AI) capabilities might seem like a leap into the unknown. However, the future of operational excellence lies in blending existing strengths with the latest innovations. This is where BMC Helix AIOps and observability come into play, offering a bridge to the future. BMC Helix AIOPs is called as BMC Helix Service Monitoring Tile in the Helix Portal Landing Page.

I have had several inquiries regarding the feasibility of using existing TrueSight Operations Management and connecting it to the award-winning functionality of BMC Helix AIOps and observability. Customers are asking if they can adopt the latest BMC Helix AIOps and observability tools without replacing TrueSight Operations Management with BMC Helix Operations Management.

This blog previews the best practice steps and use cases for customers to connect TrueSight Operations Management to BMC Helix AIOps using BMC Helix Intelligent Integrations, and guidance to helpcustomers using TrueSight Operations Management who want to adopt both BMC Helix Operations Management and BMC Helix AIOps and observability to improve operational efficiency, reduce costs, and yield the benefits of a modern, highly available, cloud-native platform.

Working with available use cases

Use case 1— TrueSight Operations Management and BMC Helix Operations Management event data workflow using BMC Helix Intelligent Integrations

Download, install, and configure the TrueSight Operations Management connector.
Once the connection and integration are successful, send events to it.
Through the connector, the events are propagated to BMC Helix Operations Management.

As a next step, when we do any event operation like a closing event in TrueSight Operations Management, we will see the corresponding event closing in the BMC Helix Operations Management
Please note there is no back propagation, as of now, of event status from BMC Helix Operations Management to TrueSight Operations Management.

Figure 1. Use case 1 flow (TSPS is TrueSight Presentation Server, a component of TrueSight Operations Management).

Figure 2. Use case 1 flow with pros and cons.

Use case 2—TrueSight Operations Management and BMC Helix Operations Management event data topology using BMC Helix Intelligent Integrations.

Download, install, and configure the TrueSight Operation Management connector.
Configuration Items (CIs) created manually in TrueSight Operation Management will get ingested into BMC Helix Operations Management.
In BMC Helix Operations Management, we will observe the CIs getting ingested. In BMC Helix Discovery, we will see these CIs as generic elements.
In BMC Helix Service Monitoring, we will not see these CIs as these are created manually in TrueSight Operations Management.
Please note there is no back propagation, as of now, of event status from BMC Helix Operations Management to TrueSight Operations Management.

Figure 3: Use case 2 flow.

Use case 3—TrueSight Operations Management with a configuration management database (CMDB) and service models integrated into BMC Helix Operations Management topology using BMC Helix Intelligent Integrations

Download, install, and configure the TrueSight Operation Management connector.
This is like use case 2, the only difference here is the service models are created in the CMDB and published from the CMDB to TrueSight Operations Management.
BMC Helix Discovery will show the same topology.
In BMC Helix Service Monitoring (BMC Helix AIOps), we will see the business service models.

Figure 4. Use case 3 flow.

Use case 4—TrueSight Operations Management monitoring vCenter

Add monitoring policy for vCenter KM in TrueSight Operations Management.
vCenter hierarchy builds up in TrueSight Operations Management.
We can see the vCenter CI in BMC Helix Discovery.
We observe that the hierarchy does not have multilevel topology as in TrueSight Operations Management.

There is no use case diagram for this.

This completes the best practice steps you can use when trying to integrate TrueSight Operations Management with BMC Helix Service Monitoring (AIOps) using BMC Helix Intelligent Integrations.

For each of the four use cases, we have recorded a five-minute video demonstration, which can be obtained by request by emailing sayan_banerjee@bmc.com

Conclusion

These use cases help IT teams that want to stay with TrueSight Operations Management while adopting BMC Helix Operations Management with AIOps and using BMC Helix Service Monitoring.

Tailoring Insights: Creating Personalized Dashboards for Users in a Multi-Tenant Environment

Sayan Banerjee — Fri, 05 Apr 2024 10:41:30 +0000

In a managed service provider (MSP)-centric environment, managing the diverse needs of multiple end users within a single tenant environment can be challenging. However, the benefits of personalized dashboards for these end users are significant. Each end user brings unique requirements and preferences for visualizing their information technology (IT) infrastructure within this domain. This is where the multi-tenancy dashboard becomes invaluable. Using tools like BMC Helix Access Controls, MSPs can now seamlessly create personalized dashboard views for individual end users, revolutionizing how data is accessed, analyzed, and utilized within a unified framework.

We’ve had inquiries regarding the feasibility and logistics of implementing personalized dashboards within a multi-tenant environment; one resounding question echoes: “Can it truly be done?” How do you navigate the process of setting up and managing dashboards tailored to multiple end users’ individual roles and preferences?

This blog is designed to address your concerns. We will guide you through the practical application of creating and managing dashboards with distinct users. The answer to the question is a resounding “Yes, it can be done!” We will equip you with the necessary steps and insights to make this process a reality in your multi-tenant environment.

Creating a User Group

Let’s start by creating a User group. For this blog, we will create a User group for the end customer, “XYZ Manufacturing.” In the next steps, you’ll discover how easy it is to create personalized dashboards based on your end user, empowering you to tailor the experience to their unique needs.

Here is a screenshot showing the group creation. As the BMC Helix tenant administrator, you will first go into the portal:

Figure 1. Main screen.

As the administrator for the XYZ Manufacturing company, you will then click on Add group and create the specific User group. The image below shows the administrator creating TestGroup1.

Figure 2. User groups.

Now that the administrator has created a new User group, they will need to add the permissions. The administrator simply clicks on Actions -> Assigned, and assigns the User to the Group:

Figure 3. User directory.

Next, the administrator will want to check the User’s role. This is easily done by going to the BMC Helix portal landing page, clicking the User access tab, going to the Users and keys page, and searching for the User.

Figure 4: Users and keys.

Now that the administrator sees the User, they can simply click on Actions -> User options to see which User groups and Roles are assigned to them. By selecting either the Groups or Roles assignment area, the administrator can quickly validate that they are assigned correctly.

Figure 5: User options.

Creating an Authorization Profile

One of the next things you will want to do as the administrator is to create an authorization profile for this User group (TestGroup1). By creating this authorization profile, you ensure that the Users in this Group only have access to the appropriate XYZ Manufacturing company information.

Here’s how you do this: Launch the BMC Helix Operations Management console from the BMC Helix portal landing page. Then, navigate to the Administration Authorization profile, and you will see the Authorization Profile Test_AutoProf1, which we have created for User group TestGroup1. In your case, this needs to be created using the “Create” button option.

Figure 6. Authorization profiles.

The next step is to add and associate the User group we created.

Figure 7. Profile details.

In this case, we have selected Microsoft Windows Servers as the PATROL Solutions and then assigned a specific Device and Group. The Device name and Group name will vary depending on customer requirements.

Figure 8. Administration.

Figure 9. Administration detail.

Here we are showing the selection of the Group Windows Servers.

Figure 9a.Windows Servers.

Setup Complete

As you’ve seen so far, this has all been straightforward to implement. Let’s now log into the BMC Helix portal as the user and see how it shows up. Here’s the dashboard, with all the devices that the user has available to them displayed.

Figure 10. Main dashboard.

Based on the rules the administrator has put in place, this user can only see the two devices that were assigned.

The user can see deeper insights from the dashboard by clicking on one of the servers (for this example, we’ve clicked into the vl-pun-dombl107 server).

Figure 11. Deeper insights.

Once on the Device Details page, the user can click on the three dots beside the Device Name and then click on the “Launch Dashboard” pop-up to delve into all of the performance details:

Figure 12. Launching dashboard.

Detailed Dashboard View

Once the user has clicked on Launch Dashboard, the system will default to the same device and show the CPU utilization, Memory usage, Disk usage, Network bandwidth utilization, and related events for this device.

Figure 13. Performance details.

Conclusion

As you can see, creating and managing multiple end users with personalized dashboards is quite simple. Using BMC Helix User group and Authorization profiles, the administrator can easily create the views needed to support personalized dashboards based on the user profiles. We hope this process walk-through will provide you with the guidance you have been asking for as you create your dashboards in your environment.

We are also here to answer any questions you might have; please feel free to reach out to us:

Windows Service Automatic Restart Use Case

Sayan Banerjee — Fri, 09 Jun 2023 09:07:05 +0000

This use case will demonstrate how to restart any Windows service using BMC Helix Intelligent Automation with the TrueSight Orchestration connector to help manage all of your IT assets.

Use Case

When a Windows Service down alert comes into BMC Helix Operations Management, the service is restarted automatically using BMC Helix Intelligent Automation.

In this use case, we are using TrueSight Orchestration (TSO) as the automation tool for Out-of-the-box TSO event orchestration runbook adapters and modules.
The BMC Helix Intelligent Automation Policy triggers the TSO workflow to remotely restart the Windows service.
The Windows service gets restarted successfully and the event is closed in BMC Helix Operations Management.
For this use case, we are restarting the Print Spooler service on a Windows server as an example to demonstrate the flow.

The Event Orchestration Runbook Configuration

The adapters needed for this use case are shown below.

Reference: https://docs.bmc.com/docs/TruesightOrchestrationContent/231/installing-the-event-orchestration-runbook-1192637229.html

The modules needed for this use case are shown below.

Event Orchestration Module Configuration for BMC Helix Monitor

You will need to make changes for the attributes highlighted below.

Event Orchestration Credential Store Configuration

Please provide the correct Windows credentials, as shown below.

BMC Helix Intelligent Automation

The screens below show the BMC Helix Intelligent Automation Connector, which needs to be configured from the console.

Reference: https://docs.bmc.com/docs/helixintelligentautomation/232/configuring-the-truesight-orchestration-connector-1191808192.html

Policy Configuration: BMC Helix Intelligent Automation

You will need to configure the BMC Helix Intelligent Automation policy from the Policies tab.

Click on “Select Action” to get the page below.

Remember to click on “Sync Actions.”

Next, you can search for “Event_Orchestration Process Event” in the Search Actions field.

Follow the steps below to complete the policy configuration.

This completes the policy creation.

Please refer to the below link to download the presentation for this Blog Article.

Restart a Windows Service

We have recorded a Video showing the working of this Use Case and required configuration. This can be obtained upon request through email to the Authors.

ipsita_priyadarshini@bmc.com

sayan_banerjee@bmc.com

Conclusion

Enterprise applications require these workflows to identify and remediate critical events happening in their infrastructure. This use case help organizations manage the end-to-end infrastructure in an effective way with the least impact to the business.

Managing Java Memory Allocation for Your Web Application

Sayan Banerjee — Tue, 29 Mar 2022 08:18:53 +0000

Better performance and scalability of an application do not always depend on the amount of memory allocated. There are often occasions where allocating more memory has resulted in an adverse situation. This blog explores the adverse impact of performance degradation and explains how to create a balance while allocating memory to Java applications.

Enterprise applications sizing background

For this exercise, we will refer to our in-house IT operations product, which is being used for the end-to-end operations management of our own IT business unit as well as multiple enterprise-level customers. It’s important to understand the volumetrics that the solution will be monitoring. This will help us reverse engineer and map the numbers to our BMC-recommended system requirements, which are documented as part of the sizing and scalability guidelines. Once this mapping is established, it becomes less complicated to size the IT operations stack.

You can find additional information on presentation server sizing and scalability here and information on infrastructure management server sizing and scalability here.

Does the process need extra memory?

Once deployed, we should make it a practice to monitor the health of the stack. This will help us to understand whether any modifications are needed to meet changing business requirements. In addition, any performance degradation in the stack will proactively trigger an alarm, making it easier for us to remediate before end users are impacted.

Different layers of monitoring include:

Monitoring the logs from the application
Operating system monitoring
Java Virtual Machine (JVM) monitoring

The Java Development Kit (JDK) comes bundled with VisualVM, which helps us to get a holistic view of the running JVMs, including overall memory consumption and the respective threads associated with the same.

The above analysis will help us investigate further and may result in enhancing or reducing current resources allocated to the stack. We would need to map the findings according to the sizing documents referenced above and look for discrepancies, based on the specific process or thread that we are analyzing.

Can you increase the memory allocation for processes directly?

The answer is NO. We should always be mindful of making changes to the resource in the running stack. The reason is there are multiple entities tightly coupled together in the stack (e.g., it’s not a standalone single-layer application), so resource changes to one entity will negatively or positively impact the related entities, leading to new issues in the stack and further degrading the performance of the application.

Example of garbage collection setting which worked for an application

Below are the Java 8 parameters that we would normally use, especially while tuning garbage collection. This is applicable to the Garbage-First Garbage Collector (G1 GC).

XX:+DisableExplicitGC
XX:+G1NewSizePercent
XX:+MaxGCPauseMillis
XX:+XX:MaxMetaspaceSize
XX:+MaxMetaspaceSize
XX:+UseCompressedOops
XX:+UseStringDeduplication
XX:+UseG1GC

With respect to the operations management application, we made changes based on our observation for the G1 GC parameters. Below are the properties that we considered before and after making the changes.

Before making the changes:

From G1 GC:

Option=XX:+UseG1 GC
Option=XX:MaxGCPauseMillis=200
Option=XX:ParallelGCThreads=20
Option=XX:ConcGCThreads=5
Option=XX:InitiatingHeapOccupancyPercent=70

After making the changes to the parallel GC:

Option=Dsun.rmi.dgc.client.gcInterval=3600000
Option=Dsun.rmi.dgc.server.gcInterval=3600000

Here, we ran the collection every hour and performed a parallel garbage collection (GC). This helped us to reduce the CPU footprints while the G1 GC was executed. Overall process memory usage is also controlled with fixed-interval GC cycle runs.

This may not work correctly if we don’t have proper heap settings. If the setting is very low, then the GC may be invoked before the above hourly interval, running automatically instead of when we want it to run. Normally, increasing the max heap by a factor of 20 percent is a good start to confirm whether the GC is being invoked every hour.

There have been instances where the application indicates that the process is running out of memory but internally the respective JVM is not using that much memory. In this case, the application process’s JVM needs a max heap allocation, but due to limited resource availability, the OS could not release the max to the JVM process. This results in an out-of-memory error due to incorrect heap settings and insufficient RAM available to the VM—it’s not a JVM process error.

Normally, we would see an exception similar to java.lang.OutOfMemoryError: unable to create new native thread, which indicates that Java demanded the memory chunk but there was insufficient RAM available on the server. In this case, adding extra heap space will not help.

In these kinds of scenarios where the overall RAM of the machine is not properly allocated, the role of GC becomes critical. In addition, if the GC cycles are not run properly, this leads to the piling of objects in the heap memory with both direct and indirect references. It can also take more processing CPU time/performance to do the cleanup when the GC executes.

Most of these would be inactive, or not-live, objects, but an inadequate or infrequent GC cycle leads to unnecessary heap consumption with these objects. This kind of issue leads us to modify some of the properties as shown above.

Below is a snapshot where the JVM required more memory, but it had reached max heap.

Figure 1. JVM reaches max heap.

The following shows the potential to go OutOfMemory (OOM) because of heap.

Figure 2. JVM nearing OutOfMemory and crashes.

Using BMC products to monitor these JVM processes, we get a memory graph that indicates that the JVM had 32 GB RAM, all of which has been utilized, so any further requests by the JVM processes cannot be handled by the OS and the JVM process crashes, as shown above.

Figure 3. JVM utilizing 32 GB of memory.

The above illustration shows that increasing the JVM heap does not always help to mitigate the OOM situation. There could be additional factors like how much RAM is available on the VM and whether the OS has enough memory to release to the JVM process.

We had another instance from a customer where one of the JVM processes was running out of memory. The end impact was the application crashed, generating a memory dump file.

Figure 4. Snapshot from the memory dump.

The above stack showed where the OOM happened; drilling down more, we could see the actual issue.

Figure 5. The operation that triggered the issue.

This is again another scenario where an end user would be prompted to increase the JVM process allocation, which may resolve the problem for a couple of days, but it will eventually crash with the same error.
In this case, we had to handle this specific issue through code optimization.

Does the server have the right memory allocated?

Most virtual machine (VM) servers have shared resources. When there is a sudden chunk of memory needed for a server, there should not be scarcity in the VM pool.

Let’s keep in mind that CPU and memory are proprietary to the nodes or the VM itself, where the application is installed. Even before we install any specific application, we allocate CPU and memory to the VM.

Within the application, it’s up to the vendor (application developer) to determine how the CPU and memory would be allocated for the seamless performance of the solution. This is where, based on our sizing recommendation, we allocate memory to the running JVM processes.

But how do we make sure these allocations at the VM level are the best possible numbers we could imagine? Well, there is no straightforward answer to this, as this depends on monitoring the VM nodes using capabilities both inside and outside the solution.

On the IT operations solution end, BMC has come up with a knowledge module called the VMware vSphere Knowledge Modules (VSM KM). This specific KM is used to monitor the entire virtual center (VC) where our application is running, with respect to memory and CPU. These are metrics that reveal the health of the VC.

Figure 6. CPU utilization metric.

Figure 7. CPU ready time.

Figure 8. Memory balloon.

Figure 9. Memory used.

Using the virtual center monitoring tool

Monitoring the VC will help us to understand scenarios where the solution itself is down and we don’t have the built-in capability for VC monitoring. Based on our experience, we have isolated a few metrics and a performance chart, which help us to understand the overall health of the VC, as follows.

ESX overprovisioned

Memory overcommitment
CPU overcommitment

Memory ballooning

Should not happen if host is performing as per expectation

Storage latency

Response of the command sent to the device, desired time within 15-20 milliseconds

VC dashboard

This will alert us to any open alarms regarding the health of the VC and respective VMs

Datastore

The datastore of the VM where our IT operations is installed; should be in a healthy condition with sufficient space available

Performance chart

Verify the performance chart for memory and CPU from the VC level

How these help

These monitoring metrics help us to identify the potential impact on the overall performance of the IT operations solution when proper CPU and memory are not allocated within the application, dependent on their availability to the VM nodes themselves.

Conclusion

It’s very important to monitor and understand how much CPU and memory we are allocating to the VM nodes so we can adjust to the lowest level possible. At the highest level, they will affect garbage collection performance.

For more information on this topic, please see our documentation on hardware requirements to support small, medium, and large environments here.