Severalnines https://severalnines.com/ Thu, 11 Jan 2024 16:01:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.0 https://severalnines.com/wp-content/uploads/2023/10/severalnines_favicon.png Severalnines https://severalnines.com/ 32 32 What is Data Sovereignty & how to achieve it https://severalnines.com/blog/what-is-data-sovereignty-how-to-achieve-it/ Fri, 05 Jan 2024 11:05:16 +0000 https://severalnines.com/?p=32590 Data privacy and security are not new themes for businesses, especially those that take advantage of cloud-based solutions. However, as data privacy concerns continue to rise, many countries (and some states) are clamping down on how and where consumer data can be collected and stored. If you’re operating across borders, the term ‘data sovereignty’ is […]

The post What is Data Sovereignty & how to achieve it appeared first on Severalnines.

]]>
Data privacy and security are not new themes for businesses, especially those that take advantage of cloud-based solutions. However, as data privacy concerns continue to rise, many countries (and some states) are clamping down on how and where consumer data can be collected and stored.

If you’re operating across borders, the term ‘data sovereignty’ is likely on your radar. If not, now’s the time to reevaluate your privacy and security practices for collecting, storing, and managing consumer data across the countries where you do business.

In this post, we will discuss the concept of data sovereignty and highlight strategies to address challenges and manage compliance demands relating to data sovereignty requirements.

What is data sovereignty?

Data sovereignty is the concept that data collected in a specific country is subject to the laws and regulations of the country from which it’s collected.

This means, for example, if you operate out of the US and are collecting consumer data from somewhere in Europe, you will be subject to the GDPR. Or, if you operate out of Canada and are collecting consumer data from California, you will be subject to California’s CCPA, as well as the CPRA if you are using that data for targeted advertising purposes.

Data sovereignty laws will vary from location to location. These laws insist that companies develop sound business practices that ensure they have control over their data and infrastructure. Companies are obligated to closely monitor compliance with all relevant regulatory laws. Failure to comply can result in significant fines and loss of business.

What is required for data sovereignty?

Data sovereignty helps protect consumers’ personal and sensitive information by dictating how companies must govern and secure data that is collected, processed, and stored. But what exactly are the requirements for data sovereignty?

The primary requirement for data sovereignty comes down to obtaining and maintaining full control over how sensitive data is accessed and used. We can divide it into five components:

  1. Regulatory compliance: A complete understanding of local and global laws that you have to adhere to for both your and your client’s operations locations. Ensure that your data practices comply with these regulations, including requirements for data storage, transfer, consent, and security.
  2. Data processing: A transparent overview of how you use and share data.
  3. Data access: A complete picture of how access levels are established and what’s done to prevent unauthorized access. This includes defining roles and permissions for data handling, implementing secure authentication protocols, and encrypting data during transit and storage. Data should only be accessible to authorized individuals or entities.
  4. Data protection: Security protocols that are in place from your and your vendors’ sides, encryption practices for both at-rest and in-transit data, to protect data from unauthorized access, breaches, and cyber threats.
  5. Infrastructure control: You should have full ownership over your data stack, being able to affect change and fix issues if and when they appear, as well as the full lifecycle of the physical servers, storage systems and network equipment where your data is hosted. 

Challenges of Data Sovereignty

The ease with which organizations are able to target and acquire foreign customers poses new data challenges. When operating in international markets, you need to rethink your strategies for managing consumer data in order to meet these challenges.

Let’s explore the four most pressing issues that arise.

Tightly scoped solutions

Since there’s not one overarching set of guidelines, and over 100 countries have their own data sovereignty laws, companies that operate across multiple territories may struggle to understand which laws apply to them. This creates new compliance challenges like data privacy regulations, such as GDPR in the European Union, data localization laws where some countries require certain types of data to be stored within the country’s border, and other industry-specific requirements which will force companies to adopt more stringent solutions and strategies for managing consumer data like strict access controls, classify consumer data based on sensitivity and regulatory requirements, maintain detailed audit trails and processing activities, implement a consent management system, and establish an incident response plan. 

Data interoperability

The challenge of data sovereignty may include issues with data interoperability, how you’re able to move data across countries, and getting data when and where you need it. Transferring data across international borders can be subject to restrictions and legal requirements.

Compliance with these requirements can be complex and may involve mechanisms like Standard Contractual Clauses (SCCs) or Binding Corporate Rules (BCRs) to ensure data protection during cross-border transfers.

Data sovereignty regulations in different jurisdictions can sometimes conflict with one another. For example, one country’s data privacy laws may require certain data protection measures, while another country’s laws may require the opposite. Organizations may face challenges in determining which regulations take precedence and how to achieve compliance.

Cloud providers

Cloud deployments can oftentimes be dispersed across multiple countries, possibly ones with different data sovereignty laws. Some of those regulations may further limit your choice of cloud provider by dictating where data is allowed to be processed.

When using cloud service providers or third-party vendors for data storage and processing, organizations must ensure that these vendors comply with data sovereignty regulations. Vendor contracts and service-level agreements (SLAs) should address compliance requirements.

Cost

Investing in a top-down culture of compliance can be a costly venture. You’ll need to implement additional processes to ensure the collection, storage, and processing of PII are all carried out in compliance with applicable data sovereignty laws.

This can involve the deployment of monitoring tools, security assessments, and audits, all of which come with associated costs. Developing and implementing data handling procedures, including encryption and access controls, can require significant investments in technology and personnel. Plus, compliance efforts, documentation, and reporting obligations will also require dedicated personnel and resources.

The adjustments to existing business processes, such as customer data collection, storage, and processing practices may involve changes in software, systems, and workflows, all of which can be costly.

Navigating the complexities of data sovereignty regulations often requires legal expertise. Organizations must invest in legal expertise to navigate these requirements and ensure that their data-handling practices comply with local laws and regulations.

Non-compliance with data sovereignty regulations can result in significant penalties and fines, which can be financially damaging and harm an organization’s reputation.

Data sovereignty solutions

Companies need to tackle these data sovereignty concerns. But what options are available to help ease the burden of compliance?

For your business, think about how you can best achieve control over your data. This will help you ensure you’re meeting all applicable laws and regulations.

To start with, you can consider the following database deployment solutions.

Multi-cloud deployments

Multi-cloud deployments can help increase data sovereignty by increasing control over database environments, infrastructure, and technologies. A multi-cloud setup offers the added flexibility that keeps companies from ceding control over data end-to-end. Organizations may avoid vendor lock-in by using multiple providers, giving them more negotiating power and flexibility in choosing the most suitable services. By distributing workloads across multiple cloud providers, organizations can reduce the risk associated with a single provider’s technical issues, service interruptions, or security breaches. This diversification enhances overall risk mitigation strategies.

In terms of implementing multi-cloud deployments, there are more and more sovereign clouds popping up that enable organizations to choose providers with data centers located in regions that align with their compliance needs, helping them adhere to regulations like GDPR, HIPAA, or industry-specific standards.

Hybrid cloud deployments

Cloud providers only offer data localization and geographical control in certain countries or regions. On the other hand, hybrid cloud deployments allow organizations to adapt to data sovereignty regulations in almost all countries by establishing a presence in specific regions, partnering with local data centers, or utilizing cloud providers with a strong local presence.

Similar to multi-cloud setups, hybrid cloud deployments offer mix-and-match flexibility, where you can choose which data is best fit for a particular environment. This gives you the choice to store certain sensitive data on-premises while utilizing the scalability and cost-efficiency of the cloud for non-sensitive data. This data segmentation approach allows organizations to comply with data sovereignty requirements while still taking advantage of public cloud benefits for other data types.

Sovereign DBaaS

Sovereign DBaaS is a newly emerging concept that leverages the public cloud’s computing advantages with increased control and portability of data. This approach enables you to reliably scale open-source database operations without vendor or environment lock-in and without giving up control over your data to third parties.

With Sovereign DBaaS, you retain a far greater level of ownership with a higher amount of independence, improved cloud cost predictability, and more cost-efficient deployment options. You stay in control over security needs and frameworks relating to applicable regulatory and compliance requirements and can influence exactly where your data resides.

A Sovereign DBaaS implementation uses intelligent automation software with open-source tooling and scripts to replicate the DBaaS experience. It results in database scalability and reliability within environments that satisfy sovereignty requirements.

Wrapping up

As data privacy regulations continue to evolve, companies will need to rethink their database deployments and privacy practices to better protect consumer data and comply with data sovereignty laws.

Go to our Sovereign DBaaS resource page to learn more about getting more control over your data stack.

The post What is Data Sovereignty & how to achieve it appeared first on Severalnines.

]]>
Importance of a cloud exit strategy and how to plan one https://severalnines.com/blog/importance-of-a-cloud-exit-strategy-and-how-to-plan-one/ Fri, 15 Dec 2023 11:31:01 +0000 https://severalnines.com/?p=32517 Nowadays, cloud computing has become an integral part of many companies. It offers flexibility, scalability, and cost-effectiveness. However, despite the numerous benefits, it is crucial for businesses to have a cloud exit strategy in place. This strategy ensures a smooth transition from the cloud environment to another provider or even to an on-prem infrastructure if […]

The post Importance of a cloud exit strategy and how to plan one appeared first on Severalnines.

]]>
Nowadays, cloud computing has become an integral part of many companies. It offers flexibility, scalability, and cost-effectiveness. However, despite the numerous benefits, it is crucial for businesses to have a cloud exit strategy in place.

This strategy ensures a smooth transition from the cloud environment to another provider or even to an on-prem infrastructure if circumstances demand it. In this blog post, we will explore the significance of a cloud exit strategy and how to plan one.

What is a cloud exit strategy?

A cloud exit strategy is a set of actions put in place by an organization to transition its data, applications, and services from a cloud computing environment back to an on-prem infrastructure or to another cloud provider. It is essentially a contingency plan that ensures the organization can smoothly and efficiently disengage from its current cloud service provider if needed.

Top reasons you should have a cloud exit strategy

There are many reasons why an organization might need a cloud exit strategy. Let’s see some of them.

Cost savings

Depending on the usage, it could be possible that your traffic increases more and more every day and, as the demand grows, so do the costs, and you might find yourself in a situation where monthly cloud expenses become too high, and the negatives soon outweigh the benefits of operating in the cloud. 

Compliance

Changes in regulations or legal requirements may need something that you can’t get in the current cloud provider, so you will need to migrate to another one or even to an on-premises environment to ensure compliance and data sovereignty.

Performance

You may need to move your systems to have a better geo-distribution topology to increase performance, or even to mitigate problems with the current cloud provider, such as worse service quality, frequent network issues, or slowness caused by insufficient resources or poor hardware.

Data sovereignty

Data privacy regulations and compliance requirements continue to evolve. Organizations need to ensure their data is stored and processed in accordance with applicable laws. A cloud exit strategy allows businesses to regain control over their data and comply with any necessary regulations, such as data residency requirements.

DRP and cloud outages

In the case of the cloud provider going down, I know, the probability is low but it is not 0 and it is a risk that you should consider in a DRP, so in that case, you will need a cloud exit strategy to avoid downtime.

Cloud providers also offer many options for implementing a Disaster Recovery Plan like High availability, redundancy, and backups but sometimes it is not enough, and if you are using only the tools offered by the cloud provider you may fall into a vendor lock-in trap.

Vendor lock-in

Using products owned by or existing only in a specific cloud provider might be a significant problem if an organization wishes to change providers or if a provider goes out of business. This can be mitigated by having a cloud exit strategy using open-source technologies and standardization of the cloud environments.

How to plan a cloud exit strategy

The plan of a cloud exit strategy depends on the business requirements and they change for each company. As a basic approach, you can follow these steps:

1. Document the current environment

To know what to do you need to know what you have, so as a starting point you should collect information and document the current environment, in case you don’t have it yet. It means, understanding the existing infrastructure, services, and dependencies and documenting critical information such as data locations, dependencies, configurations, and contractual obligations.

2. Identify alternative solutions

Research and evaluate potential alternatives, such as moving to another cloud provider or transitioning back to an on-prem environment. Consider factors like cost, scalability, performance, security, and compatibility with existing systems.

3. Contractual and legal considerations

Review existing contracts and agreements with the current cloud provider, including terms related to termination, data ownership, and data retrieval. Understand any potential costs, penalties, or limitations associated with ending the relationship.

4. Data and application migration planning

Determine the migration strategy and roadmap. Assess data transfer methods, potential downtime or service interruptions, and compatibility between the existing and target environments. Plan for any necessary data transformation, restructuring, or reformatting.

5. Test the migration plan

Emulate the migration process in a test environment. Ensure that proper testing, validation, and verification procedures are in place to mitigate risks and minimize downtime when you run it in production.

By having a well-defined cloud exit strategy, organizations can minimize disruptions, maintain data integrity, and effectively transition from one cloud provider to another or from the cloud to an on-prem environment, if necessary but, it is not an easy task so some help could be useful and here is when systems like ClusterControl enter in action.

How ClusterControl supports a cloud exit strategy

ClusterControl is a management and monitoring system that helps to deploy, manage, monitor, and scale your databases from a user-friendly interface. You can automate many database tasks you must regularly perform, like adding new nodes, running backups and restores, and more.

It supports different database technologies in different environments, no matter where you are running it and it is not necessary to have all the database nodes running in the same place.

picture of a ClusterControl dashboard displaying the health and operational status of 4 deployed clusters

In case you are planning a cloud exit strategy, you can take advantage of all the ClusterControl features to help in the process.

Deployment and management

With ClusterControl, you can create your database cluster using the deploy feature. Simply select the option “Deploy” and follow the instructions that appear.

Here, you will find all your database clusters on the same platform, no matter where they are deployed or what technology you use. It supports database vendors like Percona, MariaDB, PostgreSQL, and more. You can clone your existing cluster and run it in a different provider with only a couple of clicks.

Also, you can add new nodes in a different provider and promote them after everything is in place to minimize downtime.

Note that if you already have a cluster running, you must select the “Import Existing Server/Database” instead if you already have the server instances created in the provider or on-prem. In case you want ClusterControl to create the instances in a cloud provider for you, you can use the “Deploy in the Cloud” feature, which will use an AWS, Google Cloud, or Azure integration to create the VMs in the cloud, and you don’t need to access the cloud provider management console at all.

Vendor lock-in

As ClusterControl deploys and uses open-source technology, you don’t need to worry about vendor lock-in. Everything that you run/deploy with ClusterControl, like backups or load balancers, is open-source software, so you can install it wherever you want, and you don’t need to use any specific cloud provider or environment.

Security

Even when ClusterControl can’t configure access to your database nodes beforehand, it uses SSH connections with SSH Keys to manage your database clusters securely. So, even in different environments, connections and traffic between ClusterControl and the different providers will be safe.

It can manage your database users, so you can restrict access to your cluster from some specific sources. Also, you can create custom templates for deploying your databases, so you don’t need to depend on the provider configurations.

Audit and encrypt features are available for some database technologies. You can enable and use them from the same platform and avoid using the provider’s features if it has them.

Monitoring

ClusterControl allows you to monitor your servers in real time with predefined dashboards to analyze some of the most common metrics. It allows you to customize the graphs available in the cluster, and you can enable agent-based monitoring to generate more detailed dashboards. You can also create alerts that inform you of events in your cluster or integrate with different services such as PagerDuty or Slack.

High availability and scalability

After you have the cluster managed by ClusterControl, you can easily add new database nodes, load balancers, or even run failover tasks from the same system and change or improve your database topology as you wish.

Auto-recovery

One of the most important features of ClusterControl is the Auto-Recovery feature. It will allow you to recover your cluster in case of failure in an automatic way and let you know if it happens by sending notifications and showing alarms.

Backups

ClusterControl has many advanced backup management features that allow you not only to take different types of backups, in different ways, but also compress, encrypt, verify, and even more.

The automatic backup verification tool is also useful to make sure that your backups are good to use if needed and avoid problems in the future.

S9S-Tools

S9S-Tool is a command-line tool to interact, control, and manage database clusters using the ClusterControl Database Platform. Starting from version 1.4.1, the installer script will automatically install this package on the ClusterControl node. You can also install it on another computer or workstation to manage the database cluster remotely. Communication between this client and ClusterControl is encrypted and secure through TLS. This command-line project is open source and publicly available on GitHub.

S9S-Tool opens a new door for cluster automation where you can easily integrate it with existing deployment automation tools like Ansible, Puppet, Chef, or Salt.

Conclusion

Having a cloud exit strategy is critical for any organization. It allows businesses to proactively manage risks, maintain data sovereignty, and adapt to evolving business needs.

By following the steps outlined in this blog post and with the help of ClusterControl, businesses can plan and execute a seamless transition from one cloud provider to another or from the cloud to an on-premises environment.

Want to take ClusterControl’s UI for a test drive? Check out our no-commitment sandbox environment here. In the meantime, follow us on LinkedIn and Twitter to keep pace with the evolving database, orchestration tooling, and environment landscape.

The post Importance of a cloud exit strategy and how to plan one appeared first on Severalnines.

]]>
What’s new in SQL Server 2022 https://severalnines.com/blog/whats-new-in-sql-server-2022/ Fri, 08 Dec 2023 15:01:00 +0000 https://severalnines.com/?p=31643 SQL Server 2022 is the latest database version released by Microsoft in November 2022. The release builds upon previous versions to add more choice for SQL Server users when it comes to development languages, data types, replication to and from Azure, and more. You will find full support for the release in ClusterControl, as of version […]

The post What’s new in SQL Server 2022 appeared first on Severalnines.

]]>
SQL Server 2022 is the latest database version released by Microsoft in November 2022. The release builds upon previous versions to add more choice for SQL Server users when it comes to development languages, data types, replication to and from Azure, and more.

You will find full support for the release in ClusterControl, as of version 1.9.7.

Here, we will highlight the new features, functionalities, and improvements introduced in SQL Server 2022 that add value for anyone interested in using this particular version.

Let’s dive in.

New features and functionalities in SQL Server 2022

SQL Server 2022 includes a number of new features and performance enhancements. Let’s take a look at each one in more detail.

Azure Synapse Link for SQL

Azure Synapse Link for SQL 2022 is a feature that enables you to replicate data from SQL Server 2022 to Azure Synapse Analytics in near real-time. It allows you to run analytics, business intelligence, and machine learning scenarios on your operational data without impacting the performance of your source database.

The feature replicates data from SQL Server 2022 to Azure Synapse Analytics in near real-time. As a result, it lets you get insights into your data faster and make better decisions.

Since Azure Synapse Link for SQL 2022 uses change tracking to replicate data from your SQL Server 2022 database, it means that there is minimal impact on the performance of your source database.

Azure Synapse Link for SQL 2022 can be used for analytics, business intelligence, and machine learning.

Object storage integration

SQL Server 2022 can now be integrated with S3-compatible object storage and Azure Storage. It allows you to store your data more cost-effectively and cost-effectively. The object storage integration can be done in the following ways.

  • You can integrate SQL Server with S3 compatible storage using the Backup to URL feature. You can use this storage for BACKUP and restoration purposes with a s3 connection using REST API. 
  • SQL Server 2022 now supports Data Lake Virtualization, which allows you to query and analyze data in Azure Data Lake Storage without moving the data to SQL Server.

Data Virtualization

SQL Server 2022 adds support for querying external data using polybase with Oracle TNS files, MongoDB API for Cosmos DB, and ODBC.

Azure SQL Managed Instance Link

SQL Server 2022 provides a link for Azure SQL Managed Instance that enables you to do real-time data replication from SQL Server to Azure SQL Managed instance. Using this functionality, you can do the following tasks:

  • Scale read-only workloads
  • Offload analytics and reporting to Azure
  • Migrate data to Azure cloud
  • Disaster recovery

Contained Availability Group SQL Server

Before SQL Server 2022, you need to manage users, logins, permissions, and SQL Server agent jobs in case of Always on Availability groups. SQL Server 2022 introduces contained availability groups for the following tasks:

  • Managing the system objects such as users, logins, permissions, and SQL Server agent jobs at the availability group level.
  • Support for a particular availability group for the contained system databases.

Enhanced security

SQL Server 2022 includes new security features for safeguarding the database systems. A few essential security features are below.

  • Microsoft Defender for cloud integration: It helps to protect the SQL Servers environment on on-premise, hybrid, and cloud environments.
  • Microsoft Purview integration: You can use Microsoft Purview policies to SQL servers enrolled in Azure Arc, and Microsoft Purview Data use management.
  • Ledger: The Ledger provides a highly secure database with tamper evidence capabilities in the database. 
  • Azure Active Directory authentication: You can configure Azure Active Directory authentication for SQL Server database connections.
  • Always encrypted with secure enclaves: You can use JOIN, ORDER BY, and GROUP BY clauses for UTF-8 collation in confidential queries using the secure enclaves. 
  • Support for MS-TDS 8.0 protocol: The SQL Server 2022 supports TFS 8.0 and TLS 1.3 support for data encryption.

Query Store hints and intelligent query processing

  • Query Store hints allow you to provide SQL Server with additional information about your queries, which can help to improve performance.
  • SQL Server 2022 includes a number of new intelligent query processing features, such as adaptive query optimization, Memory Grant feedback, Parameter Sensitive plan optimization, Degree of parallelism(DOP), and query parallelization.

Performance Improvements

 SQL Server 2022 includes performance improvement features as below.

  • Improved in-memory OLTP performance: SQL Server 2022 performance improvements for in-memory OLTP include faster startup times and improved query performance.
  • Reduced I/O: SQL Server 2022 improved buffer pool management and better compression.
  • Optimized query processing: SQL Server 2022 includes a number of optimizations to query processing, such as improved query plan caching and better adaptive query optimization.
  • Improved performance for columnstore indexes, temporal tables, and machine learning workloads

Wrapping up

SQL Server 2022 is a significant release that includes a number of new features and enhancements that add a lot of value for anyone already using SQL Server, as well as anyone curious to try it out for the first time.

ClusterControl provides full lifecycle support for both SQL Server 2022 and the corresponding enterprise binary, which you can try in a sandbox demo of the platform.

Stay on top of all things SQL Server by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post What’s new in SQL Server 2022 appeared first on Severalnines.

]]>
3 principles of Sovereign DBaaS and how ClusterControl supports them https://severalnines.com/blog/3-principles-of-sovereign-dbaas-and-how-clustercontrol-supports-them/ Mon, 20 Nov 2023 14:04:13 +0000 https://severalnines.com/?p=29796 In today’s era, where decision making has to be rapid and data-driven, the data that your business aggregates is increasingly critical. And as more and more regulation is passed, data sovereignty is now a significant area to address. The concept of data sovereignty is important to incorporate into any organization’s data architecture design. Concerns over […]

The post 3 principles of Sovereign DBaaS and how ClusterControl supports them appeared first on Severalnines.

]]>
In today’s era, where decision making has to be rapid and data-driven, the data that your business aggregates is increasingly critical. And as more and more regulation is passed, data sovereignty is now a significant area to address.

The concept of data sovereignty is important to incorporate into any organization’s data architecture design. Concerns over data privacy, regulatory requirements, and the ever increasing threats of cybercrime means you need to properly safeguard your data.

However, development speed is also considered essential to gain, or maintain, a competitive advantage. This, along with easy access to compute resources, pushes many businesses towards managed services like DBaaS (Database as a Service).

This type of offering, from the hyperscalers and others, allows you to build out data pipelines quickly. But that speed benefit can come at the cost of losing control over your data.

How to recover that control? One option is to consider a Sovereign DBaaS implementation.

Let’s explore the three core principles of Sovereign DBaaS. In doing so, we will also discuss the major differences between a traditional DBaaS and Sovereign DBaaS approach.

3 principles of Sovereign DBaaS

The concept of a Sovereign DBaaS rests upon three core principles:

  • End-user independence
  • Environment agnosticism
  • Open-source and source-available software

Ultimately, the goal of implementing a sovereign DBaaS is to regain more control over your data stack. Each principle supports this goal in its own way.

1. End-user independence

A Sovereign DBaaS user wants to be independent when it comes to the visibility and control of the database layer.

In traditional DBaaS models organizations often have limited visibility and control over the environment. You are presented with a self-service portal where you can perform some actions that were exposed by the DBaaS provider.

For example, you have specific options around deployment, configuration, and some particular changes in the database topology.

But you don’t know what kind of environment is going to be used, how the hardware is configured, or where exactly the database nodes are located.

You have to take some things for granted. And this severely limits your visibility.

For example, how are you going to perform a root cause analysis if you do not have access to the underlying infrastructure?

How will you diagnose performance issues if you don’t know how the virtual machines and their hypervisors are acting?

You can check what kind of network traffic you generate. However, you have no means to see and verify how this traffic is affected by tens of other services that do not belong to you but still reside in that segment of the network.

Sure, the network traffic is encrypted and separated using VLANs or other means. But it still goes through the same network cables and network infrastructure, and as such, it may interfere with the traffic that you generated.

Building a Sovereign DBaaS allows you to take full control over all of the aspects of the infrastructure. You get to decide how you want to connect servers, whether you use a dedicated or shared network, and if you’ll have direct connections between servers.

With more control over the infrastructure, you receive more insight into what is happening under the hood. You can monitor every aspect of the hardware and software that you use – debugging gets easier as you have more control over the environment.

You will be the one who is going to deploy and manage all hardware and software (ideally, using some software to help you with that, but it is still your responsibility).

This gives you full control over the location of your data. You know precisely where the data is located – not only where the server is located but even on which disk array in your SAN (Storage Area Network) database directory is stored. There is no way data can slip through the gaps and end up in a different location or country.

2. Environment agnosticism

You may not realize it at first, but traditional DBaaS providers can eventually feel like you are locked into a particular environment or ecosystem.

For example, let’s look at AWS. Let’s say you are using AWS Aurora and you are perfectly happy with the service. But new legal restrictions are introduced that dictate you must keep control over your data.

Can you use Aurora in your local data center? Well, no. You cannot.

You could explore AWS Outpost. But firstly, it does not support Aurora as it is too tightly integrated with AWS’ own infrastructure. And secondly, AWS Outpost is pretty much a black box that is installed in your datacenter.

To what extent it would comply with the legal restrictions that you are now obligated to respect, that might be a longer story and discussion.

Also, you have to consider how much control you have when using solutions like AWS Outpost. If you don’t fully understand how your data is transferred and where exactly it’s located, then arguably you don’t have control. And for regulatory purposes, can you ever be fully confident that your data is only located and processed in a local data center?

Make no mistake, this does not mean that AWS is malicious and sends your local data overseas.

It’s just that Outpost can be integrated quite easily with your cloud infrastructure. Then you are just a couple of keystrokes from writing some lines of code that utilize, let’s say, AWS Lambda to process the data.

This is a problem because Lambda does not run on premises (unless you specifically configure it like that) and you may not notice that your data is leaking out to the public AWS regions.

Even if you are happily using Outpost or its equivalent, are you fully in control of your data? Can you move it quickly and easily out of the AWS ecosystem if you would want to?

The answer is – no. There is no easy way out of the AWS ecosystem.

Can you take a backup and export it to any MySQL installation that you have?

Well, technically you can use mysqldump and other logical backup tools. But for larger data sets (even those at one hundred gigabytes, not to mention larger) this method is so tedious and problematic that, while it can be done, it’s far from easy.

By comparison, Sovereign DBaaS is environment-agnostic. It doesn’t matter where you want to have your database infrastructure deployed – locally, in the cloud, or a mix of both.

You are not tied to a single CSP provider or any particular ecosystem.

Sure, establishing and maintaining connectivity across multiple CSPs and on-prem infrastructure might not be trivial. But if you manage to accomplish that, then a Sovereign DBaaS implementation supported by the right tools will give you a full management solution. One that provides you with a single pane of glass to see and manage all of your infrastructure.

3. Embrace open-source and source-available software

The software that you choose to run on is an important factor to consider. One that could lead to data sovereignty and portability issues down the road..

For example, ask yourself the following questions:

  1. Is MySQL RDS or MySQL Aurora really MySQL?
  2. Is PostgreSQL Aurora really PostgreSQL?
  3. Can you take data in a binary format from Aurora and deploy it on MySQL that you installed from the packages available for your Linux distribution?

In each case, the answer is no.

Amazon states that both MySQL and PostgreSQL Aurora are “wire-compatible” with MySQL and PostgreSQL.

What this really means is that your application can talk to the database using MySQL or PostgreSQL protocol but the underlying database itself may not have anything to do with standard open source MySQL or PostgreSQL deployments.

In fact, you cannot promise 3x performance unless you reengineered the code of the database to take advantage of the AWS infrastructure.

This is a perfectly valid solution and Aurora is a great piece of software. But from an external standpoint, you have no control over the knobs and gears that are turning inside. You don’t even know what knobs and gears are there to begin with.

You just hope for the best and let AWS run your data.

If you’re not happy with those limitations, or if you need more control over your data, it may be time to explore other options available to you.

One of which is to build your own Sovereign DBaaS.

In such a setup, you are free to utilize truly open-source database software. And since these are freely available, this can help to reduce your costs.

As well as being free, these technologies are pretty popular. This means if you need to find resources familiar with MySQL or PostgreSQL, it shouldn’t be too difficult or lengthy of a process.

The benefits of open-source databases go well beyond cost savings, though.

For example, open source databases are the same no matter where you deploy them. It is the same code that is running. You can easily migrate from a Redis running on one CSP to a Redis running on another cloud, or locally in your datacenter.

The same is true for MySQL, PostgreSQL, MongoDB, and so on.

This gives you flexibility to create multi-cloud environments that span across multiple CSPs. It also lets you swap one environment to another. If you need to move data between different clouds, or from one cloud to an on-prem data center, this is perfectly doable.

Compatibility is important not only just to move your data to another location but also to interact with your data.

In many of the managed DBaaS services offered by CSPs, compatibility becomes a problem because software offered by them is typically modified to fit their needs. PostgreSQL is not PostgreSQL anymore. It is a black box that provides PostgreSQL API.

The problem is – this API may differ, even if slightly, across providers, making it tricky to utilize the same code to communicate with, theoretically, the same database. A bunch of ‘if X then Y’ is needed to work around those unexpected differences.

In a Sovereign DBaaS implementation that uses open-source or source-available database software, you do not have to worry about these slight differences in behavior between CSPs.

How ClusterControl supports a sovereign DBaaS implementation

ClusterControl is a full lifecycle database ops automation platform for open-source and source-available databases.

The software supports a Sovereign DBaaS implementation by providing you with a single management console for your clusters, no matter where they are deployed.

Let’s take a look at the role ClusterControl plays within a Sovereign DBaaS setup.

Monitoring

ClusterControl provides you with a set of dashboards that are intended to show the most important metrics. But it is you who are behind the steering wheel.

If you want to see more, or configure the dashboards in different ways, all you need to do is follow the available online guides to install external software like Grafana. Once installed, ClusterControl will continue to store the metrics in Prometheus, and you will be able to visualize them in the way you want to see them.

You can take the metrics directly from the database node or you can plug into the time-series datastore that ClusterControl uses – Prometheus, another open source solution, widely used across the industry.

If something fails, and we all know that eventually something will fail, you have full control over the whole environment.

It is your data center, your network gear, your servers, your hardware in those servers, and your Storage Area Network. It is also your responsibility to set up proper monitoring for that environment. After all, with great power comes great responsibility. There’s no one else who will be taking actions on any faults in your setup.

The best part is, though, if you did the preparations properly and you collect all metrics, you can perform the Root Cause Analysis down to the port in the switch or a NIC that might be having a bad day.

ClusterControl will assist you with the database part, collecting database logs and alerting on detected anomalies. But if that’s not enough, you can dig down layer after layer and see exactly why that network timed out, which resulted in a loss of connectivity between database nodes and a degradation of the database cluster.

Environment and vendor lock-in

Whether we are talking about environment lock-in or DBaaS vendor lock-in, the risk is significantly reduced with a Sovereign DBaaS implementation supported by ClusterControl.

When you choose to go with a traditional DBaaS provider, you’re limited to the database flavors that are available from their inventory.

A sovereign DBaaS implementation with ClusterControl provides you with a wider range of open-source and source-available database options.

You can deploy, monitor, and manage different flavors for specific use cases. And you’re free to run your clusters in whichever environment makes the most sense for your needs – on-premise, in the cloud, or in a hybrid setup.

This also grants you two key elements that are not present in traditional DBaaS:

1. Access to deeper levels of the database and the infrastructure layer, so that you can make changes that align your systems with your use case.

2. Portability via open-source databases and tooling that can be implemented anywhere, so that you can easily migrate to another environment if it’s ever required.

Let’s take a look at an example.

ClusterControl can be used to plan and execute a backup schedule including features like backup verification. While doing so, ClusterControl relies on industry-standard tools like, for example pgBackRest, Xtrabackup, MariaBackup or Percona Backup for MongoDB.

This means you can take every backup that you have created with ClusterControl and restore it manually on a database node. The node can be running in any kind of environment, as long as the database is the same version that you are managing using ClusterControl.

Using PostgreSQL 14? Take a backup and use it to provision data on any PostgreSQL node you may have. On prem or in any cloud that provides you with compute resources.

Are you running MariaDB 10.6? Take the backup and transfer that data to any other MariaDB 10.6 node where you can restore it just like that. There is no need for those database nodes to be managed by ClusterControl. You have the freedom to install them from scratch, by hand or using Ansible, Chef, Puppet, or a bash script that you wrote.

Environment lock-in becomes less of an issue with ClusterControl because, as a matter of fact, ClusterControl does not care where your resources are located. As long as it can connect to a database instance using SSH connectivity, it will happily manage that node.

Cluster nodes can be located wherever you choose. As long as there is network connectivity and SSH connection can be made, that’s all that’s needed.

Wrapping up

Traditional DBaaS services provide a lot of value, but you may find that they start to cause issues around the access and portability of your data. Ultimately, this leads to a lack of control.

An alternative approach is to build your own Sovereign DBaaS solution, supported by ClusterControl. This is cloud-agnostic and fully controlled by you – from the infrastructure to the database access management. It makes it easy to deploy, manage and operate a whole set of data stores, both open source and proprietary.

Your data can stay wherever you want it to, utilizing single or multiple cloud providers (or even on-prem). No vendor lock-in will limit your options and you are free to use any environment.

Stay on top of all things Sovereign DBaaS by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post 3 principles of Sovereign DBaaS and how ClusterControl supports them appeared first on Severalnines.

]]>
What’s new in MongoDB 6 https://severalnines.com/blog/whats-new-in-mongodb-6/ Mon, 06 Nov 2023 12:15:57 +0000 https://severalnines.com/?p=29787 MongoDB 6.0 is a major release by MongoDB launched in July 2022. In the release, MongoDB introduced a bunch of new features and improvements aimed at removing complexity – so that users don’t have to troubleshoot as often, and can ultimately build at a faster pace. You will find full support for the release in […]

The post What’s new in MongoDB 6 appeared first on Severalnines.

]]>
MongoDB 6.0 is a major release by MongoDB launched in July 2022. In the release, MongoDB introduced a bunch of new features and improvements aimed at removing complexity – so that users don’t have to troubleshoot as often, and can ultimately build at a faster pace.

You will find full support for the release in ClusterControl, as of version 1.9.7.

Here, we will highlight seven new features in MongoDB 6.0 that add value for anyone using this particular database flavor.

Let’s dive in.

1. Extra support for time series data

Time series data is used across diverse industries, including finance, IoT, and telecommunications. With the way that modern applications have evolved, this type of data has become increasingly crucial.

In the last major release (MongoDB 5.0), the introduction of time series data meant that you could properly collect, process, and analyze it for access to new insights. The 5.0 release also overcame issues like high volume, storage and cost concerns, and gaps in data continuity.

Support for time series data is extended in MongoDB 6.0, where secondary and compound indexes have been introduced for measurement – consequently increasing the read performance.

With this, MongoDB has opened up the potential for geo-indexing.

Geo-indexing allows you to enrich and broaden your analysis, since scenarios involving distance and location can now be included as well.

Apart from that, both query performance and sort operations also have been improved since the data point in series now can be efficiently produced without needing to scan the entire collection.

2. Improved change streams

Modern applications like Waze or Uber rely on event-driven architectures to handle real-time data such as activity feeds and notifications. They also need recommendation engines to provide seamless user experiences.

In these instances, the data will keep changing and requires the application to identify modifications in a short amount of time.

This is where the change streams feature introduced in MongoDB 3.6 comes into action. Change streams allow an API to stream any modifications to a MongoDB database, cluster, or collection without jeopardizing the performance.

MongoDB 6.0 has added new abilities to improve the way you can work with change streams.

For example, you can get the before and after the state of a document that’s modified, which allows you to send revised versions of entire documents downstream, reference deleted documents, and so on.

Additionally, change streams in 6.0 now support data definition language (DDL) operations, such as creating or dropping collections and indexes.

The outcome? More responsive, scalable, and dynamic applications that align with user expectations.

3. Deeper insights and faster queries

Querying data is an essential element of database management. In MongoDB 6.0, MongoDB enhances the querying capabilities by introducing two key operators, $lookup and $graphlookup, improving JOINS and graph traversals.

Now you can get full support for sharded deployments from both $lookup and $graphlookup in addition to the performance upgrade of $lookup.

For example, if a larger number of documents are matched, $lookup performance will be twice as fast as the previous version.

You can actually get results between five and ten times faster than previous iterations if you have a scenario where there is an index on the foreign key and a small number of documents have been matched.

MongoDB 6.0 now allows your applications to perform complicated analytical queries against a globally and transactionally consistent snapshot of your live, operational data. This is through the introduction of read concern snapshot and the optional atClusterTime parameter. With these features, the consistency of the query results returned to the users is well-conserved.

4. Additional operators for automation

Without a doubt, productivity is a number one priority for any development team. With MongoDB 6.0, you can now locate vital values in the data set with the introduction of an array of new operators like $maxN , $minN , or $lastN. In the event that you need to sort elements in an array directly in your aggregation pipelines, $sortArray operator can be leveraged to come in handy.

With these new operators, you spend less time manipulating the data manually and writing code. The new operators will automate essential commands and long lines of code.

As a result, you should see a productivity boost and have more time to concentrate on other assignments.

5. Upgrades for initial sync and sharding

Downtime is a nightmare for businesses, and database maintenance can be a significant contributor. This is one of the reasons why MongoDB introduced initial sync via file copy in MongoDB 6.0, which is up to four times faster than previously available methods.

It’s worth noting that this feature is only available with MongoDB Enterprise Server. 

On a side note, ClusterControl also supports MongoDB Enterprise clusters.

Besides initial sync via file copy, MongoDB 6.0 also introduces significant improvements to sharding, the mechanism that facilitates horizontal scalability.

The default chunk size for sharded collections has been increased to 128 MB. The benefit is that you will have lower chunk migrations and higher efficiency from both a networking standpoint and internal overhead at the query routing layer.

In addition, to reduce the impact of the sharding balancer, a new configureCollectionBalancing command has been introduced, allowing the defragmentation of a collection.

6. New options for data security and operational efficiency

Security and operational efficiency are critical considerations for any organization. MongoDB 6.0 introduces enhancements like extended support for KMIP-compliant key management providers for Client-Side Field-Level Encryption (CSFLE) which has been introduced since its launch in 2019.

This standardization streamlines cryptographic objects like encryption keys, certificates, and more while extending the efficiency of securing sensitive data.

Additionally, the version’s encrypted audit events reinforce accountability, providing the confidentiality and integrity of logs, even when shared with central log management systems or SIEM.

Announced at MongoDB World 2022, Queryable Encryption – a groundbreaking feature is now available in preview. Queryable Encryption enables powerful queries on encrypted data while maintaining encryption until it is made available to the user.

This introduces a paradigm shift, allowing efficient data querying without the need for prior decryption, thus providing data security throughout its lifecycle.

Wrapping up

MongoDB 6.0’s new features and improvements focus on optimizing development and operations, eliminating data silos, and simplifying complex architectures. This ultimately fosters a more productive environment for innovation and creation.

The version adds a lot of value for anyone already using MongoDB, as well as anyone curious to test the technology.

ClusterControl provides full lifecycle support for both MongoDB 6.0 and the MongoDB enterprise binary, which you can try in a sandbox demo of the platform.

Stay on top of all things MongoDB by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post What’s new in MongoDB 6 appeared first on Severalnines.

]]>
Best Practices for Scaling A Multi-Cloud Database Infrastructure https://severalnines.com/blog/scaling-a-multi-cloud-database-infrastructure/ Fri, 06 Oct 2023 07:52:00 +0000 https://severalnines.com/?p=28870 Scaling a multi-cloud database infrastructure can be a complex and challenging task, but it’s essential for businesses that require high scalability, availability, and reliability. With multi-cloud becoming increasingly popular, organizations are leveraging the benefits of cloud providers to achieve better performance, flexibility, and cost savings.  However, scaling a multi-cloud database infrastructure requires careful planning, execution, […]

The post Best Practices for Scaling A Multi-Cloud Database Infrastructure appeared first on Severalnines.

]]>
Scaling a multi-cloud database infrastructure can be a complex and challenging task, but it’s essential for businesses that require high scalability, availability, and reliability.

With multi-cloud becoming increasingly popular, organizations are leveraging the benefits of cloud providers to achieve better performance, flexibility, and cost savings. 

However, scaling a multi-cloud database infrastructure requires careful planning, execution, and monitoring to ensure optimal resource utilization, data privacy, and security. 

Before you get started, it’s important to understand the main challenges of scaling a multi-cloud database infrastructure, as well as a few best practices that you’ll need to keep in mind.

We’ll cover all of this, plus the key database design considerations and techniques for auto-scaling, load balancing, and monitoring.

Let’s dive in.

Challenges of scaling a multi-cloud database infrastructure

Multi-cloud architectures come with a number of challenges. It’s no surprise, then, that scaling a multi-cloud database infrastructure also presents its own set of difficulties that need to be understood and overcome.

Complexity

The first one is complexity. Managing a multi-cloud database infrastructure can be complex and challenging due to having to deal with multiple vendors, services, and APIs. It requires expertise in managing and configuring multiple database systems and ensuring they work together seamlessly.

As you increase the complexity of the environment and infrastructure, you get a higher probability that something unexpected happens. This, again, tends to add even more complexity – you want to have systems redundant to survive unexpected scenarios. Redundancy adds to complexity.

Cost

The next challenge is cost. Multi-cloud database infrastructure scaling can be costly, especially if not managed properly. Additional costs may be associated with data replication, data transfer, and data storage across multiple clouds.

We have mentioned complexity, which requires expertise. Expertise is not easy to find on the market and if you find it, it comes with a price tag, adding to the cost.

Security

Ensuring consistent security policies and access control across multiple clouds can also be challenging, especially when dealing with different vendors and services.

Every infrastructure provider has specific details that make them unique. These idiosyncrasies make it difficult to design security solutions that fit all providers at once. What’s more, any required changes down the road can potentially pose security risks if you haven’t taken every unique detail into account.

Performance

Performance issues can arise when scaling multi-cloud database infrastructures. The increased complexity of the infrastructure can cause delays in data access and transfer times.

Downtime in one cloud can impact the entire infrastructure, leading to significant disruptions in business operations. This could lead to a chain reaction with data synchronization issues, especially when data is updated on multiple clouds simultaneously.

Best practices for scaling a multi-cloud database infrastructure

Whether you need to scale because of a sudden spike in activity or a more gradual increase in workload, there should be some kind of plan in place for how you will address the scenario. There are a few best practices to keep in mind when putting that plan together.

Let’s take a look at them one by one.

Choose the right cloud service providers

First and foremost, you’ll need to choose cloud providers that are compatible with each other and suitable for your business needs. This includes evaluating each cloud provider’s cost, performance, security, and scalability.

Some cloud providers have a specific market segmentation they are targeting. For example, if you just need a simple and straightforward cloud solution for your developers, you could start with Digital Ocean.

If you are dependent on Microsoft or Windows services like Microsoft Exchange, .NET framework, or Microsoft SQL Server, it is probably best to use Microsoft Azure Cloud solutions.

Establish a centralized management platform

Use a centralized management platform to manage and monitor your multi-cloud database infrastructure. This will allow you to view and manage all your databases and cloud resources from a single location.

For database-as-a-service, ClusterControl can manage database clusters across multiple cloud providers. The currently supported cloud providers are Microsoft Azure, Google Cloud Platform, and Amazon Web Services.

You can choose the traditional management approach via SSH (which can be achieved by exposing cloud instances or using site-to-site tunneling) or you can use the native cloud support via the cloud provider’s API and CLI tools.

ClusterControl allows you to standardize APIs to ensure compatibility across multiple clouds with its CMON RPC interface. This will enable you to manage and access your databases consistently across all your clouds.

Automate where possible

Another best practice when running on a multi-cloud infrastructure is to make use of automation tools, software, scripts, or processes to streamline your operations and reduce the risk of human error. This includes automating database monitoring, alerting, backups, scaling, and failover processes.

Monitor your multi-cloud database infrastructure closely and set up alerts for critical events. Proactive monitoring will help you identify and address issues before they become major problems.

Use replication and synchronization tools

Data consistency is essential when running on multi-cloud infrastructure.

Use replication and synchronization tools to ensure data consistency across multiple clouds. This includes ensuring that all your databases are in sync and that changes made in one cloud are reflected in all other clouds.

Optimize your databases for performance by using cloud-native services such as load balancers and auto-scaling. This will help you ensure high availability and performance across multiple clouds.

By following these best practices, you can help ensure that your multi-cloud database infrastructure scales efficiently, securely, and with optimal performance.

6 ways to scale your multi-cloud database infrastructure

Multi-cloud database infrastructure can be scaled in different ways to accommodate growing data needs, user traffic, and performance requirements. By implementing these scaling techniques, you can scale to meet business demand while maintaining optimal performance, availability, and reliability.

1. Vertical scaling

Here are six options for how to scale a multi-cloud database infrastructure.

Vertical scaling involves increasing the computing power of individual database servers by adding more CPU, memory, or storage. This can be done by upgrading the hardware of the servers or by moving to more powerful cloud instances.

Utilizing multiple cloud providers can become quite useful as different cloud vendors make different tiers of instances available. It may happen that, while you have maxed out in one cloud, you still may have an option to scale out in another one.

2. Horizontal scaling

Horizontal scaling involves adding more database servers to a cluster to distribute the workload across multiple servers. This can be done by adding more nodes to the database cluster or by deploying more instances of the database in different availability zones or cloud regions.

Multi-cloud deployments benefit from flexibility. Cloud is not infinite, no matter what people may expect. If you need to scale out promptly, given region limitations, you may run out of available resources in a particular high availability zone or region.

In that case, if you have another cloud provider available that has resources located closely, you can utilize those instances to scale out your environment.

It is very common to have a reverse proxy tier on top of the database tier to take advantage of horizontal scaling. A reverse proxy can act as a gateway, balancer, router, and firewall to the database service as it scales in size.

3. Sharding

Sharding involves splitting a large database into smaller shards or partitions, each of which is stored on a separate database server. This allows the workload to be distributed across multiple servers and can improve query performance, especially if your application supports parallelism or is write-intensive.

With sharding, it is possible to bring the data closer to the users by splitting the data across the shards using geographical designation (country codes, postal codes, IP addresses etc). In a multi-cloud environment, it is easier to build the data infrastructure closer to the users. For example, if one of your cloud providers does not have a data center in particular location, it is quite likely that some other CSP will have it.

4. Replication

Replication involves creating copies of a database on multiple servers in different cloud regions. This improves read performance by allowing users to read from the copy of the database closest to them. It also provides live backup and redundancy in case the primary database server is unavailable.

Just like with sharding, using multiple cloud providers lets you bring the data closer to your users.

5. Cloud-native services

Cloud-native services such as load balancers, auto-scaling, and caching can help improve the scalability of multi-cloud database infrastructure. Load balancers can distribute traffic across multiple servers, while auto-scaling can automatically add or remove servers based on demand.

Additionally, caching tier can be used to cache the most frequently accessed data to offload the database instances.

6. Hybrid cloud

A hybrid cloud approach can be used to scale multi-cloud database infrastructure. This involves using a combination of public cloud and private cloud resources to scale the database infrastructure as needed.

There are many reasons why you may want to go hybrid, including data protection laws that require PII to be stored in a particular location or country. Having the ability to span your infrastructure not only across multiple public clouds, but also on-premises, may make your life easier when you’ll have to deal with this type of regulatory requirement.

6 steps to a multi-cloud database infrastructure with auto-scaling

You can achieve a multi-cloud database cluster with auto-scaling, which can help improve your database infrastructure’s scalability, availability, and reliability while reducing costs and improving performance. 

Achieving a multi-cloud database cluster with auto-scaling involves six key steps.

1. Choose the right database system

Choose a database system that supports multi-cloud deployment and auto-scaling. Popular options include MySQL, PostgreSQL, MongoDB, and Cassandra.

Which one to choose depends on your workload and the type of data that you want to have stored and processed. This is an important step that will determine the ease of management and the performance of your database tier – make sure you make a good, well-thought decision.

2. Select compatible cloud providers

Opt for cloud providers that support the chosen database system and suit your business needs. This includes evaluating the cost, performance, security, and scalability of each cloud provider.

Verify what kind of interconnectivity is available for each of them – you will have to ensure that the traffic flows between them uninterrupted. You should also take into consideration the cost of that traffic.

3. Deploy your database nodes

Deploy database nodes across multiple clouds in different regions. Ensure that the database nodes are configured to communicate with each other and are synchronized with the latest data.

Keep an eye on the latency as network and latency that comes with keeping the data in multiple geographical locations is one of the important factors that affects the stability and performance of the multi-cloud environments.

4. Configure a load balancer

Configure a load balancer to distribute traffic across the database nodes. This can be done using cloud-native load-balancing services or by using third-party load-balancing solutions.

No matter what you will choose, make sure that load balancers are configured properly and the load balancing tier is highly available. If you use cloud-native services this is probably out of the box. For external, third-party software, you may have to take extra steps.

5. Enable auto-scaling

Enable auto-scaling on the database nodes to automatically add or remove nodes based on the workload. This can be done using cloud-native auto-scaling services or by using third-party auto-scaling solutions.

Make sure that you test and tune the auto-scaling solution to your liking. Do you expect a slow, gradual increase in the load? Do you predict fast and sudden spikes of load? Different scenarios may require different configurations and designs.

6. Monitor and optimize your cluster

Monitor the database cluster closely and optimize it for performance and cost. This includes setting up alerts for critical events, optimizing database queries, and scaling up or down based on workload patterns.

Monitoring the data tier is critical – databases are crucial for the data availability and the data availability is a requirement for any business to be able to operate.

Wrapping up 

Scaling a multi-cloud database infrastructure is crucial for businesses that require high scalability, availability, and reliability. However, it is complex and requires careful planning, execution, and monitoring for optimal resource utilization, data privacy, and security.

To efficiently scale, you must consider best practices such as database design, auto-scaling, load balancing, and monitoring to ensure efficient, secure, and optimal performance scaling.

If you’re considering a multi-cloud approach, make sure you’re aware of the common challenges and how to address them. Despite the challenges, organizations leverage the benefits of multi-cloud architecture for better performance, flexibility, and cost savings.

Stay on top of all things multi-cloud by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post Best Practices for Scaling A Multi-Cloud Database Infrastructure appeared first on Severalnines.

]]>
ClusterControl adds Enterprise Binaries and new GUI in latest release https://severalnines.com/blog/clustercontrol-adds-enterprise-binaries-and-new-gui-in-latest-release/ Thu, 31 Aug 2023 21:43:32 +0000 https://severalnines.com/?p=28957 We’re excited to announce that ClusterControl v1.9.7 is now available! This latest release brings a major milestone —the official launch of our new web frontend, ClusterControl GUI v2. With a host of new features and improvements, this new version introduces enterprise binaries for PostgreSQL and MongoDB, new database version availability for PostgreSQL, MongoDB, MariaDB, MS […]

The post ClusterControl adds Enterprise Binaries and new GUI in latest release appeared first on Severalnines.

]]>
We’re excited to announce that ClusterControl v1.9.7 is now available! This latest release brings a major milestone —the official launch of our new web frontend, ClusterControl GUI v2. With a host of new features and improvements, this new version introduces enterprise binaries for PostgreSQL and MongoDB, new database version availability for PostgreSQL, MongoDB, MariaDB, MS SQL Server, and more.

All this is delivered via a stunning new GUI…database ops has never looked better.

We cannot wait to hear what you think about it. Let’s dive into the highlights!

New Enterprise vendor binaries

Managing enterprise-grade PostgreSQL and MongoDB clusters has never been easier. You can now choose between open-source and enterprise binaries when deploying PostgreSQL and MongoDB. This means if you have critical environments on these enterprise binaries or want to deploy new ones, you can monitor and manage EDB Postgres Advanced Server and MongoDB Enterprise clusters through their dedicated enterprise repositories when provisioning databases through ClusterControl’s single pane of glass.

ClusterControl GUI v2.0 — a modernized web frontend

A new era of database management user experience has arrived as we’ve migrated to a new innovative web frontend. ClusterControl GUI v2 is now the default web application and offers a modernized UI and improved user experience. This new version introduces an array of features to streamline your operations, including:

  • Mail notifications – Receive timely cluster alerts via email notifications for prompt responses.
  • Certificate management – Centralize SSL/TLS certificate handling, from creating new certificates to importing existing ones.
  • Advisors – ClusterControl-specific scripts for performance monitoring, giving you more control over your monitoring metrics.
  • Incident management services – Seamlessly integrate with services like PagerDuty and OpsGenie to ensure swift incident response.
  • Topology view – Glance into replication topology and access essential node actions for better management.

Upgrade to the latest database and operating system versions

ClusterControl 1.9.7 comes with support for several new database and operating system versions, including:

  • Database version updates:
    • PostgreSQL 15
    • MongoDB 6.0
    • MariaDB 10.11
    • MS SQL Server 2022
  • Operating system version updates:
    • RedHat 9
    • AlmaLinux 9
    • RockyLinux 9

Wrapping up

We’re excited for you to experience the new ClusterControl 1.9.7, so be sure to stick around as we continue to refine and expand its capabilities. For detailed notes on the latest features, including how to upgrade to ClusterControl v2 and how to start using the enterprise binaries, visit our changelog.

Thanks for being part of the ClusterControl community! Follow us on LinkedIn and Twitter, and subscribe to our newsletter to stay in the loop.

The post ClusterControl adds Enterprise Binaries and new GUI in latest release appeared first on Severalnines.

]]>
4 Major Challenges of Operating A Multi-Cloud Database https://severalnines.com/blog/challenges-of-operating-a-multi-cloud-database/ Tue, 22 Aug 2023 11:35:37 +0000 https://severalnines.com/?p=28882 Multi-cloud databases tend to be quite complex in terms of environment design. There are multiple aspects that have to be considered before deciding on the way the environment will be built. But you have to keep in mind that designing your setup is just one part of the life cycle. Once your design becomes an […]

The post 4 Major Challenges of Operating A Multi-Cloud Database appeared first on Severalnines.

]]>
Multi-cloud databases tend to be quite complex in terms of environment design. There are multiple aspects that have to be considered before deciding on the way the environment will be built.

But you have to keep in mind that designing your setup is just one part of the life cycle. Once your design becomes an actual live environment, you also have to ensure that everything is working properly.

This part is an ongoing responsibility for the years ahead.

So, what about the day to day operations? How challenging is it to keep a multi-cloud setup running?

It can feel like spinning plates at times – an intricate balancing act where you need to know how and when to adjust.

There are four main challenges that come with operating a multi-cloud database. Let’s explore them one by one.

1. Networking

Managing anything that spans across multiple cloud service providers (CSPs) means that you will have to deal with the network quite often. There are two main areas where you may experience some challenges.

Network stability

Ideally, network stability should have been taken care of as part of the architecture design. But if you have network stability issues, you need to take care of the redundancy for the inter-cloud links that you are using.

Databases do not like to experience transient network errors and the loss of connectivity. Well, we can easily say that no service likes that but the databases are quite special here. They store data thus their availability is critical. This is why we come with different designs that are intended to keep the database up and available.

Automated failover, quorum calculation, load balancing – all of this is supposed to keep the database running. The problem is that nothing is perfect – whatever measures you implement, there’s pretty much always a situation where those would be not enough to protect the availability of our data.

As a result, while we write a code to perform the automated recovery of the services and we run tests that help us to feel confident that they will work correctly, we would rather not offer that code an opportunity to be tested against weird random network crashes.

Instead, you should be focused on minimizing the probability of experiencing a network failure.

Network throughput

The network also has to accommodate the data flow between the CSPs. This is why you have to be constantly on the lookout for any potential problems that could either increase the data flow between the database nodes or decrease the throughput of the network connections.

Backup processes, scaling up, rebuilding failed nodes – all those processes require data to be sent over the network. In each case, you will need to decide your approach.

For example, you need to choose whether a centralized backup server or a local backup is best. This will need to be established for every cloud that you use.

Network utilization will need to become an important part of your processes. When you build up new nodes or rebuild failed ones, they should be designed with network utilization in mind.

Your script should take extreme care to reuse local data, rather than picking random nodes and transferring the full data set over the internet.

You will need to think about the security measures that secure the network connection between clouds. Whatever you put in place needs to handle large data transfers. The speed of the transfers can also be impacted by the encryption method that you choose.

2. Failover and recovery

Failover and recovery is closely related to networking, but brings its own set of considerations to the table.

Network failures, eventually, will happen – it is delusional to assume otherwise. You have to be prepared for these inevitable situations.

In most cases, there will be an automated failover method in place. You may have inherited it from previous engineers, written a code that performs it, or be relying on an external software that provides this functionality.

It’s important to understand exactly how this process works. For example, you should be clear on how it behaves in more complex failure scenarios. After all, there are many ways in which a complex environment may experience failures. If you know your solution, you should be able to understand why the script behaved like it did and how to improve its behavior in the future.

You also need to know whether it’s quorum-aware or not, and if there is any possibility of a network split. Ultimately, it falls on you to determine if the failover process for your multi-cloud database is safe.

The ability to understand and improve the failover handling is critical when it comes to multi-cloud environments.

Automated failover is, usually, one of the most complex processes when it comes to database management. This is always true. Even if some of the databases perform automatic failovers, it just only means the complexity is hidden under the hood and it is not exposed to the database administrator to full extent.

3. Data size

When operating a multi-cloud database, the size of the data you’re dealing with can become a problem in itself.

In most cases, data is collected and stored in the database. As more and more data is collected and stored, the database increases in size. This makes the challenge constant and, in fact, never ending.

Two areas affected by this continuous growth in data size are the network itself and your backup and recovery processes.

Network

So, you start with your initial design and you plan how the network should look. You determine what kind of throughput is required. Then you start to operate such a database.

Over time, you begin to see that every operation involving a data transfer (provisioning of a new node, rebuilding of a failed node, running backups) starts to get slower and slower. The more data you have to transfer, the slower the transfer on the network will be.

Then, the challenge becomes identifying the correct point in time to make changes and increase the network capacity. Otherwise, your Recovery Time Objective (RTO) will be impacted.

This cycle repeats itself the more your data grows.

Backup and recovery

The network is not the only element of the environment that is impacted by data size. Your backup and recovery processes are also affected.

The reason why is fairly simple – the more data you have, the longer it takes to complete the backup.

At some point, your RTO will be impacted and you have to start considering other options.

One of the solutions, which is quite in-line with a multi-cloud approach, is to shard the data to make the data set smaller and keep the data close to the user. For example, you can use geographical location as the shard key and store the data in the closest datacenter to the region that is served.

This also helps to speed up the backup process as the total data size is split in several parts.

However, this presents a new challenge. If you shard the data, you need a way to ensure a consistent backup is possible – even when the data set is distributed across multiple shards. This is crucial if, for some reason, you would need to perform a restore of a full data set.

In that scenario, you want to start from scratch using the data from one particular point in time across all the shards. In some cases there are readily available backup tools that can be leveraged, but for most datastores you have to figure out a solution on your own.

This may involve monitoring the transaction logs (oplog, WAL, binary log) to track the state of the database and apply the transactions up to a particular point in time, same across all of the shards.

Another challenge would be to determine where to store the backups. To stay on the safe side, the data should be stored in several locations to make sure it’s still available even if one of the data centers experiences a problem.

This idea, while sound, requires you to transfer the data between the cloud providers, over the network – which will have limited capacity.

4. Security

The security of the data is a critical aspect of day-to-day database operations in almost every environment. While it’s not trivial to ensure the security within a single cloud provider, utilizing a multi-cloud database poses significantly higher risk. That’s because data has to be protected at rest and in transit.

With a multi-cloud database, there is a far greater amount of data that has to be in transit at any given time. We are talking about replicating the data across multiple cloud providers over the internet. This is not a network link within a single VPC, it’s an open network.

This network link between cloud providers has to be secured.

VPN should be established, but you need to pick the right solution. Open source has multiple ways in which a network can be secured. Will you use software? Hardware? How expensive will it be?

Keep in mind that network utilization might be quite high, and that could limit your options.

If you are considering an open source solution, for example, will it be efficient enough to transfer your data with the speed that matches your RTO? What kind of hardware is needed to achieve required network throughput? What if the required throughput increases due to increase in data size? What are your options to compensate for that?

Another option is to rely on SSL encryption for replication and frontend-backend connections to provide the security. What about the other processes? Backup? How would you transfer the backup data?

You could use SSH tunneling. But are you sure that you have SSL termination properly configured? Are you sure that, if some part of the database becomes unavailable, your load balancers will reconnect to correct backend nodes while still using SSL connection?

As you can see, the problems pile up and they may show up in later time, when some changes in the data size or even a workload pattern show up.

Wrapping up

The main challenge in a multi-cloud environment is that problems show up that are not present in less complex scenarios. It’s only in a multi-cloud setup that you’ll need to redirect traffic across multiple datacenters, ensure the load won’t saturate the network between the data centers, and keep the connections secure.

Failure recovery is another part of the challenge as it may not be trivial in a distributed multi-cloud database. There are multiple factors that you have to consider while working with such a complex environment. But, in the end, this is the way to achieve ultimate redundancy and scalability (not to mention other perks, like cost reduction).

Stay on top of all things multi-cloud by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post 4 Major Challenges of Operating A Multi-Cloud Database appeared first on Severalnines.

]]>
10 Considerations for Architecting a Multi-Cloud Database https://severalnines.com/blog/architecting-multi-cloud-database/ Wed, 16 Aug 2023 11:49:04 +0000 https://severalnines.com/?p=28857 Thinking about architecting a multi-cloud database? It’s an increasingly common topology nowadays. The adoption rate of a multi-cloud approach has been growing fast for years. It’s a solid option for a disaster recovery plan (DRP) and offers many other benefits, including flexibility, scalability, and high availability. However, when it comes to your database deployments, there […]

The post 10 Considerations for Architecting a Multi-Cloud Database appeared first on Severalnines.

]]>
Thinking about architecting a multi-cloud database? It’s an increasingly common topology nowadays.

The adoption rate of a multi-cloud approach has been growing fast for years. It’s a solid option for a disaster recovery plan (DRP) and offers many other benefits, including flexibility, scalability, and high availability.

However, when it comes to your database deployments, there are several elements to keep in mind.

Let’s explore ten key considerations to keep in mind when architecting a multi-cloud database, as well as how to mitigate some of the common problems that may arise.

1. Performance

In a multi-cloud environment, it’s important to consider critical things like response times, network latency, bandwidth, and the speed at which the database can process queries.

Data transfers between different cloud providers can be time-consuming and costly, so it’s important to have a database architecture that supports fast and efficient exchanges.

The connection between cloud providers should be asynchronous, if possible.

This will improve performance, but it will do so at the cost of a reduction in high availability.

In case of failure, if you need to failover and the data was not up-to-date, you will have data loss or data inconsistency.

Still, more often than not this is better than accepting the performance issues that come with synchronous replications.

With synchronous replication, the remote nodes need to confirm each statement before applying it in the primary node.

2. High availability and scalability

No matter what cloud provider you are using, you will need the option to scale your database topology in a horizontal or vertical way:

  • Horizontal scaling (scale-out) is performed by adding more database nodes creating or increasing a database cluster.
  • Vertical scaling (scale-up) is performed by adding more hardware resources (CPU, Memory, Disk) to an existing database node.

You can scale either way manually if you are expecting or having more traffic for any reason.

Some cloud providers also allow you to configure it in an automatic way.

This means you won’t need to worry about how much traffic you are receiving to know how many replicas you need to add.

The cloud provider will add it for you according to the rules that you create. Then, you just need to pay the bill every month.

Different cloud providers offer different levels of scalability, and it’s important to ensure that your multi-cloud database can scale to meet changing needs.

The cost of scaling should be taken into account, as it may impact the overall budget.

3. Availability and durability policies

Some cloud providers have at least 99.99% uptime.

Even so, it’s always good to check their SLA on the different offerings for availability and durability.

The cloud providers might offer different solutions priced higher to achieve high availability and durability, and depending on the business, it could be necessary to use a different solution than the default one.

You should make sure that your database will be available all the time (or almost), so you should be able to deploy it in different regions.

In case of a critical failure, like a data center issue, you will be able to switch to another region to keep your systems working.

Keep in mind that if you are using synchronous replication, the latency of having your databases running in different geographical regions could be a problem.

4. Data consistency

Maintaining data consistency across multiple cloud providers can be a challenge.

It’s important to consider how data will be synchronized and whether there are any potential bottlenecks that could impact performance.

Split-brain is a common issue in multi-cloud environments.

This occurs when more than one primary node is available at the same time, which allows the application to write in both nodes and it is not handled properly by the application or database technology.

In this case, you will have different information on each node, which generates data inconsistency in the cluster.

Fixing this issue is extremely difficult as you must merge data, which is not always possible.

5. Data sovereignty and compliance

Depending on the nature of the data being stored, the laws of different countries and regions may come into play.

It’s important to understand where your data will reside and whether it will be subject to specific legal requirements, as different cloud providers have data centers located in different locations.

The cloud provider should follow privacy laws and comply with regulations to provide maximum data protection.

The EU’s General Data Protection Regulation (GDPR) has strict regulations on storing sensitive data. Also, several EU members don’t allow sensitive data to be stored outside of national borders.

Make sure your multi-cloud setup is compliant with any regulations that apply to your business.

6. Security

Security is a top priority for organizations, and multi-cloud databases are no exception.

It’s important to consider the security features offered by each cloud provider, including data encryption, authentication, and access control.

For security reasons, the communication between the cloud providers must be encrypted, and you must restrict the traffic only from known sources to reduce the risk of unauthorized access to your network.

The usage of VPN, SSH, or Firewall Rules, or even a combination of them, is a must for this point.

Also, the cloud provider should offer encryption for data-at-rest and even in-transit. This encryption protects the data from being used by an unauthorized person during the time that it is stored in the cloud.

7. Easy management

The cloud providers should provide an easy management console where you can configure, manage, and monitor your databases running in the cloud.

Otherwise, you can convert a simple task to a complex one, which doesn’t make sense.

8. Support

Each cloud provider has its own support structure, and it’s important to consider how this will impact your multi-cloud database.

This includes the level of support available, response times, and the availability of experts to help with any issues.

9. Cost

Running a multi-cloud database can be expensive. Especially if the same database is being run on multiple platforms.

This could be the most crucial point, and the most complicated one, as cloud providers often display their cost to make it look cheap at a glance.

In general, cloud providers charge you for the amount of traffic that you have hourly/monthly and in a multi-cloud environment, depending on the size of the data, the invoice could be huge.

10. Vendor lock-in

Each cloud provider’s product will almost always run better than an open-source product.

This is due to the fact that it was designed and tested to run on the cloud provider’s infrastructure.

The performance will often be considerably better than the second one.

The problem is if you are using more than one cloud provider and you are using a cloud provider’s product, most probably you will have a technology lock-in problem as the product is only available in one cloud provider and it is not compatible with another one.

So try to avoid built-in products if possible and go for an open-source solution.

Wrapping up

When it comes to architecting a multi-cloud database, there are several important considerations to keep in mind.

By taking the time to evaluate your needs and the options available, you can ensure that your multi-cloud database is optimized for performance, cost-effectiveness, and security.

With the right strategy in place, you can enjoy the benefits of a multi-cloud database and remain competitive in today’s rapidly changing business environment.

Do you run a multi-cloud setup already? Try ClusterControl for free for 30 days to see its power in uniting instances together here. Need more time to decide to try it out? Here’s a recent look at our client experience managing multiple clouds with ClusterControl.

Stay on top of all things multi-cloud by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

The post 10 Considerations for Architecting a Multi-Cloud Database appeared first on Severalnines.

]]>
Enabling enterprise logging with Elasticsearch, Kibana, and ClusterControl https://severalnines.com/blog/enabling-enterprise-logging-with-elasticsearch-kibana-and-clustercontrol/ Tue, 30 May 2023 19:00:52 +0000 https://severalnines.com/enabling-enterprise-logging-with-elasticsearch-kibana-and-clustercontrol/ Today’s typical enterprise database deployment footprint consists of different types of databases, each suited to the applications serving specific business needs. This may include SQL (MySQL/MariaDB, PostgreSQL, Redis, SQL Server, etc) and NoSQL (MongoDB, Redis, etc).  Having tens if not hundreds or even thousands of database instances running in an enterprise, depending on the size […]

The post Enabling enterprise logging with Elasticsearch, Kibana, and ClusterControl appeared first on Severalnines.

]]>
Today’s typical enterprise database deployment footprint consists of different types of databases, each suited to the applications serving specific business needs. This may include SQL (MySQL/MariaDB, PostgreSQL, Redis, SQL Server, etc) and NoSQL (MongoDB, Redis, etc). 

Having tens if not hundreds or even thousands of database instances running in an enterprise, depending on the size of the enterprise, makes it a nightmare to centralize database logs which would subsequently make it convenient to analyze the logs for multiple purposes, including but not limited to, root cause analysis of performance problems, and threat analysis and detection. 

In this post, we will show how one can accomplish this by combining ClusterControl with Elasticsearch to, A) centralize logs from database instances, and B) analyze those logs. Let’s start with covering why enterprises often opt for Elasticsearch to handle logging duties.

Why Elasticsearch for enterprise logging?

Elasticsearch is a scalable document store that is capable of storing unstructured documents and performing searches over those documents in a highly efficient and performant manner. This makes it a suitable candidate for storing database logs and subsequently searching those logs. 

Enterprises typically store database logs from all the databases of their enterprise-wide fleet in a single centralized Elasticsearch store and allow analysts to explore and search for useful information (e.g. detect potential unauthorized access or other related cyber threats and root-cause potential performance problems with the database) in those logs.

Why ClusterControl for Elasticsearch ops?

ClusterControl is a database orchestration platform to manage and monitor database operations on-premises and public, private, or hybrid cloud environments. It covers the full-ops milieu, such as database provisioning, performance monitoring, fault detection, disaster recovery, and much more. 

For a full list of capabilities, please refer to the pricing page.

Now that you know what ClusterControl is and how it’ll actually mediate Elasticsearch operations, and why you want to use Elasticsearch for log aggregation, let’s look at the process for deploying and configuring Elasticsearch for log aggregation, Kibana for visualization, and of course Filebeat for transferring those logs.

Setting up Elasticsearch for log aggregation

We will use ClusterControl to deploy an Elasticsearch (version 7.x) cluster which will be used to store the logs from multiple database instances in an enterprise. The following screencap shows how to deploy an Elasticsearch cluster.

We need to take note of some details in order to be able to access the Elasticsearch instance:

  1. Coordinates of the endpoint: http://<host>:<port> (e.g. http://h5:9200)
  2. Login credentials (username and password). E.g. “esadmin” and “es7Admin”

Setting up Kibana for log visualization

We will also set up Kibana, on the same host as the Elasticsearch host, in order to be able to search and explore logs stored in our Elasticsearch. Here are instructions to install and configure Kibana. (NOTE: instructions are for Ubuntu/Debian. Please perform equivalent for RHEL and equivalent distributions).

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
$ sudo apt-get install apt-transport-https
$ echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
$ sudo apt-get update && sudo apt-get install kibana
$ sudo update-rc.d kibana defaults 95 10
$ sudo /bin/systemctl daemon-reload
$ sudo /bin/systemctl enable kibana.service


$ sudo cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.orig


$ sudo vi /etc/kibana/kibana.yml 
# search the following arguments and set the corresponding values as shown below
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://h5:9200"]
kibana.index: ".kibana"
kibana.defaultAppId: "home"
elasticsearch.username: "esadmin"
elasticsearch.password: "es7Admin"


$ sudo systemctl start kibana

Setting up Filebeat for database logs

Filebeat will be used to push logs from the database nodes/hosts to the Elasticsearch. Filebeat should be installed on each of the database nodes.

General filebeat installation irrespective of the database type is as follows. We will install the OSS edition of filebeat. You have the option to install the enterprise version as well (see commented line below). 

NB: instructions are for Ubuntu/Debian. Please perform equivalent for RHEL and equivalent distributions.

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ sudo apt-get install apt-transport-https
# echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
$ echo "deb https://artifacts.elastic.co/packages/oss-7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
$ sudo apt-get update && sudo apt-get install filebeat
$ sudo update-rc.d filebeat defaults 95 10
$ sudo systemctl daemon-reload
$ sudo systemctl enable filebeat


$ sudo vi /etc/filebeat/filebeat.yml
setup.kibana:
  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "h5:5601"


output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["h5:9200"]
  # Protocol - either `http` (default) or `https`.
  #protocol: "https"
  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "esadmin"
  password: "abc123"




$ filebeat test output
$ filebeat modules list


# Note: enable the appropriate module for the database type on the host. The enterprise version of filebeat
# has support for Oracle, MSSQL, etc
$ filebeat modules enable redis | mysql | postgresql | mongodb
$ filebeat setup -e
$ filebeat setup --dashboards


$ sudo systemctl start filebeat

Once installed on a database node, filebeat should start pushing database logs from the database host to Elasticsearch. This would allow us to explore and analyze those logs using the Kibana.

PostgreSQL log path modification

Modify filebeat manifest for Postgres to include the highlighted line below. Substitute the appropriate Postgres version for “<version>”. Then restart filebeat. (e.g. “systemctl restart filebeat”)

root@h7:/var/log# vi /usr/share/filebeat/module/postgresql/log/manifest.yml 
module_version: "1.0"
var:
  - name: paths
    default:
      - /var/log/postgresql/postgresql-*-*.log*
      - /var/log/postgresql/postgresql-*-*.csv*
      - /var/lib/postgresql/<version>/main/log/postgresql-*.log*
    os.darwin:
      - /usr/local/var/postgres/*.log*
      - /usr/local/var/postgres/*.csv
    os.windows:
      - "c:/Program Files/PostgreSQL/*/logs/*.log*"
      - "c:/Program Files/PostgreSQL/*/logs/*.csv"
ingest_pipeline:
  - ingest/pipeline.yml
  - ingest/pipeline-log.yml
  - ingest/pipeline-csv.yml
input: config/log.yml

MongoDB modification

Modify filebeat manifest for MongoDB to include the highlighted line below. Then restart filebeat. (e.g. “systemctl restart filebeat”)

# For MongoDB
vi /usr/share/filebeat/module/mongodb/log/manifest.yml
module_version: 1.0


var:
  - name: paths
    default:
      #- /var/log/mongodb/mongodb.log
      - /var/log/mongodb/mongod.log
    os.windows:
      - c:\data\log\mongod.log


ingest_pipeline:
  - ingest/pipeline.yml
  - ingest/pipeline-plaintext.yml
  - ingest/pipeline-json.yml
input: config/log.yml

Logging into Kibana to visualize logs

Point your browser to the kibana host port 5601 and log into kibana using the credentials used during installation and setup. (e.g. http://<kibana-host>:5601). Once successfully logged in to Kibana, navigate to “Analytics => Discover”. There should be an index that has already been created in Elasticsearch called “filebeat-*” and you should be able to view logs that are coming through from the databases as shown below.

There are also custom dashboards for visualizing logs for specific database types. These dashboards are available under “Dashboards”. I’ve prepared some predefined logs visualization dashboards for MySQL/MariaDB, PostgreSQL, MongoDB and Redis and pointed out some of their specific log types below.

MySQL/MariaDB

The built-in dashboard for MySQL (and/or MariaDB) allows visualization and exploration of logs specific to MySQL and MariaDB. These include slow queries from the slow-query log, mysql error log, and regular mysql daemon log. You can explore these logs to root-cause potential problems and gather information on database access patterns.

PostgreSQL

The built-in dashboard for PostgreSQL allows you to explore access logs resulting from the pgaudit plugin and general postgresql daemon log.

Here’s a link to enabling slow queries in Postgres.

Visualize Postgres logs, including that of slow queries and audit as shown below.

MongoDB

Redis

Wrapping up

We have shown how one can deploy an Elasticsearch cluster using Severalnines ClusterControl and subsequently push database logs to that Elasticsearch cluster using Filebeat and analyze the logs using Kibana.

Using a tool like ClusterControl to orchestrate the operations of your Elasticsearch and the rest of your polyglot, open-source database footprint environment gives you a true single pane of glass over your entire footprint, regardless of environment. Try it out commitment-free for 30 days.

Not looking for a solution at the moment? In the meantime, follow us on LinkedIn & Twitter and subscribe to our newsletter to get more content on open-source database operations best practices and the latest Severalnines updates.

The post Enabling enterprise logging with Elasticsearch, Kibana, and ClusterControl appeared first on Severalnines.

]]>
Common use cases for multi-cloud databases https://severalnines.com/blog/common-use-cases-for-multi-cloud-databases/ Tue, 30 May 2023 16:52:24 +0000 https://severalnines.com/?p=27268 Multi-cloud database has become a common term heard in the IT world, but what does it really mean? What does it entail, and why is it an important topic of discussion? Is it something every organization needs? In this blog, we’ll dive into these questions and explore various use cases that might point organizations toward […]

The post Common use cases for multi-cloud databases appeared first on Severalnines.

]]>
Multi-cloud database has become a common term heard in the IT world, but what does it really mean? What does it entail, and why is it an important topic of discussion? Is it something every organization needs?

In this blog, we’ll dive into these questions and explore various use cases that might point organizations toward a multi-cloud infrastructure.

What is a multi-cloud database?

Let us start by trying to understand what a multi-cloud database is. 

Multi-cloud implies multiple different clouds. So, we are talking about a database spanning across multiple clouds. The majority of cloud service providers come with some kind of a DBaaS (database-as-a-service) solution (Amazon RDS, for example). The thing is, such solutions are not good in terms of interoperability with other cloud providers. Typically you can implement some sort of replication, but the main reason for this feature is not building a scalable database but importing the data into the DBaaS. There are significant limits in terms of what you can or cannot do.

If not the DBaaS, then we have to focus on compute instances as the building blocks for our multi-cloud database — this is the most common pattern. Using compute instances allows for the most flexibility regarding how to approach the inter-cloud connectivity, how to configure the databases, and how to implement automated recovery for the cluster. This makes it possible to build different types of environments, either relying to some extent on the tools made available by the CSP or relying solely on the open-source software, building a cloud-agnostic setup.

Is building and managing a multi-cloud database easy?

The short answer is it is not. Even the fact that you must work with multiple nodes separated by WAN links makes the process challenging. How to deal with network splits? How to handle failures of one or more data centers?

Those questions are not easy to answer. They will require knowledge and experience in building WAN-spanning networks and databases. Operating such a database is a challenge on its own as well. The question may arise, why are we even talking about such a concept? What are the advantages of it that would overshadow the disadvantages and challenges?

Multi-cloud databases and their use cases

Let’s talk about some of the reasons why organizations around the world go through the paces to build those complex environments. As you may think, there are many reasons for this to happen.

Disaster recovery and survivability

Disaster recovery is, by far, the most common reason why people decide to go multi-cloud. 

Data is one of (if not the most) important assets of any organization; therefore, its well-being and safety are very important. We want the data to be safe, and we want to be able to recover it should something happen.

People are setting up replication to mitigate the risk of hardware failure. We are utilizing multiple availability zones to ensure that the infrastructure (power, cooling, network) is redundant and that a failure of one of its elements will not have a negative impact on the availability of the organization’s data. Then we are talking about utilizing multiple regions to protect the data from even the most serious hazards (hurricanes flooding the data center, uncontrolled fires, or such subtle problems like an excavator cutting main fiber lines leading to the data center).

This escalation ladder goes further. In one of the highest levels, we finally have an environment spanning across multiple cloud providers, ensuring that even a complete closure of a single CSP will not impact the availability of your data. Of course, the more you want to be protected, the more expensive it will be. It all depends on how critical and valuable your data is. In some cases, where the infrastructure is required to be available all the time, this is a viable option.

Data sovereignty

Another very important reason to implement multi-cloud environments is to have complete control over where your data is stored. 

As you may know, today’s world is full of regulations that govern where and how particular types of data can be stored. If your organization is dealing with sensitive data, you may have to comply with standards like HIPAA, PCI DSS, or GDPR that define what you can and cannot do with your data. This typically involves knowing where the data is allowed to be located. You may not be allowed to store data that belongs to an organization located in a European country in a data center that is located in another country (United States, for example). 

In some cases, the problem becomes even larger. You may be forbidden to store your data in a European data center that belongs to an American company. In this example, you are practically banned from using the infrastructure of the main hyperscalers like Amazon Web Services, Google Cloud Platform, or Azure, even if you have the infrastructure in one of the data centers in the European Union.

Country-level law might be even more strict. If you are a government entity or working closely with one, you may have been unauthorized to store the data outside the country. This realistically forces you to use one of the service providers based in that particular country and have data centers within it.

Those regulations pose some challenges. Let’s assume that our organization provides services to multiple customers from different countries and works with data of varying levels of confidentiality. In such a case, you probably cannot use a one-size-fits-all solution. If you go ahead and build your infrastructure on AWS, you won’t be able to provide services to some of your customers (or you will limit the pool of potential customers that would be legally allowed to use your services).

It doesn’t mean that you cannot use AWS at all. In some cases, this might be a perfectly ok solution to use with some of your customers. You have to keep in mind the other issues, though. For that, you probably need to utilize some other cloud service providers, more minor, more local, that would allow you to build services that will meet the security requirements of your clients.

Building it, you must always know how you process your data. Connecting your AWS infrastructure with the local “branches” from particular countries is perfectly fine. The challenge is that you cannot process the “local” data by the software located in AWS. In most cases, building a “control plane” of your software solution would also be perfectly fine, and storing it in one of the big cloud providers. You can then use it to manage the rest of your solution as long as the data stays in the local “branches” and is never transferred outside the data center it has been stored in.

Cost-awareness

Cloud infrastructure cost reduction is another common reason for utilizing a multi-cloud setup. Large hyperscalers provide a huge variety of services, different types of data stores, and numerous kinds of solutions to process the data. This allows organizations to quickly build complex environments tailored to particular data processing needs. This has the other, darker side —CSP lock-in and a price tag attached to those services. Vendor lock-in is a topic for another discussion, and you can avoid it by skipping custom services and utilizing compute resources to build your data pipeline with open-source software.

No matter what service you use, you will quite commonly find that large CSPs are expensive. Sure, custom, managed services are never cheap, but even simple VMs are priced higher than in the case of smaller competitors. This is something organizations try to exploit to their advantage. It is not uncommon to see some services used in one particular cloud because, let’s say, there is no easy way to build an alternative utilizing open-source technology. Maybe it’s a lack of knowledge in the team and requires a complex setup and expensive maintenance. In that case, it might be perfectly reasonable to use a managed service for that particular technology while reducing expenses by using a cheaper cloud service provider to compute resources used to build the rest of the data processing infrastructure.

Another common reason to utilize multiple cloud services is to have the option to quickly and easily migrate between them in case it makes sense financially. Prices change over time, and what was a good decision a year ago may be a wrong decision to stick to in the long run. Building your environment across multiple clouds lets you move the resources from one CSP to another if it helps you to reduce expenses.

Scale out

Finally, let’s talk about one more reason to go multi-cloud. If you are a large organization, you are utilizing a large fleet of compute instances. We are talking about thousands of instances. Let’s say that, for some reason, you expect a significant increase in load on your systems. It could be some event that will bring you more traffic; it can be a marketing effort. What’s important is that you have to scale out and do it fast.

The problem is that the cloud, despite what marketing says, is just someone else’s computer. A CSP may be able to spin up a couple of thousand VMs, but it’s not something you can take for granted. Especially smaller CSPs may struggle to provide you with such infrastructure upon request. Organizations, which have built their environment across multiple clouds, are in a better position to deal with such limitations – the more clouds your environment is spanning, the easier it will be to perform a large scale-up in a short time frame – more CSPs mean more resources available for you to use.

To sum up, while building a multi-cloud environment is not easy, it offers several benefits that may outweigh the drawbacks and challenges. It is something every organization should consider at the planning stage. It comes with a great deal of flexibility and presents you with more options to choose from should you encounter technical and financial challenges.

Wrapping up

Multi-cloud databases are complex systems with many operational challenges, yet organizations are adopting them due to benefits like improved disaster recovery, data sovereignty, and compliance. Building a multi-cloud environment allows greater control over data storage, security, and legal compliance. Despite challenges and costs, the potential advantages make this a crucial topic in the IT world.

If you’re considering moving towards a multi-cloud implementation, check out how to address some of the common challenges of multi-cloud architectures, and download our free multi-cloud guide for a more in-depth look at the what, why, and how of building a multi-cloud setup.To stay in the loop on all things multi-cloud, don’t forget to subscribe to our newsletter and follow us on LinkedIn and Twitter, as we’ll be sharing more great content in the coming weeks. Stay tuned!

The post Common use cases for multi-cloud databases appeared first on Severalnines.

]]>
Failover Modes for SQL Server cluster on Linux https://severalnines.com/blog/failover-modes-for-sql-server-cluster-on-linux/ Mon, 01 May 2023 17:48:00 +0000 https://severalnines.com/?p=26662 In a previous article, we explored two different high-availability and disaster recovery solutions for SQL Server on Linux—log shipping and SQL Server Always On Availability Groups. If you’re using Availability Groups (AG), it’s essential to understand its supported failover modes prior to AG configuration. In this post, we’ll cover cluster configurations for high availability, the […]

The post Failover Modes for SQL Server cluster on Linux appeared first on Severalnines.

]]>
In a previous article, we explored two different high-availability and disaster recovery solutions for SQL Server on Linux—log shipping and SQL Server Always On Availability Groups.

If you’re using Availability Groups (AG), it’s essential to understand its supported failover modes prior to AG configuration.

In this post, we’ll cover cluster configurations for high availability, the failover process for each cluster type supported in SQL Server Linux, and how to perform failover for SQL Server Linux using ClusterControl.

Failover options in SQL Server Always On Availability Groups

From SQL Server 2017 onwards, the Linux version supports the following cluster types:

  • None: The cluster type None configures Availability Groups (AG) between standalone nodes. It does not use Pacemaker configurations. It supports manual failover from primary to secondary replicas.
  • External: The External cluster type requires a Pacemaker underneath the AG for managing the cluster. It supports automatic and manual failover. 

SQL Server cluster configurations for high availability

SQL Server on Linux supports cluster configurations for high availability. The concept behind the cluster configuration in Linux is similar to the Windows Server Failover Cluster. A few characteristics of clustering in SQL Linux are as below:

  • SQL Server Linux requires Pacemaker for clustering configurations.
  • The Standard edition supports two replicas; one primary and one secondary. The secondary replica is used for high availability purposes.
  • You cannot read data from the secondary node in the standard edition.
  • The Enterprise edition supports nine replicas; one primary and eight secondary replicas.
  • The Enterprise edition supports readable secondary replicas. You can have up to 2 secondary synchronous replicas out of eight secondary replicas. 

Let’s understand in detail the failover process in the None cluster type. As we know, it involves two standalone nodes for a high-availability configuration. It supports the following two failover modes:

  • Manual failover without data loss: In many cases, you might need to plan minor downtime on the primary replica for applying e.g., OS patches or hotfixes, or doing configuration changes. In these cases,  you can do a manual failover without any data loss. 

The steps for manual failover without data loss are as follows:

  1. Use the Alter Availability Group statement to make the current primary and target secondary replicas in the SYNCHRONOUS COMMIT mode. The synchronous commit ensures primary and secondary replicas are in full sync.
  2. Take the primary replica in offline mode to prepare for role switch.
  3.  Promote a secondary replica as a new primary replica.
  4. Update the old primary role as SECONDARY.
  5. Resume the data movement between new primary and secondary replicas.
  • Manual failover with data loss:  Suppose your primary replica crashes due to hardware failure. In this case, the primary replica is not available. It will take time to recover the server. Your application will not be available because it connects to the primary replica for executing queries. Your secondary replica is available, but the database is not accessible (in restoring mode). In this case, you can plan to force the failover to the secondary replica to point the application to start serving requests from the secondary replica. This type of forced failover can involve data loss. Therefore, you must use it only if the primary instance is unavailable.

The steps for the forced manual failover with data loss are as follows:

  1. Initiate the force failover from the secondary replica.
  2. After the force failover, remove the original primary from the new replica (Old secondary)
  3. Point application to the new primary replica.

If the original primary replica comes online, it will try to take the primary role. Therefore, we must remove the availability group configuration from the original replica and re-configure it. 

The steps to perform if the original primary replica comes online are as follows:

  1. Take the availability group offline on the original primary.
  2. Remove the availability group offline from the original primary
  3. Add the replica as a secondary replica from the new primary replica.

SQL Server Linux failover using ClusterControl

ClusterControl version 1.9.2 or higher supports high availability SQL Server 2019 clusters. Users can deploy the cluster with up to 8 nodes (1 primary and 8 secondary replicas) in asynchronous replication. Using ClusterControl to manage SQL Server on Linux provides the following benefits:

  • Graphical installation for the SQL Server instances
  • Automatic configurations for SQL Server Agent, backups, certificates, backup retention, and restoration with proper backup chain
  • Monitoring and alerting
  • Security and Compliance
  • High availability and disaster recovery (DR) with automatic failover

ClusterControl supports manual failover using graphical controls with and without data loss. 

  • Manual failover without data loss: You can initiate a manual failover without any data loss. The cluster logs show the commands, progress tracking the manual failover, and its status.
  • Forced failover with data loss: If the primary instance is unavailable, ClusterControl automatically performs forced failover and promotes the secondary replica as a new primary replica. It creates a new node as a new secondary replica, so you have a connected primary and secondary replica.

Let’s understand how the AG failover works using the ClusterControl for SQL Server on Linux. In my demo environment, I have three AG replicas:

Primary: mssql1,

Secondary: mssql2 and mssql3

Now, due to some maintenance activity, we need to perform a failover and switch node mssql1 role from primary to secondary. The node mssql2 should be the new primary replica. 

Click on the ellipses for the node mssql2 and Promote Replica. 

It opens the following promote replica page. You get an option – Force promotion of replica. Do not click on this option unless your current primary is down. 

Click on Promote to start a new job that will perform the SQL Server Linux AG failover. After this, the node mssql2 should be the primary. 

You can go to the activity center and view the job messages, status, SQL script it uses for performing the AG failover. 

The job shows the green tick or logs a message: successfully promoted a new master, stating the ClusterControl has performed the AG failover. 

You can view the status of the nodes, and it shows the followings: 

New Primary: mssql2

Old primary or current secondary: mssql1 and mssql3

Let’s say due to some issues, the primary replica is down. In this case, you required manual forced failover. 

Now we want to promote mssql3 as the new primary after forced failover. Click on Promote Replica from the mssql3 ellipses options.

Toggle the switch to enable force promotion of the replica, as shown below. 

View the job activity details as shown below. 

After the forced failover, the mssql3 node takes the role of a new primary replica. The old primary replica mssql2 is still in a failed state. 

If the primary replica becomes available again, you need to remove it from the cluster and add it as a secondary AG replica. Later, if required, you can perform a failover to switch the role back to primary. 

Wrapping up

When it comes to implementing a successful high availability and disaster recovery solution for SQL Server on Linux, understanding the failover modes available, how they work, and how to configure them is key. Hopefully, this article has helped clearly outline the steps involved in each failover process.

If you’re just getting started with SQL Server Always On Availability Groups, check out these steps to set up AG on Linux. Want more SQL Server content? Subscribe to our newsletter to have the latest posts delivered straight to your inbox, and follow us on LinkedIn and Twitter for regular database operation management tips.

The post Failover Modes for SQL Server cluster on Linux appeared first on Severalnines.

]]>