Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

The entire keyspace in Redis Clusters is divided in 16384 slots (called hash slots) and these slots are assigned to multiple Redis nodes. A given key is mapped to one of these slots, and the hash slot for a key is computed as:

HASH_SLOT = CRC16(key) mod 16384

Multi-key operations are supported on Redis Clusters as long as all the keys involved in a single command execution belong to the same hash slot. This can be ensured using the concept of hash tags.

The Redis Cluster Specification is the definitive guide to understanding the internals of the technology, while the Redis Cluster Tutorial provides deployment and administration guidelines.


Check out the top advantages of Redis Clusters to see how it can benefit your deployments:

High Performance

Redis Cluster promises the same level of performance as standalone Redis deployments.

High Availability

Redis Cluster ...

Read More on Datafloq
MySQL Tutorial – Understanding The Seconds Behind Master Value

MySQL Tutorial – Understanding The Seconds Behind Master Value

In a MySQL hosting replication setup, the parameter Seconds_Behind_Master (SBM), as displayed by the SHOW SLAVE STATUS command, is commonly used as an indication of the current replication lag of the slave. In this blog post, we examine how to understand and interpret the MySQL Seconds Behind Master value in various situations.

Possible Values of  Seconds Behind Master

The value of SBM, as explained in the  MySQL documentation, depends on the state of the MySQL slave in general, and the states of MySQL slave SQL_THREAD and IO_THREAD in particular. While IO_THREAD connects with the master and reads the updates, SQL_THREAD applies these updates on the slave. Let’s examine the possible values of SBM during different states of the MySQL Slave.

When SBM Value is Null

SBM is always NULL if your slave is stopped, or your SQL Thread is stopped (or not running).
SBM will also be NULL if the IO Thread is stopped, provided the SQL Thread has already processed all events from the relay log. A sample output of SHOW SLAVE STATUS (trimmed to show only values of interest) demonstrates this:



Slave_IO_Running: No

Slave_SQL_Running: Yes

Seconds_Behind_Master: NULL

Master_UUID: 23b326b1-a452-11e8-91ca-000d3a065e8e

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Retrieved_Gtid_Set: 23b326b1-a452-11e8-91ca-000d3a065e8e:818-389213

Executed_Gtid_Set: 23b326b1-a452-11e8-91ca-000d3a065e8e:1-389213

When SBM Value is Zero or Positive

SBM is going ...

Read More on Datafloq
How to Connect Your MongoDB Deployments to Robo 3T GUI

How to Connect Your MongoDB Deployments to Robo 3T GUI

Robo 3T (formerly Robomongo) is a popular desktop graphical user interface (GUI) for your MongoDB hosting deployments that allows you to interact with your data through visual indicators instead of a text-based interface. This open source tool has cross-platform support and actually embeds the mongo shell within its interface to provide both shell and GUI-based interaction.

As a highly-popular GUI leveraged by our MongoDB hosting customers, we’re providing this tutorial on how to quickly connect your ScaleGrid deployment with Robo 3T.


Identify Client Machine

The first thing we need to do is identify a machine to install Robo 3T. You can either create a new instance or pick an existing instance which has access to the ScaleGrid MongoDB cluster you’d like to connect to. For new ScaleGrid users, create a free 30-day trial account and set up your first MongoDB cluster (Robo 3T supported on all plans).

For our MongoDB Bring Your Own Cloud (BYOC) AWS deployments that are not open to the internet, this may mean selecting an instance that is allowed to connect to the Security Group and also has VPN connectivity to the Virtual Private Cloud (VPC).
For AWS customers who have their deployments open to the internet, this will involve adding the identified machine’s IP address to the ScaleGrid cluster’s whitelist.

Install Robo 3T

Robo 3T ...

Read More on Datafloq
MySQL High Availability Framework Explained – Part II

MySQL High Availability Framework Explained – Part II

In Part I, we introduced a High Availability (HA) framework for MySQL hosting and discussed various components and their functionality. Now in Part II, we will discuss the details of MySQL semi-synchronous replication and the related configuration settings that help us ensure redundancy and consistency of the data in our HA setup. Make sure to check back in for Part III where we will review various failure scenarios that could arise and the way the framework responds and recovers from these conditions.

What is MySQL Semi-synchronous Replication?

Simply put, in a MySQL semi-synchronous replication configuration, the master commits transactions to the storage engine only after receiving an acknowledgement from at least one of the slaves. The slaves would provide acknowledgement only after the events are received and copied to the relay logs and also flushed to the disk. This guarantees that for all transactions committed and returned to the client, the data exists on at least 2 nodes. The term ‘semi’ in semi-synchronous (replication) is due to the fact that the master commits the transactions once the events are received and flushed to relay log, but not necessarily committed to the data files on the slave. This is in contrast to fully synchronous replication, where the transaction would have ...

Read More on Datafloq
MySQL High Availability Framework Explained – Part I

MySQL High Availability Framework Explained – Part I

In this two-part blog series, we will explain the details and functionality of a High Availability (HA) framework for MySQL hosting using MySQL semi-synchronous replication and the Corosync plus Pacemaker stack. In Part I, we’ll walk you through the basics of High Availability, the components of an HA framework, and then introduce you to the HA framework for MySQL. MySQL High Availability Framework Explained - Part I

What is High Availability?

The availability of a computer system is the percentage of time its services are up during a period of time. It’s generally expressed as a series of 9′s. For example, the table below shows availability and the corresponding downtime measured over one year.

The meaning of High Availability varies depending on the requirements of your application and business.  For example, if you cannot afford a downtime of more than a few minutes per year in your service, we say that the service needs to have 99.999% High Availability.

Components of an HA Framework

The essence of being highly available is the ability to instantly recover from failures that can happen in any part of a system. There are four highly essential components in any HA framework that need to work together in an automated fashion to enable this recoverability. Let’s review ...

Read More on Datafloq
Managing High Availability in PostgreSQL – Part I

Managing High Availability in PostgreSQL – Part I

Managing high availability in your PostgreSQL hosting is very important to ensuring your clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. In an earlier blog post, we introduced you to configuring high availability for PostgreSQL using streaming replication, and now we’re going to show you how to best manage PostgreSQL high availability.

There are multiple tools available for managing the high availability of your PostgreSQL clusters using streaming replication. These solutions offer automatic failover capabilities, monitoring, replication, and other useful administrative tasks. Some of the prominent open source solutions include:

PostgreSQL Automatic Failover by ClusterLabs

Replication Manager for PostgreSQL Clusters by repmgr (2ndQuadrant)

Patroni by Zalando

Each of these tools provides their own way of managing the clusters. In our three-part series of posts on high availability for PostgreSQL, we’ll share an overview, the prerequisites, and the working and test results for each of these three tools. Here in Part 1, we’ll deep dive into the PostgreSQL Automatic Failover (PAF) solution by ClusterLabs.

PostgreSQL Automatic Failover

PostgreSQL Automatic Failover (PAF) is a high availability management solution for PostgreSQL by ClusterLabs. PAF makes use of the popular, industry-standard Pacemaker and Corosync stack. With Pacemaker and Corosync together, you’ll be able to detect failures in the system and act accordingly.

Pacemaker is capable of managing many resources, and ...

Read More on Datafloq
Top 5 Benefits of Shared MongoDB Hosting

Top 5 Benefits of Shared MongoDB Hosting

Shared MongoDB hosting is one of the most cost-effective and easy-to-setup options for deploying MongoDB in the cloud, and is used by thousands of companies around the world to host their databases. In this post, we outline the top five benefits of using shared MongoDB hosting to help you decide whether it’s the right for your business.

Shared MongoDB hosting plans are typically best-suited for startups up to medium-sized businesses who need to move fast, develop their customer scenarios, or host a development or testing environment for their application. The most important thing to look for is a shared hosting solution for MongoDB that is fully managed so you have the necessary expertise on-hand to help you monitor, backup, and troubleshoot your database operations. Otherwise, it can significantly impact the security or stability of their application, and consequently, the longevity of your business. This also puts you and your team in a position to focus on building out your application, not getting bogged down by unforeseen database issues.

MongoDB Hosting Configurations for Shared Clusters

Each MongoDB process is run in a separate Docker container, and the amount of RAM allocated to each container is 1/10th of the disk size or storage you use. The minimum supported size is 2GB of storage (200MB RAM), ...

Read More on Datafloq
The Future of the Application Stack – Kubernetes, PaaS & DBaaS

The Future of the Application Stack – Kubernetes, PaaS & DBaaS

Containers are eating the world. If you have built and deployed an application in production over the last few years, the odds are that you have deployed your code in containers. You might have created and deployed individual containers (Docker, Linux LXC, etc.) directly in the beginning, but quickly switched over to a container orchestration technology like Kubernetes (K8s) or Swarm when you needed to coordinate multi-node deployments and high availability (HA). In this container-driven world, what will the future of the application stack look like? Let’s start with what we need from this “future� application stack.

What Do We Need From This Future Application Stack?

Cloud Agnostic

We want to be cloud agnostic with the ability to deploy to any cloud of our choice. Ideally, we can even mix in various providers in a single deployment.


We need to be able to run our application stack on-premise with our own custom hardware, private cloud, and internally managed datacenters.

Language Agnostic

It almost goes with saying, but I’ll add it in for completeness. The future open stack needs to support all of the popular programming languages.

The Future Application Stack

The future application stack will be composed of a triad of technologies – K8s, Platform-as-a-Service (PaaS), and Database-as-a-Service (DBaaS):


Kubernetes is a portable, extensible open-source platform for managing containerized workloads ...

Read More on Datafloq
MongoDB Acquires mLab — What Are The Different MongoDB Hosting Alternatives?

MongoDB Acquires mLab — What Are The Different MongoDB Hosting Alternatives?

If you’re an mLab customer, you’ve likely heard the news that they’ve been acquired by MongoDB and your clusters are going to be migrated to MongoDB Atlas sometime in the next 12 months. While some are excited, others indifferent, and a few are just waiting to see how it pans out, there are a good number of users who are concerned about what this means for their MongoDB deployments.

If you’ve been monitoring this announcement on Twitter or the various developer forums, you’ve likely seen many wondering what other alternatives there are for MongoDB hosting. Well, you’re in luck. There are many different options out there, and you may be pleasantly surprised to find that you could be better off with a different MongoDB DBaaS solution. Let’s take a look at the different MongoDB hosting alternatives for mLab and MongoDB Atlas.

Compare MongoDB DBaaS Providers

So what else is available other than MongoDB Atlas and mLab? The top 3 other MongoDB DBaaS providers include ScaleGrid, Compose and ObjectRocket — all of whom provide MongoDB hosting, management, monitoring, and free support. Which one is right for you? This, of course, depends on what you’re looking for out of a MongoDB-as-a-Service solution. Check out this MongoDB Hosting Provider Comparison to ...

Read More on Datafloq
Latest PostgreSQL Trends: Most Time-Consuming Tasks & Important Metrics to Track

Latest PostgreSQL Trends: Most Time-Consuming Tasks & Important Metrics to Track

PostgreSQL, the fourth most popular database and DBMS of the Year in 2017, has exploded in popularity amongst the development and database communities across the world. Stealing market share from leaders Oracle, MySQL, and Microsoft SQL Server, PostgreSQL hosting is also highly leveraged by new businesses in exciting spaces like IoT, e-commerce, SaaS, analytics, and more. Read the Latest PostgreSQL Trends report.

So What’s Trending in PostgreSQL Management?

We attended PostgresOpen in San Francisco last month to uncover the latest trends from the experts themselves.

Most Time-Consuming PostgreSQL Management Tasks

So, what’s eating up your time on the PostgreSQL management front? While there are thousands of tasks involved with managing your PostgreSQL production deployments, managing queries was the strong leader with over 30% from respondents.

Managing space was a distant second with 15% of PostgreSQL users finding it their most difficult task, followed by replication, upgrades, and monitoring. 23% of PostgreSQL users fell into the “All others� category, consisting of tasks like patching, recoveries, partitioning, and migrations.

Managing PostgreSQL Queries Breakdown

With the far lead on managing PostgreSQL queries, we dove deeper to see what specific tasks were consuming their time. The results spread across the entire process of managing queries, from structuring at setup to optimizing after analysis.

To explain this further, ...

Read More on Datafloq
How To Choose The Best MongoDB Hosting For Your Business

How To Choose The Best MongoDB Hosting For Your Business

So you’re planning a MongoDB database for your business (or you already have one), and you’re hunting for hosting. What are you looking for, exactly? How will you know when you’ve found the right host? Choose poorly, and you’ll sabotage your future efforts in a big way, but choose smartly, and you’ll have a supportive partner to help you grow.

Well, have no fear — in this piece, we’re going to review what makes MongoDB different, explain what makes hosting so important, cover the specific MongoDB hosting elements you need to investigate and give you a clear indication of how to proceed. Let’s get started - how to choose the best MongoDB hosting for your business.

What makes MongoDB different?

There are many different viable approaches to data storage and management in the business world, but the most common option for holding and sorting data is some form of relational database. A relational database links fields together by primary and secondary keys, avoiding redundancies and maintaining a rigid structure — something that works well in many cases, but certainly not all of them.

MongoDB, however, is a non-relational database. Through storing data in flexible files associated with objects (instead of tables or fields), it’s massively stronger for ...

Read More on Datafloq
Blockchain: What Is It, How It Works, And What It Means For Big Data

Blockchain: What Is It, How It Works, And What It Means For Big Data

In this new digital transformation era, blockchain comes hand-in-hand as one of the fastest growing technologies to help secure and protect data through cryptography. Learn more about blockchain and what it means for big data.

What is Blockchain?

Blockchain is a secure, shared, decentralized, distributed, immutable database that maintains a continuously growing list of records called blocks. Seebacher & Schüritz further describes the distributed database as a “shared among and agreed upon a peer-to-peer network. It consists of a linked sequence of blocks (a storage unit of a transaction), holding timestamped transactions that are secured by public-key cryptography (i.e., “hash�) and verified by the network community. Once an element is appended to the blockchain, it cannot be altered, turning a blockchain into an immutable record of past activity.�

How Does Blockchain Work?

At the core of the blockchain technology is a distributed ledger with groups of transactions collected into a block. The block is validated by a third party (miner) and is locked. Each participant in the global network keeps a copy of the ledger and every time a new block is created, it is broadcasted to all the participants that add it to their local copy of the ledger. The process of “hashing� transforms assets which are ...

Read More on Datafloq
Getting Started with PostgreSQL Streaming Replication

Getting Started with PostgreSQL Streaming Replication

In this blog post, we dive into the nuts and bolts of setting up Streaming Replication (SR) in PostgreSQL. Streaming replication is the fundamental building block for achieving high availability in your PostgreSQL hosting, and is produced by running a master-slave configuration.

Read the original: Getting Started with PostgreSQL Streaming Replication

Master-Slave Terminology

Master/Primary Server

The server that can take writes.
Also called read/write server.

Slave/Standby Server

A server where the data is kept in sync with the master continuously.
Also called backup server or replica.
A warm standby server is one that cannot be connected to until it is promoted to become a master server.
In contrast, a hot standby server can accept connections and serves read-only queries. For the rest of this discussion, we will be focusing only on hot standby servers.

Data is written to the master server and propagated to the slave servers. In case there are an issue with the existing master server, one of the slave servers will take over and continue to take writes ensuring availability of the system.

WAL Shipping-Based Replication

What is WAL?

WAL stands for Write-Ahead Logging.
It is a log file where all the modifications to the database are written before they’re applied/written to data files.
WAL is used for recovery after a database crash, ensuring data integrity.
WAL is used in ...

Read More on Datafloq
Digital Transformation: It All Starts With Data Thinking

Digital Transformation: It All Starts With Data Thinking

Every few years, there’s a paradigm shift in technology patterns and digital levers. Recently, it is the age of digital disruption caused by a fundamental need for organizations to digitally transform to stay in the game. New technology platforms and services such as the Internet of Things (IoT), Artificial Intelligence, Robotics Process Automation, Machine Learning, and Blockchain are already paying dividends, enriching the digital transformation journey, and have created the new rock star: data. Even new roles such as Chief Data Officer, Chief Digital Officer, and the like, have cropped up to harness the power of data.

Nowadays, every organizational decision around a digital transformation strategy is driven by data. Whether it’s to optimize inventory stock levels, reduce lead times from suppliers, or design the pricing and promotions strategy for a customer segment, all decisions require data to understand what can be improved to gain a competitive edge for the organization. As businesses are digitizing, digitalizing, and digitally transforming (yes, there’s a difference) at breakneck speed, new businesses and business models evolve, and the lines between business processes and technology blur, one element remains a constant denominator. You guessed it, data.

Just as organizational decision-making has evolved, data has undergone significant shape-shifting as well. It has multiplied, exploded and become ...

Read More on Datafloq
MongoDB Ruby Driver 2.5.x Case-Sensitivity Issues with Hostnames on Replica Sets

MongoDB Ruby Driver 2.5.x Case-Sensitivity Issues with Hostnames on Replica Sets

Having trouble connecting to MongoDB replica sets after upgrading the MongoDB Ruby driver to 2.5.x? We've recently received a few inquiries about this issue with the latest MongoDB Ruby driver version and wrote this post to share our findings on the problem and cause.

The error message that was encountered on connection attempt was -

No server is available matching preference: #<Mongo::ServerSelector::Primary:...>

The issue had been already reported to MongoDB and is being tracked here. We spent some time to investigate this issue and the root cause introduced in the driver code in 2.5.x.

MongoDB Ruby Driver 2.5.x Issue Summary

The issue exists in the 2.5.x version(s) of the MongoDB Ruby driver and is encountered when the hostnames comprising of the replica set contain case-sensitive characters, for example, Possible workarounds are:

 Downgrade to 2.4.x or upgrade 2.6.x once it's made available.
Change the hostnames of all the members of the replica sets to downcase. For example, change the hostname in the example above to

Details On The Issue

Enabling detailed logging on Ruby provided a clue on what was happening:

#19140] DEBUG -- : MONGODB | Topology type 'replica set' initializing.
#19140] DEBUG -- : MONGODB | Server initializing.
#19140] DEBUG -- : MONGODB | Server description for changed from 'unknown' to 'unknown'.
#19140] DEBUG -- : MONGODB ...

Read More on Datafloq
The Top 6 Free Redis Memory Analysis Tools

The Top 6 Free Redis Memory Analysis Tools

When it comes to analyzing the memory usage of a Redis instance, there are lots of free and open-source tools in the market, along with a smattering of paid products. Some of the most popular ones are Jacks (of all trades fame), but if you’re looking for a deeper analysis of your memory problems, you might be better off with one of the more targeted, and lesser-known tools.

In this post, we've compiled a list of the top 6 free tools we found most useful in analyzing memory usage of our Redis instances:

Redis Memory Analyzer (RMA)
Redis Sampler
RDB Tools
Redis Toolkit

Read the full post: The Top 6 Free Redis Memory Analysis Tools

1) Redis Memory Analyzer

Redis Memory Analyzer (RMA) is one of the most comprehensive FOSS memory analyzers available for Redis. It supports three different levels of details:

Global - Overview of memory usage information.
Scanner - Highest level keyspace/prefix level memory usage information - in other words, the shortest common prefix is used.
RAM - Lowest level keyspace/prefix - in other words, the longest common prefix is used.

Each mode has its own uses- you can get further details in the RMA ReadMe.

RMA - Global Mode

In the global mode, RMA provides some high-level statistics, like the number of keys, ...

Read More on Datafloq

Privacy Policy

Copyright © 2019 BBBT - All Rights Reserved
Powered by WordPress & Atahualpa