Morpheus Blog

TL;DR: Finding the causes of slow queries and other operations in a database begins with knowing where to find and how to collect the performance data you need to analyze in order to find a solution. The metrics built into MySQL and other modern databases and development tools allow you to zero in on the source of system slowdowns using simple counters and basic mathematical operations.

If a database's poor performance has you scratching your head, start your troubleshooting by understanding the causes of system slow-downs, and by avoiding the most common performance-metrics mistakes.

Measuring response time in any query-processing system depends on Little's law, which is key to recording and reporting response times in multithreading systems of all types. In an April 2011 post on Percona's MySQL Performance blog, Baron Schwartz explains how Little's law applies to MySQL. Schwartz uses the example of creating two counters to measure a system's busy time, and then adding a third counter to measure weighted busy time, or times during which more than one query is processing simultaneously.

The busy-time example highlights the four fundamental metrics: observation interval; number of queries per interval; total active-query time (busy time); and total execution time of all queries (weighted busy time). Combining these with Little's law creates four long-term average metrics: throughput; execution time; concurrency; and utilization.

With these metrics, you can use the Universal Scalability Law to model scalability. The metrics can also be used for queueing analysis and capacity planning -- all with simple addition, subtraction, multiplication, and division of numbers collected by the counters.

A MySQL performance-monitor primer

The MySQL Reference Manual explains how to use the Benchmark() function to measure the speed of a specific expression or function. The approximate time required for the statement to execute is displayed below the return value, which in this case will always be zero.

MySQL's Benchmark() function can be used to display a statement's approximate execution time. Source: MySQL Manual

The MySQL Performance Schema are intended for monitoring MySQL Server execution at a low level by inspecting the internal execution of the server at runtime. All server events are monitored; this includes anything that takes time and that is designed to allow timing information to be collected. For example, by examining the events_waits_current table in the performance_schema database, you get a snapshot of what the server is doing right now.

The events_waits_current table can be examined in the performance_schema database to show the server's current activity. Source: MySQL Manual

The MySQL Reference Manual provides a Performance Schema Quick-Start Guide and a section on Using the Performance Schema to Diagnose Problems.

Ensuring the accuracy of your metrics data

Any troubleshooting approach relies first and foremost on the accuracy of the performance data being collected. Azul Systems CTO Gil Tene explained at an Oracle OpenWorld 2013 conference session that how you measure latency in a system is as important as what you measure. TheServerSide.com's Maxine Giza reports on Tene's OpenWorld presentation in a September 2013 post.

For example, response time may report a single instance rather than a value aggregated over time to represent both peak and low demand. Conversely, the database's metrics may not record or report critical latencies, and may depend instead on mean and standard-deviation measures. This causes you to lose the data that's most useful in addressing the latency.

Another mistake many DBAs make, according to Tene, is to run load-generator tests in ways that don't represent real-world conditions, so important results are omitted. Tene states that the coordinated omission in results is "significant" and leads to bad reporting. He recommends that organizations establish latency-behavior requirements and pass/fail criteria to avoid wasting time and resources.

The monitoring tools on the Morpheus database-as-a-service (DBaaS) let you troubleshoot heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases through a single console using a fast and simple point-and-click interface. Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. A free full replica set of every database instance and automated daily backups ensure that your data is available when you need it.

Morpheus supports a range of tools for connecting, configuring, and managing your databases. Visit the Morpheus site to create a free account.

TL;DR: IT managers have grown accustomed to the industry's continual reinvention as new technologies have their day in the sun. Organizations concerned about being left in the competition's dust as the big-data wave sweeps across the IT landscape can relax. Considered from a business perspective, big-data analysis bears a striking resemblance to tried-and-true methods. Still, realizing the benefits of big data requires seeing your company's data in a less-structured and more-flexible light.

Technology cycles have been around for so long that they're predictable. First, the breakthrough is hyped as the cure for everything from the common cold to global warming. As reality sets in, the naysayers come to the fore, highlighting all the problems and downplaying any potential benefits. Finally, the new technology is integrated into existing systems and life goes on -- until the next cycle.

At the moment, big data is positioned somewhere in the hype cycle between cure-all and hogwash. As big-data principles find their way into established information-technology models and processes, it becomes easier to separate the technology's benefits to organizations from its potential pitfalls.

A Gartner survey released in September 2014 indicates that 73 percent of organizations have invested in big data or plan to do so in the next 24 months. Source: Gartner

Topping the list of big-data misconceptions is what exactly constitutes "big data." In an October 16, 2014, post on the Digital Book World site, Marcello Vena lists the three defining characteristics of big data:

It is "big" as in "unstructured," not as in "lots of it." If a standard RDBMS can be used to analyze it, it's not "big data."
Big data does entail high volume -- up to and beyond exabyte level, which is the equivalent of 1,000 petabytes, 1 million terabytes, or 1 billion gigabytes. Entirely new processes are required to capture, ingest, curate, analyze, model, and visualize such data stores. These large data repositories also present unique storage and transfer challenges in terms of security, reliability, and accessibility.
Analyzing huge data stores to extract business intelligence requires scaling the data initially to identify the specific data sets that relate to the particular problem/opportunity being analyzed. The challenge is to scale the data without losing the extra information created by seeing the individual data drops in relation to the ocean they are a part of.

With big data, quality is more important than ever

Conventional data systems characterize data elements at the point of collection, but big data requires the ability to apply data attributes in the context of a specific analysis at the time it occurs. The same data element could be represented very differently by various analyses, and sometimes by the same analysis at a different time.

Big data doesn't realize its value to organizations until the speed and analysis abilities are joined by business applications. Source: O'Reilly Radar

Big data doesn't make data warehouses obsolete. Data warehouses ensure the integrity and quality of the data being analyzed. They also provide context for the data, which makes most analysis more efficient. ZDNet's Toby Wolpe explains in a September 26, 2014, article that big-data analysis tools are still playing catchup with their data-warehouse counterparts.

High volume can't replace high quality. In an October 8, 2014, article, TechTarget's Nicole Laskowski uses the example of geolocation data collected by cell phones. In theory, you could use this data to plot the movements of millions of people. In fact, we often travel without taking our cell phones, or the phones may be turned off or have location tracking disabled.

Focus on the business need you're attempting to address rather than on the data you intend to analyze or the tools you will use for the analysis. The shift to big data is evolutionary rather than revolutionary. Off-the-shelf data-analysis tools have a proven track record and can generally be trusted to clean up unreliable data stores regardless of size.

An example of a technology that bridges the gap between the new and old approaches to database management is the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's straightforward interface lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch databases with just a few clicks.

Morpheus supports a range of tools for connecting to, deploying, and monitoring your databases. A free full replica set and automatic daily backups are provided for each database instance. Visit the Morpheus site to create a free account.

TL;DR: When storing large lists in MongoDB, a common thought is to place items into an array. In most situations, arrays take up less disk space that objects with keys and values. However, the way MongoDB handles constantly growing arrays can cause performance problems over time.

Data Storage Decisions

When storing data, you may decide to store information that is regularly updated with new information. For example, you may store every post a user makes to an online forum. One way to do this would be to include an array in a document to store the content of each post, as in the following example:

An example MongoDB document with an array.

In most cases, this would seem like an excellent way of storing the data. In programming, arrays are often a very efficient means of storing related values—they tend to be lightning fast for both data storage and retrieval.

How MongoDB Handles Growing Arrays

In MongoDB, arrays work a little differently than they do in a programming language. There are several key points to consider when using arrays for storage in MongoDB:

Expansion: An array that expands often will also increase the size of its containing document. Rather than being rewritten, the document will instead be moved on the disk. In MongoDB, this type of movement tends to be slower, because it requires every index to also be updated.
Indexing: If an array field in MongoDB is indexed, then one document within the collection will be responsible for a distinct entry in that index for every single element it the array. This means that the indexing work required to insert or delete an indexed array is going to be like indexing the same number of documents as the number of elements within the array—a lot of additional work for the database.
BSON Format: Finding one or more elements at the end of a large array can take quite a long time, because the BSON data format used a linear memory scan to manipulate documents.

Addressing the Array Issues

One suggestion for alleviating these performance issues is to model the data differently, so that you do not simply have an ever-growing single array. An example of this is to use nested subdocuments for data storage like the following example shows:

Nested subdocuments used for data storage. Source: MongoSoup

This method improves performance by dramatically decreasing the amount of MongoDB storage space needed for the data, as shown in the following comparison:

Storage space required for several data models. Option 1 - plain array, Option 2: array with documents, Option 3: Document with subdocuments. Source: MongoSoup

As you can see, the storage space for the nested subdocuments (Option 3) was far less than the single array (Option 1).

Where to Get MongoDB

MongoDB is a great database for applications with large amounts of data that needs to be queried quickly. One way to set up a MongoDB database easily is to have it hosted remotely as a cloud service.

Using Morpheus, you can get MongoDB (and several other databases) as a service in the cloud. It runs on a high performance infrastructure with Solid State Drives, and also has an easy setup as well as automatic backups and replication. Why not open a free account today?

In most cases, MySQL's optimizer chooses the fastest index option for queries automatically, but now and then it may hit a snag that slows your database queries to a crawl. You can use one of the three index hints -- USE INDEX, IGNORE INDEX, or FORCE INDEX -- to specify which indexes the optimizer uses or doesn't use. However, there are many limitations to using the hints, and most query-processing problems can be resolved by making things simpler rather than by making them more complicated.

The right index makes all the difference in the performance of a database server. Indexes let your queries focus on the rows that matter, and they allow you to set your preferred search order. Covering indexes (also called index-only queries) speed things up by responding to database queries without having to access data in the tables themselves.

Unfortunately, MySQL's optimizer doesn't always choose the most-efficient query plan. As the MySQL Manual explains, you view the optimizer's statement execution plan by preceding the SELECT statement with the keyword EXPLAIN. When this occurs, you can use index hints to specify which index to use for the query.

The three syntax options for hints are USE INDEX, IGNORE INDEX, and FORCE INDEX: The first instructs MySQL to use only the index listed; the second prevents MySQL from using the indexes listed; and the third has the same effect as the first option, but with the added limitation that table scans occur only when none of the given indexes can be used.

MySQL's index_hint syntax lets you specify the index to be used to process a particular query. Source: MySQL Reference Manual

Why use FORCE INDEX at all? There may be times when you want to keep table scans to an absolute minimum. Any database is likely to field some queries that can't be satisfied without having to access some data residing only in the table, and outside any index.

To modify the index hint, apply a FOR clause: FOR JOIN, FOR ORDER BY, or FOR GROUP BY. The first applies hints only when MySQL is choosing how to find table rows or process joins; while the second and third apply the hints only when sorting or grouping rows, respectively. Note that whenever a covering index is used to access the table, the optimizer ignores attempts by the ORDER BY and GROUP BY modifiers to have it ignore the covering index.

Are you sure you need that index hint?

Once you get the hang of using index hints to improve query performance, you may be tempted to overuse the technique. In a Stack Overflow post from July 2013, a developer wasn't able to get MySQL to list his preferred index when he ran EXPLAIN. He was looking for a way to force MySQL to use that specific index for performance tests.

In this case, it was posited that no index hint was needed. Instead, he could just change the order of the specified indexes so that the left-most column in the index is used for row restriction. (While this approach is preferred in most situations, the particulars of the database in question made this solution impractical.)

SQL performance guru Markus Winand classifies optimizer hints as either restricting or supporting. Restricting hints Winand uses only reluctantly because they create potential points of failure in the future: a new index could be added that the optimizer can't access, or an object name used as a parameter could be changed at some point.

Supporting hints add some useful information that make the optimizer run better, but Winand claims such hints are rare. For example, the query may be asking for only a specified number of rows in a result, so the optimizer can choose an index that would be impractical to run on all rows.

Troubleshooting the performance of your databases doesn't get simpler than using the point-and-click interface of the Morpheus database-as-a-service (DBaaS). You can provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases via Morpheus's single dashboard. Each database instance includes a free full replica set and daily backups.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service lets you use a range of tools for monitoring and optimizing your databases. Visit the Morpheus site to create a free account.

For some applications, the MySQL query cache can be a handy way to improve the performance of your queries. On the other hand, the query cache can become problematic for performance when certain situations arise. To understand how to troubleshoot these issues, you will first want to see what the MySQL query cache does.

What is the MySQL Query Cache?

The MySQL query cache will cache a parsed query and its entire result set. It is excellent when you have numerous small queries that in turn return small data sets, as the query cache will allow the return results to be available immediately, rather than rerunning the query each time it occurs.

Gathering information about the query cache

To see if the query cache is indeed on, and that it has a non-zero size, you can issue a quick command which will return a result set with this information. This is shown in the example below:

Getting information about the query cache. Source: Database Journal

If the cache is set to off or has no size, then it is not working at all. Setting the query_cache_type to “ON” and the query_cache_size to a number other than zero should get it started.

Some common issues that can cause performance problems are that the query cache size is too small, the query cache is fragmented, and that the query cache has grown too large. Each of these can be problematic in its own way, so being able to troubleshoot these issues can help you keep your database running smoothly.

Query cache size is too small

Ideally, the query cache will have a hit ratio near 100%. This ratio is determined by taking the number of query cache hits and dividing that number by the sum of query cache hits and com select. These values can be obtained by issuing a couple of commands, as shown below:

Getting information for the query cache hit ratio. Source: Database Journal

If the ratio is off, you can increase the query cache size and rerun the numbers. When the ratio is correct, then the query cache size is large enough for the cached queries.

Query cache is fragmented

If a cached query is larger than the result set of the query, then the difference is empty memory that cannot be used by other cached queries. This creates fragmentation. To fix this, you likely just need to adjust your query_cache_min_res_unit value to fix the issue.

Query cache has grown too large

If you are no longer using small queries or are no longer returning small result sets, then having the query cache enabled can actually become counterproductive. Instead of providing quick results, it can create a slowdown. In such cases, you will want to disable the query cache to restore performance.

Getting a MySQL Database

To get started with your own MySQL database, you can use a service such as Morpheus, which offers databases such as MySQL, MongoDB, and others as a service on the cloud. In addition, all databases are automatically backed up, replicated, and archived, and are deployed on a high performance infrastructure with Solid State Drives. Open a free account today!

Companies large and small are taking a fresh look at their data archives, particularly how to convert them into active archives that deliver business intelligence while simultaneously reducing infrastructure costs. A new approach combines tape-to-NAS, or tNAS, with cloud storage to take advantage of tape's write speed and local availability, and also the cloud's cost savings, efficiency, and reliability.

Archival storage has long been the ugly step-sister of information management. You create data archives because you have to, whether to comply with government regulations or as your backup of last resort. About the only time you would need to access an archive is in response to an emergency.

Data archives were perceived as both a time sink for the people who have to create and manage the old data, and as a hardware expense because you have to pay for all those tape drives (usually) or disk drives (occasionally). Throw in the need to maintain a remote location to store the archive and you've got a major money hole in your IT department's budget.

This way of looking at your company's data archive went out with baggy jeans and flip phones. Today's active archives bear little resemblance to the dusty racks of tapes tucked into even-dustier closets of some backwater remote facility.

The two primary factors driving the adoption of active archiving are the need to extract useful business intelligence from the archives (thus treating the archive as a valuable resource); and the need to reduce storage costs generally and hardware purchases specifically.

Advances in tape-storage technology, such as Linear Tape Open (LTO) generations 6, 7, and beyond, promise to extend tape's lifespan, as IT Jungle's Alex Woodie explains in a September 15, 2014, article. However, companies are increasingly using a mix of tape, disk (solid state and magnetic), and cloud storage to create their active archives.

Tape as a frontend to your cloud-based active archive

Before your company trusts its archive to cloud storage, you have to consider worst-case scenarios: What if you can't access your data? What if uploads and downloads are too slow? What if the storage provider goes out of business or otherwise places your data at risk?

To address these and other possibilities, Simon Watkins of the Active Archive Alliance proposes using tape-to-NAS (tNAS) as a frontend to a cloud-based active archive. In a December 1, 2014, article on the Alliance's blog, Watkins describes a tape-library tNAS that runs NAS gateway software and stores data in the Linear Tape File System (LTFS) format.

The tNAS approach addresses bandwidth congestion by configuring the cloud as a tNAS tier: data is written quicker to tape, and subsequently transferred to the cloud archive when bandwidth is available. Similarly, you always have an up-to-date copy of your data to use should the cloud archive become unavailable for any reason. This also facilitates transferring your archive to another cloud service.

A white paper published by HP in October 2014 presents a tNAS architecture that is able to replicate the archive concurrently to both tape and cloud storage. The simultaneous cloud/tape replication can be configured as a mirror or as tiers.

This tNAS design combines tape and cloud archival storage and supports concurrent replication. Source: HP

To mirror the tape and cloud replication, place both the cloud and tape volumes behind the cache, designating one primary and the other secondary. Data is sent from the cache to both volumes either at a threshold you set or when the cache becomes full.

Tape-cloud tiering takes advantage of tape's fast write speeds and is best when performance is paramount. In this model, tape is always the primary archive, and users are denied access to the cloud archive.

With the Morpheus database-as-a-service (DBaaS) replication of your MySQL, MongoDB, Redis, and ElasticSearch databases is automatic -- and free. Morpheus is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

Morpheus lets you monitor all your databases from a single dashboard. The service's SSD-backed infrastructure ensures high availability and reliability. Visit the Morpheus site to create a free account.

The necessity of having a rock-solid disaster-recovery plan in place has been made abundantly clear by recent high-profile data breaches. Advances in cloud-based DR allow organizations of all sizes to ensure they'll be up and running quickly after whatever disaster may happen their way.

It just got a lot easier to convince senior management at your company that they should allocate some funds for implementation of an iron-clad disaster-recovery program. That may be one of the few silver linings of the data breach that now threatens to bring down Sony Pictures Entertainment.

It has always been a challenge for IT managers to make a business case for disaster-recovery spending. Computing UK's Mark Worts explains in a December 1, 2014, article that because DR is all about mitigating risks, senior executives strive to reduce upfront costs and long-term contracts. Cloud-based DR addresses both of these concerns by being inexpensive to implement, and by allowing companies to pay for only the resources they require right here, right now.

Small and midsized businesses, and departments within enterprises are in the best position to benefit from cloud-based DR, according to Worts. Because of their complex, distributed infrastructures, it can be challenging for enterprises to realize reasonable recovery time objectives (RTO) and recovery point objectives (RPO) relying primarily on cloud DR services.

Components of a cloud-based DR configuration

Researchers from the University of Massachusetts and AT&T Labs developed a model for a low-cost cloud-based DR service (PDF) that has the potential to enhance business continuity over existing methods. The model depends on warm standby replicas (standby servers are available but take minutes to get running) rather than hot standby (synchronous replication for immediate availability) or cold standby (standby servers are not available right away, so recovery may take hours or days).

The first challenge is for the system to know when a failure has occurred; transient failures or network segmentation can trigger false alarms, for example. Cloud services can help detect system failures by monitoring across distributed networks. The system must also know when to fall back once the primary system has been restored.

A low-cost cloud-based disaster recovery system configured with three web servers and one database at the primary site. Source: University of Massachusetts

The researchers demonstrate that their RUBiS system offers significant cost savings over use of a colocation facility. For example, only one "small" virtual machine is required to run the DR server in the cloud's replication mode, while colocation DR entails provisioning four "large" servers to run the application during failover.

The cloud-based RUBiS DR solution is much less expensive to operate than a colocation approach for a typical database server implementation. Source: University of Massachusetts

A key cloud-DR advantage: Run your apps remotely

The traditional approaches to disaster recovery usually entail tape storage in some musty, offsite facility. Few organizations can afford the luxury of dual data centers, which duplicate all data and IT operations automatically and offer immediate failover. The modern approach to DR takes advantage of cloud services' ability to replicate instances of virtual machines, as TechTarget's Andrew Reichman describes in a November 2014 article.

By combining compute resources with the stored data, cloud DR services let you run your critical applications in the cloud while your primary facilities are restored. SIOS Technology's Jerry Melnick points out in a December 10, 2014, EnterpriseTech post that business-critical applications such as SQL Server, Oracle, and SAP do not tolerate downtime, data loss, or performance slowdowns.

It's possible to transfer the application failover of locally managed server clusters to their cloud counterparts by using SANless clustering software to synchronize storage in cloud cluster nodes. In such instances, efficient synchronous or asynchronous replication creates virtualized storage with the characteristics of SAN failover software.

Failover protection is a paramount feature of the Morpheus database-as-a-service (DBaaS). Morpheus includes a free full replica set with every database instance you create. The service supports MySQL, MongoDB, Redis, and ElasticSearch databases; it is the first and only DBaaS that works with SQL, NoSQL, and in-memory databases.

With Morpheus's single-click provisioning, you can monitor all your databases via a single dashboard. Automatic daily backups are provided for MySQL and Redis databases, and your data is safely stored on the service's SSD-backed infrastructure. Visit the Morpheus site to create a free account.

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site for pricing information and to create a free account.

Even if you can't prevent all unauthorized access to your organization's networks, you can mitigate the damage -- and prevent most of it -- by using two time-proven, straightforward security techniques: encrypt all data storage and transmissions; and back up your data to the cloud or other off-premises site. Best of all, both security measures can be implemented without relying on all-too-human humans.

People are the weak link in any data-security plan. It turns out we're more fallible than the machines we use. Science fiction scenarios aside, the key to protecting data from attacks such as the one that threatens to put Sony out of business is to rely on machines, not people.

The safest things a company can do are to implement end-to-end encryption, and back up all data wherever it's stored. All connections between you and the outside world need to be encrypted, and all company data stored anywhere -- including on employees' mobile devices -- must be encrypted and backed up automatically.

A combination of encryption and sound backup as cornerstones of a solid business-continuity plan would have saved Sony's bacon. In a December 17, 2014, post on the Vox, Timothy B. Lee writes that large companies generally under-invest in security until disaster strikes. But Sony has been victimized before. In 2011, hackers stole the personal information of millions of members of the Sony PlayStation network.

User authentication: The security hole that defies plugging

Most hackers get into their victim's networks via stolen user IDs and passwords. The 2014 Verizon Data Breach Investigations Report identifies the nine attack patterns that accounted for 93 percent of all network break-ins over the past decade. DarkReading's Kelly Jackson Higgins presents the report's findings in an April 22, 2014, article.

The 2014 Data Breach Investigations Report identifies nine predominant patterns in security breaches over the past decade. Source: Verizon

In two out of three breaches, the crook gained access by entering a user ID and password. The report recorded 1,367 data breaches in 2013, compared to only 621 in 2012. In 422 of the attacks in 2013, stolen credentials were used; 327 were due to data-stealing malware; 245 were from phishing attacks; 223 from RAM scraping; and 165 from backdoor malware.

There's just no way to keep user IDs and passwords out of the hands of data thieves. You have to assume that eventually, crooks will make it through your network defenses. In this case, the only way to protect your data is by encrypting it so that even if it's stolen, it can't be read without the decryption key.

If encryption is such a data-security magic bullet, why haven't organizations been using it for years already? In a June 10, 2014, article on ESET's We Live Security site, Stephen Cobb warns about the high cost of not encrypting your business's data. Concentra had just reached a $1,725,220 settlement with the U.S. government following a HIPAA violation that involved the loss of unencrypted health information.

A 2013 Ponemon Institute survey pegged the average cost of a data breach in the U.S. at $5.4 million. Source: Ponemon Institute/Symantec

Encryption's #1 benefit: Minimizing the human factor

Still, as many as half of all major corporations don't use encryption, according to a survey conducted in 2012 by security firm Kaspersky Labs. The company lists the five greatest benefits of data encryption:

Complete data protection, even in the event of theft
Data is secured on all devices and distributed nodes
Data transmissions are protected
Data integrity is guaranteed
Regulatory compliance is assured

Backups: Where data security starts and ends

Vox's Timothy B. Lee points out in his step-by-step account of the Sony data breach that the company's networks were "down for days" following the November 24, 2014, attack. (In fact, the original network breach likely occurred months earlier, as Wired's Kim Zetter reports in a December 15, 2014, post.)

Any business-continuity plan worth its salt prepares the company to resume network operations within hours or even minutes after a disaster, not days. A key component of your disaster-recovery plan is your recovery time objective. While operating dual data centers is an expensive option, it's also the safest. More practical for most businesses are cloud-based services such as the Morpheus database-as-a-service (DBaaS).

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. When you choose Morpheus to host your MySQL, MongoDB, Redis, and ElasticSearch databases, you get a free full replica with each database instance. Morpheus also provides automatic daily backups of your MySQL and Redis databases.

The Morpheus dashboard lets you provision, deploy, and host your databases and monitor performance using a range of database tools. Visit the Morpheus site to create a free account.

Only by using cloud services will companies be able to offer their employees and managers access to big data, as well as the tools they'll need to analyze the information without being data scientists. A primary advantage of moving data analytics to the cloud is its potential to unleash the creativity of the data users, although a level of data governance is still required.

Data analytics are moving to the edge of the network, starting at the point of collection. That's one result of our applications getting smarter. According to the IDC FutureScape for Big Data and Analytics 2015 Predictions, apps that incorporate machine learning and other advanced or predictive analytics will grow 65 percent faster in 2015 than software without such abilities.

There's only one way to give millions of people affordable access to the volumes of data now being collected in real time, not to mention the easy-to-use tools they'll need to make productive use of the data. And that's via the cloud.

IDC also predicts a shortage of skilled data analysts: by 2018 there will be 181,000 positions requiring deep-analytics skills, and five times that number requiring similar data-interpretation abilities. Another of IDC's trends for 2015 is the booming market for visual data discovery tools, which are projected to grow at 2.5 times the rate of other business-intelligence sectors.

As software gets smarter, more data conditioning and analysis is done automatically, which facilitates analysis by end users. Source: Software Development Times

When you combine smarter software, a shortage of experts, and an increase in easy-to-use analysis tools, you get end users doing their own analyses, with the assistance of intelligent software. If all the pieces click into place, your organization can benefit by tapping into the creativity of its employees and managers.

The yogurt-shop model for data analytics

In a November 19, 2014, article, Forbes' Bill Franks compares DIY data analytics to self-serve yogurt shops. In both cases the value proposition is transferred to the customer: analyzing the data becomes an engaging, rewarding experience, similar to choosing the type and amount of toppings for your cup of frozen yogurt.

More importantly, you can shift to the self-serve model without any big expenses in infrastructure, training, or other costs. You might even find your costs reduced, just as self-serve yogurt shops save on labor and other costs, particularly by tapping into the efficiency and scalability of the cloud.

Employees are more satisfied with their data-analytics roles when their companies used cloud-based big data analytics. Source: Aberdeen Group (via Ricoh)

Last but not least, when you give people direct access to data and offer them tools that let them mash up the data as their creativity dictates, you'll generate valuable combinations you may never have come up with yourself.

Determining the correct level of oversight for DIY data analysts

Considering the value of the company's data, it's understandable that IT managers would hesitate to turn employees loose on the data without some supervision. As Timo Elliott explains in a post from April 2014 on the Business Analytics blog, data governance remains the responsibility of the IT department.

Elliott defines data governance as "stopping people from doing stupid things with data." The concept encompasses security, data currency, and reliability, but it also entails ensuring that information in the organization gets into the hands of the people who need it, when they need it.

You'll see aspects of DIY data analytics in the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. You use a single console to provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch. Every database instance is deployed with a free full replica set, and your MySQL and Redis databases are backed up.

Morpheus supports a range of tools for configuring and managing your databases, which are monitored continuously by the service's staff and advanced bots. Visit the Morpheus site for pricing information and to create a free account.

Basing data-access rights on the attributes of users, data, resources, and environments helps keep your data safe from thieves by preventing nearly all brute-force access attempts. However, applying attribute-based access controls to existing database systems requires careful consideration of potential performance bottlenecks you may be creating.

Attribute-based access controls (ABAC) for databases tend to be as simple or complex as the organization using them. For a 40-person office, it isn't particularly difficult to establish the roles, policies, rules, and relationships you'll use to determine users' data access rights.

For example, to ensure that only managers in the finance department have access to the company's financial data, create Role=Manager and Department=Finance. Then require these attributes in the permissions of any user who requests financial data.

As you can imagine, access controls are rarely that simple. When creating enterprise-wide ABACs, an attribute-management platform and machine-enforced policies may be required. IT Business Edge's Guide to Attribute Based Access Control (ABAC) Definition and Considerations (registration required) outlines the major components of such an attribute-management system:

Enterprise policy development and distribution
Enterprise identity and subject attributes
Subject attribute sharing
Enterprise object attributes
Authentication
Access control mechanism deployment and distribution

The nuts and bolts of policy-based access controls

ABAC policies are established using the eXtensible Access Control Markup Language (XACML). As explained by ABAC vendor Axiomatics, attributes are assigned to subjects, actions, resources, and environments. By evaluating the attributes in conjunction with the rules of your policies, access to the data or resource is allowed or denied.

ABAC applies rules to access requests based on the attributes and policies you establish for subjects, environments, resources, and actions. Source: Axiomatics

Applying an ABAC system to existing RDBMSs can be problematic, as exemplified in a post from March 2014 on the Stack Exchange Information Security forum. In particular, what effect will implementing fine-grain access have on database performance? And can existing Policy Enforcement Points (PEPs) implement XACML?

Axiomatics' David Brossard replies that performance is most affected by the PEP-to-PDP (Policy Decision Point) communication link, and the PDP-to-PIP (Policy Information Point) link. In particular, how you expose the authorization service PDP helps determine performance: if exposed as a SOAP service, you invite SOAP latency, and if exposed via Apache Thrift or another binary protocol, you'll likely realize better performance.

Brossard identifies six areas where ABAC performance can be enhanced:

1. How policies are loaded into memory
2. How policies are evaluated
3. How attribute values are fetched from PIPs
4. How authorization decisions are cached by PEPs
5. How authorization requests are bundled to reduce roundtrips between the PEP and PDP
6. How the PEP and PDP communicate

Six potential performance bottlenecks in a typical ABAC implementation. Source: Stack Exchange Information Security

Database performance monitoring needn't be so complicated, however. With the Morpheus database-as-a-service, you can use a single dashboard to monitor your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. You can invoke a new instance of any SQL, NoSQL, or in-memory database quickly and simply via the Morpheus dashboard.

Your databases are deployed on an SSD-backed infrastructure for fast performance, and direct patches to Amazon EC2 ensure the lowest latency available. Visit the Morpheus site to create a free account.

When installing MySQL, it is a good idea to set some key parameters upon setup to ensure that your database will run smoothly and efficiently. Setting these ahead of time can help you not end up in a situation where you have to update settings after your database has grown substantially and the application is already in production.

What are Parameters?

Parameters are values that are stored in the MySQL configuration file. This file is called my.cnf, and the location will vary from system to system. You will need to check the installation on your system to determine the location of the file.

One possible method of finding the location of the MySQL configuration file. Source: Stack Overflow.

Keep in mind; however, that many parameters can be set temporarily by running the SET GLOBAL or SET SESSION MySQL queries. It is a good idea to do this first (provided the parameter is part of the group of dynamic system variables) to ensure the changes are helpful before changing them in the more static configuration file.

With that in mind, here are some key parameters you can set in your MySQL configuration.

query_cache_size

Due to issues with the query cache actually making things slower in many cases, it is often recommended to disable it by setting this parameter to have a value of 0 (zero).

max_connections

With a default setup, you may often end up getting the Too many connections error, due to this parameter being set too low. However, setting it too high can also become problematic, so you will want to test at different settings to see what works best for your setup and applications.

One possible method of finding the current number of max connections. Source: Stack Overflow.

innodb_buffer_pool_size

If you are using InnoDB, then this is a very important parameter to set, as this buffer pool is where the system will cache indexes and data. A higher setting allows the memory to be used for reading rather than the disks, which will improve performance. The setting will depend on your available RAM; for example, if you have 128GB of RAM, a typical setting would be between 100 and 120GB.

innodb_log_file_size

This log file size determines how large the redo logs will be. Often, this will be set between 512 MB (for common usage) and 4GB (for applications that will do a large number of write operations).

log_bin

If you do not want the MySQL server to be a replication master, then the standard recommendation is to keep this disabled by commenting out all lines that begin with log_bin or expire_log_days in the MySQL configuration file.

Get Your Own MySQL Database

If you do not already have a MySQL database up and running, one way to easily obtain one is to use a service such as Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including MySQL, MongoDB, and others).

In addition, all databases are deployed on a high performance infrastructure with Solid State Drives, and backed up, replicated, and archived. Open a free account today!

Choosing a tool for information search or storage can be a difficult task. Some tools are better at creating relations among data, some excel at quickly accessing large amounts of data, and others make it easier when attempting to search through a vast array of information. Where does ElasticSearch fit into this, and when is it the right tool for your job?

What is ElasticSearch?

Elasticsearch is an open source search and analytics engine (based on Lucene) designed to operate in real time. It was designed to be used in distributed environments by providing flexibility and scalability.

Instead of the typical full-text search setup, ElasticSearch offers ways to extend searching capabilities through the use of APIs and query DSLs. There are clients available so that it can be used with numerous programming languages, such as Ruby, PHP, JavaScript and others.

What are some advantages of ElasticSearch?

ElasticSearch has some notable features that can be helpful to an application:

Distributed approach - Indices can be divided into shards, with each shard able to have any number of replicas. Routing and rebalancing operations are done automatically when new documents are added.

Based on Lucene - Lucene is an open source library for information retrieval that may already be familiar to developers. ElasticSearch makes numerous features of the Lucene library available through its API and JSON.

An example of an index API call. Source: ElasticSearch.

Use of faceting - A faceted search is more robust than a typical text search, allowing users to apply a number of filters on the information and even have a classification system based on the data. This allows better organization of the search results and allows users to better determine what information they need to examine.

Structured search queries - While searches can still be done using a text string, more robust searches can be structured using JSON objects.

An example structured query using JSON. Source: Slant.

When is ElasticSearch the right tool?

If you are seeking a database for saving and retrieving data outside of searching, you may find a NoSQL or relational database a better fit, since they are designed for those types of queries. While ElasticSearch can serve as a NoSQL solution, it lacks , so you will need to be able to handle that limitation.

On the other hand, if you want a solution that is effective at quickly and dynamically searching through large amounts of data, then ElasticSearch is a good solution. If your application will be search-intensive, such as with GitHub, where it is used to search through 2 billion documents from all of its code repositories, then ElasticSearch is an ideal tool for the job.

Get ElasticSearch or a Database

If you want to try out ElasticSearch, one way to do so is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including ElasticSearch, MongoDB, MySQL, and more). In addition, databases are deployed on a high performance infrastructure with Solid State Drives, replicated, and archived.

A function as straightforward as entering dates in a MySQL database should be nearly automatic, but the process is anything but foolproof. MySQL's handling of invalid date entries can leave developers scratching their heads. In particular, the globalization of IT means you're never sure where the server hosting your database will be located -- or relocated. Plan ahead to ensure your database's date entries are as accurate as possible.

DBAs know that if they want their databases to function properly, they have to follow the rules. The first problem is, some "rules" are more like guidelines, allowing a great deal of flexibility in their application. The second problem is, it's not always easy to determine which rules are rigid, and which are more malleable.

An example of a rule with some built-in wiggle room is MySQL's date handling. Database Journal's Rob Gravelle explains in a September 8, 2014, post that MySQL automatically converts numbers and strings into a correct Date whenever you add or update data in a DATE, DATETIME, or TIMESTAMP column. The string has to be in the "yyyy-mm-dd" format, but you can use any punctuation to separate the three date elements, such as "yyyy&mm&dd", or you can skip the separators altogether, as in "yyyymmdd".

So what happens when a Date record has an invalid entry, or no entry at all? MySQL inserts its special zero date of "0000-00-00" and warns you that it has encountered an invalid date, as shown below.

Only the first of the four Date records is valid, so MySQL warns that there is an invalid date after entering the zero date of "0000-00-00". Source: Database Journal

To prevent the zero date from being entered, you can use NO_ZERO_DATE in strict mode, which generates an error whenever an invalid date is entered; or NO_ZERO_IN_DATE mode, which allows no month or day entry when a valid year is entered. Note that both of these modes have been deprecated in MySQL 5.7.4 and rolled into strict SQL mode.

Other options are to enable ALLOW_INVALID_DATES mode, which permits an application to store the year, month, and date in three separate fields, for example, or to enable TRADITIONAL SQL Mode, which acts more like stricter database servers by combining STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, and NO_AUTO_CREATE_USER.

Avoid using DATETIME at all? Not quite

Developer Eli Billauer posits on his personal blog that it is always a mistake to use the MySQL (and SQL) DATETIME column type. He qualifies his initial blanket pronouncement to acknowledge that commenters to the post give examples of instances where use of DATETIME is the best approach.

Billauer points out that many developers use DATETIME to store the time of events, as in this example:

Using the DATETIME and NOW() functions creates problems because you can't be sure of the local server's time, or the user's timezone. Source: Eli Billauer

Because DATETIME relies on the time of the local server, you can't be sure where the web server hosting the app is going to be located. One way around this uncertainty is to apply a SQL function that converts timezones, but this doesn't address such issues as daylight savings time and databases relocated to new servers. (Note that the UTC_TIMESTAMP() function provides the UTC time.)

There are several ways to get around these limitations, one of which is to use "UNIX time," as in "UNIX_TIMESTAMP(thedate)." This is also referred to as "seconds since the Epoch." Alternatively, you can store the integer itself in the database; Billauer explains how to obtain Epoch time in Perl, PHP, Python, C, and Javascript.

Troubleshooting and monitoring the performance of your MySQL, MongoDB, Redis, and ElasticSearch databases is a piece of cake when you use the Morpheus database-as-a-service (DBaaS). Morpheus provides a single, easy-to-use dashboard. In addition to a free full replica set of each database instance, you get backups of your MySQL and Redis databases.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's SSD-backed infrastructure ensures peak performance, and direct links to EC2 guarantee ultra-low latency. Visit the Morpheus site to create a free account.

The general consensus of the experts is that Node.js will continue to play an important role in web app development despite the impending release of the io.js forked version. Still, some developers have decided to switch to the Go programming language and other alternatives, which they consider better suited to large, distributed web apps.

The developer community appears to be tiring of the constant churn in platforms and toolkits. Jimmy Breck-McKye points out in a December 1, 2014, post on his Lazy Programmer blog that it has been only two years since the arrival of Node.js, the JavaScript framework for developing server-side apps quickly and simply.

Soon Node.js was followed by Backbone.js/Grunt, Require.js/Handlebars, and most recently, Angular, Gulp, and Browserify. How is a programmer expected to invest in any single set of development tools when the tools are likely to be eclipsed before the developer can finish learning them?

Node.js still has plenty of supporters, despite the recent forking of the product with the release of io.js by a group of former Node contributors. In a December 29, 2014, post on the LinkedIn Pulse blog, Kurt Cagle identifies Node as one of the Ten Trends in Data Science for 2015. Cagle nearly gushes over the framework, calling it "the nucleus of a new stack that is likely going to relegate Ruby and Python to has-been languages." Node could even supplant PHP someday, according to Cagle.

The internal thread architecture of Node.js handles incoming requests to the http server similar to SQL requests. Source: Stack Overflow

Taking the opposite view is Shiju Varghese, who writes in an August 31, 2014, post on his Medium blog that after years of developing with Node, he has switched to using Go for Web development and as a " technology ecosystem for building distributed apps." Among Node's shortcomings, according to Varghese, are its error handling, debugging, and usability.

More importantly, Varghese claims Node is a nightmare to maintain for large, distributed apps. For anyone building RESTful apps on Node.js, he recommends the Hapi.js framework created by WalMart. Varghese predicts that the era of using dynamic languages for "performance-critical" web apps is drawing to a close.

The Node.js fork may -- or may not -- be temporary

When io.js was released in late November 2014, developers feared they would be forced to choose between the original version of the open-source framework supported by Joyent, and the new version created by former Node contributors. As ReadWrite's Lauren Orsini describes in a December 10, 2014, article, the folks behind io.js were unhappy with Joyent's management of the framework.

Io.js is intended to have "an open governance model," according to the framework's readme file. It is described as an "evented IO for V8 JavaScript." Node.js and io.js are both server-side frameworks that allow web apps to handle user requests in real time, and the io.js development team reportedly intends to maintain compatibility with the "Node ecosystem."

At present, most corporate developers are taking a wait-and-see approach to the Node rift, according to InfoWorld's Paul Krill. In a December 8, 2014, article, Krill writes that many attendees at Intuit's Node Day conference see the fork as a means of pressuring Joyent to "open up a little bit," as one conference-goer put it. Many expect the two sides to reconcile before long -- and before parallel, incompatible toolsets are released.

Still, the io.js fork is expected to be released in January 2015, according to InfoQ's James Chester in a December 9, 2014, post. Isaac Z. Schluetler, one of the Node contributors backing io.js, insists in an FAQ that the framework is not intended to compete with Node, but rather to improve it.

Regardless of the outcome of the current schism, the outlook for Node developers looks rosy. Indeed.com's recent survey of programmer job postings indicates that the number of openings for Node developers is on the rise, although it still trails jobs for Ruby and Python programmers.

Openings for developers who can work with Node.js are on the rise, according to Indeed.com. Source: FreeCodeCamp

Regardless of your preferred development framework, you can rest assured that your MySQL, MongoDB, Redis, and ElasticSearch databases are accessible when you need them on the Morpheus database-as-a-service (DBaaS). Morpheus supports a range of tools for connecting to, configuring, and managing your databases.

You can provision, deploy, and host all your databases on Morpheus with just a few clicks using the service's dashboard. Visit the Morpheus site for to create a free account!

Until more sophisticated User Role-type controls are added to MySQL, developers will have to use GRANT and REVOKE statements to manage user privileges, or the Administrative Roles options provided in MySQL Workbench. Troubleshooting table-creation glitches in MySQL can be the source of much developer frustration, particularly when trying to assign privileges in a single database.

MySQL is not noted for the ease with which you can determine which users can access which features and functions. As Database Journal's Rob Gravelle explains in a February 13, 2014, article, SQL-type User Role controls were originally anticipated in MySQL 5.0, but Oracle has postponed the feature to MySQL 7.0.

Gravelle describes three tools that add User Roles to MySQL: Google's aptly named google_mysql_tools, the Open Source project SecuRich, and MySQL Workbench, whose Administrative Roles feature is described below. (Note that google_msql_tools are written in Python and thus require the MySQLdb connector.)

The MySQL Reference Manual presents the basics on how to use MySQL's GRANT statements to assign privileges to user accounts, including access to secure connections and server resources. As you might expect, the REVOKE statement is used to revoke privileges. The typical scenario is to create an account using CREATE USER, and then define its privileges and characteristics using GRANT.

The standard method of assigning user privileges in MySQL is to use the GRANT statement. Source: MySQL Reference Manual

Privileges can be granted globally using "ON *.*" syntax, at the database level using "ON db_name.*", at the table level using "ON db_name.tbl_name", and at the column level using the following syntax:

Assign user privileges at the column level in MySQL by enclosing the column or columns within parentheses. Source: MySQL Reference Manual

Other privileges apply to stored routines and proxy users. The "WITH" clause is used to allow one user to grant privileges to other users, to limit the user's access to resources, and to require that a user use secure connections in a specific way.

Assigning Administrative Roles via MySQL Workbench

Applying roles to users in MySQL Workbench is as easy as selecting the user account, choosing the Administrative Roles tab, and checking the boxes, as shown in the image below.

MySQL Workbench's Administrative Roles tab lets you assign user privileges by checking the appropriate boxes. Source: MySQL Reference Manual

Likewise, choose the Schema Privileges tab to assign such privileges as the ability to create temporary tables.

The inability to create tables can be a thorny problem for MySQL developers. A Stack Overflow post from February 2011 highlights several possible solutions to a recalcitrant create-table command. The first proposed solution is to grant all privileges via "GRANT ALL PRIVILEGES ON mydb* TO 'myuser'@'%' WITH GRANT OPTION;". Such a "Super User" account is not recommended for production databases, however, nor for granting privileges on a single database.

Alternatively, you could use the following syntax to limit the privilege to a particular database:

Grant a MySQL user the ability to create tables in a single database by using the "@%" and "@localhost" qualifiers. Source: Stack Overflow

A similar problem encountered when trying to allow MySQL users to create tables is presented in a Stack Exchange post from July 2014. The developer wants the user to be able to create, update, and delete tables, but to be prevented from changing the password or viewing all the records in the database. (The default setting in MySQL allows users to change their own passwords, but only administrators can change other users' passwords.)

Using MySQL Workbench, you can open the Users and Privileges options and create a role that has no administrative privileges but "all object rights" and "DLL rights" for the specific schema. Limiting users to a single schema prevents them from viewing or changing any other table except the information_schema administrative schema.

Much of the pain of managing MySQL, MongoDB, Redis, and ElasticSearch databases is mitigated by using the Morpheus database-as-a-service. Morpheus lets you provision, deploy, and host your databases in just seconds using a simple point-and-click interface, backups are provided for MySQL and Redis databases.

Morpheus is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases. The service lets you use a range of database tools for connecting, configuring, and managing your databases. Visit the Morpheus site for to create a free account.

When searching for the right database to use for a particular application, you have a number of determinations to make. Depending on the structure of your data, how much data you have, how fast queries need to be, and other considerations, a MySQL database may just be the tool that best fits the job at hand.

What is MySQL?

MySQL is a popular open-source relational database management system (RDBMS), which means the database model is a set of relations. The idea is to have a very organized structure with data that is always consistent, preferably with no duplication. This can be achieved by properly normalizing the database.

What are some advantages of MySQL?

Consistent data – A normalized MySQL database is quite reliable when it comes to having accurate data when queried. Since there is no duplicate data stored in another location, any query for a piece of data will return the most current and correct data.

Use of SQL – SQL (Structured Query Language) is a very popular means of writing queries that can add, update, or retrieve stored data. This means that many developers and database administrators will already be familiar with the query syntax that will be needed when working with MySQL.

An example of SQL syntax. Source: 1KeyData.

ACID model – ACID stands for Atomicity, Consistency, Isolation, and Durability. This helps to ensure that all database transactions are reliable. For example, atomicity means that if any part of a database transaction should fail, then the entire transaction fails (even if some parts of it would succeed). This helps prevent the potential problems that can occur if partial transactions are executed.

When is MySQL the right tool?

MySQL can be more difficult to scale than a NoSQL database, so if you have a very large amount of data that will consistently be growing in size, you may want to consider a NoSQL solution, which allows for quick storage and queries with fewer round trips to the database.

On the other hand, MySQL is typically the right tool in situations where you need your data and any transactions dealing with the data to be consistent and reliable. This is certainly true when you are dealing with sensitive data such as financial or confidential information, which needs to be accurate at all times.

Of course, there are also cases where you deal with both big data and sensitive data. In such instances, you can get both MySQL and a NoSQL system to work together to use the best features of each database where they are needed.

An example of MySQL and MongoDB modeling the same data. Source: ScaleBase.

Get a Hosted MySQL Database

If you want to use MySQL for your application, one way to do so is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including MongoDB, ElasticSearch, and more).

In addition to this, all databases are deployed on a high performance infrastructure with Solid State Drives, and are backed up, replicated, and archived. Open a free account today!

Striking the perfect balance between write and query performance in a MongoDB database distributed between clustered servers depends on choosing an appropriate hash-based shard key. Conversely, choosing the wrong key can slow writes and reads to a crawl, in addition to squandering server storage space.

Hash-based sharding was introduced in version 2.4 of MongoDB as a way to allow shards to be distributed efficiently among clustered servers. As the MongoDB manual explains, your choice of shard key -- and the resulting data partitioning -- is a balancing act between write and query performance.

Using a randomized shard key offers many benefits for scaling write operations in a cluster, but random shard keys can cause query performance to suffer because they don't support query isolation, thus mongos must query all or nearly all shards. Step-by-step instructions for creating a hashed index and for sharding a collection using a hashed shard key are provided in the MongoDB manual.

As straightforward as the concept of hash-based sharding appears, implementing the technique on a live MongoDB database can be anything but trouble-free. A post on the MongoDB blog highlights the tradeoffs required to establish the optimal sharding balance for a specific database.

Once you've named the collection to be sharded and the hashed "identifier" field for the documents in the collection, you create the hashed index on the field and then shard the collection using the same field. The post uses as an example a collection named "mydb.webcrawler" and an identifier field named "url".

After naming the collection and the hashed identifier field, you create the field's hashed index. Source: MongoDB Blog

Next, use the same field to shard the collection. Source: MongoDB Blog

While it's best to shard the collection before adding data via pre-splitting, when you shard an existing collection the balancer automatically positions chunks to ensure even distribution of the data. The split and moveChunks functions apply to hash-based shard keys, but use of the "find" mechanism can be problematic because the specifier document is hashed to get the containing chunk. The solution is to use the "bound" parameter when manipulating chunks or entire collections manually.

When hash-based sharding impedes performance

The consequences of choosing the wrong shard key when you hash a MongoDB collection are demonstrated in a Stack Overflow post from September 2013. After sharding a collection by hashed_id, the resulting _id_hashed index was taking up nearly a gigabyte of space. The poster asked whether the index could be deleted because only the _id field is used to query the document.

Hash-based sharding requires a hashed index on the shard key, which is used to determine the shard used for all subsequent queries. In this case, the optimizer is using the _id index because it is unique and generates a more efficient plan, but it still requires the _id_hashed index.

In an October 14, 2014, post on the Wenda.io site, the process of applying a hash-based shard to a particular field is explained. The goal is to allow the application to generate a static hash for the field value so that the hash will always be the same if the value is the same.

When you designate a field in a document as a hash sharded field, a hash value for that field is generated automatically just before the document is read or written to. Outgoing queries are assigned that hash value that is always used for shard targeting. However, this can impact default chunk balancing and depends on selection of an appropriate hash function.

Much of the hassle of managing MongoDB collections -- as well as MySQL, Redis, and ElasticSearch databases -- is eliminated by the simple interface of the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host heterogeneous databases via a single console.

Morpheus is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases. A free full replica set is deployed with each database instance you provision, and your MySQL and Redis databases are backed up. Visit the Morpheus site to create a free account.

[title 1]When is MongoDB the Right Tool for Your Job? [title 2]Is MongoDB the Right Tool for Your Job? [title 3]MongoDB: When Should You Use it? [image 1]http://www.shutterstock.com/pic-152478677/stock-photo-big-data-concept-words-on-group-of-jigsaw-puzzle-pieces.html?src=towrcpyafcqk-vRAe59gkw-1-30 [image 2]http://www.shutterstock.com/pic-147575993/stock-photo-human-resources-social-networking-assessment-center-concept-personal-audit-or-crm-concept.html?src=towrcpyafcqk-vRAe59gkw-1-22 [image 3]http://www.shutterstock.com/pic-114296293/stock-photo-digital-world-control-the-digital-file-in-data-center-room-elements-of-this-image-furnished-by.html?src=towrcpyafcqk-vRAe59gkw-1-68

With many tools available for information storage, sometimes it can be difficult to determine the best one to use for a particular case. Find out when MongoDB may be the right tool for the job.

TL;DR: MongoDB has become quite popular in recent years, but is it the right tool to use for your application? When choosing a database, it is a good idea to pick one that has the features you most need and performs well in your particular situation. This way, you are less likely to be hit with surprises down the road.

What is MongoDB?

MongoDB is a NoSQL, document-oriented database. This means that it does not use SQL (Structured Query Language) for queries, and also does not use the relational tables used in traditional relational databases. Instead, it stores related information in a single document using a JSON-like structure (called BSON).

What are some advantages of MongoDB?

Big Data - Since MongoDB is easily scalable and can search through large amounts of data quickly in most cases, it is a good database to use when you have massive amounts of data. Its scalability helps when you are consistently adding more data to the mix.

BSON - BSON (Binary JSON) is a binary method of storing simple data structures using the same type of format as JSON (JavaScript Object Notation). Given that numerous programmers understand JSON already, using the BSON format for documents makes it easy for programmers to access the needed data.

An example MongoDB query using the BSON format. Source: MongoDB.

Document-Oriented - Unlike relational databases, which need to be normalized to try to eliminate duplicate data, MongoDB stores data in as few documents as possible instead. This means that related data is usually easier to put together and to locate later, making in more user-friendly in that area.

An example MongoDB document. Source: MongoDB.

When is MongoDB the right tool?

While being document-oriented is more user-friendly, the cost is that there will likely be some duplicated data, which is later resolved to the most recent and correct value. With that in mind, a normalized relational database is typically better when you are storing sensitive information (such as personal or financial information).

On the other hand, MongoDB is often a great database when you are dealing with big data and need to be able to make speedy queries on that data. For example, eBay uses MongoDB to store their media metadata, which is quite a large amount of information.

Of course, there are also cases where you deal with both big data and sensitive data. In such instances, you can get both MongoDB and a relational database to work together to use the best features of each database where they are needed.

Get a Hosted MongoDB Database

If you want to use MongoDB for your application, one way to do so is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including MongoDB, MySQL, and more.

In addition to all of this, databases are deployed on a high performance infrastructure with Solid State Drives, and are backed up, replicated, and archived. You learn more by viewing pricing information, or you can even open a free account now!

Since MySQL both sends queries to the server and returns data in text format, the query must be fully parsed and the result set must be converted to a string before being sent to the client. This overhead can cause performance issues, so MySQL implemented a new feature called Prepared Statements when it released version 4.1.

What is a MySQL prepared statement?

A MySQL prepared statement is a method that can be used to pass a query containing one or more placeholders to the MySQL server. Prepared statements make use of the client/server protocol that works between a MySQL client and server, thus allowing it to have a quicker response time that the typical text/parse/conversion exchange.

Here is an example query that demonstrates how a placeholder can be used (this is similar to using a variable in programming):

Example of a MySQL placeholder

This query does not need to be fully parsed, since different values can be used for the placeholder. This provides a performance boost for the query, which is even more pronounced if the query is used numerous times.

In addition to enhanced performance, the placeholder can help you avoid a number of SQL injection vulnerabilities, since you are defining the placeholder rather than having it sent as a text string that can be more easily manipulated.

Using MySQL Prepared Statements

A prepared statement in MySQL is essentially performed using four keywords:

PREPARE - This prepares the statement for execution
SET - Sets a value for the placeholder
EXECUTE - This executes the prepared statement
DEALLOCATE PREPARE - This deallocates the prepared statement from memory.

With that in mind, here is an example of a MySQL prepared statement:

Example of a MySQL placeholder

Notice how the four keywords are used to complete the prepared statement:

The PREPARE statement defines a name for the prepared statement and a query to be run.
The SELECT statement that is prepared will select all of the user data from the users table for the specified user. A question mark is used as a placeholder for the user name, which will be defined next.
A variable named @username is set and is given a value of 'sally_224'. The EXECUTE statement is then used to execute the prepared statement using the value in the placeholder variable.
To end everything and ensure the statement is deallocated from memory, the DEALLOCATE PREPARE statement is used with the name of the prepared statement that is to be deallocated (statement_user in this case).

Get your own MySQL Database

To use prepared statements, you will need to have a MySQL database set up and running. One way to easily obtain a database is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, and more). In addition, databases are backed up, replicated, and archived, and are deployed on a high performance infrastructure with Solid State Drives.

Avoid the Most Common Database Performance-monitoring Mistakes

Big Data for Business Intelligence: Game Changer or New Name for the Same Old Techniques?

How to Store Large Lists in MongoDB

MySQL's Index Hints Can Improve Query Performance, But Only If You Avoid the 'Gotchas'

MySQL Query Cache and Common Trouble Shooting Issues

A New Twist to Active Archiving Adds Cloud Storage to the Mix

Cloud-based Disaster Recovery: Data Security Without Breaking the Bank

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

Sony's Two Big Mistakes: No Encryption, and No Backup

Cloud Computing + Data Analytics = Instant Business Intelligence

Devise an Attribute-based Access Control Plan that Won't Affect Database Performance

The Most Important Server Parameters for MySQL Databases

When Is Elasticsearch is the right tool for your Job

Find the Best Approach for Entering Dates in MySQL Databases

Has Node.js Adoption Peaked? If So, What's Next for Server-Side App Development?

Alternative Approaches to Assigning User Privileges in MySQL

Is MySQL the Right Tool for Your Job?

How to Ensure Peak Performance When Using Hash-based Sharding in MongoDB

MongoDB Right Tool

The Most Important Takeaways from MySQL Prepared Statements