Morpheus Blog

TL;DR: Get a jump on query optimization in your databases by designing tables with speed in mind. This entails choosing the best data types for table fields, choosing the correct fields to index, and knowing when and how to split your tables. It also helps to be able to distinguish table partitioning from sharding.

It's a problem as old as databases themselves: large tables slow query performance. Out of this relatively straightforward problem has sprung an industry of indexing, tuning, and optimizing methodologies. The big question is, which approach is best for your database system?

For MySQL databases in particular, query performance starts with the design of the table itself. Justin Ellingwood explains the basics of query optimization in MySQL and MariaDB in a Digital Ocean article from November 11, 2013, and updated on May 30, 2014.

For example, data elements that will be updated frequently should be in their own table to prevent the query cache from being dumped and rebuilt repeatedly. Generally speaking, the smaller the table, the faster the updates.

Similarly, by limiting data sizes up front you avoid wasted storage space, such as by using the "enum" type rather than "varchar" when a field that takes string values has a limited number of valid entries.

There's more than one way to 'split' a table

Generally speaking, the bigger the database table, the longer it takes to access and modify. Unfortunately, database performance optimization isn't as simple as dividing big tables into several smaller ones. Michael Tocker describes 10 ways to improve the speed of large MySQL tables in an October 24, 2013, post on his Master MySQL blog.

One of the 10 methods is to use partitioning to reduce the size of indexes by creating several "tables" out of one. This minimizes index->lock contention. Tocker also recommends using InnoDB rather than MyISAM even though MyISAM can be faster at inserts to the end of a table. MyISAM's table locking restricts updates and deletes, and its use of a single lock to protect the key buffer when loading or removing data from disk causes contention.

Much confusion surrounds the concept of database table partitioning, particularly how partitioning is distinguished from sharding. When the question was posed on Quora, Mosaic CTO Tony Bako explained that partitioning divides logical data elements into multiple entities to improve performance, availability, and maintainability.

Conversely, sharding is a form of horizontal partitioning that creates replicas of the schema and then divides the data stored in each shard by the shard key. This requires that DBAs distribute load and space evenly across shards based on data-access patterns and space considerations.

Sharding uses horizontal partitioning to store data in physically separate databases; here a user table is sharded by values in the "s_age" field. Source: CUBRID

With the Morpheus database-as-a-service (DBaaS) you can monitor your MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. Morpheus lets you bring up a new instance of any SQL, NoSQL, or in-memory database with a single clock. Automatic daily backups and free live replica sets for each provisioned database ensure that your data is secure.

In addition, database performance is optimized via Morpheus's SSD-backed infrastructure and direct patching into EC2 for ultra-low latency. Visit the Morpheus site to create a free account.

TL;DR: Combining tables frequently results in duplicate entries that can be removed in several ways. The trick is knowing which way is best for a given situation. Often the only way to determine the best approach is by testing several and comparing their effect on database performance.

It is one of the most common operations in database management: Merge two tables that use different schema while also removing duplicate entries. Yet there are as many approaches to this problem as there are types of database tables. There are also as many potential glitches.

Here's a look at three ways to address the situation in SQL and MySQL.

All the news that's fit to merge

Combining multiple tables with similar values often creates duplicate entries. Several methods are available for eliminating duplicate values in SQL, but it can be tricky to determine which is best in a given situation.

In a StackOverflow post from October 2012, a number of approaches were proposed for removing duplicates from joined tables. The first was to convert an inner query to a common table expression (CTE):

A common table expression for an inner join often has a lower impact on performance than using the DISTINCT keyword to eliminate duplicates. Source: StackOverflow

The second approach was to use the DISTINCT keyword, which one poster claims performs better in some cases. Also suggested were use of the string_agg function and the group by clause.

Getting up close and personal with the UNION clause

One of the basic elements in the SQL toolbox is the UNION operator, which checks for duplicates and returns only distinct rows, and also stores data from both tables without duplicates:

Insert rows from a second table when their values don't match those of the joined table, or create a new table that doesn't affect either of the original tables. Source: StackOverflow

Alternatively, you can use the SELECT INTO command to create a new table from the contents of two separate tables in a way that removes duplicates:

The SELECT INTO command creates a new table from the content of two others and removes duplicates in the original tables. Source: StackOverflow

Combining multiple gigabyte-size tables without a performance hit

It isn't unusual for database tables to become massive. Imagine merging a dozen tables with a total of nearly 10 million separate records and more than 3GB. The first suggestion on StackOverflow was to create a new table with unique constraint on the set of columns that establish a row's uniqueness, then to use INSERT IGNORE INTO ... SELECT FROM to move rows from the old table to the new one, and finally to truncate the old tables and use INSERT INTO ... SELECT FROM to return the rows to the original table.

Another proposed solution was to create a specific view that combines the results of the 12 tables, and then to filter the results by querying on the view you just created.

The Morpheus database-as-a-service (DBaaS) makes analyzing and optimizing databases much more efficient. Morpheus lets you provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. It is the only DBaas to support SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, each database instance is deployed with a full replica set for fail over and fault tolerance. Morpheus's solid state disk infrastructure, direct patches into EC2, and colocation with fast peering points ensure peak database performance.

Visit the Morpheus site to create a free account. Few things in the database universe are as straightforward as Morpheus's DBaaS.

TL;DR: Just a little bit of pre-lock planning ensures that a SQL or MySQL database you convert to read-only status performs as expected and is accessible by the right group of users. Doing so also helps guarantee the database can be safely unlocked when and if it should ever need to be updated or otherwise altered.

There's something about setting a database to read-only that is comforting for DBAs. It's almost as if the database is all grown up and ready to be kicked out of the house, er, I mean, sent out to make its own way in the world.

Of course, there are as many reasons to set a database to read-only -- temporarily or permanently -- as there are databases. Here's a rundown on the ways to lock the content of a SQL or MySQL database while allowing users to access its contents.

As Atif Shehzad explains on the MSSQLTips site, before you lock the database, you have to optimize it to ensure it's running at peak performance. You can't update the statistics of a read-only database, for example, nor can you create or defragment indexes. Also, you can't add extended properties to the database's objects, edit its permissions, or add/remove users.

Shehzad provides an eight-step pre-lock script to run through prior to converting a database to read-only. The checklist covers everything from creating a transaction log backup to modifying permissions and updating statistics.

An eight-step pre-lock checklist ensures your database is optimized and backed up prior to being switched to read-only. Source: MSSQLTips

Once the database is optimized and backed up, use either the ALTER DATABASE [database name] SET READ_ONLY command or the system stored procedure sp_dboption (the former is recommended because the stored procedure has been removed from recent versions of SQL Server). Alternatively, you can right-click the database in SSMS, choose Properties > Options, and set the Database Read-Only state to True. The database icon and name will change in SSMS to indicate its read-only state.

Converting a MySQL database to read-only -- and back again

A primary reason for setting a MySQL database as read-only is to ensure no updates are lost during a backup. The MySQL Documentation Library provides instructions for backing up master and slave servers in a replication setup via a global read lock and manipulation of the read_only system variable.

The instructions assume a replication setup with a master server (M1), slave server (S1), and clients (C1 connected to M1, and C2 connected to S1). The statements that put the master in a read-only state and that restore it to a normal operational state once the backup is complete are shown below. (Note that in some versions, "ON" becomes "1" and "OFF" becomes "0".)

The first statements switch the database to read-only, and the second revert it to its normal state after completing the backup. Source: MySQL Documentation Library

In its read-only state, the database can be queried but not updated. An August 23, 2013, post on StackOverflow explains how to revoke and then reinstate DML privileges for specific users, which is less likely to affect the performance of the entire database.

The Morpheus database as a service (DBaaS) lets you make these and other changes to your database as simply as pointing and clicking. Morpheus's single dashboard can be used to provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, Morpheus provides a free live replica set for each database instance. Developers can use their choice of tools for connecting, configuring, and managing their databases. Visit the Morpheus site to create a free account. Think of all you can accomplish in the time you'll save when you no longer have to worry about backups!

TL;DR: Thinking about upgrading your MySQL database? When performing an upgrade, there are some factors you need to consider and some best practices that can be followed to help ensure the process goes as smoothly as possible. You will need to consider if an upgrade is necessary, whether it is a minor or major upgrade, and changes to query syntax, results, and performance.

Do You Need to Upgrade?

The need to upgrade is based on the risk versus the reward. Any upgrade carries with it the risk of losing functionality (breaks something) or data (catastrophic loss). With that in mind, you may be running into bugs that are resolved in a later release, performance problems, or growing concerns about the security of the database as the current version continues to age. Any of these factors could cause an upgrade to be necessary, so you will need to follow some best practices to help mitigate as much risk as possible.

An example MySQL setup. Source: Programming Notes

Will the Upgrade be Minor or Major?

A minor upgrade is typically one where there is a small change in the third release number. For example, upgrading version 5.1.22 to 5.1.25 would be considered a minor upgrade. As long as the difference is relatively small, the risk to upgrade will be relatively low.

A major upgrade, on the other hand, involves a change in the second or the first number. For example, upgrading version 5.1.22 to 5.3.1 or 4.1.3 to 5.1.0 would usually be considered a major upgrade. In such cases, the risk becomes higher because more changes to the system have been implemented.

Consider the Changes

Before upgrading, it is best to examine the changes that have been made between the two versions. Changes to query syntax or the results of queries can cause your application to have erroneous data, errors, or even stop working. It is important to know what changes will need to be made in your queries to ensure that your system continues to function after the upgrade takes place.

Also, an upgrade could either cause increased or decreased performance, depending on what has changed and the system on which MySQL is running. If the upgrade could cause a decrease in performance, you will certainly want to consider if this is the right time to update.

Performance on a single thread comparison. Source: PERCONA

Performing the Upgrade

Typically, the best practice when upgrading is to follow this procedure:

Dump your user grant data
Dump your regular data
Restore your regular data in the new version
Restore your user grant data in the new version

Doing this, you significantly reduce your risk of losing data, since you will have backup dump files. In addition, since you are using the MySQL dump and restore, the restore process will use the format of the new MySQL version, which helps mitigate compatibility issues.

Easy Upgrades

If you want to upgrade even more easily, consider using a database as a service in the cloud. Such services make it easy to provision, replicate and archive your database, and make upgrading easier via the use of available tools.

One such service is Morpheus, which offers not only MySQL, but also lets you use MongoDB, ElasticSearch, or Redis. In addition, all databases are deployed on a high performance infrastructure with Solid State Drives and are automatically backed up, replicated, and archived. So, take a look at pricing information or open a free account today to begin taking advantage of this service!

TL;DR: When dealing with a user password, you want to be very careful in how this information is saved. Passwords stored in plain text within your database are a serious security risk both to you and your users, especially if your business is working with any of your users' financial or personal information. To keep from saving passwords in plain text, you can encrypt them using a salt and a hashing algorithm.

Plain Text Password Problems

While storing plain-text passwords can be handy when making prototypes and testing various systems, they can be disastrous when used in a production database. If an attacker somehow gains access to the database and its records, the hacker now can instantly make use of every user account. The reason: the passwords are all right there in plain text for the taking!

Back in 2006, the web site Reddit, a discussion forum, had a backup copy of its database stolen. Unfortunately, all of the passwords were stored in plain-text. The person that had the data could have easily taken over any of the accounts that were stored in the backup database by making use of the user names and passwords available.

This may not seem like a major problem for a discussion forum. If the administrator and moderator passwords were changed quickly, the intruder likely would only be able to post spam or other types of messages the user would not normally write. However, these same users may have used the same login information for other tasks, such as online banking or credit card accounts. This would indeed be a problem for the user once a hacker had access to such an account!

Plain text passwords are not a game, they are a security risk! Source: MacTrast

Salting and Hashing a Password

To avoid having plain-text passwords in your database, you need to store a value that has been altered in a way that will be very difficult to crack. The first step is to add a salt, which is a random string that is added to the password. This value can be either prepended or appended to the password, and should be long in order to provide the best security.

After the password is salted, it should then be hashed. Hashing will take the salted password and turn it into a string of characters that can be placed into the database instead of the plain-text password. There are a number of hashing algorithms, such as SHA256, SHA512, and more.

While implementing a salted password hashing can be more time consuming, it could save your users from having their passwords exposed or stolen. It is definitely a good idea to do this as a safeguard for the people using your services.

An example of password creation and verification with salting and hashing in place. Source: PacketLife

Further Protection

Another way to help protect your users is to make sure the database itself is secure. Keeping the database on site may be difficult for your business, but there are companies that offer databases as a service in the cloud.

One such company is Morpheus, which includes VPN connections to databases and online monitoring to help keep your database secure. In addition, databases are backed up, replicated, and archived automatically on an SSD-backed infrastructure. So, give Morpheus a try and get a secure, reliable database for your business!

TL;DR: Software is complex -- to design, develop, deliver, and maintain. Everybody knows that, right? New app-development approaches and fundamental changes to the way businesses of all types operate are challenging the belief that meeting customers' software needs requires an army of specialists working in a tightly managed hierarchy. Focusing on repeatable results and reusable APIs helps take the complexity out of the development process.

What's holding up software development? Seven out of 10 software development teams have workers in different locations, and four out of five are bogged down by having to accommodate legacy systems. Life does not need to be like this. The rapidly expanding capabilities of cloud-based technologies and external services (like our very own Database-As-A-Service) allow developers to focus more time on application development. The end result: better software products.

The results of the 2014 State of the IT Union survey are presented in a September 9, 2014, article in Dr. Dobb's Journal. Among the findings are that 58 percent of development teams are comprised of 10 or fewer people, while 36 percent work in groups of 11 to 50 developers. In addition, 70 percent of the teams have members working in different geographic locations, but that drops to 61 percent for agile development teams.

A primary contributor to the complexity of software development projects is the need to accommodate legacy software and data sources: 83 percent of the survey respondents reported having to deal with "technical debt," (obsolete hardware and software) which increases risk and development time. Software's inherent complexity is exacerbated by the realities of the modern organization: teams working continents apart, dealing with a tangled web of regulations and requirements, while adapting to new technologies that are turning development on its head.

The survey indicates that agile development projects are more likely to succeed because they focus on repeatable results rather than processes. It also highlights the importance of flexibility in managing software projects, each of which is as unique as the product it delivers.

Successful agile development requires discipline

Organizations are realizing the benefits of agile development, but often in piecemeal fashion as they are forced to accommodate legacy systems. There's more to agile development than new tools and processes, however. As Ben Linders points out in an October 16, 2014, article on InfoQ, the key to success for agile teams is discipline.

The misconception is that agile development operates without a single methodology. In fact, it is even more important to adhere to the framework the team has selected -- whether SCRUM, Kanban, Extreme Programming (XP), Lean, Agile Modeling, or another -- than it is when using traditional waterfall development techniques.

The keys to successfully managing an agile development team have little to do with technology and much to do with communication. Source: CodeProject

Focusing on APIs helps future-proof your apps

Imagine building the connections to your software before you build the software itself. That's the API-first approach some companies are taking in developing their products. Tech Target's Crystal Bedell describes the API-first approach to software development in an October 2014 article.

Bedell quotes Jeff Kaplan, a market analyst for ThinkStrategies, who sees APIs as the foundation for interoperability. In fact, your app's ability to integrate with the universe of platforms and environments is the source of much of its value.

Another benefit of an API-centric development strategy is the separation of all the functional components of the app, according to Progress Software's Matt Robinson. As new standards arise, you can reuse the components and adapt them to specific external services.

Morpheus was created with the express purpose of being that adaptable external service to scale with your needs. The Morpheus database-as-a-service also future-proofs your apps by being the first service to support SQL, NoSQL, and in-memory databases. You can provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch using a simple, single-click interface. Visit the Morpheus site now to create a free account.

If you don't have enough connections open to your MySQL server, your users will begin to receive a "Too many connections" error while trying to use your service. To fix this, you can increase the maximum number of connections to the database that are allowed, but there are some things to take into consideration before simply ramping up this number.

Items to Consider

Before you increase the connections limit, you will want to ensure that the machine on which the database is housed can handle the additional workload. The maximum number of connections that can be supported depends on the following variables:

The available RAM– The system will need to have enough RAM to handle the additional workload.
The thread library quality of the platform - This will vary based on the platform. For example, Windows can be limited by the Posix compatibility layer it uses (though the limit no longer applies to MySQL v5.5 and up). However, there remains memoray usage concerns depending on the architecture (x86 vs. x64) and how much memory can be consumed per application process.
The required response time - Increasing the number could increase the amount of time to respond to request. This should be tested to ensure it meets your needs before going into production.
The amount of RAM used per connection - Again, RAM is important, so you will need to know if the RAM used per connection will overload the system or not.
The workload required for each connection - The workload will also factor in to what system resources are needed to handle the additional connections.

Another issue to consider is that you may also need to increase the open files limit–This may be necessary so that enough handles are available.

Checking the Connection Limit

To see what the current connection limit is, you can run the following from the MySQL command line or from many of the available MySQL tools such as phpMyAdmin:

The show variables command.

This will display a nicely formatted result for you:

Example result of the show variables command.

Increasing the Connection Limit

To increase the global number of connections temporarily, you can run the following from the command line:

An example of setting the max_connections global.

If you want to make the increase permanent, you will need to edit the my.cnf configuration file. You will need to determine the location of this file for your operating system (Linux systems often store the file in the /etc folder, for example). Open this file add a line that includes max_connections, followed by an equal sign, followed by the number you want to use, as in the following example:

example of setting the max_connections

The next time you restart MySQL, the new setting will take effect and will remain in place unless or until this is changed again.

Easily Scale a MySQL Database

Instead of worrying about these settings on your own system, you could opt to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, Redis, and Elasticsearch).

In addition, MySQL and Redis have automatic back ups, and each database instance is replicated, archived, and deployed on a high performance infrastructure with Solid State Drives. You can start a free account today to begin taking advantage of this service!

Even if you can't prevent all unauthorized access to your organization's networks, you can mitigate the damage -- and prevent most of it -- by using two time-proven, straightforward security techniques: encrypt all data storage and transmissions; and back up your data to the cloud or other off-premises site. Best of all, both security measures can be implemented without relying on all-too-human humans.

People are the weak link in any data-security plan. It turns out we're more fallible than the machines we use. Science fiction scenarios aside, the key to protecting data from attacks such as the one that threatens to put Sony out of business is to rely on machines, not people.

The safest things a company can do are to implement end-to-end encryption, and back up all data wherever it's stored. All connections between you and the outside world need to be encrypted, and all company data stored anywhere -- including on employees' mobile devices -- must be encrypted and backed up automatically.

A combination of encryption and sound backup as cornerstones of a solid business-continuity plan would have saved Sony's bacon. In a December 17, 2014, post on the Vox, Timothy B. Lee writes that large companies generally under-invest in security until disaster strikes. But Sony has been victimized before. In 2011, hackers stole the personal information of millions of members of the Sony PlayStation network.

User authentication: The security hole that defies plugging

Most hackers get into their victim's networks via stolen user IDs and passwords. The 2014 Verizon Data Breach Investigations Report identifies the nine attack patterns that accounted for 93 percent of all network break-ins over the past decade. DarkReading's Kelly Jackson Higgins presents the report's findings in an April 22, 2014, article.

The 2014 Data Breach Investigations Report identifies nine predominant patterns in security breaches over the past decade. Source: Verizon

In two out of three breaches, the crook gained access by entering a user ID and password. The report recorded 1,367 data breaches in 2013, compared to only 621 in 2012. In 422 of the attacks in 2013, stolen credentials were used; 327 were due to data-stealing malware; 245 were from phishing attacks; 223 from RAM scraping; and 165 from backdoor malware.

There's just no way to keep user IDs and passwords out of the hands of data thieves. You have to assume that eventually, crooks will make it through your network defenses. In this case, the only way to protect your data is by encrypting it so that even if it's stolen, it can't be read without the decryption key.

If encryption is such a data-security magic bullet, why haven't organizations been using it for years already? In a June 10, 2014, article on ESET's We Live Security site, Stephen Cobb warns about the high cost of not encrypting your business's data. Concentra had just reached a $1,725,220 settlement with the U.S. government following a HIPAA violation that involved the loss of unencrypted health information.

A 2013 Ponemon Institute survey pegged the average cost of a data breach in the U.S. at $5.4 million. Source: Ponemon Institute/Symantec

Encryption's #1 benefit: Minimizing the human factor

Still, as many as half of all major corporations don't use encryption, according to a survey conducted in 2012 by security firm Kaspersky Labs. The company lists the five greatest benefits of data encryption:

Complete data protection, even in the event of theft
Data is secured on all devices and distributed nodes
Data transmissions are protected
Data integrity is guaranteed
Regulatory compliance is assured

Backups: Where data security starts and ends

Vox's Timothy B. Lee points out in his step-by-step account of the Sony data breach that the company's networks were "down for days" following the November 24, 2014, attack. (In fact, the original network breach likely occurred months earlier, as Wired's Kim Zetter reports in a December 15, 2014, post.)

Any business-continuity plan worth its salt prepares the company to resume network operations within hours or even minutes after a disaster, not days. A key component of your disaster-recovery plan is your recovery time objective. While operating dual data centers is an expensive option, it's also the safest. More practical for most businesses are cloud-based services such as the Morpheus database-as-a-service (DBaaS).

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. When you choose Morpheus to host your MySQL, MongoDB, Redis, and ElasticSearch databases, you get a free full replica with each database instance. Morpheus also provides automatic daily backups of your MySQL and Redis databases.

The Morpheus dashboard lets you provision, deploy, and host your databases and monitor performance using a range of database tools. Visit the Morpheus site to create a free account.

Companies large and small are taking a fresh look at their data archives, particularly how to convert them into active archives that deliver business intelligence while simultaneously reducing infrastructure costs. A new approach combines tape-to-NAS, or tNAS, with cloud storage to take advantage of tape's write speed and local availability, and also the cloud's cost savings, efficiency, and reliability.

Archival storage has long been the ugly step-sister of information management. You create data archives because you have to, whether to comply with government regulations or as your backup of last resort. About the only time you would need to access an archive is in response to an emergency.

Data archives were perceived as both a time sink for the people who have to create and manage the old data, and as a hardware expense because you have to pay for all those tape drives (usually) or disk drives (occasionally). Throw in the need to maintain a remote location to store the archive and you've got a major money hole in your IT department's budget.

This way of looking at your company's data archive went out with baggy jeans and flip phones. Today's active archives bear little resemblance to the dusty racks of tapes tucked into even-dustier closets of some backwater remote facility.

The two primary factors driving the adoption of active archiving are the need to extract useful business intelligence from the archives (thus treating the archive as a valuable resource); and the need to reduce storage costs generally and hardware purchases specifically.

Advances in tape-storage technology, such as Linear Tape Open (LTO) generations 6, 7, and beyond, promise to extend tape's lifespan, as IT Jungle's Alex Woodie explains in a September 15, 2014, article. However, companies are increasingly using a mix of tape, disk (solid state and magnetic), and cloud storage to create their active archives.

Tape as a frontend to your cloud-based active archive

Before your company trusts its archive to cloud storage, you have to consider worst-case scenarios: What if you can't access your data? What if uploads and downloads are too slow? What if the storage provider goes out of business or otherwise places your data at risk?

To address these and other possibilities, Simon Watkins of the Active Archive Alliance proposes using tape-to-NAS (tNAS) as a frontend to a cloud-based active archive. In a December 1, 2014, article on the Alliance's blog, Watkins describes a tape-library tNAS that runs NAS gateway software and stores data in the Linear Tape File System (LTFS) format.

The tNAS approach addresses bandwidth congestion by configuring the cloud as a tNAS tier: data is written quicker to tape, and subsequently transferred to the cloud archive when bandwidth is available. Similarly, you always have an up-to-date copy of your data to use should the cloud archive become unavailable for any reason. This also facilitates transferring your archive to another cloud service.

A white paper published by HP in October 2014 presents a tNAS architecture that is able to replicate the archive concurrently to both tape and cloud storage. The simultaneous cloud/tape replication can be configured as a mirror or as tiers.

This tNAS design combines tape and cloud archival storage and supports concurrent replication. Source: HP

To mirror the tape and cloud replication, place both the cloud and tape volumes behind the cache, designating one primary and the other secondary. Data is sent from the cache to both volumes either at a threshold you set or when the cache becomes full.

Tape-cloud tiering takes advantage of tape's fast write speeds and is best when performance is paramount. In this model, tape is always the primary archive, and users are denied access to the cloud archive.

With the Morpheus database-as-a-service (DBaaS) replication of your MySQL, MongoDB, Redis, and ElasticSearch databases is free. Morpheus is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

Morpheus lets you monitor all your databases from a single dashboard. The service's SSD-backed infrastructure ensures high availability and reliability. Visit the Morpheus site to create a free account.

The necessity of having a rock-solid disaster-recovery plan in place has been made abundantly clear by recent high-profile data breaches. Advances in cloud-based DR allow organizations of all sizes to ensure they'll be up and running quickly after whatever disaster may happen their way.

It just got a lot easier to convince senior management at your company that they should allocate some funds for implementation of an iron-clad disaster-recovery program. That may be one of the few silver linings of the data breach that now threatens to bring down Sony Pictures Entertainment.

It has always been a challenge for IT managers to make a business case for disaster-recovery spending. Computing UK's Mark Worts explains in a December 1, 2014, article that because DR is all about mitigating risks, senior executives strive to reduce upfront costs and long-term contracts. Cloud-based DR addresses both of these concerns by being inexpensive to implement, and by allowing companies to pay for only the resources they require right here, right now.

Small and midsized businesses, and departments within enterprises are in the best position to benefit from cloud-based DR, according to Worts. Because of their complex, distributed infrastructures, it can be challenging for enterprises to realize reasonable recovery time objectives (RTO) and recovery point objectives (RPO) relying primarily on cloud DR services.

Components of a cloud-based DR configuration

Researchers from the University of Massachusetts and AT&T Labs developed a model for a low-cost cloud-based DR service (PDF) that has the potential to enhance business continuity over existing methods. The model depends on warm standby replicas (standby servers are available but take minutes to get running) rather than hot standby (synchronous replication for immediate availability) or cold standby (standby servers are not available right away, so recovery may take hours or days).

The first challenge is for the system to know when a failure has occurred; transient failures or network segmentation can trigger false alarms, for example. Cloud services can help detect system failures by monitoring across distributed networks. The system must also know when to fall back once the primary system has been restored.

A low-cost cloud-based disaster recovery system configured with three web servers and one database at the primary site. Source: University of Massachusetts

The researchers demonstrate that their RUBiS system offers significant cost savings over use of a colocation facility. For example, only one "small" virtual machine is required to run the DR server in the cloud's replication mode, while colocation DR entails provisioning four "large" servers to run the application during failover.

The cloud-based RUBiS DR solution is much less expensive to operate than a colocation approach for a typical database server implementation. Source: University of Massachusetts

A key cloud-DR advantage: Run your apps remotely

The traditional approaches to disaster recovery usually entail tape storage in some musty, offsite facility. Few organizations can afford the luxury of dual data centers, which duplicate all data and IT operations automatically and offer immediate failover. The modern approach to DR takes advantage of cloud services' ability to replicate instances of virtual machines, as TechTarget's Andrew Reichman describes in a November 2014 article.

By combining compute resources with the stored data, cloud DR services let you run your critical applications in the cloud while your primary facilities are restored. SIOS Technology's Jerry Melnick points out in a December 10, 2014, EnterpriseTech post that business-critical applications such as SQL Server, Oracle, and SAP do not tolerate downtime, data loss, or performance slowdowns.

It's possible to transfer the application failover of locally managed server clusters to their cloud counterparts by using SANless clustering software to synchronize storage in cloud cluster nodes. In such instances, efficient synchronous or asynchronous replication creates virtualized storage with the characteristics of SAN failover software.

Failover protection is a paramount feature of the Morpheus database-as-a-service (DBaaS). Morpheus includes a free full replica set with every database instance you create. The service supports MySQL, MongoDB, Redis, and ElasticSearch databases; it is the first and only DBaaS that works with SQL, NoSQL, and in-memory databases.

With Morpheus's single-click provisioning, you can monitor all your databases via a single dashboard. Automatic daily backups are provided for MySQL and Redis databases, and your data is safely stored on the service's SSD-backed infrastructure. Visit the Morpheus site to create a free account.

A database can never be too optimized, and DBAs will never be completely satisfied with the performance of their creations. As your MySQL databases grow in size and complexity, taking full advantage of the optimizing tools built into the MySQL Workbench becomes increasingly important.

DBAs have something in common with NASCAR pit crew chiefs: No matter how well your MySQL database is performing, there's always a little voice in your head telling you, "I can make it go faster."

Of course, you can go overboard trying to fine-tune your database's performance. In reality, most database tweaking is done to address a particular performance glitch or to prevent the system from bogging down as the database grows in size and complexity.

One of the tools in the MySQL Workbench for optimizing your database is the Performance Dashboard. When you mouse over a graph or other element in the dashboard, you get a snapshot of server, network, and InnoDB metrics.

The Performance Dashboard in the MySQL Workbench provides at-a-glance views of key metrics of network traffic, server activity, and InnoDB storage. Source: MySQL.com

Other built-in optimization tools are Performance Reports for analyzing IO hotspots, high-cost SQL statements, Wait statistics, and InnoDB engine metrics; Visual Explain Plans that offer graphical views of SQL statement execution; and Query Statistics that report on client timing, network latency, server execution timing, index use, rows scanned, joins, temporary storage use, and other operations.

A maintenance release of the MySQL Workbench, version 6.2.4, was announced on November 20, 2014, and is described on the MySQL Workbench Team Blog. Among the new features in MySQL Workbench 6.2 are a spatial data viewer for graphing data sets with GEOMETRY data; enhanced Fabric Cluster connectivity; and a Metadata Locks View for finding and troubleshooting threads that are blocked or stuck waiting on a lock.

Peering deeper into your database's operation

One of the performance enhancements in MySQL 5.7 is the new Cost Model, as Marcin Szalowicz explains in a September 25, 2014, post on the MySQL Workbench blog. For example, Visual Explain's interface has been improved to facilitate optimizing query performance.

MySQL 5.7's Visual Explain interface now provides more insight for improving the query processing of your database. Source: MySQL.com

The new query results panel centralizes information about result sets, including Result Grid, Form Editor, Field Types, Query Stats, Spatial Viewer, and both traditional and Visual Execution Plans. Also new is the File > Run SQL Script option that makes it easy to execute huge SQL script files.

Attempts to optimize SQL tables automatically via the OPTIMIZE TABLE command often go nowhere. A post from March 2011 on Stack Overflow demonstrates that you may end up with slower performance and more storage space used rather than less. The best approach is to use "mysqlcheck" at the command line:

Run "mysqlcheck" at the command line to optimize a single database or all databases at once. Source: Stack Overflow

Alternatively, you could run a php script to optimize all the tables in a database:

A php script can be used to optimize all the tables in a database at one time. Source: Stack Overflow

A follow-up to the above post on DBA StackExchange points out that MySQL Workbench has a "hidden" maintenance tool called the Schema Inspector that opens an editor area in which you can inspect and tweak several pages at once.

What is evident from these exchanges is that database optimization remains a continuous process, even with the arrival of new tools and techniques. A principal advantage of the Morpheus database-as-a-service (DBaaS) is the use of a single dashboard to access statistics about all your MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus you can provision, deploy, and host SQL, NoSQL, and in-memory databases with a single click. The service supports a range of tools for connecting, configuring, and managing your databases, and running backups for MySQL and Redis.

Visit the Morpheus site to create a free account. Database optimization has never been simpler!

The efficient operation of your MongoDB database depends on which field in the documents you designate as the shard key. Since you have to select the shard key up front and can't change it later, you need to give the choice due consideration. For query-focused apps, the key should be limited to one or a few shards; for apps that entail a lot of scaling between clusters, create a key that writes efficiently.

The outlook is rosy for MongoDB, the most popular NoSQL DBMS. Research and Markets' March 2014 report entitled Global NoSQL Market 2014-2018 predicts that the overall NoSQL market will grow at a compound annual rate of 53 percent between 2013 and 2018. Much of the increase will be driven by increased use of big data in organizations of all sizes, according to the report.

Topping the list of MongoDB's advantages over relational databases are efficiency, easy scalability, and "deep query-ability," as Tutorialspoint's MongoDB Tutorial describes it. As usual, there's a catch: MongoDB's efficient data storage, scaling, and querying depend on sharding, and sharding depends on the careful selection of a shard key.

As the MongoDB Manual explains, every document in a collection has an indexed field or compound indexed field that determines how the collection's documents are distributed among a cluster's shards. Sharding allows the database to scale horizontally across commodity servers, which costs less than scaling vertically by adding processors, memory, and storage.

A mini-shard-key-selection vocabulary

When a MongoDB collection grows beyond its cluster, it chunkifies its documents based on ranges of values in the shard key. Keep in mind that once you choose a shard key, you're stuck with it: you can't change it later.

The characteristic that makes a chunk easy to divide is cardinality. The MongoDB Manual recommends that your shard keys have a high degree of randomness to ensure the cluster's write operations are distributed evenly, which is referred to as write scaling. Conversely, when a field has a high degree of randomness, it becomes a challenge to target specific shards. By using a shard key that is tied to a single shard, queries run much more efficiently; this is called query isolation.

When a collection doesn't have a field suitable to use as a shard key, a compound shard key can be used, or a field can be added to serve as the key.

Choice of shard key depends on the nature of the collection

How do you know which field to use as the shard key? A post by Goran Zugic from May 2014 explains the three types of sharding MongoDB supports:

Range-based sharding splits collections based on shard key value.
Hash-based sharding determines hash values based on field values in the shard key.
Tag-aware sharding ties shard key values to specific shards and are commonly used for location-based applications.

The primary consideration when deciding which shard key to designate is how the collection will be used. Zugic presents it as a balancing act between query isolation and write scaling: the former is preferred when queries are routed to one shard or a small number of shards; the latter when efficient scaling of clusters between servers is paramount.

MongoDB ensures that all replica sets have the same number of chunks, as Conrad Irwin describes in a March 2014 post on the BugSnag site. Irwin lists three factors that determine choice of shard key:

Distribution of reads and writes: split reads evenly across all replica sets to scale working set size linearly among several shards, and to avoid writing to a single machine in a cluster.
Chunk size: make sure your shard key isn't used by so many documents that your chunks grow too large to move between shards.
Query hits: if your queries have to hit too many servers, latency increases, so craft your keys so queries run as efficiently as possible.

Irwin provides two examples. The simplest approach is to use a hash of the _id of your documents:

Source: BugSnag

In addition to distributing reads and writes efficiently, the technique guarantees that each document will have its own shard key, which maximizes chunk-ability.

The other example groups related documents in the index by project while also applying a hash to distinguish shard keys:

Source: BugSnag

A mini-decision tree for shard-key selection might look like this:

Hash the _id if there isn't a good candidate to serve as a grouping key in your application.
If there is a good grouping-key candidate in the app, go with it and use the _id to prevent your chunks from getting too big.
Be sure to distribute reads and writes evenly with whichever key you use to avoid sending all queries to the same machine.

This and other aspects of optimizing MongoDB databases can be handled through a single dashboard via the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and Elasticsearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases. Visit the Morpheus site to sign up for a free account!

Only by using cloud services will companies be able to offer their employees and managers access to big data, as well as the tools they'll need to analyze the information without being data scientists. A primary advantage of moving data analytics to the cloud is its potential to unleash the creativity of the data users, although a level of data governance is still required.

Data analytics are moving to the edge of the network, starting at the point of collection. That's one result of our applications getting smarter. According to the IDC FutureScape for Big Data and Analytics 2015 Predictions, apps that incorporate machine learning and other advanced or predictive analytics will grow 65 percent faster in 2015 than software without such abilities.

There's only one way to give millions of people affordable access to the volumes of data now being collected in real time, not to mention the easy-to-use tools they'll need to make productive use of the data. And that's via the cloud.

IDC also predicts a shortage of skilled data analysts: by 2018 there will be 181,000 positions requiring deep-analytics skills, and five times that number requiring similar data-interpretation abilities. Another of IDC's trends for 2015 is the booming market for visual data discovery tools, which are projected to grow at 2.5 times the rate of other business-intelligence sectors.

As software gets smarter, more data conditioning and analysis is done automatically, which facilitates analysis by end users. Source: Software Development Times

When you combine smarter software, a shortage of experts, and an increase in easy-to-use analysis tools, you get end users doing their own analyses, with the assistance of intelligent software. If all the pieces click into place, your organization can benefit by tapping into the creativity of its employees and managers.

The yogurt-shop model for data analytics

In a November 19, 2014, article, Forbes' Bill Franks compares DIY data analytics to self-serve yogurt shops. In both cases the value proposition is transferred to the customer: analyzing the data becomes an engaging, rewarding experience, similar to choosing the type and amount of toppings for your cup of frozen yogurt.

More importantly, you can shift to the self-serve model without any big expenses in infrastructure, training, or other costs. You might even find your costs reduced, just as self-serve yogurt shops save on labor and other costs, particularly by tapping into the efficiency and scalability of the cloud.

Employees are more satisfied with their data-analytics roles when their companies used cloud-based big data analytics. Source: Aberdeen Group (via Ricoh)

Last but not least, when you give people direct access to data and offer them tools that let them mash up the data as their creativity dictates, you'll generate valuable combinations you may never have come up with yourself.

Determining the correct level of oversight for DIY data analysts

Considering the value of the company's data, it's understandable that IT managers would hesitate to turn employees loose on the data without some supervision. As Timo Elliott explains in a post from April 2014 on the Business Analytics blog, data governance remains the responsibility of the IT department.

Elliott defines data governance as "stopping people from doing stupid things with data." The concept encompasses security, data currency, and reliability, but it also entails ensuring that information in the organization gets into the hands of the people who need it, when they need it.

You'll see aspects of DIY data analytics in the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. You use a single console to provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch, and every database instance is deployed with a free full replica set.

Morpheus supports a range of tools for configuring and managing your databases, which are monitored continuously by the service's staff and advanced bots. Visit the Morpheus site for pricing information and to create a free account.

In most cases, MySQL's optimizer chooses the fastest index option for queries automatically, but now and then it may hit a snag that slows your database queries to a crawl. You can use one of the three index hints -- USE INDEX, IGNORE INDEX, or FORCE INDEX -- to specify which indexes the optimizer uses or doesn't use. However, there are many limitations to using the hints, and most query-processing problems can be resolved by making things simpler rather than by making them more complicated.

The right index makes all the difference in the performance of a database server. Indexes let your queries focus on the rows that matter, and they allow you to set your preferred search order. Covering indexes (also called index-only queries) speed things up by responding to database queries without having to access data in the tables themselves.

Unfortunately, MySQL's optimizer doesn't always choose the most-efficient query plan. As the MySQL Manual explains, you view the optimizer's statement execution plan by preceding the SELECT statement with the keyword EXPLAIN. When this occurs, you can use index hints to specify which index to use for the query.

The three syntax options for hints are USE INDEX, IGNORE INDEX, and FORCE INDEX: The first instructs MySQL to use only the index listed; the second prevents MySQL from using the indexes listed; and the third has the same effect as the first option, but with the added limitation that table scans occur only when none of the given indexes can be used.

MySQL's index_hint syntax lets you specify the index to be used to process a particular query. Source: MySQL Reference Manual

Why use FORCE INDEX at all? There may be times when you want to keep table scans to an absolute minimum. Any database is likely to field some queries that can't be satisfied without having to access some data residing only in the table, and outside any index.

To modify the index hint, apply a FOR clause: FOR JOIN, FOR ORDER BY, or FOR GROUP BY. The first applies hints only when MySQL is choosing how to find table rows or process joins; while the second and third apply the hints only when sorting or grouping rows, respectively. Note that whenever a covering index is used to access the table, the optimizer ignores attempts by the ORDER BY and GROUP BY modifiers to have it ignore the covering index.

Are you sure you need that index hint?

Once you get the hang of using index hints to improve query performance, you may be tempted to overuse the technique. In a Stack Overflow post from July 2013, a developer wasn't able to get MySQL to list his preferred index when he ran EXPLAIN. He was looking for a way to force MySQL to use that specific index for performance tests.

In this case, it was posited that no index hint was needed. Instead, he could just change the order of the specified indexes so that the left-most column in the index is used for row restriction. (While this approach is preferred in most situations, the particulars of the database in question made this solution impractical.)

SQL performance guru Markus Winand classifies optimizer hints as either restricting or supporting. Restricting hints Winand uses only reluctantly because they create potential points of failure in the future: a new index could be added that the optimizer can't access, or an object name used as a parameter could be changed at some point.

Supporting hints add some useful information that make the optimizer run better, but Winand claims such hints are rare. For example, the query may be asking for only a specified number of rows in a result, so the optimizer can choose an index that would be impractical to run on all rows.

Troubleshooting the performance of your databases doesn't get simpler than using the point-and-click interface of the Morpheus database-as-a-service (DBaaS). You can provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases via Morpheus's single dashboard.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service lets you use a range of tools for monitoring and optimizing your databases. Visit the Morpheus site to create a free account.

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as regular backups of MySQL and Redis databases. Visit the Morpheus site to create a free account!

Choosing a tool for information search or storage can be a difficult task. Some tools are better at creating relations among data, some excel at quickly accessing large amounts of data, and others make it easier when attempting to search through a vast array of information. Where does ElasticSearch fit into this, and when is it the right tool for your job?

What is ElasticSearch?

Elasticsearch is an open source search and analytics engine (based on Lucene) designed to operate in real time. It was designed to be used in distributed environments by providing flexibility and scalability.

Instead of the typical full-text search setup, ElasticSearch offers ways to extend searching capabilities through the use of APIs and query DSLs. There are clients available so that it can be used with numerous programming languages, such as Ruby, PHP, JavaScript and others.

What are some advantages of ElasticSearch?

ElasticSearch has some notable features that can be helpful to an application:

Distributed approach - Indices can be divided into shards, with each shard able to have any number of replicas. Routing and rebalancing operations are done automatically when new documents are added.

Based on Lucene - Lucene is an open source library for information retrieval that may already be familiar to developers. ElasticSearch makes numerous features of the Lucene library available through its API and JSON.

An example of an index API call. Source: ElasticSearch.

Use of faceting - A faceted search is more robust than a typical text search, allowing users to apply a number of filters on the information and even have a classification system based on the data. This allows better organization of the search results and allows users to better determine what information they need to examine.

Structured search queries - While searches can still be done using a text string, more robust searches can be structured using JSON objects.

An example structured query using JSON. Source: Slant.

When is ElasticSearch the right tool?

If you are seeking a database for saving and retrieving data outside of searching, you may find a NoSQL or relational database a better fit, since they are designed for those types of queries. While ElasticSearch can serve as a NoSQL solution, it lacks distributed transactions, so you will need to be able to handle that limitation.

On the other hand, if you want a solution that is effective at quickly and dynamically searching through large amounts of data, then ElasticSearch is a good solution. If your application will be search-intensive, such as with GitHub, where it is used to search through 2 billion documents from all of its code repositories, then ElasticSearch is an ideal tool for the job.

Get ElasticSearch or a Database

If you want to try out ElasticSearch, one way to do so is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including ElasticSearch, MongoDB, MySQL, and more). In addition, databases are deployed on a high performance infrastructure with Solid State Drives, replicated, and archived.

A function as straightforward as entering dates in a MySQL database should be nearly automatic, but the process is anything but foolproof. MySQL's handling of invalid date entries can leave developers scratching their heads. In particular, the globalization of IT means you're never sure where the server hosting your database will be located -- or relocated. Plan ahead to ensure your database's date entries are as accurate as possible.

DBAs know that if they want their databases to function properly, they have to follow the rules. The first problem is, some "rules" are more like guidelines, allowing a great deal of flexibility in their application. The second problem is, it's not always easy to determine which rules are rigid, and which are more malleable.

An example of a rule with some built-in wiggle room is MySQL's date handling. Database Journal's Rob Gravelle explains in a September 8, 2014, post that MySQL automatically converts numbers and strings into a correct Date whenever you add or update data in a DATE, DATETIME, or TIMESTAMP column. The string has to be in the "yyyy-mm-dd" format, but you can use any punctuation to separate the three date elements, such as "yyyy&mm&dd", or you can skip the separators altogether, as in "yyyymmdd".

So what happens when a Date record has an invalid entry, or no entry at all? MySQL inserts its special zero date of "0000-00-00" and warns you that it has encountered an invalid date, as shown below.

Only the first of the four Date records is valid, so MySQL warns that there is an invalid date after entering the zero date of "0000-00-00". Source: Database Journal

To prevent the zero date from being entered, you can use NO_ZERO_DATE in strict mode, which generates an error whenever an invalid date is entered; or NO_ZERO_IN_DATE mode, which allows no month or day entry when a valid year is entered. Note that both of these modes have been deprecated in MySQL 5.7.4 and rolled into strict SQL mode.

Other options are to enable ALLOW_INVALID_DATES mode, which permits an application to store the year, month, and date in three separate fields, for example, or to enable TRADITIONAL SQL Mode, which acts more like stricter database servers by combining STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, and NO_AUTO_CREATE_USER.

Avoid using DATETIME at all? Not quite

Developer Eli Billauer posits on his personal blog that it is always a mistake to use the MySQL (and SQL) DATETIME column type. He qualifies his initial blanket pronouncement to acknowledge that commenters to the post give examples of instances where use of DATETIME is the best approach.

Billauer points out that many developers use DATETIME to store the time of events, as in this example:

Using the DATETIME and NOW() functions creates problems because you can't be sure of the local server's time, or the user's timezone. Source: Eli Billauer

Because DATETIME relies on the time of the local server, you can't be sure where the web server hosting the app is going to be located. One way around this uncertainty is to apply a SQL function that converts timezones, but this doesn't address such issues as daylight savings time and databases relocated to new servers. (Note that the UTC_TIMESTAMP() function provides the UTC time.)

There are several ways to get around these limitations, one of which is to use "UNIX time," as in "UNIX_TIMESTAMP(thedate)." This is also referred to as "seconds since the Epoch." Alternatively, you can store the integer itself in the database; Billauer explains how to obtain Epoch time in Perl, PHP, Python, C, and Javascript.

Troubleshooting and monitoring the performance of your MySQL, MongoDB, Redis, and ElasticSearch databases is a piece of cake when you use the Morpheus database-as-a-service (DBaaS). Morpheus provides a single, easy-to-use dashboard. In addition to a free full replica set of each database instance, you get automatic daily backups of your MySQL and Redis databases.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's SSD-backed infrastructure ensures peak performance, and direct links to EC2 guarantee ultra-low latency. Visit the Morpheus site to create a free account.

Since MySQL both sends queries to the server and returns data in text format, the query must be fully parsed and the result set must be converted to a string before being sent to the client. This overhead can cause performance issues, so MySQL implemented a new feature called Prepared Statements when it released version 4.1.

What is a MySQL prepared statement?

A MySQL prepared statement is a method that can be used to pass a query containing one or more placeholders to the MySQL server. Prepared statements make use of the client/server protocol that works between a MySQL client and server, thus allowing it to have a quicker response time that the typical text/parse/conversion exchange.

Here is an example query that demonstrates how a placeholder can be used (this is similar to using a variable in programming):

Example of a MySQL placeholder

This query does not need to be fully parsed, since different values can be used for the placeholder. This provides a performance boost for the query, which is even more pronounced if the query is used numerous times.

In addition to enhanced performance, the placeholder can help you avoid a number of SQL injection vulnerabilities, since you are defining the placeholder rather than having it sent as a text string that can be more easily manipulated.

Using MySQL Prepared Statements

A prepared statement in MySQL is essentially performed using four keywords:

PREPARE - This prepares the statement for execution
SET - Sets a value for the placeholder
EXECUTE - This executes the prepared statement
DEALLOCATE PREPARE - This deallocates the prepared statement from memory.

With that in mind, here is an example of a MySQL prepared statement:

Example of a MySQL placeholder

Notice how the four keywords are used to complete the prepared statement:

The PREPARE statement defines a name for the prepared statement and a query to be run.
The SELECT statement that is prepared will select all of the user data from the users table for the specified user. A question mark is used as a placeholder for the user name, which will be defined next.
A variable named @username is set and is given a value of 'sally_224'. The EXECUTE statement is then used to execute the prepared statement using the value in the placeholder variable.
To end everything and ensure the statement is deallocated from memory, the DEALLOCATE PREPARE statement is used with the name of the prepared statement that is to be deallocated (statement_user in this case).

Get your own MySQL Database

To use prepared statements, you will need to have a MySQL database set up and running. One way to easily obtain a database is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, and more). In addition, databases are backed up, replicated, and archived, and are deployed on a high performance infrastructure with Solid State Drives.

Morpheus was recently acknowledged by industry-leading publication CRN as "Coolest Cloud Computing Vendor". We are proud of the hard work that we have put into the product, and excited about the positive feedback we continue to receive from our users.

A note about the award is below:

Morpheus Data, LLC, developer of the industry's first and only database provisioning, management, backup, and monitoring platform for private, public, and hybrid clouds, today announced it has earned recognition on The Channel Company's CRN 100 Coolest Cloud Computing Vendors of 2015. This annual list recognizes some of the most innovative cloud companies supporting the IT channel today. Morpheus was named among only 20 vendors in the Cloud Storage Category.

The 100 Coolest Cloud Computing Vendors honor is presented to companies based on their approach to creating innovative products, services or partner programs that have helped channel partners transform into true solution providers, ultimately helping customers take advantage of the ease of use, flexibility, scalability, and savings that cloud computing offers. Morpheus is recognized for Morpheus Virtual Appliance, its database provisioning and management platform for hybrid clouds, along with BitCan, cloud based database and file backup and restore service for MySQL, MongoDB, Linux, Unix, and Windows servers.

"We are honored to be selected as one of CRN's Coolest Cloud Computing Vendors of 2015," said Ashish Mohindroo, General Manager, Morpheus. "Being recognized by a leading channel publication as a top vendor in this space, is further validation that Morpheus is successfully addressing the demanding enterprise scale business requirements for provisioning, management, backups, and monitoring of heterogeneous databases across public, private, and hybrid clouds."

Morpheus Virtual Appliance along with BitCan offer companies the ability to provision, manage, backup and restore highly scalable and reliable SQL, NoSQL, and In-Memory databases across public, private, and hybrid clouds. With its point and click provisioning, automated backups, and single management console channel partners can on-board and manage new customers with ease. Morpheus empowers channel partners with expansive opportunities to take advantage of the rapidly growing database-as-a-service and backup market with a channel program that has very high margins and incentives, sales and marketing support, and no fees.

"Companies are rapidly moving towards a hybrid cloud model, where workloads are shared across on-premise, private and public clouds. Gaining visibility, control, and management of these systems across multiple clouds is a significant challenge for these companies," said Jeff Drazan, Managing Partner, Bertram Capital and Morpheus Chairman. "Morpheus, with its unique next-generation cloud architecture, has the platform which is essential for provisioning, managing and monitoring databases and IT systems in hybrid clouds."

This year's 100 Coolest Cloud Computing honorees are identified across five major categories including platforms, infrastructure, storage, security, and software. This list is an effort to help solution providers navigate through the ambiguity of the cloud market to identify the vendors, products, and services that can help elevate their cloud services offerings. Companies were chosen based on data and information gathered from solution provider nominations along with input from the CRN editorial team.

"Widespread demand for cloud computing solutions is increasing across businesses of all shapes and sizes. As organizations become more willing to adopt cloud technology and services, experienced solution providers are in greater demand." said Robert Faletra, CEO, The Channel Company. "The 100 Coolest Cloud Computing Vendors list enables solution providers to engage with companies who are poised to help them capitalize on these opportunities."

Coverage of the100 Coolest Cloud Computing Vendors will be featured in the February 2015 issue of CRN and online at www.CRN.com.

Data deduplication efforts have traditionally focused on backups and archives, but new unstructured data types and the continuing surge in data volumes are straining hardware and other resources. By applying deduplication when the data elements are created, you can improve storage efficiency, although you risk slowing the performance of your applications.

The more storage options your company uses, the more likely you're paying to store duplicate copies of text files, media, and other resources. This makes data deduplication more important than ever for efficient storage management. At the same time, finding and removing duplicate data has never been more challenging.

Deduplication has long been the province of data backups and archives. But storage executive Arun Taneja recommends deduping data when it is created. Taneja is quoted in a October 2014 TechTarget article written by Alex Barrett.

Organizations may hesitate to run deduplication on their primary data, but the advantages in terms of space savings and processor efficiency make the practice worthwhile. For example, text files and virtual desktop environments can realize 40:1 deduplication rates, although 6:1 deduplication rates are considered a good average by storage vendors.

Deduplication promises to reduce storage costs and improve efficiency, but the tradeoff is the potential of slowing application performance. Source: TechTarget

On the other end of the spectrum, encrypted backups and video files aren't amenable to hash-based in-line deduplication. And compressed files in general dedupe poorly or not at all.

In-line deduplication puts efficient storage front and center

Running deduplication on primary data introduces the possibility that applications will be made to wait as data is written or read. If application performance is paramount, deduplication is shifted to post-processing. This maximizes processor cycles, but it requires more primary storage, which negates one of the principal advantages of deduplication: storage savings.

In a September 2014 TechTarget article, Brien Posey offers two reasons why in-line, software-based deduplication is worth the possible performance hit: first, your existing hardware likely can handle the added processing load without any noticeable degradation; and second, any performance reduction that results may be worth the savings in other areas, particularly in improved data transmission rates.

Today's deduplication challenges will only become more thorny as new types of data -- and much more of it -- find their way into your business's day-to-day operations. In an April 24, 2014, post on Wired, Sepaton executive Amy Lipton points out that efficient data management now requires scalability beyond simply buying more hardware to expand existing data silos.

How to Handle Huge Database Tables

How to Determine the Best Approach to Removing Duplicates in Joined Tables

How (And Why) Make Read-only Versions of Your SQL and MySQL Databases

Morpheus Lessons: Best Practices for Upgrading MySQL

Password Encryption: Keeping Hackers from Obtaining Passwords in Your Database

Making Software Development Simpler: Look for Repeatable Results, Reusable APIs and DBaaS

"Too Many Connections": How to Increase the MySQL Connection Count To Avoid This Problem

Sony's Two Big Mistakes: No Encryption, and No Backup

A New Twist to Active Archiving Adds Cloud Storage to the Mix

Cloud-based Disaster Recovery: Data Security Without Breaking the Bank

Your Options for Optimizing the Performance of MySQL Databases

The Three Most Important Considerations in Selecting a MongoDB Shard Key

Cloud Computing + Data Analytics = Instant Business Intelligence

MySQL's Index Hints Can Improve Query Performance, But Only If You Avoid the 'Gotchas'

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

Is Elasticsearch the Right Solution for You?

Find the Best Approach for Entering Dates in MySQL Databases

The Most Important Takeaways from MySQL Prepared Statements

Morpheus Data Honored as Coolest Cloud Computing Vendor by CRN

New Approaches to Data Deduplication Extend Beyond Backups