Morpheus Blog

TL;DR: Software is complex -- to design, develop, deliver, and maintain. Everybody knows that, right? New app-development approaches and fundamental changes to the way businesses of all types operate are challenging the belief that meeting customers' software needs requires an army of specialists working in a tightly managed hierarchy. Focusing on repeatable results and reusable APIs helps take the complexity out of the development process.

What's holding up software development? Seven out of 10 software development teams have workers in different locations, and four out of five are bogged down by having to accommodate legacy systems. Life does not need to be like this. The rapidly expanding capabilities of cloud-based technologies and external services (like our very own Database-As-A-Service) allow developers to focus more time on application development. The end result: better software products.

The results of the 2014 State of the IT Union survey are presented in a September 9, 2014, article in Dr. Dobb's Journal. Among the findings are that 58 percent of development teams are comprised of 10 or fewer people, while 36 percent work in groups of 11 to 50 developers. In addition, 70 percent of the teams have members working in different geographic locations, but that drops to 61 percent for agile development teams.

A primary contributor to the complexity of software development projects is the need to accommodate legacy software and data sources: 83 percent of the survey respondents reported having to deal with "technical debt," (obsolete hardware and software) which increases risk and development time. Software's inherent complexity is exacerbated by the realities of the modern organization: teams working continents apart, dealing with a tangled web of regulations and requirements, while adapting to new technologies that are turning development on its head.

The survey indicates that agile development projects are more likely to succeed because they focus on repeatable results rather than processes. It also highlights the importance of flexibility in managing software projects, each of which is as unique as the product it delivers.

Successful agile development requires discipline

Organizations are realizing the benefits of agile development, but often in piecemeal fashion as they are forced to accommodate legacy systems. There's more to agile development than new tools and processes, however. As Ben Linders points out in an October 16, 2014, article on InfoQ, the key to success for agile teams is discipline.

The misconception is that agile development operates without a single methodology. In fact, it is even more important to adhere to the framework the team has selected -- whether SCRUM, Kanban, Extreme Programming (XP), Lean, Agile Modeling, or another -- than it is when using traditional waterfall development techniques.

The keys to successfully managing an agile development team have little to do with technology and much to do with communication. Source: CodeProject

Focusing on APIs helps future-proof your apps

Imagine building the connections to your software before you build the software itself. That's the API-first approach some companies are taking in developing their products. Tech Target's Crystal Bedell describes the API-first approach to software development in an October 2014 article.

Bedell quotes Jeff Kaplan, a market analyst for ThinkStrategies, who sees APIs as the foundation for interoperability. In fact, your app's ability to integrate with the universe of platforms and environments is the source of much of its value.

Another benefit of an API-centric development strategy is the separation of all the functional components of the app, according to Progress Software's Matt Robinson. As new standards arise, you can reuse the components and adapt them to specific external services.

The Morpheus database-as-a-service also future-proofs your apps by being the first service to support SQL, NoSQL, and in-memory databases. You can provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch using a simple, single-click interface. Visit the Morpheus site now to create a free account.

TL;DR: MongoDB provides a $type operator that can be helpful when you need to select documents where the value of the fields is of a certain type. This can be really helpful, but when it comes to selecting array values, things can get a bit tricky. It is best to use some form of workaround to get the selection you need in such cases.

What is the $type Operator?

The $type operator allows you to make your query selection more specific by selecting only documents that have a field containing a value that is a certain data type. Suppose you had a collection like this:

An example MongoDB collection.

If you want to get only the documents where the "name" field contains a string value, you could use the following query:

Using the $type operator to check for a string.

Since 2 is the code for a string data type, the query will only return the second document: the one with the string "George" in the "name" field. Since the first document has an object type, it won’t be returned.

MongoDB offers codes for a number of data types, as shown in the chart below:

The types that can be used by the $type operator. Source: MongoDB Documentation

Oddities with Arrays

While the $type operator generally works as expected, determining whether a field contains an array can be a bit tricky. This is because when you are dealing with an array field, the $type operator checks the type against the array elements themselves instead of the field. This means that for an array field, it will only return documents where the array contains another array.

This is helpful if you are looking for multidimensional arrays, but does not help the case where you need to know if the field itself is an array, which was possible when looking for a string value. Finding out if the field is an array will require a workaround.

How to Find a Field of Type Array

One method that works is supplied in the MongoDB documentation. It suggests using the JavaScript isArray() method, as in the following code:

Using isArray() to check for the array type.

This will do the trick, but it does come with a substantial performance decrease when running the query.

A workaround that avoids this is to use $elemMatch to check for the existence of an array, like this:

Using $elemMatch to check for the array type.

This will do the trick, except in the case where you need to also include empty arrays. To do this, you can add a sort that will allow empty arrays to be returned as well:

Using $elemMatch with a sort to include empty arrays.

With this in place, you can now determine if the field is an array. If you need to find out if it has inner arrays, you can simply use the $type operator to check for this. You are now able to perform the check with MongoDB and without the performance penalty of cycling over the collection in JavaScript!

How to Get MongoDB

MongDB is well-suited for numerous applications, especially in cases of big data that needs to be queried quickly. One way to easily set up a MongoDB database is to have it remotely hosted as a service in the cloud.

One company that offers this is Morpheus, where you can get MongoDB (and several other databases) as a service in the cloud. With easy setup and replication, and running on a high performance infrastructure with Solid State Drives, why not look open a free account today!

TL;DR: The efficient operation of your MongoDB database depends on which field in the documents you designate as the shard key. Since you have to select the shard key up front and can't change it later, you need to give the choice due consideration. For query-focused apps, the key should be limited to one or a few shards; for apps that entail a lot of scaling between clusters, create a key that writes efficiently.

The outlook is rosy for MongoDB, the most popular NoSQL DBMS. Research and Markets' March 2014 report entitled Global NoSQL Market 2014-2018 predicts that the overall NoSQL market will grow at a compound annual rate of 53 percent between 2013 and 2018. Much of the increase will be driven by increased use of big data in organizations of all sizes, according to the report.

Topping the list of MongoDB's advantages over relational databases are efficiency, easy scalability, and "deep query-ability," as Tutorialspoint's MongoDB Tutorial describes it. As usual, there's a catch: MongoDB's efficient data storage, scaling, and querying depend on sharding, and sharding depends on the careful selection of a shard key.

As the MongoDB Manual explains, every document in a collection has an indexed field or compound indexed field that determines how the collection's documents are distributed among a cluster's shards. Sharding allows the database to scale horizontally across commodity servers, which costs less than scaling vertically by adding processors, memory, and storage.

A mini-shard-key-selection vocabulary

When a MongoDB collection grows beyond its cluster, it chunkifies its documents based on ranges of values in the shard key. Keep in mind that once you choose a shard key, you're stuck with it: you can't change it later.

The characteristic that makes a chunk easy to divide is cardinality. The MongoDB Manual recommends that your shard keys have a high degree of randomness to ensure the cluster's write operations are distributed evenly, which is referred to as write scaling. Conversely, when a field has a high degree of randomness, it becomes a challenge to target specific shards. By using a shard key that is tied to a single shard, queries run much more efficiently; this is called query isolation.

When a collection doesn't have a field suitable to use as a shard key, a compound shard key can be used, or a field can be added to serve as the key.

Choice of shard key depends on the nature of the collection

How do you know which field to use as the shard key? A post by Goran Zugic from May 2014 explains the three types of sharding MongoDB supports:

Range-based sharding splits collections based on shard key value.
Hash-based sharding determines hash values based on field values in the shard key.
Tag-aware sharding ties shard key values to specific shards and are commonly used for location-based applications.

The primary consideration when deciding which shard key to designate is how the collection will be used. Zugic presents it as a balancing act between query isolation and write scaling: the former is preferred when queries are routed to one shard or a small number of shards; the latter when efficient scaling of clusters between servers is paramount.

MongoDB ensures that all replica sets have the same number of chunks, as Conrad Irwin describes in a March 2014 post on the BugSnag site. Irwin lists three factors that determine choice of shard key:

Distribution of reads and writes: split reads evenly across all replica sets to scale working set size linearly among several shards, and to avoid writing to a single machine in a cluster.
Chunk size: make sure your shard key isn't used by so many documents that your chunks grow too large to move between shards.
Query hits: if your queries have to hit too many servers, latency increases, so craft your keys so queries run as efficiently as possible.

Irwin provides two examples. The simplest approach is to use a hash of the _id of your documents:

Source: BugSnag

In addition to distributing reads and writes efficiently, the technique guarantees that each document will have its own shard key, which maximizes chunk-ability.

The other example groups related documents in the index by project while also applying a hash to distinguish shard keys:

Source: BugSnag

A mini-decision tree for shard-key selection might look like this:

Hash the _id if there isn't a good candidate to serve as a grouping key in your application.
If there is a good grouping-key candidate in the app, go with it and use the _id to prevent your chunks from getting too big.
Be sure to distribute reads and writes evenly with whichever key you use to avoid sending all queries to the same machine.

This and other aspects of optimizing MongoDB databases can be handled through a single dashboard via the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and Elasticsearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases. Visit the Morpheus site to sign up for a free account!

If you don't have enough connections open to your MySQL server, your users will begin to receive a "Too many connections" error while trying to use your service. To fix this, you can increase the maximum number of connections to the database that are allowed, but there are some things to take into consideration before simply ramping up this number.

Items to Consider

Before you increase the connections limit, you will want to ensure that the machine on which the database is housed can handle the additional workload. The maximum number of connections that can be supported depends on the following variables:

The available RAM – The system will need to have enough RAM to handle the additional workload.
The thread library quality of the platform - This will vary based on the platform. For example, Windows can be limited by the Posix compatibility layer it uses (though the limit no longer applies to MySQL v5.5 and up). However, there remains memoray usage concerns depending on the architecture (x86 vs. x64) and how much memory can be consumed per application process.
The required response time - Increasing the number could increase the amount of time to respond to request. This should be tested to ensure it meets your needs before going into production.
The amount of RAM used per connection - Again, RAM is important, so you will need to know if the RAM used per connection will overload the system or not.
The workload required for each connection - The workload will also factor in to what system resources are needed to handle the additional connections.

Another issue to consider is that you may also need to increase the open files limit–This may be necessary so that enough handles are available.

Checking the Connection Limit

To see what the current connection limit is, you can run the following from the MySQL command line or from many of the available MySQL tools such as phpMyAdmin:

The show variables command.

This will display a nicely formatted result for you:

Example result of the show variables command.

Increasing the Connection Limit

To increase the global number of connections temporarily, you can run the following from the command line:

An example of setting the max_connections global.

If you want to make the increase permanent, you will need to edit the my.cnf configuration file. You will need to determine the location of this file for your operating system (Linux systems often store the file in the /etc folder, for example). Open this file add a line that includes max_connections, followed by an equal sign, followed by the number you want to use, as in the following example:

example of setting the max_connections

The next time you restart MySQL, the new setting will take effect and will remain in place unless or until this is changed again.

Easily Scale a MySQL Database

Instead of worrying about these settings on your own system, you could opt to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, Redis, and Elasticsearch).

In addition, MySQL and Redis have automatic back ups, and each database instance is replicated, archived, and deployed on a high performance infrastructure with Solid State Drives. You can start a free account today to begin taking advantage of this service!

A database can never be too optimized, and DBAs will never be completely satisfied with the performance of their creations. As your MySQL databases grow in size and complexity, taking full advantage of the optimizing tools built into the MySQL Workbench becomes increasingly important.

DBAs have something in common with NASCAR pit crew chiefs: No matter how well your MySQL database is performing, there's always a little voice in your head telling you, "I can make it go faster."

Of course, you can go overboard trying to fine-tune your database's performance. In reality, most database tweaking is done to address a particular performance glitch or to prevent the system from bogging down as the database grows in size and complexity.

One of the tools in the MySQL Workbench for optimizing your database is the Performance Dashboard. When you mouse over a graph or other element in the dashboard, you get a snapshot of server, network, and InnoDB metrics.

The Performance Dashboard in the MySQL Workbench provides at-a-glance views of key metrics of network traffic, server activity, and InnoDB storage. Source: MySQL.com

Other built-in optimization tools are Performance Reports for analyzing IO hotspots, high-cost SQL statements, Wait statistics, and InnoDB engine metrics; Visual Explain Plans that offer graphical views of SQL statement execution; and Query Statistics that report on client timing, network latency, server execution timing, index use, rows scanned, joins, temporary storage use, and other operations.

A maintenance release of the MySQL Workbench, version 6.2.4, was announced on November 20, 2014, and is described on the MySQL Workbench Team Blog. Among the new features in MySQL Workbench 6.2 are a spatial data viewer for graphing data sets with GEOMETRY data; enhanced Fabric Cluster connectivity; and a Metadata Locks View for finding and troubleshooting threads that are blocked or stuck waiting on a lock.

Peering deeper into your database's operation

One of the performance enhancements in MySQL 5.7 is the new Cost Model, as Marcin Szalowicz explains in a September 25, 2014, post on the MySQL Workbench blog. For example, Visual Explain's interface has been improved to facilitate optimizing query performance.

MySQL 5.7's Visual Explain interface now provides more insight for improving the query processing of your database. Source: MySQL.com

The new query results panel centralizes information about result sets, including Result Grid, Form Editor, Field Types, Query Stats, Spatial Viewer, and both traditional and Visual Execution Plans. Also new is the File > Run SQL Script option that makes it easy to execute huge SQL script files.

Attempts to optimize SQL tables automatically via the OPTIMIZE TABLE command often go nowhere. A post from March 2011 on Stack Overflow demonstrates that you may end up with slower performance and more storage space used rather than less. The best approach is to use "mysqlcheck" at the command line:

Run "mysqlcheck" at the command line to optimize a single database or all databases at once. Source: Stack Overflow

Alternatively, you could run a php script to optimize all the tables in a database:

A php script can be used to optimize all the tables in a database at one time. Source: Stack Overflow

A follow-up to the above post on DBA StackExchange points out that MySQL Workbench has a "hidden" maintenance tool called the Schema Inspector that opens an editor area in which you can inspect and tweak several pages at once.

What is evident from these exchanges is that database optimization remains a continuous process, even with the arrival of new tools and techniques. A principal advantage of the Morpheus database-as-a-service (DBaaS) is the use of a single dashboard to access statistics about all your MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus you can provision, deploy, and host SQL, NoSQL, and in-memory databases with a single click. The service supports a range of tools for connecting, configuring, and managing your databases, and automated backups for MySQL and Redis.

Visit the Morpheus site to create a free account. Database optimization has never been simpler!

TL;DR: Finding the causes of slow queries and other operations in a database begins with knowing where to find and how to collect the performance data you need to analyze in order to find a solution. The metrics built into MySQL and other modern databases and development tools allow you to zero in on the source of system slowdowns using simple counters and basic mathematical operations.

If a database's poor performance has you scratching your head, start your troubleshooting by understanding the causes of system slow-downs, and by avoiding the most common performance-metrics mistakes.

Measuring response time in any query-processing system depends on Little's law, which is key to recording and reporting response times in multithreading systems of all types. In an April 2011 post on Percona's MySQL Performance blog, Baron Schwartz explains how Little's law applies to MySQL. Schwartz uses the example of creating two counters to measure a system's busy time, and then adding a third counter to measure weighted busy time, or times during which more than one query is processing simultaneously.

The busy-time example highlights the four fundamental metrics: observation interval; number of queries per interval; total active-query time (busy time); and total execution time of all queries (weighted busy time). Combining these with Little's law creates four long-term average metrics: throughput; execution time; concurrency; and utilization.

With these metrics, you can use the Universal Scalability Law to model scalability. The metrics can also be used for queueing analysis and capacity planning -- all with simple addition, subtraction, multiplication, and division of numbers collected by the counters.

A MySQL performance-monitor primer

The MySQL Reference Manual explains how to use the Benchmark() function to measure the speed of a specific expression or function. The approximate time required for the statement to execute is displayed below the return value, which in this case will always be zero.

MySQL's Benchmark() function can be used to display a statement's approximate execution time. Source: MySQL Manual

The MySQL Performance Schema are intended for monitoring MySQL Server execution at a low level by inspecting the internal execution of the server at runtime. All server events are monitored; this includes anything that takes time and that is designed to allow timing information to be collected. For example, by examining the events_waits_current table in the performance_schema database, you get a snapshot of what the server is doing right now.

The events_waits_current table can be examined in the performance_schema database to show the server's current activity. Source: MySQL Manual

The MySQL Reference Manual provides a Performance Schema Quick-Start Guide and a section on Using the Performance Schema to Diagnose Problems.

Ensuring the accuracy of your metrics data

Any troubleshooting approach relies first and foremost on the accuracy of the performance data being collected. Azul Systems CTO Gil Tene explained at an Oracle OpenWorld 2013 conference session that how you measure latency in a system is as important as what you measure. TheServerSide.com's Maxine Giza reports on Tene's OpenWorld presentation in a September 2013 post.

For example, response time may report a single instance rather than a value aggregated over time to represent both peak and low demand. Conversely, the database's metrics may not record or report critical latencies, and may depend instead on mean and standard-deviation measures. This causes you to lose the data that's most useful in addressing the latency.

Another mistake many DBAs make, according to Tene, is to run load-generator tests in ways that don't represent real-world conditions, so important results are omitted. Tene states that the coordinated omission in results is "significant" and leads to bad reporting. He recommends that organizations establish latency-behavior requirements and pass/fail criteria to avoid wasting time and resources.

The monitoring tools on the Morpheus database-as-a-service (DBaaS) let you troubleshoot heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases through a single console using a fast and simple point-and-click interface. Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. A free full replica set of every database instance and automated daily backups ensure that your data is available when you need it.

Morpheus supports a range of tools for connecting, configuring, and managing your databases. Visit the Morpheus site to create a free account.

TL;DR: IT managers have grown accustomed to the industry's continual reinvention as new technologies have their day in the sun. Organizations concerned about being left in the competition's dust as the big-data wave sweeps across the IT landscape can relax. Considered from a business perspective, big-data analysis bears a striking resemblance to tried-and-true methods. Still, realizing the benefits of big data requires seeing your company's data in a less-structured and more-flexible light.

Technology cycles have been around for so long that they're predictable. First, the breakthrough is hyped as the cure for everything from the common cold to global warming. As reality sets in, the naysayers come to the fore, highlighting all the problems and downplaying any potential benefits. Finally, the new technology is integrated into existing systems and life goes on -- until the next cycle.

At the moment, big data is positioned somewhere in the hype cycle between cure-all and hogwash. As big-data principles find their way into established information-technology models and processes, it becomes easier to separate the technology's benefits to organizations from its potential pitfalls.

A Gartner survey released in September 2014 indicates that 73 percent of organizations have invested in big data or plan to do so in the next 24 months. Source: Gartner

Topping the list of big-data misconceptions is what exactly constitutes "big data." In an October 16, 2014, post on the Digital Book World site, Marcello Vena lists the three defining characteristics of big data:

It is "big" as in "unstructured," not as in "lots of it." If a standard RDBMS can be used to analyze it, it's not "big data."
Big data does entail high volume -- up to and beyond exabyte level, which is the equivalent of 1,000 petabytes, 1 million terabytes, or 1 billion gigabytes. Entirely new processes are required to capture, ingest, curate, analyze, model, and visualize such data stores. These large data repositories also present unique storage and transfer challenges in terms of security, reliability, and accessibility.
Analyzing huge data stores to extract business intelligence requires scaling the data initially to identify the specific data sets that relate to the particular problem/opportunity being analyzed. The challenge is to scale the data without losing the extra information created by seeing the individual data drops in relation to the ocean they are a part of.

With big data, quality is more important than ever

Conventional data systems characterize data elements at the point of collection, but big data requires the ability to apply data attributes in the context of a specific analysis at the time it occurs. The same data element could be represented very differently by various analyses, and sometimes by the same analysis at a different time.

Big data doesn't realize its value to organizations until the speed and analysis abilities are joined by business applications. Source: O'Reilly Radar

Big data doesn't make data warehouses obsolete. Data warehouses ensure the integrity and quality of the data being analyzed. They also provide context for the data, which makes most analysis more efficient. ZDNet's Toby Wolpe explains in a September 26, 2014, article that big-data analysis tools are still playing catchup with their data-warehouse counterparts.

High volume can't replace high quality. In an October 8, 2014, article, TechTarget's Nicole Laskowski uses the example of geolocation data collected by cell phones. In theory, you could use this data to plot the movements of millions of people. In fact, we often travel without taking our cell phones, or the phones may be turned off or have location tracking disabled.

Focus on the business need you're attempting to address rather than on the data you intend to analyze or the tools you will use for the analysis. The shift to big data is evolutionary rather than revolutionary. Off-the-shelf data-analysis tools have a proven track record and can generally be trusted to clean up unreliable data stores regardless of size.

An example of a technology that bridges the gap between the new and old approaches to database management is the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's straightforward interface lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch databases with just a few clicks.

Morpheus supports a range of tools for connecting to, deploying, and monitoring your databases. A free full replica set and automatic daily backups are provided for each database instance. Visit the Morpheus site to create a free account.

TL;DR: When storing large lists in MongoDB, a common thought is to place items into an array. In most situations, arrays take up less disk space that objects with keys and values. However, the way MongoDB handles constantly growing arrays can cause performance problems over time.

Data Storage Decisions

When storing data, you may decide to store information that is regularly updated with new information. For example, you may store every post a user makes to an online forum. One way to do this would be to include an array in a document to store the content of each post, as in the following example:

An example MongoDB document with an array.

In most cases, this would seem like an excellent way of storing the data. In programming, arrays are often a very efficient means of storing related values—they tend to be lightning fast for both data storage and retrieval.

How MongoDB Handles Growing Arrays

In MongoDB, arrays work a little differently than they do in a programming language. There are several key points to consider when using arrays for storage in MongoDB:

Expansion: An array that expands often will also increase the size of its containing document. Rather than being rewritten, the document will instead be moved on the disk. In MongoDB, this type of movement tends to be slower, because it requires every index to also be updated.
Indexing: If an array field in MongoDB is indexed, then one document within the collection will be responsible for a distinct entry in that index for every single element it the array. This means that the indexing work required to insert or delete an indexed array is going to be like indexing the same number of documents as the number of elements within the array—a lot of additional work for the database.
BSON Format: Finding one or more elements at the end of a large array can take quite a long time, because the BSON data format used a linear memory scan to manipulate documents.

Addressing the Array Issues

One suggestion for alleviating these performance issues is to model the data differently, so that you do not simply have an ever-growing single array. An example of this is to use nested subdocuments for data storage like the following example shows:

Nested subdocuments used for data storage. Source: MongoSoup

This method improves performance by dramatically decreasing the amount of MongoDB storage space needed for the data, as shown in the following comparison:

Storage space required for several data models. Option 1 - plain array, Option 2: array with documents, Option 3: Document with subdocuments. Source: MongoSoup

As you can see, the storage space for the nested subdocuments (Option 3) was far less than the single array (Option 1).

Where to Get MongoDB

MongoDB is a great database for applications with large amounts of data that needs to be queried quickly. One way to set up a MongoDB database easily is to have it hosted remotely as a cloud service.

Using Morpheus, you can get MongoDB (and several other databases) as a service in the cloud. It runs on a high performance infrastructure with Solid State Drives, and also has an easy setup as well as automatic backups and replication. Why not open a free account today?

In most cases, MySQL's optimizer chooses the fastest index option for queries automatically, but now and then it may hit a snag that slows your database queries to a crawl. You can use one of the three index hints -- USE INDEX, IGNORE INDEX, or FORCE INDEX -- to specify which indexes the optimizer uses or doesn't use. However, there are many limitations to using the hints, and most query-processing problems can be resolved by making things simpler rather than by making them more complicated.

The right index makes all the difference in the performance of a database server. Indexes let your queries focus on the rows that matter, and they allow you to set your preferred search order. Covering indexes (also called index-only queries) speed things up by responding to database queries without having to access data in the tables themselves.

Unfortunately, MySQL's optimizer doesn't always choose the most-efficient query plan. As the MySQL Manual explains, you view the optimizer's statement execution plan by preceding the SELECT statement with the keyword EXPLAIN. When this occurs, you can use index hints to specify which index to use for the query.

The three syntax options for hints are USE INDEX, IGNORE INDEX, and FORCE INDEX: The first instructs MySQL to use only the index listed; the second prevents MySQL from using the indexes listed; and the third has the same effect as the first option, but with the added limitation that table scans occur only when none of the given indexes can be used.

MySQL's index_hint syntax lets you specify the index to be used to process a particular query. Source: MySQL Reference Manual

Why use FORCE INDEX at all? There may be times when you want to keep table scans to an absolute minimum. Any database is likely to field some queries that can't be satisfied without having to access some data residing only in the table, and outside any index.

To modify the index hint, apply a FOR clause: FOR JOIN, FOR ORDER BY, or FOR GROUP BY. The first applies hints only when MySQL is choosing how to find table rows or process joins; while the second and third apply the hints only when sorting or grouping rows, respectively. Note that whenever a covering index is used to access the table, the optimizer ignores attempts by the ORDER BY and GROUP BY modifiers to have it ignore the covering index.

Are you sure you need that index hint?

Once you get the hang of using index hints to improve query performance, you may be tempted to overuse the technique. In a Stack Overflow post from July 2013, a developer wasn't able to get MySQL to list his preferred index when he ran EXPLAIN. He was looking for a way to force MySQL to use that specific index for performance tests.

In this case, it was posited that no index hint was needed. Instead, he could just change the order of the specified indexes so that the left-most column in the index is used for row restriction. (While this approach is preferred in most situations, the particulars of the database in question made this solution impractical.)

SQL performance guru Markus Winand classifies optimizer hints as either restricting or supporting. Restricting hints Winand uses only reluctantly because they create potential points of failure in the future: a new index could be added that the optimizer can't access, or an object name used as a parameter could be changed at some point.

Supporting hints add some useful information that make the optimizer run better, but Winand claims such hints are rare. For example, the query may be asking for only a specified number of rows in a result, so the optimizer can choose an index that would be impractical to run on all rows.

Troubleshooting the performance of your databases doesn't get simpler than using the point-and-click interface of the Morpheus database-as-a-service (DBaaS). You can provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases via Morpheus's single dashboard. Each database instance includes a free full replica set and daily backups.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service lets you use a range of tools for monitoring and optimizing your databases. Visit the Morpheus site to create a free account.

For some applications, the MySQL query cache can be a handy way to improve the performance of your queries. On the other hand, the query cache can become problematic for performance when certain situations arise. To understand how to troubleshoot these issues, you will first want to see what the MySQL query cache does.

What is the MySQL Query Cache?

The MySQL query cache will cache a parsed query and its entire result set. It is excellent when you have numerous small queries that in turn return small data sets, as the query cache will allow the return results to be available immediately, rather than rerunning the query each time it occurs.

Gathering information about the query cache

To see if the query cache is indeed on, and that it has a non-zero size, you can issue a quick command which will return a result set with this information. This is shown in the example below:

Getting information about the query cache. Source: Database Journal

If the cache is set to off or has no size, then it is not working at all. Setting the query_cache_type to “ON” and the query_cache_size to a number other than zero should get it started.

Some common issues that can cause performance problems are that the query cache size is too small, the query cache is fragmented, and that the query cache has grown too large. Each of these can be problematic in its own way, so being able to troubleshoot these issues can help you keep your database running smoothly.

Query cache size is too small

Ideally, the query cache will have a hit ratio near 100%. This ratio is determined by taking the number of query cache hits and dividing that number by the sum of query cache hits and com select. These values can be obtained by issuing a couple of commands, as shown below:

Getting information for the query cache hit ratio. Source: Database Journal

If the ratio is off, you can increase the query cache size and rerun the numbers. When the ratio is correct, then the query cache size is large enough for the cached queries.

Query cache is fragmented

If a cached query is larger than the result set of the query, then the difference is empty memory that cannot be used by other cached queries. This creates fragmentation. To fix this, you likely just need to adjust your query_cache_min_res_unit value to fix the issue.

Query cache has grown too large

If you are no longer using small queries or are no longer returning small result sets, then having the query cache enabled can actually become counterproductive. Instead of providing quick results, it can create a slowdown. In such cases, you will want to disable the query cache to restore performance.

Getting a MySQL Database

To get started with your own MySQL database, you can use a service such as Morpheus, which offers databases such as MySQL, MongoDB, and others as a service on the cloud. In addition, all databases are automatically backed up, replicated, and archived, and are deployed on a high performance infrastructure with Solid State Drives. Open a free account today!

Companies large and small are taking a fresh look at their data archives, particularly how to convert them into active archives that deliver business intelligence while simultaneously reducing infrastructure costs. A new approach combines tape-to-NAS, or tNAS, with cloud storage to take advantage of tape's write speed and local availability, and also the cloud's cost savings, efficiency, and reliability.

Archival storage has long been the ugly step-sister of information management. You create data archives because you have to, whether to comply with government regulations or as your backup of last resort. About the only time you would need to access an archive is in response to an emergency.

Data archives were perceived as both a time sink for the people who have to create and manage the old data, and as a hardware expense because you have to pay for all those tape drives (usually) or disk drives (occasionally). Throw in the need to maintain a remote location to store the archive and you've got a major money hole in your IT department's budget.

This way of looking at your company's data archive went out with baggy jeans and flip phones. Today's active archives bear little resemblance to the dusty racks of tapes tucked into even-dustier closets of some backwater remote facility.

The two primary factors driving the adoption of active archiving are the need to extract useful business intelligence from the archives (thus treating the archive as a valuable resource); and the need to reduce storage costs generally and hardware purchases specifically.

Advances in tape-storage technology, such as Linear Tape Open (LTO) generations 6, 7, and beyond, promise to extend tape's lifespan, as IT Jungle's Alex Woodie explains in a September 15, 2014, article. However, companies are increasingly using a mix of tape, disk (solid state and magnetic), and cloud storage to create their active archives.

Tape as a frontend to your cloud-based active archive

Before your company trusts its archive to cloud storage, you have to consider worst-case scenarios: What if you can't access your data? What if uploads and downloads are too slow? What if the storage provider goes out of business or otherwise places your data at risk?

To address these and other possibilities, Simon Watkins of the Active Archive Alliance proposes using tape-to-NAS (tNAS) as a frontend to a cloud-based active archive. In a December 1, 2014, article on the Alliance's blog, Watkins describes a tape-library tNAS that runs NAS gateway software and stores data in the Linear Tape File System (LTFS) format.

The tNAS approach addresses bandwidth congestion by configuring the cloud as a tNAS tier: data is written quicker to tape, and subsequently transferred to the cloud archive when bandwidth is available. Similarly, you always have an up-to-date copy of your data to use should the cloud archive become unavailable for any reason. This also facilitates transferring your archive to another cloud service.

A white paper published by HP in October 2014 presents a tNAS architecture that is able to replicate the archive concurrently to both tape and cloud storage. The simultaneous cloud/tape replication can be configured as a mirror or as tiers.

This tNAS design combines tape and cloud archival storage and supports concurrent replication. Source: HP

To mirror the tape and cloud replication, place both the cloud and tape volumes behind the cache, designating one primary and the other secondary. Data is sent from the cache to both volumes either at a threshold you set or when the cache becomes full.

Tape-cloud tiering takes advantage of tape's fast write speeds and is best when performance is paramount. In this model, tape is always the primary archive, and users are denied access to the cloud archive.

With the Morpheus database-as-a-service (DBaaS) replication of your MySQL, MongoDB, Redis, and ElasticSearch databases is automatic -- and free. Morpheus is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

Morpheus lets you monitor all your databases from a single dashboard. The service's SSD-backed infrastructure ensures high availability and reliability. Visit the Morpheus site to create a free account.

The necessity of having a rock-solid disaster-recovery plan in place has been made abundantly clear by recent high-profile data breaches. Advances in cloud-based DR allow organizations of all sizes to ensure they'll be up and running quickly after whatever disaster may happen their way.

It just got a lot easier to convince senior management at your company that they should allocate some funds for implementation of an iron-clad disaster-recovery program. That may be one of the few silver linings of the data breach that now threatens to bring down Sony Pictures Entertainment.

It has always been a challenge for IT managers to make a business case for disaster-recovery spending. Computing UK's Mark Worts explains in a December 1, 2014, article that because DR is all about mitigating risks, senior executives strive to reduce upfront costs and long-term contracts. Cloud-based DR addresses both of these concerns by being inexpensive to implement, and by allowing companies to pay for only the resources they require right here, right now.

Small and midsized businesses, and departments within enterprises are in the best position to benefit from cloud-based DR, according to Worts. Because of their complex, distributed infrastructures, it can be challenging for enterprises to realize reasonable recovery time objectives (RTO) and recovery point objectives (RPO) relying primarily on cloud DR services.

Components of a cloud-based DR configuration

Researchers from the University of Massachusetts and AT&T Labs developed a model for a low-cost cloud-based DR service (PDF) that has the potential to enhance business continuity over existing methods. The model depends on warm standby replicas (standby servers are available but take minutes to get running) rather than hot standby (synchronous replication for immediate availability) or cold standby (standby servers are not available right away, so recovery may take hours or days).

The first challenge is for the system to know when a failure has occurred; transient failures or network segmentation can trigger false alarms, for example. Cloud services can help detect system failures by monitoring across distributed networks. The system must also know when to fall back once the primary system has been restored.

A low-cost cloud-based disaster recovery system configured with three web servers and one database at the primary site. Source: University of Massachusetts

The researchers demonstrate that their RUBiS system offers significant cost savings over use of a colocation facility. For example, only one "small" virtual machine is required to run the DR server in the cloud's replication mode, while colocation DR entails provisioning four "large" servers to run the application during failover.

The cloud-based RUBiS DR solution is much less expensive to operate than a colocation approach for a typical database server implementation. Source: University of Massachusetts

A key cloud-DR advantage: Run your apps remotely

The traditional approaches to disaster recovery usually entail tape storage in some musty, offsite facility. Few organizations can afford the luxury of dual data centers, which duplicate all data and IT operations automatically and offer immediate failover. The modern approach to DR takes advantage of cloud services' ability to replicate instances of virtual machines, as TechTarget's Andrew Reichman describes in a November 2014 article.

By combining compute resources with the stored data, cloud DR services let you run your critical applications in the cloud while your primary facilities are restored. SIOS Technology's Jerry Melnick points out in a December 10, 2014, EnterpriseTech post that business-critical applications such as SQL Server, Oracle, and SAP do not tolerate downtime, data loss, or performance slowdowns.

It's possible to transfer the application failover of locally managed server clusters to their cloud counterparts by using SANless clustering software to synchronize storage in cloud cluster nodes. In such instances, efficient synchronous or asynchronous replication creates virtualized storage with the characteristics of SAN failover software.

Failover protection is a paramount feature of the Morpheus database-as-a-service (DBaaS). Morpheus includes a free full replica set with every database instance you create. The service supports MySQL, MongoDB, Redis, and ElasticSearch databases; it is the first and only DBaaS that works with SQL, NoSQL, and in-memory databases.

With Morpheus's single-click provisioning, you can monitor all your databases via a single dashboard. Automatic daily backups are provided for MySQL and Redis databases, and your data is safely stored on the service's SSD-backed infrastructure. Visit the Morpheus site to create a free account.

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site for pricing information and to create a free account.

Even if you can't prevent all unauthorized access to your organization's networks, you can mitigate the damage -- and prevent most of it -- by using two time-proven, straightforward security techniques: encrypt all data storage and transmissions; and back up your data to the cloud or other off-premises site. Best of all, both security measures can be implemented without relying on all-too-human humans.

People are the weak link in any data-security plan. It turns out we're more fallible than the machines we use. Science fiction scenarios aside, the key to protecting data from attacks such as the one that threatens to put Sony out of business is to rely on machines, not people.

The safest things a company can do are to implement end-to-end encryption, and back up all data wherever it's stored. All connections between you and the outside world need to be encrypted, and all company data stored anywhere -- including on employees' mobile devices -- must be encrypted and backed up automatically.

A combination of encryption and sound backup as cornerstones of a solid business-continuity plan would have saved Sony's bacon. In a December 17, 2014, post on the Vox, Timothy B. Lee writes that large companies generally under-invest in security until disaster strikes. But Sony has been victimized before. In 2011, hackers stole the personal information of millions of members of the Sony PlayStation network.

User authentication: The security hole that defies plugging

Most hackers get into their victim's networks via stolen user IDs and passwords. The 2014 Verizon Data Breach Investigations Report identifies the nine attack patterns that accounted for 93 percent of all network break-ins over the past decade. DarkReading's Kelly Jackson Higgins presents the report's findings in an April 22, 2014, article.

The 2014 Data Breach Investigations Report identifies nine predominant patterns in security breaches over the past decade. Source: Verizon

In two out of three breaches, the crook gained access by entering a user ID and password. The report recorded 1,367 data breaches in 2013, compared to only 621 in 2012. In 422 of the attacks in 2013, stolen credentials were used; 327 were due to data-stealing malware; 245 were from phishing attacks; 223 from RAM scraping; and 165 from backdoor malware.

There's just no way to keep user IDs and passwords out of the hands of data thieves. You have to assume that eventually, crooks will make it through your network defenses. In this case, the only way to protect your data is by encrypting it so that even if it's stolen, it can't be read without the decryption key.

If encryption is such a data-security magic bullet, why haven't organizations been using it for years already? In a June 10, 2014, article on ESET's We Live Security site, Stephen Cobb warns about the high cost of not encrypting your business's data. Concentra had just reached a $1,725,220 settlement with the U.S. government following a HIPAA violation that involved the loss of unencrypted health information.

A 2013 Ponemon Institute survey pegged the average cost of a data breach in the U.S. at $5.4 million. Source: Ponemon Institute/Symantec

Encryption's #1 benefit: Minimizing the human factor

Still, as many as half of all major corporations don't use encryption, according to a survey conducted in 2012 by security firm Kaspersky Labs. The company lists the five greatest benefits of data encryption:

Complete data protection, even in the event of theft
Data is secured on all devices and distributed nodes
Data transmissions are protected
Data integrity is guaranteed
Regulatory compliance is assured

Backups: Where data security starts and ends

Vox's Timothy B. Lee points out in his step-by-step account of the Sony data breach that the company's networks were "down for days" following the November 24, 2014, attack. (In fact, the original network breach likely occurred months earlier, as Wired's Kim Zetter reports in a December 15, 2014, post.)

Any business-continuity plan worth its salt prepares the company to resume network operations within hours or even minutes after a disaster, not days. A key component of your disaster-recovery plan is your recovery time objective. While operating dual data centers is an expensive option, it's also the safest. More practical for most businesses are cloud-based services such as the Morpheus database-as-a-service (DBaaS).

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. When you choose Morpheus to host your MySQL, MongoDB, Redis, and ElasticSearch databases, you get a free full replica with each database instance. Morpheus also provides automatic daily backups of your MySQL and Redis databases.

The Morpheus dashboard lets you provision, deploy, and host your databases and monitor performance using a range of database tools. Visit the Morpheus site to create a free account.

Only by using cloud services will companies be able to offer their employees and managers access to big data, as well as the tools they'll need to analyze the information without being data scientists. A primary advantage of moving data analytics to the cloud is its potential to unleash the creativity of the data users, although a level of data governance is still required.

Data analytics are moving to the edge of the network, starting at the point of collection. That's one result of our applications getting smarter. According to the IDC FutureScape for Big Data and Analytics 2015 Predictions, apps that incorporate machine learning and other advanced or predictive analytics will grow 65 percent faster in 2015 than software without such abilities.

There's only one way to give millions of people affordable access to the volumes of data now being collected in real time, not to mention the easy-to-use tools they'll need to make productive use of the data. And that's via the cloud.

IDC also predicts a shortage of skilled data analysts: by 2018 there will be 181,000 positions requiring deep-analytics skills, and five times that number requiring similar data-interpretation abilities. Another of IDC's trends for 2015 is the booming market for visual data discovery tools, which are projected to grow at 2.5 times the rate of other business-intelligence sectors.

As software gets smarter, more data conditioning and analysis is done automatically, which facilitates analysis by end users. Source: Software Development Times

When you combine smarter software, a shortage of experts, and an increase in easy-to-use analysis tools, you get end users doing their own analyses, with the assistance of intelligent software. If all the pieces click into place, your organization can benefit by tapping into the creativity of its employees and managers.

The yogurt-shop model for data analytics

In a November 19, 2014, article, Forbes' Bill Franks compares DIY data analytics to self-serve yogurt shops. In both cases the value proposition is transferred to the customer: analyzing the data becomes an engaging, rewarding experience, similar to choosing the type and amount of toppings for your cup of frozen yogurt.

More importantly, you can shift to the self-serve model without any big expenses in infrastructure, training, or other costs. You might even find your costs reduced, just as self-serve yogurt shops save on labor and other costs, particularly by tapping into the efficiency and scalability of the cloud.

Employees are more satisfied with their data-analytics roles when their companies used cloud-based big data analytics. Source: Aberdeen Group (via Ricoh)

Last but not least, when you give people direct access to data and offer them tools that let them mash up the data as their creativity dictates, you'll generate valuable combinations you may never have come up with yourself.

Determining the correct level of oversight for DIY data analysts

Considering the value of the company's data, it's understandable that IT managers would hesitate to turn employees loose on the data without some supervision. As Timo Elliott explains in a post from April 2014 on the Business Analytics blog, data governance remains the responsibility of the IT department.

Elliott defines data governance as "stopping people from doing stupid things with data." The concept encompasses security, data currency, and reliability, but it also entails ensuring that information in the organization gets into the hands of the people who need it, when they need it.

You'll see aspects of DIY data analytics in the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. You use a single console to provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch. Every database instance is deployed with a free full replica set, and your MySQL and Redis databases are backed up.

Morpheus supports a range of tools for configuring and managing your databases, which are monitored continuously by the service's staff and advanced bots. Visit the Morpheus site for pricing information and to create a free account.

Basing data-access rights on the attributes of users, data, resources, and environments helps keep your data safe from thieves by preventing nearly all brute-force access attempts. However, applying attribute-based access controls to existing database systems requires careful consideration of potential performance bottlenecks you may be creating.

Attribute-based access controls (ABAC) for databases tend to be as simple or complex as the organization using them. For a 40-person office, it isn't particularly difficult to establish the roles, policies, rules, and relationships you'll use to determine users' data access rights.

For example, to ensure that only managers in the finance department have access to the company's financial data, create Role=Manager and Department=Finance. Then require these attributes in the permissions of any user who requests financial data.

As you can imagine, access controls are rarely that simple. When creating enterprise-wide ABACs, an attribute-management platform and machine-enforced policies may be required. IT Business Edge's Guide to Attribute Based Access Control (ABAC) Definition and Considerations (registration required) outlines the major components of such an attribute-management system:

Enterprise policy development and distribution
Enterprise identity and subject attributes
Subject attribute sharing
Enterprise object attributes
Authentication
Access control mechanism deployment and distribution

The nuts and bolts of policy-based access controls

ABAC policies are established using the eXtensible Access Control Markup Language (XACML). As explained by ABAC vendor Axiomatics, attributes are assigned to subjects, actions, resources, and environments. By evaluating the attributes in conjunction with the rules of your policies, access to the data or resource is allowed or denied.

ABAC applies rules to access requests based on the attributes and policies you establish for subjects, environments, resources, and actions. Source: Axiomatics

Applying an ABAC system to existing RDBMSs can be problematic, as exemplified in a post from March 2014 on the Stack Exchange Information Security forum. In particular, what effect will implementing fine-grain access have on database performance? And can existing Policy Enforcement Points (PEPs) implement XACML?

Axiomatics' David Brossard replies that performance is most affected by the PEP-to-PDP (Policy Decision Point) communication link, and the PDP-to-PIP (Policy Information Point) link. In particular, how you expose the authorization service PDP helps determine performance: if exposed as a SOAP service, you invite SOAP latency, and if exposed via Apache Thrift or another binary protocol, you'll likely realize better performance.

Brossard identifies six areas where ABAC performance can be enhanced:

1. How policies are loaded into memory
2. How policies are evaluated
3. How attribute values are fetched from PIPs
4. How authorization decisions are cached by PEPs
5. How authorization requests are bundled to reduce roundtrips between the PEP and PDP
6. How the PEP and PDP communicate

Six potential performance bottlenecks in a typical ABAC implementation. Source: Stack Exchange Information Security

Database performance monitoring needn't be so complicated, however. With the Morpheus database-as-a-service, you can use a single dashboard to monitor your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. You can invoke a new instance of any SQL, NoSQL, or in-memory database quickly and simply via the Morpheus dashboard.

Your databases are deployed on an SSD-backed infrastructure for fast performance, and direct patches to Amazon EC2 ensure the lowest latency available. Visit the Morpheus site to create a free account.

When installing MySQL, it is a good idea to set some key parameters upon setup to ensure that your database will run smoothly and efficiently. Setting these ahead of time can help you not end up in a situation where you have to update settings after your database has grown substantially and the application is already in production.

What are Parameters?

Parameters are values that are stored in the MySQL configuration file. This file is called my.cnf, and the location will vary from system to system. You will need to check the installation on your system to determine the location of the file.

One possible method of finding the location of the MySQL configuration file. Source: Stack Overflow.

Keep in mind; however, that many parameters can be set temporarily by running the SET GLOBAL or SET SESSION MySQL queries. It is a good idea to do this first (provided the parameter is part of the group of dynamic system variables) to ensure the changes are helpful before changing them in the more static configuration file.

With that in mind, here are some key parameters you can set in your MySQL configuration.

query_cache_size

Due to issues with the query cache actually making things slower in many cases, it is often recommended to disable it by setting this parameter to have a value of 0 (zero).

max_connections

With a default setup, you may often end up getting the Too many connections error, due to this parameter being set too low. However, setting it too high can also become problematic, so you will want to test at different settings to see what works best for your setup and applications.

One possible method of finding the current number of max connections. Source: Stack Overflow.

innodb_buffer_pool_size

If you are using InnoDB, then this is a very important parameter to set, as this buffer pool is where the system will cache indexes and data. A higher setting allows the memory to be used for reading rather than the disks, which will improve performance. The setting will depend on your available RAM; for example, if you have 128GB of RAM, a typical setting would be between 100 and 120GB.

innodb_log_file_size

This log file size determines how large the redo logs will be. Often, this will be set between 512 MB (for common usage) and 4GB (for applications that will do a large number of write operations).

log_bin

If you do not want the MySQL server to be a replication master, then the standard recommendation is to keep this disabled by commenting out all lines that begin with log_bin or expire_log_days in the MySQL configuration file.

Get Your Own MySQL Database

If you do not already have a MySQL database up and running, one way to easily obtain one is to use a service such as Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including MySQL, MongoDB, and others).

In addition, all databases are deployed on a high performance infrastructure with Solid State Drives, and backed up, replicated, and archived. Open a free account today!

Choosing a tool for information search or storage can be a difficult task. Some tools are better at creating relations among data, some excel at quickly accessing large amounts of data, and others make it easier when attempting to search through a vast array of information. Where does ElasticSearch fit into this, and when is it the right tool for your job?

What is ElasticSearch?

Elasticsearch is an open source search and analytics engine (based on Lucene) designed to operate in real time. It was designed to be used in distributed environments by providing flexibility and scalability.

Instead of the typical full-text search setup, ElasticSearch offers ways to extend searching capabilities through the use of APIs and query DSLs. There are clients available so that it can be used with numerous programming languages, such as Ruby, PHP, JavaScript and others.

What are some advantages of ElasticSearch?

ElasticSearch has some notable features that can be helpful to an application:

Distributed approach - Indices can be divided into shards, with each shard able to have any number of replicas. Routing and rebalancing operations are done automatically when new documents are added.

Based on Lucene - Lucene is an open source library for information retrieval that may already be familiar to developers. ElasticSearch makes numerous features of the Lucene library available through its API and JSON.

An example of an index API call. Source: ElasticSearch.

Use of faceting - A faceted search is more robust than a typical text search, allowing users to apply a number of filters on the information and even have a classification system based on the data. This allows better organization of the search results and allows users to better determine what information they need to examine.

Structured search queries - While searches can still be done using a text string, more robust searches can be structured using JSON objects.

An example structured query using JSON. Source: Slant.

When is ElasticSearch the right tool?

If you are seeking a database for saving and retrieving data outside of searching, you may find a NoSQL or relational database a better fit, since they are designed for those types of queries. While ElasticSearch can serve as a NoSQL solution, it lacks , so you will need to be able to handle that limitation.

On the other hand, if you want a solution that is effective at quickly and dynamically searching through large amounts of data, then ElasticSearch is a good solution. If your application will be search-intensive, such as with GitHub, where it is used to search through 2 billion documents from all of its code repositories, then ElasticSearch is an ideal tool for the job.

Get ElasticSearch or a Database

If you want to try out ElasticSearch, one way to do so is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily set up one or more databases (including ElasticSearch, MongoDB, MySQL, and more). In addition, databases are deployed on a high performance infrastructure with Solid State Drives, replicated, and archived.

A function as straightforward as entering dates in a MySQL database should be nearly automatic, but the process is anything but foolproof. MySQL's handling of invalid date entries can leave developers scratching their heads. In particular, the globalization of IT means you're never sure where the server hosting your database will be located -- or relocated. Plan ahead to ensure your database's date entries are as accurate as possible.

DBAs know that if they want their databases to function properly, they have to follow the rules. The first problem is, some "rules" are more like guidelines, allowing a great deal of flexibility in their application. The second problem is, it's not always easy to determine which rules are rigid, and which are more malleable.

An example of a rule with some built-in wiggle room is MySQL's date handling. Database Journal's Rob Gravelle explains in a September 8, 2014, post that MySQL automatically converts numbers and strings into a correct Date whenever you add or update data in a DATE, DATETIME, or TIMESTAMP column. The string has to be in the "yyyy-mm-dd" format, but you can use any punctuation to separate the three date elements, such as "yyyy&mm&dd", or you can skip the separators altogether, as in "yyyymmdd".

So what happens when a Date record has an invalid entry, or no entry at all? MySQL inserts its special zero date of "0000-00-00" and warns you that it has encountered an invalid date, as shown below.

Only the first of the four Date records is valid, so MySQL warns that there is an invalid date after entering the zero date of "0000-00-00". Source: Database Journal

To prevent the zero date from being entered, you can use NO_ZERO_DATE in strict mode, which generates an error whenever an invalid date is entered; or NO_ZERO_IN_DATE mode, which allows no month or day entry when a valid year is entered. Note that both of these modes have been deprecated in MySQL 5.7.4 and rolled into strict SQL mode.

Other options are to enable ALLOW_INVALID_DATES mode, which permits an application to store the year, month, and date in three separate fields, for example, or to enable TRADITIONAL SQL Mode, which acts more like stricter database servers by combining STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, and NO_AUTO_CREATE_USER.

Avoid using DATETIME at all? Not quite

Developer Eli Billauer posits on his personal blog that it is always a mistake to use the MySQL (and SQL) DATETIME column type. He qualifies his initial blanket pronouncement to acknowledge that commenters to the post give examples of instances where use of DATETIME is the best approach.

Billauer points out that many developers use DATETIME to store the time of events, as in this example:

Using the DATETIME and NOW() functions creates problems because you can't be sure of the local server's time, or the user's timezone. Source: Eli Billauer

Because DATETIME relies on the time of the local server, you can't be sure where the web server hosting the app is going to be located. One way around this uncertainty is to apply a SQL function that converts timezones, but this doesn't address such issues as daylight savings time and databases relocated to new servers. (Note that the UTC_TIMESTAMP() function provides the UTC time.)

There are several ways to get around these limitations, one of which is to use "UNIX time," as in "UNIX_TIMESTAMP(thedate)." This is also referred to as "seconds since the Epoch." Alternatively, you can store the integer itself in the database; Billauer explains how to obtain Epoch time in Perl, PHP, Python, C, and Javascript.

Troubleshooting and monitoring the performance of your MySQL, MongoDB, Redis, and ElasticSearch databases is a piece of cake when you use the Morpheus database-as-a-service (DBaaS). Morpheus provides a single, easy-to-use dashboard. In addition to a free full replica set of each database instance, you get backups of your MySQL and Redis databases.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's SSD-backed infrastructure ensures peak performance, and direct links to EC2 guarantee ultra-low latency. Visit the Morpheus site to create a free account.

The general consensus of the experts is that Node.js will continue to play an important role in web app development despite the impending release of the io.js forked version. Still, some developers have decided to switch to the Go programming language and other alternatives, which they consider better suited to large, distributed web apps.

The developer community appears to be tiring of the constant churn in platforms and toolkits. Jimmy Breck-McKye points out in a December 1, 2014, post on his Lazy Programmer blog that it has been only two years since the arrival of Node.js, the JavaScript framework for developing server-side apps quickly and simply.

Soon Node.js was followed by Backbone.js/Grunt, Require.js/Handlebars, and most recently, Angular, Gulp, and Browserify. How is a programmer expected to invest in any single set of development tools when the tools are likely to be eclipsed before the developer can finish learning them?

Node.js still has plenty of supporters, despite the recent forking of the product with the release of io.js by a group of former Node contributors. In a December 29, 2014, post on the LinkedIn Pulse blog, Kurt Cagle identifies Node as one of the Ten Trends in Data Science for 2015. Cagle nearly gushes over the framework, calling it "the nucleus of a new stack that is likely going to relegate Ruby and Python to has-been languages." Node could even supplant PHP someday, according to Cagle.

The internal thread architecture of Node.js handles incoming requests to the http server similar to SQL requests. Source: Stack Overflow

Taking the opposite view is Shiju Varghese, who writes in an August 31, 2014, post on his Medium blog that after years of developing with Node, he has switched to using Go for Web development and as a " technology ecosystem for building distributed apps." Among Node's shortcomings, according to Varghese, are its error handling, debugging, and usability.

More importantly, Varghese claims Node is a nightmare to maintain for large, distributed apps. For anyone building RESTful apps on Node.js, he recommends the Hapi.js framework created by WalMart. Varghese predicts that the era of using dynamic languages for "performance-critical" web apps is drawing to a close.

The Node.js fork may -- or may not -- be temporary

When io.js was released in late November 2014, developers feared they would be forced to choose between the original version of the open-source framework supported by Joyent, and the new version created by former Node contributors. As ReadWrite's Lauren Orsini describes in a December 10, 2014, article, the folks behind io.js were unhappy with Joyent's management of the framework.

Io.js is intended to have "an open governance model," according to the framework's readme file. It is described as an "evented IO for V8 JavaScript." Node.js and io.js are both server-side frameworks that allow web apps to handle user requests in real time, and the io.js development team reportedly intends to maintain compatibility with the "Node ecosystem."

At present, most corporate developers are taking a wait-and-see approach to the Node rift, according to InfoWorld's Paul Krill. In a December 8, 2014, article, Krill writes that many attendees at Intuit's Node Day conference see the fork as a means of pressuring Joyent to "open up a little bit," as one conference-goer put it. Many expect the two sides to reconcile before long -- and before parallel, incompatible toolsets are released.

Still, the io.js fork is expected to be released in January 2015, according to InfoQ's James Chester in a December 9, 2014, post. Isaac Z. Schluetler, one of the Node contributors backing io.js, insists in an FAQ that the framework is not intended to compete with Node, but rather to improve it.

Regardless of the outcome of the current schism, the outlook for Node developers looks rosy. Indeed.com's recent survey of programmer job postings indicates that the number of openings for Node developers is on the rise, although it still trails jobs for Ruby and Python programmers.

Openings for developers who can work with Node.js are on the rise, according to Indeed.com. Source: FreeCodeCamp

Regardless of your preferred development framework, you can rest assured that your MySQL, MongoDB, Redis, and ElasticSearch databases are accessible when you need them on the Morpheus database-as-a-service (DBaaS). Morpheus supports a range of tools for connecting to, configuring, and managing your databases.

You can provision, deploy, and host all your databases on Morpheus with just a few clicks using the service's dashboard. Visit the Morpheus site for to create a free account!

Making Software Development Simpler: Look for Repeatable Results, Reusable APIs and DBaaS

How to Use the $type Query Operator and Array in MongoDB

The Three Most Important Considerations in Selecting a MongoDB Shard Key

"Too Many Connections": How to Increase the MySQL Connection Count To Avoid This Problem

Your Options for Optimizing the Performance of MySQL Databases

Avoid the Most Common Database Performance-monitoring Mistakes

Big Data for Business Intelligence: Game Changer or New Name for the Same Old Techniques?

How to Store Large Lists in MongoDB

MySQL's Index Hints Can Improve Query Performance, But Only If You Avoid the 'Gotchas'

MySQL Query Cache and Common Trouble Shooting Issues

A New Twist to Active Archiving Adds Cloud Storage to the Mix

Cloud-based Disaster Recovery: Data Security Without Breaking the Bank

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

Sony's Two Big Mistakes: No Encryption, and No Backup

Cloud Computing + Data Analytics = Instant Business Intelligence

Devise an Attribute-based Access Control Plan that Won't Affect Database Performance

The Most Important Server Parameters for MySQL Databases

When Is Elasticsearch is the right tool for your Job

Find the Best Approach for Entering Dates in MySQL Databases

Has Node.js Adoption Peaked? If So, What's Next for Server-Side App Development?