Morpheus Blog

TL;DR: Even jaded IT veterans are sitting up and taking notice of the potential benefits of Docker's microservice model of app development, deployment, and maintenance. By containerizing the entire runtime environment, Docker ensures apps will function smoothly on any platform. By separating app components at such a granular level, Docker lets you apply patches and updates seamlessly without having to shut down the entire app.

The tech industry is noted for its incredibly short "next-big-thing" cycles. After all, "hype" and "tech" go together like mashed potatoes and gravy. Every now and then, one of these over-heated breakthroughs actually lives up to all the blather.

Docker is a Linux-based development environment designed to make it easy to create distributed applications. The Docker Engine is the packaging tool that containerizes all resources comprising the app's runtime environment. The Docker Hub is a cloud service for sharing application "artifacts" via public and private repositories, and automating the build pipeline.

Because Docker lets developers ship their code in a self-contained runtime environment, their apps run on any platform without the portability glitches that often drive sysadmins crazy when a program hiccups on a platform other than the one on which it was created.

How Docker out-virtualizes VMs

It's natural to compare Docker's microservice-based containers to virtual machines. As Lucas Carlson explains in a September 30, 2014, article on VentureBeat, you can fit from 10 to 100 times as many containers on a server than VMs. More importantly, there's no need for a hypervisor intermediation layer that's required to manage the VMs on the physical hardware, as Docker VP of Services James Turnbull describes in a July 9, 2014, interview with Jodi Biddle on OpenSource.com.

Virtual Machines and Docker

Docker containers are faster and more efficient than virtual machines in part because they require no guest OS or separate hypervisor management layer. Source: Docker

Because Docker offers virtualization at the operating system level, the containers run user space atop the OS kernel, according to Turnbull, which makes them incredibly lightweight and fast. Carlson's September 14, 2014, article on JavaWorld compares Docker development to a big Lego set of other people's containers that you can combine without worrying about incompatibilities.

You get many of the same plug-and-play capabilities when you choose to host your apps with the Morpheus cloud database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch on a single dashboard. The service's SSD-based infrastructure and automatic daily backups ensure the reliability and accessibility your data requires.

Morpheus deploys all your database instances with a free full replica set, and the service's single-click DB provisioning allows you to bring up a new instance of any SQL, NoSQL, or in-memory database in seconds. Visit the Morpheus site for pricing information or to sign up for a free trial account.

Services such as Morpheus deliver the promise of burgeoning technologies such as Docker while allowing you to preserve your investment in existing database technologies. In a time of industry transition, it's great to know you can get the best of both worlds, minus the departmental upheaval.

TL;DR: Among the Big Name software companies, Microsoft appears to be making the smoothest transition to a cloud-centric data universe. The early reviews of the company's Azure DocumentDB database-as-a-service indicate that Microsoft is going all in -- at least in part at the expense of the company's database mainstays. In fact, DBaaS may serve as the cornerstone of Microsoft's reconstitution into a developer-services provider.

Microsoft is getting a lot of press lately -- most of it not so good. (Exhibit A: CEO Satya Nadella stated at a recent women-in-tech conference that women should count on hard work and "good karma" to earn them a raise rather than to ask for one directly. FastCompany's Lydia Dishman reports in an October 14, 2014, article that Nadella's gaffe will ultimately be a great help to women who work for Microsoft and other tech firms.)

In one area at least Microsoft is getting solid reviews: the burgeoning database-as-a-service industry. The company's Azure DocumentDB earned a passing grade from early adopter Xomni, which provides cloud services to retailers. In a September 29, 2014, article, InformationWeek's Doug Henschen describes what Xomni liked and didn't like about DocumentDB.

; img style=

Microsoft's Azure DocumentDB uses a resource model in which resources under a database account are addressable via a logical and stable URI. Source: Microsoft

That's not to say DocumentDB doesn't have some very rough edges. As Xomni CTO Daron Yondem points out, there's no built-in search function or connection to Microsoft's new Azure Search. Another DocumentDB area in need of improvement is its software development kit, according to Yondem. While you can't expect much in the way of development tools in a preview release, Xomni relied on third-party tools to add a search function to DocumentDB.

On the plus side, Yondem points to DocumentDB's tuning feature for balancing transactional consistency and performance, as well as its support for SQL queries.

Microsoft embraces open source, icicles spotted in hell

Another sign that the Great Microsoft Makeover may be more than hype is the company's 180 on open source. Not only are Microsoft's new cloud services based on open source, the company is making it easier to change the open-source code repository via pull requests.

Readwrite's Matt Asay explains in an October 9, 2014, article that Microsoft is slowly winning over developers, who account for an increasing percentage of the technology buying in organizations. CIOs have long been convinced of the ability of Microsoft products to boost worker productivity, and now developers warming to the company. Asay asserts that Microsoft will succeed because of its ability to keep it simple and keep it safe.

That's precisely the secret to the success of the Morpheus database-as-a-service. Morpheus lets you provision a new instance of any SQL, NoSQL, or in-memory database with a single click. Your databases are automatically backed up each day and provisioned with a free live replica for failover and fault tolerance.

Your MongoDB, MySQL, Redis, and ElasticSearch databases are protected via VPN connections and monitored from a single dashboard. You can use your choice of developer tools to connect, configure, and manage your databases. Visit the Morpheus site for pricing information or to create a free account.

TL;DR: As software becomes the force driving industries of all types and sizes, the nature of app development and management is changing fundamentally. Gone are the days of centralized control via complex, interdependent, hierarchical architectures. Welcome to the Internet model of software: small pieces, loosely joined via the microservice architecture. At the forefront of the new software model are business managers, who base software-design decisions on existing and future business processes.

Anyone who works in technology knows change is constant. But change is also hard -- especially the kind of transformational change presently occurring in the software business with the arrival of the microservices model of app development, deployment, and maintenance. As usual, not everybody gets it.

Considering how revolutionary the microservices approach to software design is, the misconceptions surrounding the technology are understandable. Diginomica's Phil Wainewright gets to the heart of the problem in a September 30, 2014, article. When Wainewright scanned the agenda for an upcoming conference on the software-defined enterprise, he was flabbergasted to see all the focus on activities within the data center: virtualization, containerization, and software-defined storage and networking.

As Wainewright points out, the last thing you want to do is add a layer of "efficient flexibility underneath a brittle and antiquated business infrastructure." That's the approach that doomed the service-oriented architectures of a decade ago. Instead, the data center must be perceived as merely one component of a configurable and extensible software-defined enterprise. The foundation of tomorrow's networks are simple, easily exchangeable microservices that permeate the organization rather than residing in a single, central repository.

Microservices complete the transition from tightly coupled components through SOA's loose coupling to complete decoupling to facilitate continuous delivery. Source: PricewaterhouseCoopers

To paraphrase a time-worn axiom, if you love your software, let it go. The company's business managers must drive the decisions about technology spending based on what they know of the organization's goals and assets.

Microservices: fine-grained, stateless, self-contained

Like SOA, microservices are designed to be more responsive and adaptable to business processes and needs. What doomed SOA approaches was the complexity they added to systems management by applying a middleware layer to software development and deployment. As ZDNet's Joe McKendrick explains in a September 30, 2014, article, the philosophy underlying microservices is to keep it simple.

The services are generally constructed using Node.js or other Web-oriented languages, or in functional languages such as Scala or the Clojure Lisp library, according to PricewaterhouseCoopers analysts Galen Gruman and Alan Morrison in their comprehensive microservices-architecture overview. Another defining characteristic is that microservices are perfect fit for the APIs and RESTful services that are increasingly the basis for enterprise functions.

Microservice architectures are distinguished from service-oriented architectures in nearly every way. Source: PricewaterhouseCoopers

In the modern business world, "rapid" development simply isn't fast enough. The goal for app developers is continuous delivery of patches, updates, and enhancements. The discrete, self-contained, and loosely coupled nature of microservices allow them to be swapped out or ignored without affecting the performance of the application.

The March 25, 2014, microservices overview written by Martin Fowler and James Lewis provides perhaps the most in-depth examination of the technology. Even more important than the technical aspects of the microservices approach is the organizational changes the technology represents. In particular, development shifts from a project model, where the "team" hands off the end result and disbands, to a product model, where the people who build the app take ownership of it: "You build it, you run it."

The same development-maintenance integration is evident in the Morpheus database as a service, which allows you to provision, deploy, and host MySQL, MongoDB, Redis, and Elasticsearch databases using a single, simple console. The ability to spin up instances for elastic scalability based on the demands of a given momentm, whether growing rapidly or shrinking marginally, means that your instances will be far more productive and efficient. In addition to residing on high-performance solid-state drives, your databases are provisioned with free live replicas for fault tolerance and fail over. Visit the Morpheus site for to create a free account.

Morpheus Technical FAQ

What databases are currently supported?

The following is a list of the currently supported databases:

Redis-2.8.13
Elasticsearch-1.2.2
MongoDB-2.6.3
MySQL-5.6.19

What security measures does Morpheus take to protect instances?

Each instance has its own firewall.

How does scaling actually work?

To add more storage, set your notification level by clicking your instance’s Settings and choosing a notification storage limit.

When this threshold is reached, we send an email letting you know the storage capacity that remains. When you receive the email, you can reduce your storage needs or go to the Dashboard’s Instance page and add an instance.

Future plans will allow in-place upgrades to larger plans.

How often are Morpheus instances backed-up?

At this time, only Redis and MySQL are backed up, one per day for four days. Backups are taken just after midnight, Pacific Standard Time.

How do I access these backup?

From the dashboard select your Redis or MySQL instance. Click Backups.

Click the download iconto download this version to your hard-drive or click the restore iconto make this backup, the current cloud version.

Can I change the frequency of the backups?

No. At this time there are no configurations for backups. However, it’s easy to connect your Morpheus DB with a comprehensive cloud backup service such as BitCan available at http://gobitcan.com/

Can I check my logs?

Yes. From the Morpheus dashboard, select the database type then from the Instances page, select the instance. Click on Logs to display log messages.

This is useful for those wanting a quick overview of your instance’s connections and activities. If you want more finely-tuned log experience, integrate the Morpheus-created instance with a log manager such as Oohlalog available at http://www.oohlalog.com/

Can I check the status of my instances?

Yes. Use the Check Status button on your Instance page.

Servers are listed by IP address. If a server is not running, you can send a restart signal from here.

Can I access the live replicas created?

Yes and no, it depends on the type of instance.

In the case of MongoDB, you cannot directly connect to the three replicated data nodes. If any one of them fail, the others continue to operate in its place.

MySQL has two replicas. You can connect to either running instance. To connect, view both IP addresses on the check status screen. Failover needs to be handled from the driver in the application. Check the MySQL documentation at http://dev.mysql.com/doc/ for information about how to do this.

For Elasticsearch, you can connect to either of two clustered nodes, viewable from the check status screen. Failover is automatic if you are using the standard Elasticsearch node transport to connect from an application. Elasticsearch connects in two ways, HTTP or by using a node client.

HTTP—everything, such as searches, are done using standard URLs. The downside is an URL can only connect to one address at a time as the node transport essentially runs a cluster proxy inside your application. All you need to do is connect to your application, which knows the health of cluster and will load-balance route your search to any available node in the cluster.

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html for information about straight HTTP.

Node Client— Instantiating a node based client is the simplest way to get a client that can execute operations against elasticsearch.See http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html for more information about ES node client.

With Redis, you can connect to either your master or slave. Currently, failover needs to be handled from the application, or you can provide your own Redis sentinel service. In the future, we plan to as Redis Sentinel to our standard Redis offering.

Why use Morpheus when there’s Amazon Web Services?

Simply put, to get the same high-performance database instance that Morpheus spins up in minutes would cost a lot of time, money effort and expertise. Morpheus lets you manage all your databases using a simple dashboard leaving our experts to handle the scaling/descaling, load balancing, disaster recovery, and security knob-twisting. Another benefit is that Morpheus-created instances are easy to integrate with many third-party add-ons like log management and more finely-tuned backups.

[title 1]The Technical Details of the JP Morgan Data Breach [title 2]What Happened in the JP Morgan Data Breach? [title 3]How Did Attackers Obtain Gigabytes of Customer Data from JP Morgan? [image 1] http://www.shutterstock.com/cat.mhtml?searchterm=JP%20Morgan&language=en&lang=en&search_source=&safesearch=1&version=llv1&media_type=&page=1&inline=160975910 [image 2] http://www.shutterstock.com/cat.mhtml?safesearch=1&search_type=keyword_search&extra_html=1&lang=en&language=en&search_source=search_form&version=llv1&anyorall=all&use_local_boost=1&searchterm=malware%20network&show_color_wheel=1&media_type=images&page=1&sort_method=popular&inline=150161624 [image 3] http://www.shutterstock.com/cat.mhtml?safesearch=1&search_type=keyword_search&extra_html=1&lang=en&language=en&search_source=search_form&version=llv1&anyorall=all&use_local_boost=1&searchterm=malware%20detected&show_color_wheel=1&media_type=images&page=1&sort_method=popular&inline=224782792

In yet another data breach, JP Morgan lost gigabytes of customer data, including some account information. Find out the technical details of the attack that allowed it to be successful.

TL;DR: JP Morgan was recently the latest large back to fall victim to a data breach that lost customer data. In spite of the company already having some very sophisticated security measures in place, the attackers were able to get into the database by exploiting a vulnerability they discovered in the JP Morgan web site. From there, writing some custom malware allowed them to obtain gigabytes of customer data over the course of roughly two months.

Security Measures Already in Place

The bank already had a strong security system in place, with very sophisticated attack detection systems. Two months before the breach, JP Morgan announced that they would begin spending approximately $250 million per year on cybersecurity and would have roughly 1,000 people working on this part of their infrastructure.

This would seemingly be a tough structure to bypass for intruders looking to gain access to the bank’s data. Unfortunately for the bank, attackers managed to find a way to do so.

The Beginning of the Breach

In early June when the attackers discovered a flaw in one of the JP Morgan web sites. The intruders used this flaw to begin writing custom programs that could be used to attack the bank’s corporate network. The malware was tailor made for infiltrating the JP Morgan network and digging deep into their systems.

The attackers are thought to have succeeded by finding a number of zero-day vulnerabilities, by which they could gain control of the systems they were after using methods that were unknown prior to the attack. This meant that programmers also had zero time to create any patches that could be used to counter the infiltration.

Example of a zero-day attack. Source: FireEye

The Data Collection

With their custom malware in place, the attackers were able to slowly gather consumer data. Their advanced attack programs were able to avoid detection by the bank’s extremely sophisticated detection alarms specifically designed to determine when stolen data was being pulled from their systems, and to avoid it for more than two months!

To help avoid detection, the malware was designed to route through computers in a number of foreign countries, and then was most often redirected to a site in Russia. During the two month period, the attackers were able to use this redirection to obtain gigabytes of customer data from the bank undetected. When JP Morgan was eventually able to find the breach, they were able to quickly put an end to it using their security measures.

Example of malware detection and reaction. Source: Securosis

Securing Big Data

Trying to secure large amounts of data can be a challenging task, especially if you do not have a large and sophisticated system in place like JP Morgan. One way to help with this is to find a company that offers a database as a service on the cloud.

One such service is Morpheus, which offers numerous security features to help protect important data, including online monitoring and VPN connections to databases. In addition, all databases are backed up, archived, and replicated on an SSD-backed infrastructure automatically.

With Morpheus, you can choose from several databases, including MySQL, MongoDB, and others, plus all databases are easy to install and scale based on your needs. So, visit the Morpheus site for pricing information or to try it out with a free account!

[title 1] In Today's Business World, Coders Are Royalty -- So Treat Them That Way [title 2] To Keep Coders Happy, Keep Them Learning, and Leave Them Alone! [title 3] Keeping Software Developers Contented Requires Thinking Like They Think

Things move so quickly in the coder-sphere that to keep up you have to give developers what they need and then stand back.

TL;DR: Development cycles continue to shrink. Companies have to adapt their processes or risk being left in their competitors' dust. Ensuring the software developers in your organization have what they need to deliver apps that meet your needs requires giving coders the kid-glove treatment: Let them use whichever tool and methods they prefer, and give them plenty of room to operate in.

Managing software developers is easy. All you have to do is think like they think -- and stay out of their way.

That's a big part of the message of Adrian Cockroft, who presented at Monktoberfest 2014 in Portland, Maine. Cockroft formerly ran cloud operations at Netflix, and he is now a fellow at Battery Ventures Technology. Readwrite's Matt Asay reports on Cockroft's presentation at Monktoberfest 2014 in an October 6, 2014, article.

You have to be fast, whatever you do. Cockroft says its most efficient to let developers use the tools they prefer, and to work in the manner that best suits them. He recommends that companies let cloud services do the "heavy lifting" in lieu of buying and maintaining the traditional hardware, software, and app-development infrastructure.

Ultimately the organization adopts an amorphous structure based on constant, iterative development of apps comprised of loosely coupled microservices. Distinctions between development and operations blur because various parts of the apps are under construction at any time -- without preventing the apps from working properly.

A tight, iterative app-development process loses the traditional hierarchy of roles and becomes cyclical. Source: Adrian Cockroft

Coders never stop learning new techniques, technologies

You'd be challenged to find a profession that changes faster than software development, which means coders are always on the lookout for a better way to get their work done. Code School founder and CEO Gregg Pollack claims that companies don't always do enough to encourage developers to pursue new and different interests.

In an October 16, 2014, article on TechCrunch, Pollack describes several ways organizations can encourage continuing education of their coders. One method that has met with success is pair programming, which combines work and learning because the two developers alternate between instructor and student roles, according to Pollack.

Considering the shear volume of software development innovations, it's impossible to be up on all the latest and greatest. In an October 18, 2014, article, TechCrunch's Jon Evans describes a condition he calls "developaralysis." This might cause a coder who is fluent in only eight programming languages to feel insecure. And for each language there are an untold number of frameworks, toolkits, and libraries to master.

One solution to developaralysis is the Morpheus database-as-a-service (DBaaS), which is unique in supporting SQL, NoSQL, and in-memory databases. Morpheus lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch. Access to key statistics across all databases is available via a single console.

Morpheus offers free daily backups and replication of all database instances, and VPN connections protect your data from the public Internet. Visit the Morpheus site for pricing information or to create a free account.

Developers walk the thin line between relying on what they know and searching out new, better approaches -- ones that their competitors might use to run rings around them. As Readwrite's Asay points out, developers are ultimately creators, so managers need to allow coders' creativity to flow -- often simply by making themselves scarce.

[title 1]How eBay Solves the Database Scaling Problem Using MongoDB [title 2]eBay Has a Solution to Database Scaling: MongoDB [title 3]Database Scaling Problems? See eBay's Solution! [image 1]http://www.shutterstock.com/pic-184749206/stock-photo-ostersund-sweden-july-close-up-of-ebay-s-website-on-a-computer-screen-ebay-is-one-of.html?src=Cn4zcQ2UllI4ZPUw-uTeVw-1-6 [image 2] http://www.shutterstock.com/pic-166893080/stock-photo-highway-signpost-scalability.html?src=ycjfxiugK5VJKIK3-QlQ7A-1-47 [image 3]http://www.shutterstock.com/pic-121671511/stock-photo-business-woman-in-suit-drawing-a-big-data-diagram.html?src=ycjfxiugK5VJKIK3-QlQ7A-1-24 [image 4]http://www.shutterstock.com/pic-154012415/stock-photo-april-kleinmachnow-ebay-logo-at-the-german-headquarters-of-the-internet-company-ebay.html?src=J-Tdv-ShI_mrZ2BNznWKbQ-1-5

When you have a vast amount of data, scaling your database can be very difficult. See how eBay solved the potential problems involved in providing users with search suggestions by using MongoDB.

TL;DR: eBay uses MongoDB to perform a number of tasks involving large amounts of data. Such projects include search suggestions, cloud management, storage of metadata, and the categorization of merchandise. The search suggestions are a key feature of their web site, and MongoDB provides them with a way to provide these suggestions to users quickly.

What are search suggestions?

eBay's search suggestions at work. Source: AuctionBytes Blog

When you begin typing in a query into eBay’s search box, a list of suggested completed queries will appear underneath the box. If one of these suggestions matches what you planned to type in, you can immediately select it by using the mouse or your arrow keys rather than having to type out the remainder of you search query.

This is a great feature to have for users, as it not only may complete the intended query, but can also bring up a similar query the user may prefer over the original one. The suggestions feature provides the user with a convenient and helpful way of searching for particular items of interest.

What has to be done

To provide such assistance requires a large amount of possible suggestions to be stored, and these must be returned extremely quickly to the user to be even remotely useful. eBay determined that any query to the database to return suggestions must make the round trip in less than 60-70 milliseconds!

This could be very challenging with a traditional relational database. eBay instead decided to try out a document store, MongoDB, to see if they could achieve the needed performance.

How eBay implemented Mongo

eBay made the search suggestion list is a MongoDB document. This document was then indexed by word prefix, and in addition by certain pieces of metadata, such as product category. The multiple indexes provided them with flexibility in looking up suggestions and also kept the queries speedy.

eBay was able to use a single replica set which made sharding unnecessary. In addition, data was placed in memory, which again provided a speed boost for the queries.

Database sharding visualized. Source: Cubrid Shard

Mongo’s Performance

With all this in place, could the queries to the database still return suggestions to the user in the allotted time (less than 60-70 milliseconds)? As it turned out, MongoDB was able to make the round trip in less than 1.4 milliseconds!

Given this incredible performance, eBay was able to safely rely on MongoDB to provide speedy search suggestions to its users.

Could your business do the same?

If your business needs to query a large amount of data quickly, MongoDB may be a good choice for you. One way to easily get MongoDB working for you quickly is to use a provider that offers the database as a service.

Morpheus provides MongoDB (and several other popular databases) as a service, with easy setup and maintenance. The service is easily scalable, allowing you to add or remove space as your needs change. Additional services include online monitoring, VPN connections to databases, and excellent support.

All databases are backed up, replicated, and archived automatically on an SSD-backed infrastructure, ensuring you do not lose any of your important data. So, try out Morpheus today and get your data into a fast, secure, scalable database!

[title 1] How to Handle Huge Database Tables [title 2] Make Sure Your Database Tables Don't Slow Queries to a Crawl [title 3] Simple Ways to Ensure Your Database's Tables Don't Slow Queries

Design your huge database tables to ensure they can handle queries without slowing the database's performance to a crawl.

TL;DR: Get a jump on query optimization in your databases by designing tables with speed in mind. This entails choosing the best data types for table fields, choosing the correct fields to index, and knowing when and how to split your tables. It also helps to be able to distinguish table partitioning from sharding.

It's a problem as old as databases themselves: large tables slow query performance. Out of this relatively straightforward problem has sprung an industry of indexing, tuning, and optimizing methodologies. The big question is, Which approach is best for your database system?

For MySQL databases in particular, query performance starts with the design of the table itself. Justin Ellingwood explains the basics of query optimization in MySQL and MariaDB in a Digital Ocean article from November 11, 2013, and updated on May 30, 2014.

For example, data elements that will be updated frequently should be in their own table to prevent the query cache from being dumped and rebuilt repeatedly. Generally speaking, the smaller the table, the faster the updates.

Similarly, by limiting data sizes up front you avoid wasted storage space, such as by using the "enum" type rather than "varchar" when a field that takes string values has a limited number of valid entries.

There's more than one way to 'split' a table

Generally speaking, the bigger the database table, the longer it takes to access and modify. Unfortunately, database performance optimization isn't as simple as dividing big tables into several smaller ones. Michael Tocker describes 10 ways to improve the speed of large MySQL tables in an October 24, 2013, post on his Master MySQL blog.

One of the 10 methods is to use partitioning to reduce the size of indexes by creating several "tables" out of one. This minimizes index->lock contention. Tocker also recommends using InnoDB rather than MyISAM even though MyISAM can be faster at inserts to the end of a table. MyISAM's table locking restricts updates and deletes, and its use of a single lock to protect the key buffer when loading or removing data from disk causes contention.

Much confusion surrounds the concept of database table partitioning, particularly how partitioning is distinguished from sharding. When the question was posed on Quora, Mosaic CTO Tony Bako explained that partitioning divides logical data elements into multiple entities to improve performance, availability, and maintainability.

Conversely, sharding is a form of horizontal partitioning that creates replicas of the schema and then divides the data stored in each shard by the shard key. This requires that DBAs distribute load and space evenly across shards based on data-access patterns and space considerations.

Sharding uses horizontal partitioning to store data in physically separate databases; here a user table is sharded by values in the "s_age" field. Source: CUBRID

With the Morpheus database-as-a-service (DBaaS) you can monitor your MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. Morpheus lets you bring up a new instance of any SQL, NoSQL, or in-memory database with a single clock. Automatic daily backups and free live replica sets for each provisioned database ensure that your data is secure.

In addition, database performance is optimized via Morpheus's SSD-backed infrastructure and direct patching into EC2 for ultra-low latency. Visit the Morpheus site for pricing information or to create a free account.

[title 1] Merge Databases with Different Schema but Duplicate Entries [title 2] How to Determine the Best Approach to Removing Duplicates in Joined Tables [title 3] Finding the Optimal Method for Removing Duplicates in Merged Tables

Removing duplicate entries from merged database tables can be anything but routine -- and the source of performance woes.

TL;DR: Combining tables frequently results in duplicate entries that can be removed in several ways. The trick is knowing which way is best for a given situation. Often the only way to determine the best approach is by testing several and comparing their effect on database performance.

It is one of the most common operations in database management: Merge two tables that use different schema while also removing duplicate entries. Yet there are as many approaches to this problem as there are types of database tables. There are also as many potential glitches.

Here's a look at three ways to address the situation in SQL and MySQL.

All the news that's fit to merge

Combining multiple tables with similar values often creates duplicate entries. Several methods are available for eliminating duplicate values in SQL, but it can be tricky to determine which is best in a given situation.

In a StackOverflow post from October 2012, a number of approaches were proposed for removing duplicates from joined tables. The first was to convert an inner query to a common table expression (CTE):

A common table expression for an inner join often has a lower impact on performance than using the DISTINCT keyword to eliminate duplicates. Source: StackOverflow

The second approach was to use the DISTINCT keyword, which one poster claims performs better in some cases. Also suggested were use of the string_agg function and the group by clause.

Getting up close and personal with the UNION clause

One of the basic elements in the SQL toolbox is the UNION operator, which checks for duplicates and returns only distinct rows, and also stores data from both tables without duplicates:

Insert rows from a second table when their values don't match those of the joined table, or create a new table that doesn't affect either of the original tables. Source: StackOverflow

Alternatively, you can use the SELECT INTO command to create a new table from the contents of two separate tables in a way that removes duplicates:

The SELECT INTO command creates a new table from the content of two others and removes duplicates in the original tables. Source: StackOverflow

Combining multiple gigabyte-size tables without a performance hit

It isn't unusual for database tables to become massive. Imagine merging a dozen tables with a total of nearly 10 million separate records and more than 3GB. The first suggestion on StackOverflow was to create a new table with unique constraint on the set of columns that establish a row's uniqueness, then to use INSERT IGNORE INTO ... SELECT FROM to move rows from the old table to the new one, and finally to truncate the old tables and use INSERT INTO ... SELECT FROM to return the rows to the original table.

Another proposed solution was to create a specific view that combines the results of the 12 tables, and then to filter the results by querying on the view you just created.

The Morpheus database-as-a-service (DBaaS) makes analyzing and optimizing databases much more efficient. Morpheus lets you provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. It is the only DBaas to support SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, each database instance is deployed with a full replica set for fail over and fault tolerance. Morpheus's solid state disk infrastructure, direct patches into EC2, and colocation with fast peering points ensure peak database performance.

Visit the Morpheus site for pricing information and to create a free account. Few things in the database universe are as straightforward as Morpheus's DBaaS.

[title 1] New Tools for Distributed Application Development [title 2] Making Sense of the Rush to Container-based App Development [title 3] Container-based Development: Separating the Substance from the Hype

Big-name development-tool vendors are rushing to support the container model for efficient, VM-based app creation.

TL;DR: Google, Microsoft, and other software giants have taken aim at the burgeoning market for container-focused application development by supporting the technology on their platforms. However, the true potential of the technology may be realized only by embracing the revolutionary approach to container development proposed by startups such as Terminal.

Applications are becoming atomized. No longer are they comprised of lines and lines of code painstakingly written, compiled, debugged, tested, deployed, and updated. Now programs are pieced together using microservices like Lego blocks to build unique software creations directly in your browser.

What began about a year ago with the landscaping-changing introduction of the Docker open-source platform for creating, deploying, and running distributed applications has quickly become a mainstream movement involving the biggest names in the software industry.

The most-recent arrival on the container scene is the Kubernetes-based Google Container Engine, which the company announced in a November 4, 2014, blog post. As explained by InfoWorld's Sergar Yegulalp in a November 5, 2014, article, Container Engine groups virtual-machine instances into nodes that themselves are combined into clusters. The clusters run "Dockerized" versions of the apps at scale, performing all load balancing and handling communication between containers.

The introduction of Google Container Engine comes on the heels of last month's announcement by Microsoft that the next version of Windows Server will support Docker containers, as ZDNet's Mary Jo Foley reported in an October 15, 2014, article. Docker CTO Solomon Hykes sees Microsoft's support for the technology as "a strong message to the IT community" that Docker is "a mainstream part of the IT toolbox." Hykes was interviewed by ZDNet's Toby Wolpe for an October 28, 2014, story.

Native support for the Docker client in Microsoft's Windows Server 2015 allows the same standard Docker client and interface to be used on multiple development environments. Source: Microsoft

Container's impact on app dev more revolutionary than evolutionary

It's no surprise that Google, Microsoft, and other established development-tool vendors would attempt to integrate the mini-VM container approach with their existing platforms. However, the long-term impact of the technology may be in how it turns the existing development model on its head by creating a private server for each app from a thin slice of the cloud.

That's how Terminal co-founders Joseph Perna and Varun Ganapathi explain the approach their new company is taking to container-based app development. Ganapathi is quoted by TechCrunch's Kim-Mai Cutler in an October 24, 2014, post as predicting that everyone will have a cloud computer on which they run and interoperate apps securely. Because you can run multiple processes simultaneously from within a container, you can quickly stitch together dozens of virtual machines to create an app that scales instantly, without requiring a reboot.

Scalability is both one of the greatest benefits of container-based development, and one of the technology's greatest challenges. According to NetworkWorld's Pete Bartolik, the problem is that CIOs continue to overpay for capacity. In an October 16, 2014, article, Bartolik cites a recent study by Computerworld UK that found companies are using only half of the cloud server capacity they have provisioned. It seems IT managers have gotten into the habit of over-provisioning their own servers to accommodate surges in demand.

Of course, this cancels out one of the principal benefits of cloud computing: the cost savings realized by auto-scaling and elasticity. Containers are perceived as a means of instituting a true usage-based model. Yet you needn't wait for the container wave to take advantage of the efficiency cloud services deliver. The Morpheus database-as-a-service (DBaaS) ensures that you're paying for only the resources you're using.

Morpheus's straightforward interface lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch from a single dashboard. Your databases are backed up automatically and deployed with a free full replica set for fault tolerance and fail over. Visit the Morpheus site for pricing information and to sign up for a free account.

TL;DR: Just a little bit of pre-lock planning ensures that a SQL or MySQL database you convert to read-only status performs as expected and is accessible by the right group of users. Doing so also helps guarantee the database can be safely unlocked when and if it should ever need to be updated or otherwise altered.

There's something about setting a database to read-only that is comforting for DBAs. It's almost as if the database is all grown up and ready to be kicked out of the house, er, I mean, sent out to make its own way in the world.

Of course, there are as many reasons to set a database to read-only -- temporarily or permanently -- as there are databases. Here's a rundown on the ways to lock the content of a SQL or MySQL database while allowing users to access its contents.

As Atif Shehzad explains on the MSSQLTips site, before you lock the database, you have to optimize it to ensure it's running at peak performance. You can't update the statistics of a read-only database, for example, nor can you create or defragment indexes. Also, you can't add extended properties to the database's objects, edit its permissions, or add/remove users.

Shehzad provides an eight-step pre-lock script to run through prior to converting a database to read-only. The checklist covers everything from creating a transaction log backup to modifying permissions and updating statistics.

An eight-step pre-lock checklist ensures your database is optimized and backed up prior to being switched to read-only. Source: MSSQLTips

Once the database is optimized and backed up, use either the ALTER DATABASE [database name] SET READ_ONLY command or the system stored procedure sp_dboption (the former is recommended because the stored procedure has been removed from recent versions of SQL Server). Alternatively, you can right-click the database in SSMS, choose Properties > Options, and set the Database Read-Only state to True. The database icon and name will change in SSMS to indicate its read-only state.

Converting a MySQL database to read-only -- and back again

A primary reason for setting a MySQL database as read-only is to ensure no updates are lost during a backup. The MySQL Documentation Library provides instructions for backing up master and slave servers in a replication setup via a global read lock and manipulation of the read_only system variable.

The instructions assume a replication setup with a master server (M1), slave server (S1), and clients (C1 connected to M1, and C2 connected to S1). The statements that put the master in a read-only state and that restore it to a normal operational state once the backup is complete are shown below. (Note that in some versions, "ON" becomes "1" and "OFF" becomes "0".)

The first statements switch the database to read-only, and the second revert it to its normal state after completing the backup. Source: MySQL Documentation Library

In its read-only state, the database can be queried but not updated. An August 23, 2013, post on StackOverflow explains how to revoke and then reinstate DML privileges for specific users, which is less likely to affect the performance of the entire database.

The Morpheus database as a service (DBaaS) lets you make these and other changes to your database as simply as pointing and clicking. Morpheus's single dashboard can be used to provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, Morpheus provides a free live replica set for each database instance. Developers can use their choice of tools for connecting, configuring, and managing their databases. Visit the Morpheus site to create a free account. Think of all you can accomplish in the time you'll save when you no longer have to worry about backups!

TL;DR: Thinking about upgrading your MySQL database? When performing an upgrade, there are some factors you need to consider and some best practices that can be followed to help ensure the process goes as smoothly as possible. You will need to consider if an upgrade is necessary, whether it is a minor or major upgrade, and changes to query syntax, results, and performance.

Do You Need to Upgrade?

The need to upgrade is based on the risk versus the reward. Any upgrade carries with it the risk of losing functionality (breaks something) or data (catastrophic loss). With that in mind, you may be running into bugs that are resolved in a later release, performance problems, or growing concerns about the security of the database as the current version continues to age. Any of these factors could cause an upgrade to be necessary, so you will need to follow some best practices to help mitigate as much risk as possible.

An example MySQL setup. Source: Programming Notes

Will the Upgrade be Minor or Major?

A minor upgrade is typically one where there is a small change in the third release number. For example, upgrading version 5.1.22 to 5.1.25 would be considered a minor upgrade. As long as the difference is relatively small, the risk to upgrade will be relatively low.

A major upgrade, on the other hand, involves a change in the second or the first number. For example, upgrading version 5.1.22 to 5.3.1 or 4.1.3 to 5.1.0 would usually be considered a major upgrade. In such cases, the risk becomes higher because more changes to the system have been implemented.

Consider the Changes

Before upgrading, it is best to examine the changes that have been made between the two versions. Changes to query syntax or the results of queries can cause your application to have erroneous data, errors, or even stop working. It is important to know what changes will need to be made in your queries to ensure that your system continues to function after the upgrade takes place.

Also, an upgrade could either cause increased or decreased performance, depending on what has changed and the system on which MySQL is running. If the upgrade could cause a decrease in performance, you will certainly want to consider if this is the right time to update.

Performance on a single thread comparison. Source: PERCONA

Performing the Upgrade

Typically, the best practice when upgrading is to follow this procedure:

Dump your user grant data
Dump your regular data
Restore your regular data in the new version
Restore your user grant data in the new version

Doing this, you significantly reduce your risk of losing data, since you will have backup dump files. In addition, since you are using the MySQL dump and restore, the restore process will use the format of the new MySQL version, which helps mitigate compatibility issues.

Easy Upgrades

If you want to upgrade even more easily, consider using a database as a service in the cloud. Such services make it easy to provision, replicate and archive your database, and make upgrading easier via the use of available tools.

One such service is Morpheus, which offers not only MySQL, but also lets you use MongoDB, ElasticSearch, or Redis. In addition, all databases are deployed on a high performance infrastructure with Solid State Drives and are automatically backed up, replicated, and archived. So, take a look at pricing information or open a free account today to begin taking advantage of this service!

TL;DR: When dealing with a user password, you want to be very careful in how this information is saved. Passwords stored in plain text within your database are a serious security risk both to you and your users, especially if your business is working with any of your users' financial or personal information. To keep from saving passwords in plain text, you can encrypt them using a salt and a hashing algorithm.

Plain Text Password Problems

While storing plain-text passwords can be handy when making prototypes and testing various systems, they can be disastrous when used in a production database. If an attacker somehow gains access to the database and its records, the hacker now can instantly make use of every user account. The reason: the passwords are all right there in plain text for the taking!

Back in 2006, the web site Reddit, a discussion forum, had a backup copy of its database stolen. Unfortunately, all of the passwords were stored in plain-text. The person that had the data could have easily taken over any of the accounts that were stored in the backup database by making use of the user names and passwords available.

This may not seem like a major problem for a discussion forum. If the administrator and moderator passwords were changed quickly, the intruder likely would only be able to post spam or other types of messages the user would not normally write. However, these same users may have used the same login information for other tasks, such as online banking or credit card accounts. This would indeed be a problem for the user once a hacker had access to such an account!

Plain text passwords are not a game, they are a security risk! Source: MacTrast

Salting and Hashing a Password

To avoid having plain-text passwords in your database, you need to store a value that has been altered in a way that will be very difficult to crack. The first step is to add a salt, which is a random string that is added to the password. This value can be either prepended or appended to the password, and should be long in order to provide the best security.

After the password is salted, it should then be hashed. Hashing will take the salted password and turn it into a string of characters that can be placed into the database instead of the plain-text password. There are a number of hashing algorithms, such as SHA256, SHA512, and more.

While implementing a salted password hashing can be more time consuming, it could save your users from having their passwords exposed or stolen. It is definitely a good idea to do this as a safeguard for the people using your services.

An example of password creation and verification with salting and hashing in place. Source: PacketLife

Further Protection

Another way to help protect your users is to make sure the database itself is secure. Keeping the database on site may be difficult for your business, but there are companies that offer databases as a service in the cloud.

One such company is Morpheus, which includes VPN connections to databases and online monitoring to help keep your database secure. In addition, databases are backed up, replicated, and archived automatically on an SSD-backed infrastructure. So, give Morpheus a try and get a secure, reliable database for your business!

TL;DR: In the past, it was standard practice to require that users follow a link in an email to migrate their password to a new system. Now administrators are hesitant to take up any more of their users' workday than necessary. Fortunately, automating the password-hash migration process is relatively easy to accomplish in all modern development environments.

DBAs generally don't like requiring that users reset their passwords. But sometimes a security upgrade or other major system change entails migration of hashed passwords.

When this happens, many admins don't hesitate to require all users to re-register on the new system via a link sent to them via email. That's the approach recommended in a four-year-old post on Webmasters Stack Exchange.

Today DBAs have many options for migrating password hashes without requiring any effort by users apart from logging in. The transparent approach relies on retaining the old password in the new system just long enough for each user to verify it when they sign into the new system for the first time.

A simple example of this password-migration approach is described in a Stack Overflow post from January 2013:

Verify the password with the new hash algorithm;
If the password doesn't match, compare it with the old hash algorithm;
If the password matches the old hash, calculate and store the new hash for the original password.

Variations on the hash-migration theme for Linux, PHP, others

Password migration is an important concern when considering whether to adopt a new platform. For example, supporters of the Discourse open source discussion system have devised a password-migration plug-in that stores the original password hash in a custom field.

The first time the user signs in with what the new system considers an incorrect password, the original hash method is used to calculate and compare the hash with the stored value. When there's a match, the "new" password is set automatically and the original hash is cleared.

Password migration is presented as a three-step process in a September 17, 2014, post on the Ole Aass site. First, create a table called users_migrate that has three columns: id, username, and password. Next, execute a query on the server that copies the id, username, and password data from the original user tables into the new table.

Run a query on the server that copies the user values from the original tables to the new password-migration table. Source: Ole Aass

Of course, it's also possible to overthink the problem. In a post from February 2013 on Stack Exchange's Super User site, someone pointed out that if there aren't that many users, it might be fastest to copy the hashes to the new system manually, one-by-one. Someone else recommended the chpasswd tool, and a third person suggested using lastlog to generate a list of users and then grep:

To migrate password hashes in Linux, generate a list of users with lastlog, and then grep them in /etc/shadow. Source: Super User

An even simpler approach to password management and other database-migration tasks is to take advantage of the simplicity and efficiency of the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard.

With the Morpheus DBaaS, you can invoke a new instance of any SQL, NoSQL, or in-memory database in seconds, and each instance is deployed with a free full replica set, in addition to automatic daily backups. Visit the Morpheus site to create a free account.

TL;DR: A common misconception about the document-based MongoDB NoSQL database is that it requires no schema at all. In fact, the first step in designing in MongoDB is selecting a schema that matches the database users' needs. Choosing the right schema allows you to take full advantage of the system's performance, efficiency, and scalability benefits.

Most of the people designing MongoDB databases come from a relational-database background. Transferring from a world of tables, joins, and normalization to the document-based approach of MongoDB and other NoSQL databases can be liberating and daunting at the same time.

MongoDB is designed for speed: You can embed all sorts of data types and structures in collections of documents that are easy to query. To realize the performance potential of MongoDB, you have to design collections to match the app's most common access patterns.

In an August 2013 post on Percona's MySQL Performance Blog, Stephane Combaudon uses the example of a simple passport database. In MySQL, you would typically create a "people" table with "id" and "name" columns, and a "passport" table with "id", "people_id", "country", and "valid_until" columns. Then you would use joins between the tables to run queries.

A basic passport database in MySQL might use joins between two separate tables to query the database. Source: MySQL Performance Blog

In contrast, a MongoDB database for the same purpose could use a single collection to store all the passport information, but this makes it difficult to determine which attributes are associated with which objects.

The same passport database in MongoDB could place all data elements in a single collection. Source: MySQL Performance Blog

Alternatively, you could embed the passport information inside the people information, or vice-versa, although this could be a problem if some people don't have passports, such as "Cinderella" in the example below.

A MongoDB passport database could embed the people information inside the passport information, though this design likely doesn't optimize performance. Source: MySQL Performance Blog

In this example, you're much more likely to access people information than passport information, so having two separate collections makes sense because it keeps less data in memory. When you need the passport data, simply add a join to the app.

The dangers of attempting 1:1 conversions of relational DBs

Many of the skills you learned in developing relational databases transfer smoothly to MongoDB's document-based model, but the principal exception is schema design, as InfoWorld's Andrew C. Oliver explains in a January 14, 2014, article. If you attempt a 1:1 port of an RDBMS schema to MongoDB, you're almost certain to run into performance problems.

Oliver points out that most of the complaints about MongoDB are by people whose choice of schema was all wrong for a document-focused database. A 1:1 table-to-document port is prone to cause missed joins, lost atomicity (although you can have atomic writes within a single MongoDB document), more required operations, and a failure to realize the performance benefits of parallelism.

By not enforcing a schema on a document or schema the way pre-defined schemas are required in RDBMSs, your databases are theoretically easier to develop and modify. In practice, things don't always work out this way. Among the MongoDB gotchas examined by Russell Smith in a Rainforest Blog post from November 2012 and updated on July 29, 2014, is failure to give schema design the attention it deserves.

Of course, MongoDB databases don't exist in isolation. Services such as the Morpheus database-as-a-service (DBaaS) are geared to meet the real-world needs of organizations that rely on a mix of SQL, NoSQL, and in-memory databases. In fact, Morpheus is the first and only DBaaS that lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus, you can bring up an instance of any database, monitor it, and optimize its performance in just seconds via a single dashboard. And all database instances include a free full replica set. Visit the Morpheus site for to sign up for a free account.

TL;DR: Software is complex -- to design, develop, deliver, and maintain. Everybody knows that, right? New app-development approaches and fundamental changes to the way businesses of all types operate are challenging the belief that meeting customers' software needs requires an army of specialists working in a tightly managed hierarchy. Focusing on repeatable results and reusable APIs helps take the complexity out of the development process.

What's holding up software development? Seven out of 10 software development teams have workers in different locations, and four out of five are bogged down by having to accommodate legacy systems. Life does not need to be like this. The rapidly expanding capabilities of cloud-based technologies and external services (like our very own Database-As-A-Service) allow developers to focus more time on application development. The end result: better software products.

The results of the 2014 State of the IT Union survey are presented in a September 9, 2014, article in Dr. Dobb's Journal. Among the findings are that 58 percent of development teams are comprised of 10 or fewer people, while 36 percent work in groups of 11 to 50 developers. In addition, 70 percent of the teams have members working in different geographic locations, but that drops to 61 percent for agile development teams.

A primary contributor to the complexity of software development projects is the need to accommodate legacy software and data sources: 83 percent of the survey respondents reported having to deal with "technical debt," (obsolete hardware and software) which increases risk and development time. Software's inherent complexity is exacerbated by the realities of the modern organization: teams working continents apart, dealing with a tangled web of regulations and requirements, while adapting to new technologies that are turning development on its head.

The survey indicates that agile development projects are more likely to succeed because they focus on repeatable results rather than processes. It also highlights the importance of flexibility in managing software projects, each of which is as unique as the product it delivers.

Successful agile development requires discipline

Organizations are realizing the benefits of agile development, but often in piecemeal fashion as they are forced to accommodate legacy systems. There's more to agile development than new tools and processes, however. As Ben Linders points out in an October 16, 2014, article on InfoQ, the key to success for agile teams is discipline.

The misconception is that agile development operates without a single methodology. In fact, it is even more important to adhere to the framework the team has selected -- whether SCRUM, Kanban, Extreme Programming (XP), Lean, Agile Modeling, or another -- than it is when using traditional waterfall development techniques.

The keys to successfully managing an agile development team have little to do with technology and much to do with communication. Source: CodeProject

Focusing on APIs helps future-proof your apps

Imagine building the connections to your software before you build the software itself. That's the API-first approach some companies are taking in developing their products. Tech Target's Crystal Bedell describes the API-first approach to software development in an October 2014 article.

Bedell quotes Jeff Kaplan, a market analyst for ThinkStrategies, who sees APIs as the foundation for interoperability. In fact, your app's ability to integrate with the universe of platforms and environments is the source of much of its value.

Another benefit of an API-centric development strategy is the separation of all the functional components of the app, according to Progress Software's Matt Robinson. As new standards arise, you can reuse the components and adapt them to specific external services.

The Morpheus database-as-a-service also future-proofs your apps by being the first service to support SQL, NoSQL, and in-memory databases. You can provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch using a simple, single-click interface. Visit the Morpheus site now to create a free account.

TL;DR: MongoDB provides a $type operator that can be helpful when you need to select documents where the value of the fields is of a certain type. This can be really helpful, but when it comes to selecting array values, things can get a bit tricky. It is best to use some form of workaround to get the selection you need in such cases.

What is the $type Operator?

The $type operator allows you to make your query selection more specific by selecting only documents that have a field containing a value that is a certain data type. Suppose you had a collection like this:

An example MongoDB collection.

If you want to get only the documents where the "name" field contains a string value, you could use the following query:

Using the $type operator to check for a string.

Since 2 is the code for a string data type, the query will only return the second document: the one with the string "George" in the "name" field. Since the first document has an object type, it won’t be returned.

MongoDB offers codes for a number of data types, as shown in the chart below:

The types that can be used by the $type operator. Source: MongoDB Documentation

Oddities with Arrays

While the $type operator generally works as expected, determining whether a field contains an array can be a bit tricky. This is because when you are dealing with an array field, the $type operator checks the type against the array elements themselves instead of the field. This means that for an array field, it will only return documents where the array contains another array.

This is helpful if you are looking for multidimensional arrays, but does not help the case where you need to know if the field itself is an array, which was possible when looking for a string value. Finding out if the field is an array will require a workaround.

How to Find a Field of Type Array

One method that works is supplied in the MongoDB documentation. It suggests using the JavaScript isArray() method, as in the following code:

Using isArray() to check for the array type.

This will do the trick, but it does come with a substantial performance decrease when running the query.

A workaround that avoids this is to use $elemMatch to check for the existence of an array, like this:

Using $elemMatch to check for the array type.

This will do the trick, except in the case where you need to also include empty arrays. To do this, you can add a sort that will allow empty arrays to be returned as well:

Using $elemMatch with a sort to include empty arrays.

With this in place, you can now determine if the field is an array. If you need to find out if it has inner arrays, you can simply use the $type operator to check for this. You are now able to perform the check with MongoDB and without the performance penalty of cycling over the collection in JavaScript!

How to Get MongoDB

MongDB is well-suited for numerous applications, especially in cases of big data that needs to be queried quickly. One way to easily set up a MongoDB database is to have it remotely hosted as a service in the cloud.

One company that offers this is Morpheus, where you can get MongoDB (and several other databases) as a service in the cloud. With easy setup and replication, and running on a high performance infrastructure with Solid State Drives, why not look open a free account today!

TL;DR: The efficient operation of your MongoDB database depends on which field in the documents you designate as the shard key. Since you have to select the shard key up front and can't change it later, you need to give the choice due consideration. For query-focused apps, the key should be limited to one or a few shards; for apps that entail a lot of scaling between clusters, create a key that writes efficiently.

The outlook is rosy for MongoDB, the most popular NoSQL DBMS. Research and Markets' March 2014 report entitled Global NoSQL Market 2014-2018 predicts that the overall NoSQL market will grow at a compound annual rate of 53 percent between 2013 and 2018. Much of the increase will be driven by increased use of big data in organizations of all sizes, according to the report.

Topping the list of MongoDB's advantages over relational databases are efficiency, easy scalability, and "deep query-ability," as Tutorialspoint's MongoDB Tutorial describes it. As usual, there's a catch: MongoDB's efficient data storage, scaling, and querying depend on sharding, and sharding depends on the careful selection of a shard key.

As the MongoDB Manual explains, every document in a collection has an indexed field or compound indexed field that determines how the collection's documents are distributed among a cluster's shards. Sharding allows the database to scale horizontally across commodity servers, which costs less than scaling vertically by adding processors, memory, and storage.

A mini-shard-key-selection vocabulary

When a MongoDB collection grows beyond its cluster, it chunkifies its documents based on ranges of values in the shard key. Keep in mind that once you choose a shard key, you're stuck with it: you can't change it later.

The characteristic that makes a chunk easy to divide is cardinality. The MongoDB Manual recommends that your shard keys have a high degree of randomness to ensure the cluster's write operations are distributed evenly, which is referred to as write scaling. Conversely, when a field has a high degree of randomness, it becomes a challenge to target specific shards. By using a shard key that is tied to a single shard, queries run much more efficiently; this is called query isolation.

When a collection doesn't have a field suitable to use as a shard key, a compound shard key can be used, or a field can be added to serve as the key.

Choice of shard key depends on the nature of the collection

How do you know which field to use as the shard key? A post by Goran Zugic from May 2014 explains the three types of sharding MongoDB supports:

Range-based sharding splits collections based on shard key value.
Hash-based sharding determines hash values based on field values in the shard key.
Tag-aware sharding ties shard key values to specific shards and are commonly used for location-based applications.

The primary consideration when deciding which shard key to designate is how the collection will be used. Zugic presents it as a balancing act between query isolation and write scaling: the former is preferred when queries are routed to one shard or a small number of shards; the latter when efficient scaling of clusters between servers is paramount.

MongoDB ensures that all replica sets have the same number of chunks, as Conrad Irwin describes in a March 2014 post on the BugSnag site. Irwin lists three factors that determine choice of shard key:

Distribution of reads and writes: split reads evenly across all replica sets to scale working set size linearly among several shards, and to avoid writing to a single machine in a cluster.
Chunk size: make sure your shard key isn't used by so many documents that your chunks grow too large to move between shards.
Query hits: if your queries have to hit too many servers, latency increases, so craft your keys so queries run as efficiently as possible.

Irwin provides two examples. The simplest approach is to use a hash of the _id of your documents:

Source: BugSnag

In addition to distributing reads and writes efficiently, the technique guarantees that each document will have its own shard key, which maximizes chunk-ability.

The other example groups related documents in the index by project while also applying a hash to distinguish shard keys:

Source: BugSnag

A mini-decision tree for shard-key selection might look like this:

Hash the _id if there isn't a good candidate to serve as a grouping key in your application.
If there is a good grouping-key candidate in the app, go with it and use the _id to prevent your chunks from getting too big.
Be sure to distribute reads and writes evenly with whichever key you use to avoid sending all queries to the same machine.

This and other aspects of optimizing MongoDB databases can be handled through a single dashboard via the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and Elasticsearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases. Visit the Morpheus site to sign up for a free account!

If you don't have enough connections open to your MySQL server, your users will begin to receive a "Too many connections" error while trying to use your service. To fix this, you can increase the maximum number of connections to the database that are allowed, but there are some things to take into consideration before simply ramping up this number.

Items to Consider

Before you increase the connections limit, you will want to ensure that the machine on which the database is housed can handle the additional workload. The maximum number of connections that can be supported depends on the following variables:

The available RAM – The system will need to have enough RAM to handle the additional workload.
The thread library quality of the platform - This will vary based on the platform. For example, Windows can be limited by the Posix compatibility layer it uses (though the limit no longer applies to MySQL v5.5 and up). However, there remains memoray usage concerns depending on the architecture (x86 vs. x64) and how much memory can be consumed per application process.
The required response time - Increasing the number could increase the amount of time to respond to request. This should be tested to ensure it meets your needs before going into production.
The amount of RAM used per connection - Again, RAM is important, so you will need to know if the RAM used per connection will overload the system or not.
The workload required for each connection - The workload will also factor in to what system resources are needed to handle the additional connections.

Another issue to consider is that you may also need to increase the open files limit–This may be necessary so that enough handles are available.

Checking the Connection Limit

To see what the current connection limit is, you can run the following from the MySQL command line or from many of the available MySQL tools such as phpMyAdmin:

The show variables command.

This will display a nicely formatted result for you:

Example result of the show variables command.

Increasing the Connection Limit

To increase the global number of connections temporarily, you can run the following from the command line:

An example of setting the max_connections global.

If you want to make the increase permanent, you will need to edit the my.cnf configuration file. You will need to determine the location of this file for your operating system (Linux systems often store the file in the /etc folder, for example). Open this file add a line that includes max_connections, followed by an equal sign, followed by the number you want to use, as in the following example:

example of setting the max_connections

The next time you restart MySQL, the new setting will take effect and will remain in place unless or until this is changed again.

Easily Scale a MySQL Database

Instead of worrying about these settings on your own system, you could opt to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, Redis, and Elasticsearch).

In addition, MySQL and Redis have automatic back ups, and each database instance is replicated, archived, and deployed on a high performance infrastructure with Solid State Drives. You can start a free account today to begin taking advantage of this service!

A database can never be too optimized, and DBAs will never be completely satisfied with the performance of their creations. As your MySQL databases grow in size and complexity, taking full advantage of the optimizing tools built into the MySQL Workbench becomes increasingly important.

DBAs have something in common with NASCAR pit crew chiefs: No matter how well your MySQL database is performing, there's always a little voice in your head telling you, "I can make it go faster."

Of course, you can go overboard trying to fine-tune your database's performance. In reality, most database tweaking is done to address a particular performance glitch or to prevent the system from bogging down as the database grows in size and complexity.

One of the tools in the MySQL Workbench for optimizing your database is the Performance Dashboard. When you mouse over a graph or other element in the dashboard, you get a snapshot of server, network, and InnoDB metrics.

The Performance Dashboard in the MySQL Workbench provides at-a-glance views of key metrics of network traffic, server activity, and InnoDB storage. Source: MySQL.com

Other built-in optimization tools are Performance Reports for analyzing IO hotspots, high-cost SQL statements, Wait statistics, and InnoDB engine metrics; Visual Explain Plans that offer graphical views of SQL statement execution; and Query Statistics that report on client timing, network latency, server execution timing, index use, rows scanned, joins, temporary storage use, and other operations.

A maintenance release of the MySQL Workbench, version 6.2.4, was announced on November 20, 2014, and is described on the MySQL Workbench Team Blog. Among the new features in MySQL Workbench 6.2 are a spatial data viewer for graphing data sets with GEOMETRY data; enhanced Fabric Cluster connectivity; and a Metadata Locks View for finding and troubleshooting threads that are blocked or stuck waiting on a lock.

Peering deeper into your database's operation

One of the performance enhancements in MySQL 5.7 is the new Cost Model, as Marcin Szalowicz explains in a September 25, 2014, post on the MySQL Workbench blog. For example, Visual Explain's interface has been improved to facilitate optimizing query performance.

MySQL 5.7's Visual Explain interface now provides more insight for improving the query processing of your database. Source: MySQL.com

The new query results panel centralizes information about result sets, including Result Grid, Form Editor, Field Types, Query Stats, Spatial Viewer, and both traditional and Visual Execution Plans. Also new is the File > Run SQL Script option that makes it easy to execute huge SQL script files.

Attempts to optimize SQL tables automatically via the OPTIMIZE TABLE command often go nowhere. A post from March 2011 on Stack Overflow demonstrates that you may end up with slower performance and more storage space used rather than less. The best approach is to use "mysqlcheck" at the command line:

Run "mysqlcheck" at the command line to optimize a single database or all databases at once. Source: Stack Overflow

Alternatively, you could run a php script to optimize all the tables in a database:

A php script can be used to optimize all the tables in a database at one time. Source: Stack Overflow

A follow-up to the above post on DBA StackExchange points out that MySQL Workbench has a "hidden" maintenance tool called the Schema Inspector that opens an editor area in which you can inspect and tweak several pages at once.

What is evident from these exchanges is that database optimization remains a continuous process, even with the arrival of new tools and techniques. A principal advantage of the Morpheus database-as-a-service (DBaaS) is the use of a single dashboard to access statistics about all your MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus you can provision, deploy, and host SQL, NoSQL, and in-memory databases with a single click. The service supports a range of tools for connecting, configuring, and managing your databases, and automated backups for MySQL and Redis.

Visit the Morpheus site to create a free account. Database optimization has never been simpler!

Contain(er) Yourself: Separating Docker Hype from the Tech's Reality

Could Database as a Service Be What Saves Microsoft's Bacon?

The New Reality: Microservices Apply the Internet Model to App Development

Morpheus Technical FAQ

Morpheus Technical FAQ

What databases are currently supported?

What security measures does Morpheus take to protect instances?

How does scaling actually work?

How often are Morpheus instances backed-up?

How do I access these backup?

Can I change the frequency of the backups?

Can I check my logs?

Can I check the status of my instances?

Can I access the live replicas created?

Why use Morpheus when there’s Amazon Web Services?

JP Morgan Data Breach

How to Keep Coders Happy

How eBay Solves the Database Scaling Problem with MongoDB

How to Handle Huge Databaase Tables

Merge Databases with Different Schema and Duplicate Entries

Container-based Development Separating Hype from Substance

How (And Why) Make Read-only Versions of Your SQL and MySQL Databases

Morpheus Lessons: Best Practices for Upgrading MySQL

Password Encryption: Keeping Hackers from Obtaining Passwords in Your Database

Quick and Simple Ways to Migrate Password Hashes Without Bugging Users

The Importance of Schema Design in 'Schema-less' MongoDB

Making Software Development Simpler: Look for Repeatable Results, Reusable APIs and DBaaS

How to Use the $type Query Operator and Array in MongoDB

The Three Most Important Considerations in Selecting a MongoDB Shard Key

"Too Many Connections": How to Increase the MySQL Connection Count To Avoid This Problem

Your Options for Optimizing the Performance of MySQL Databases