Channel: Morpheus Blog
Viewing all 1101 articles
Browse latest View live

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables


One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site for pricing information and to create a free account.

Sony's Two Big Mistakes: No Encryption, and No Backup


Even if you can't prevent all unauthorized access to your organization's networks, you can mitigate the damage -- and prevent most of it -- by using two time-proven, straightforward security techniques: encrypt all data storage and transmissions; and back up your data to the cloud or other off-premises site. Best of all, both security measures can be implemented without relying on all-too-human humans.

People are the weak link in any data-security plan. It turns out we're more fallible than the machines we use. Science fiction scenarios aside, the key to protecting data from attacks such as the one that threatens to put Sony out of business is to rely on machines, not people.

The safest things a company can do are to implement end-to-end encryption, and back up all data wherever it's stored. All connections between you and the outside world need to be encrypted, and all company data stored anywhere -- including on employees' mobile devices -- must be encrypted and backed up automatically.

A combination of encryption and sound backup as cornerstones of a solid business-continuity plan would have saved Sony's bacon. In a December 17, 2014, post on the Vox, Timothy B. Lee writes that large companies generally under-invest in security until disaster strikes. But Sony has been victimized before. In 2011, hackers stole the personal information of millions of members of the Sony PlayStation network.

User authentication: The security hole that defies plugging

Most hackers get into their victim's networks via stolen user IDs and passwords. The 2014 Verizon Data Breach Investigations Report identifies the nine attack patterns that accounted for 93 percent of all network break-ins over the past decade. DarkReading's Kelly Jackson Higgins presents the report's findings in an April 22, 2014, article.

The 2014 Data Breach Investigations Report identifies nine predominant patterns in security breaches over the past decade. Source: Verizon

In two out of three breaches, the crook gained access by entering a user ID and password. The report recorded 1,367 data breaches in 2013, compared to only 621 in 2012. In 422 of the attacks in 2013, stolen credentials were used; 327 were due to data-stealing malware; 245 were from phishing attacks; 223 from RAM scraping; and 165 from backdoor malware.

There's just no way to keep user IDs and passwords out of the hands of data thieves. You have to assume that eventually, crooks will make it through your network defenses. In this case, the only way to protect your data is by encrypting it so that even if it's stolen, it can't be read without the decryption key.

If encryption is such a data-security magic bullet, why haven't organizations been using it for years already? In a June 10, 2014, article on ESET's We Live Security site, Stephen Cobb warns about the high cost of not encrypting your business's data. Concentra had just reached a $1,725,220 settlement with the U.S. government following a HIPAA violation that involved the loss of unencrypted health information.

A 2013 Ponemon Institute survey pegged the average cost of a data breach in the U.S. at $5.4 million. Source: Ponemon Institute/Symantec

Encryption's #1 benefit: Minimizing the human factor

Still, as many as half of all major corporations don't use encryption, according to a survey conducted in 2012 by security firm Kaspersky Labs. The company lists the five greatest benefits of data encryption:

  1. Complete data protection, even in the event of theft
  2. Data is secured on all devices and distributed nodes
  3. Data transmissions are protected
  4. Data integrity is guaranteed
  5. Regulatory compliance is assured

Backups: Where data security starts and ends

Vox's Timothy B. Lee points out in his step-by-step account of the Sony data breach that the company's networks were "down for days" following the November 24, 2014, attack. (In fact, the original network breach likely occurred months earlier, as Wired's Kim Zetter reports in a December 15, 2014, post.)

Any business-continuity plan worth its salt prepares the company to resume network operations within hours or even minutes after a disaster, not days. A key component of your disaster-recovery plan is your recovery time objective. While operating dual data centers is an expensive option, it's also the safest. More practical for most businesses are cloud-based services such as the Morpheus database-as-a-service (DBaaS).

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. When you choose Morpheus to host your MySQL, MongoDB, Redis, and ElasticSearch databases, you get a free full replica with each database instance. Morpheus also provides automatic daily backups of your MySQL and Redis databases.

The Morpheus dashboard lets you provision, deploy, and host your databases and monitor performance using a range of database tools. Visit the Morpheus site to create a free account.

Cloud Computing + Data Analytics = Instant Business Intelligence


Only by using cloud services will companies be able to offer their employees and managers access to big data, as well as the tools they'll need to analyze the information without being data scientists. A primary advantage of moving data analytics to the cloud is its potential to unleash the creativity of the data users, although a level of data governance is still required.

Data analytics are moving to the edge of the network, starting at the point of collection. That's one result of our applications getting smarter. According to the IDC FutureScape for Big Data and Analytics 2015 Predictions, apps that incorporate machine learning and other advanced or predictive analytics will grow 65 percent faster in 2015 than software without such abilities.

There's only one way to give millions of people affordable access to the volumes of data now being collected in real time, not to mention the easy-to-use tools they'll need to make productive use of the data. And that's via the cloud.

IDC also predicts a shortage of skilled data analysts: by 2018 there will be 181,000 positions requiring deep-analytics skills, and five times that number requiring similar data-interpretation abilities. Another of IDC's trends for 2015 is the booming market for visual data discovery tools, which are projected to grow at 2.5 times the rate of other business-intelligence sectors.

As software gets smarter, more data conditioning and analysis is done automatically, which facilitates analysis by end users. Source: Software Development Times

When you combine smarter software, a shortage of experts, and an increase in easy-to-use analysis tools, you get end users doing their own analyses, with the assistance of intelligent software. If all the pieces click into place, your organization can benefit by tapping into the creativity of its employees and managers.

The yogurt-shop model for data analytics

In a November 19, 2014, article, Forbes' Bill Franks compares DIY data analytics to self-serve yogurt shops. In both cases the value proposition is transferred to the customer: analyzing the data becomes an engaging, rewarding experience, similar to choosing the type and amount of toppings for your cup of frozen yogurt.

More importantly, you can shift to the self-serve model without any big expenses in infrastructure, training, or other costs. You might even find your costs reduced, just as self-serve yogurt shops save on labor and other costs, particularly by tapping into the efficiency and scalability of the cloud.

Employees are more satisfied with their data-analytics roles when their companies used cloud-based big data analytics. Source: Aberdeen Group (via Ricoh)

Last but not least, when you give people direct access to data and offer them tools that let them mash up the data as their creativity dictates, you'll generate valuable combinations you may never have come up with yourself.

Determining the correct level of oversight for DIY data analysts

Considering the value of the company's data, it's understandable that IT managers would hesitate to turn employees loose on the data without some supervision. As Timo Elliott explains in a post from April 2014 on the Business Analytics blog, data governance remains the responsibility of the IT department.

Elliott defines data governance as "stopping people from doing stupid things with data." The concept encompasses security, data currency, and reliability, but it also entails ensuring that information in the organization gets into the hands of the people who need it, when they need it.

You'll see aspects of DIY data analytics in the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. You use a single console to provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch. Every database instance is deployed with a free full replica set, and your MySQL and Redis databases are backed up.

Morpheus supports a range of tools for configuring and managing your databases, which are monitored continuously by the service's staff and advanced bots. Visit the Morpheus site for pricing information and to create a free account.

Find the Best Approach for Entering Dates in MySQL Databases


A function as straightforward as entering dates in a MySQL database should be nearly automatic, but the process is anything but foolproof. MySQL's handling of invalid date entries can leave developers scratching their heads. In particular, the globalization of IT means you're never sure where the server hosting your database will be located -- or relocated. Plan ahead to ensure your database's date entries are as accurate as possible.

DBAs know that if they want their databases to function properly, they have to follow the rules. The first problem is, some "rules" are more like guidelines, allowing a great deal of flexibility in their application. The second problem is, it's not always easy to determine which rules are rigid, and which are more malleable.

An example of a rule with some built-in wiggle room is MySQL's date handling. Database Journal's Rob Gravelle explains in a September 8, 2014, post that MySQL automatically converts numbers and strings into a correct Date whenever you add or update data in a DATE, DATETIME, or TIMESTAMP column. The string has to be in the "yyyy-mm-dd" format, but you can use any punctuation to separate the three date elements, such as "yyyy&mm&dd", or you can skip the separators altogether, as in "yyyymmdd".

So what happens when a Date record has an invalid entry, or no entry at all? MySQL inserts its special zero date of "0000-00-00" and warns you that it has encountered an invalid date, as shown below.


Only the first of the four Date records is valid, so MySQL warns that there is an invalid date after entering the zero date of "0000-00-00". Source: Database Journal

To prevent the zero date from being entered, you can use NO_ZERO_DATE in strict mode, which generates an error whenever an invalid date is entered; or NO_ZERO_IN_DATE mode, which allows no month or day entry when a valid year is entered. Note that both of these modes have been deprecated in MySQL 5.7.4 and rolled into strict SQL mode.

Other options are to enable ALLOW_INVALID_DATES mode, which permits an application to store the year, month, and date in three separate fields, for example, or to enable TRADITIONAL SQL Mode, which acts more like stricter database servers by combining STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, and NO_AUTO_CREATE_USER.

Avoid using DATETIME at all? Not quite

Developer Eli Billauer posits on his personal blog that it is always a mistake to use the MySQL (and SQL) DATETIME column type. He qualifies his initial blanket pronouncement to acknowledge that commenters to the post give examples of instances where use of DATETIME is the best approach.

Billauer points out that many developers use DATETIME to store the time of events, as in this example:


Using the DATETIME and NOW() functions creates problems because you can't be sure of the local server's time, or the user's timezone. Source: Eli Billauer

Because DATETIME relies on the time of the local server, you can't be sure where the web server hosting the app is going to be located. One way around this uncertainty is to apply a SQL function that converts timezones, but this doesn't address such issues as daylight savings time and databases relocated to new servers. (Note that the UTC_TIMESTAMP() function provides the UTC time.)

There are several ways to get around these limitations, one of which is to use "UNIX time," as in "UNIX_TIMESTAMP(thedate)." This is also referred to as "seconds since the Epoch." Alternatively, you can store the integer itself in the database; Billauer explains how to obtain Epoch time in Perl, PHP, Python, C, and Javascript.

Troubleshooting and monitoring the performance of your MySQL, MongoDB, Redis, and ElasticSearch databases is a piece of cake when you use the Morpheus database-as-a-service (DBaaS). Morpheus provides a single, easy-to-use dashboard. In addition to a free full replica set of each database instance, you get backups of your MySQL and Redis databases.

Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases. The service's SSD-backed infrastructure ensures peak performance, and direct links to EC2 guarantee ultra-low latency. Visit the Morpheus site to create a free account.

Has Node.js Adoption Peaked? If So, What's Next for Server-Side App Development?


The general consensus of the experts is that Node.js will continue to play an important role in web app development despite the impending release of the io.js forked version. Still, some developers have decided to switch to the Go programming language and other alternatives, which they consider better suited to large, distributed web apps.

The developer community appears to be tiring of the constant churn in platforms and toolkits. Jimmy Breck-McKye points out in a December 1, 2014, post on his Lazy Programmer blog that it has been only two years since the arrival of Node.js, the JavaScript framework for developing server-side apps quickly and simply.

Soon Node.js was followed by Backbone.js/Grunt, Require.js/Handlebars, and most recently, Angular, Gulp, and Browserify. How is a programmer expected to invest in any single set of development tools when the tools are likely to be eclipsed before the developer can finish learning them?

Node.js still has plenty of supporters, despite the recent forking of the product with the release of io.js by a group of former Node contributors. In a December 29, 2014, post on the LinkedIn Pulse blog, Kurt Cagle identifies Node as one of the Ten Trends in Data Science for 2015. Cagle nearly gushes over the framework, calling it "the nucleus of a new stack that is likely going to relegate Ruby and Python to has-been languages." Node could even supplant PHP someday, according to Cagle.

The internal thread architecture of Node.js handles incoming requests to the http server similar to SQL requests. Source: Stack Overflow

Taking the opposite view is Shiju Varghese, who writes in an August 31, 2014, post on his Medium blog that after years of developing with Node, he has switched to using Go for Web development and as a " technology ecosystem for building distributed apps." Among Node's shortcomings, according to Varghese, are its error handling, debugging, and usability.

More importantly, Varghese claims Node is a nightmare to maintain for large, distributed apps. For anyone building RESTful apps on Node.js, he recommends the Hapi.js framework created by WalMart. Varghese predicts that the era of using dynamic languages for "performance-critical" web apps is drawing to a close.

The Node.js fork may -- or may not -- be temporary

When io.js was released in late November 2014, developers feared they would be forced to choose between the original version of the open-source framework supported by Joyent, and the new version created by former Node contributors. As ReadWrite's Lauren Orsini describes in a December 10, 2014, article, the folks behind io.js were unhappy with Joyent's management of the framework.

Io.js is intended to have "an open governance model," according to the framework's readme file. It is described as an "evented IO for V8 JavaScript." Node.js and io.js are both server-side frameworks that allow web apps to handle user requests in real time, and the io.js development team reportedly intends to maintain compatibility with the "Node ecosystem."

At present, most corporate developers are taking a wait-and-see approach to the Node rift, according to InfoWorld's Paul Krill. In a December 8, 2014, article, Krill writes that many attendees at Intuit's Node Day conference see the fork as a means of pressuring Joyent to "open up a little bit," as one conference-goer put it. Many expect the two sides to reconcile before long -- and before parallel, incompatible toolsets are released.

Still, the io.js fork is expected to be released in January 2015, according to InfoQ's James Chester in a December 9, 2014, post. Isaac Z. Schluetler, one of the Node contributors backing io.js, insists in an FAQ that the framework is not intended to compete with Node, but rather to improve it.

Regardless of the outcome of the current schism, the outlook for Node developers looks rosy. Indeed.com's recent survey of programmer job postings indicates that the number of openings for Node developers is on the rise, although it still trails jobs for Ruby and Python programmers.

Openings for developers who can work with Node.js are on the rise, according to Indeed.com. Source: FreeCodeCamp

Regardless of your preferred development framework, you can rest assured that your MySQL, MongoDB, Redis, and ElasticSearch databases are accessible when you need them on the Morpheus database-as-a-service (DBaaS). Morpheus supports a range of tools for connecting to, configuring, and managing your databases.

You can provision, deploy, and host all your databases on Morpheus with just a few clicks using the service's dashboard. Visit the Morpheus site for to create a free account!

The Most Important Takeaways from MySQL Prepared Statements


Since MySQL both sends queries to the server and returns data in text format, the query must be fully parsed and the result set must be converted to a string before being sent to the client. This overhead can cause performance issues, so MySQL implemented a new feature called Prepared Statements when it released version 4.1.

What is a MySQL prepared statement?

A MySQL prepared statement is a method that can be used to pass a query containing one or more placeholders to the MySQL server. Prepared statements make use of the client/server protocol that works between a MySQL client and server, thus allowing it to have a quicker response time that the typical text/parse/conversion exchange.

Here is an example query that demonstrates how a placeholder can be used (this is similar to using a variable in programming):


Example of a MySQL placeholder

This query does not need to be fully parsed, since different values can be used for the placeholder. This provides a performance boost for the query, which is even more pronounced if the query is used numerous times.

In addition to enhanced performance, the placeholder can help you avoid a number of SQL injection vulnerabilities, since you are defining the placeholder rather than having it sent as a text string that can be more easily manipulated.

Using MySQL Prepared Statements

A prepared statement in MySQL is essentially performed using four keywords:

  1. PREPARE - This prepares the statement for execution
  2. SET - Sets a value for the placeholder
  3. EXECUTE - This executes the prepared statement
  4. DEALLOCATE PREPARE - This deallocates the prepared statement from memory.

With that in mind, here is an example of a MySQL prepared statement:



Example of a MySQL placeholder

Notice how the four keywords are used to complete the prepared statement:

  1. The PREPARE statement defines a name for the prepared statement and a query to be run.
  2. The SELECT statement that is prepared will select all of the user data from the users table for the specified user. A question mark is used as a placeholder for the user name, which will be defined next.
  3. A variable named @username is set and is given a value of 'sally_224'. The EXECUTE statement is then used to execute the prepared statement using the value in the placeholder variable.
  4. To end everything and ensure the statement is deallocated from memory, the DEALLOCATE PREPARE statement is used with the name of the prepared statement that is to be deallocated (statement_user in this case).

Get your own MySQL Database

To use prepared statements, you will need to have a MySQL database set up and running. One way to easily obtain a database is to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, and more). In addition, databases are backed up, replicated, and archived, and are deployed on a high performance infrastructure with Solid State Drives. 

Diagnose and Optimize MySQL Performance Bottlenecks


A common source of MySQL performance problems is tables with outdated, redundant, and otherwise-useless data. Slow queries can be fixed by optimizing one or all tables in your database in a way that doesn't lock users out any longer than necessary.

MySQL was originally designed to be the little database that could, yet MySQL installations keep getting bigger and more complicated: larger databases (often running in VMs), and larger and more widely disparate clusters. As database configurations increase in size and complexity, DBAs are more likely to encounter performance slowdowns. Yet the bigger and more complex the installation, the more difficult it is to diagnose and address the speed sappers.

The MySQL Reference Manual includes an overview of factors that affect database performance, as well as sections explaining how to optimize SQL statementsindexesInnoDB tablesMyISAM tablesMEMORY tableslocking operations, and MySQL Server, among other components.

At the hardware level, the most common sources of performance hits are disk seeks, disk reading and writing, CPU cycles, and memory bandwidth. Of these, memory management generally and disk I/O in particular top the list of performance-robbing suspects. In a June 16, 2014, article, ITworld's Matthew Mombrea focuses on the likelihood of encountering disk thrashing (a.k.a. I/O thrashing) when hosting multiple virtual machines running MySQL Server, each of which contains dozens of databases.

Data is constantly being swapped between RAM and disk, and obviously it's faster to access data in system memory than data on disk. When insufficient RAM is available to MySQL, dozens or hundreds of concurrent queries to disk will result in I/O thrashing. Comparing the server's load value to its CPU utilization will confirm this: high load value and low CPU utilization indicates high disk I/O wait times.

Determining how frequently you need to optimize your tables

The key to a smooth-running database is ensuring your tables are optimized. Striking the right balance between optimizing too often and optimizing too infrequently is a challenge for any DBA working with large MySQL databases. This quandary was presented in a Stack Overflow post from February 2012.

For a statistical database having more than 2,000 tables, each of which has approximately 100 million rows, how often should the tables be optimized when only 60 percent of them are updated every day (the remainder are archives)? You need to run OPTIMIZE on the table in three situations:

  • When its datafile is fragmented on disk
  • When many of its rows are updated or change size
  • When deleting many records and not adding many others

Run CHECK TABLE when you suspect the table's data is corrupted, and then REPAIR TABLE when corruption is reported. Use ANALYZE TABLE to update index cardinality.

In a separate Stack Overflow post from March 2011, the perils of optimizing too frequently are explained. Many databases use InnoDB with a single file rather than separate files per table. Optimizing in such situations can cause more disk space to be used rather than less. (Also, tables are locked during optimization, so large tables may be inaccessible for long periods.)

From the command line, you can use mysqlcheck to optimize one or all databases:

Run "mysqlcheck" from the command line to optimize one or all of your databases quickly. Source: Stack Overflow

Alternatively, you can run this PHP script to optimize all the tables in your database:

This PHP script will optimize all the tables in a database in one fell swoop. Source: Stack Overflow

Other suggestions are to implode the table names into one string so that you need only one optimize table query, and to use MySQL Administrator in the MySQL GUI Tools.

Monitoring and optimizing your MySQL, MongoDB, Redis, and ElasticSearch databases is a point-and-click process in the new Morpheus Virtual Appliance. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds. You can provision your database with astounding ease, and each database instance includes a free full replica set. The service supports a range of database tools and lets you analyze all your databases from a single dashboard. Visit the Morpheus site to create a free account.

How to Ensure Your SSL-TLS Connections Are Secure


Encryption is becoming an essential component of nearly all applications, but managing the Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates that are at the heart of most protected Internet connections is anything but simple. A new tool from Google can help ensure your apps are protected against man-in-the-middle attacks.

In the not-too-distant past, only certain types of Internet traffic was encrypted, primarily online purchases and any transmission of sensitive business information. Now the push is on to encrypt everything -- or nearly everything -- that travels over the Internet. While some analysts question whether the current SSL/TLS encryption standards are up to the task, certificate-based encryption isn't likely to be replaced anytime soon.

The Elecronic Frontier Foundation's Let's Encrypt program proposes a new certificate authority (CA) intended to make HTTPS the default on all websites. The EFF claims the current CA system for HTTPS is too complex, too costly, and too easy for the bad guys to beat.

Nearly every web user has encountered a warning or error message generated by a misconfigured certificate. The pop-ups are usually full of techno-jargon that can confuse engineers, let alone your typical site visitors. In fact, a recent study by researchers at Google and the University of Pennsylvania entitled Improving SSL Warnings: Comprehension and Adherence (pdf) found that 66 percent of people using the Chrome browser clicked right through the CA warnings.

As Threatpost's Brian Donahue reports in a February 3, 2015, article, redesigning the messages to provide better visual cues and more dire warnings convinced 62 percent of users to choose the preferred, safe response, compared to only 37 percent who did so when confronted with the old warnings. The "opinionated design" concept combines a plain-English explanation ("Your connection is not private" in red letters) with added steps required to continue despite the warning.

Researchers were able to increase the likelihood that users would make the safe choice by redesigning SSL certificate warnings from cryptic (top) to straightforward (bottom). Source: Sophos Naked Security

Best practices for developing SSL-enabled apps

SSL has become a key tool in securing IT infrastructures. Because SSL certificates are valid only for the time they specify, monitoring the certificates becomes an important part of app management. A Symantec white paper entitled SSL for Apps: Best Practices for Developers (pdf) outlines the steps required to secure your apps using SSL/TLS.

When establishing an SSL connection, the server returns one or more certificates to create a "chain of trust." The certificates may not be received in a predictable order. Also, the server may return more than necessary or require that the client look for necessary certificates elsewhere. In the latter case, a certificate with a caIssuers entry in its authorityInfoAccess extension will list a protocol and extension for the issuing certificate.

Once you've determined the end-entity SSL certificate, you verify that the chain from the end-entity certificate to the trusted root certificate or intermediate certificate is valid.

To help developers ensure their apps are protected against man-in-the-middle attacks resulting from corrupted SSL certificates, Google recently released a tool called nogotofail. As PC World's Lucian Constantin explains in a November 4, 2014, article, apps become vulnerable to such attacks because of bad client configurations or unpatched libraries that may override secure default settings.

Nogotofail simulates man-in-the-middle attacks using deep packet inspection to track all SSL/TLS traffic rather than monitoring only the two ports usually associated with secure connections, such as port 443. The tool can be deployed as a router, VPN server, or network proxy.

Security is at the heart of the new Morpheus Virtual Appliance, which lets you seamlessly provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds. Each database instance you create includes a free full replica set for built-in fault tolerance and failover. You can administer your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard via a simple point-and-click interface. 

Visit the Morpheus site to sign up for a FREE Trial!

The Benefits of Virtual Appliances Expand to Encompass Nearly All Data Center Ops


Virtual appliances deliver the potential to enhance data security and operational efficiency in IT departments of all shapes, sizes, and types. As the technology expands to encompass ever more data-center operations, it becomes nearly impossible for managers to exclude virtual appliances from their overall IT strategies.

Why have virtual appliances taken the IT world by storm? They just make sense. By combining applications with just as much operating system and other resources as they need, you're able to minimize overhead and maximize processing efficiency. You can run the appliances on standard hardware or in virtual machines.

At the risk of sounding like a late-night TV commercial, "But wait, there's more!" The Turnkey Linux site summarizes several other benefits of virtual appliances: they streamline complicated, labor-intensive processes; they make software deployment a breeze by encapsulating all the app's dependencies, thus precluding conflicts due to incompatible OSes and missing libraries; and last but not least, they enhance security by running in isolation, so a problem with or breach of one appliance doesn't affect any other network components.

In fact, the heightened focus in organizations of all sizes on data security is the impetus that will lead to a doubling of the market for virtual security appliances between 2013 and 2018, according to research firm Infonetics. The company forecasts that revenues from virtual security appliances will total $1.2 billion in 2018, as cited by Fierce IT's Fred Donovan in a November 11, 2014, article.


Growth in the market for virtual appliances will be spurred in large part by increased emphasis on data security in organizations. Source: Infonetics Research, via Fierce IT

In particular, virtual appliances are seen as the primary platform for implementation of software-defined networks and network functions virtualization, both of which are expected to boom starting in 2016, according to Infonetics.

The roster of top-notch virtual appliances continues to grow

There are now virtual appliances available for such core functions as ERP, CRM, content management, groupware, file serving, help desks, and domain controllers. TechRepublic's Jack Wallen lists 10 of his favorite virtual appliances, which include Drupal applianceLAMP stackZimbra applianceOpenfiler appliance, and the Opsview Core Virtual Appliance.

If you prefer the DIY approach, the TKLDev development environment for Turnkey Linux appliances claims to make building Turnkey Core from scratch as easy as running make.

The TKLDev tool lets you build Turnkey Core simply by running make. Source: Turnkey Linux

The source code for all the appliances in the Turnkey library are available on GitHub, as are all other repositories and the TKLDev documentation.

Also available are the Turnkey LXC (LinuX Containers) and Turnkey LXC appliance. Turnkey LXC is described by Turnkey Linux's Alon Swartz in a December 19, 2013, post as a " middle ground between a chroot on steroids and a full fledged virtual machine." The environment allows multiple isolated containers to be run on a single host.

The most recent addition to the virtual-appliance field is the Morpheus Virtual Appliance, which is the first and only database provisioning and management platform that supports private, public, and hybrid clouds. Morpheus offers the simplest way to provision heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

The Morpheus Virtual Appliance offers real-time monitoring and analysis of all your databases via a single dashboard to provide instant insight into consumption and availability of system resources. A free full replica set is provisioned for each database instance, and backups are created for your MySQL and Redis databases.

Visit the Morpheus site to create a free trial account. You'll also find out how to get started using Morpheus, which is the only database-as-a-service to support SQL, NoSQL, and in-memory databases.

When One Data Model Just Won't Do: Database Design that Supports Polyglot Persistence


The demands of modern database development mandate an approach that matches the model (structured or unstructured) to the nature of the underlying data, as well as the way the data will be used. Choice of data model is no longer an either/or proposition: now you can have your relational and key-value, too. The multimodel approach must be applied deliberately to reduce operational complexity and ensure reliability.

"When your only tool is a hammer, all your problems start looking like nails." Too often that old adage has applied to database design: When your only tool is a relational DBMS, all your data starts to look structured.

Well, today's proliferation of data types defies squeezing it all into a single model. The age of the multimodel database has arrived, and developers are responding by adopting designs that apply the most appropriate model to the various data types that comprise their diverse databases.

In a January 6, 2015, article on InfoWorld, FoundationDB's Stephen Pimentel explains that the rise in NoSQL, JSON, graphs, and other non-SQL data models is the result of today's applications needing to work with various data types and storage requirements. Rather than creating multiple distinct databases, developers are increasingly basing their databases on a single backend that supports multiple data models.

Data scientist Martin Fowler describes polyglot persistence as the ability of applications to manage their own data using various technologies based on the characteristics and use of that data. Rather than selecting the tool first and then fitting the data to the tool, developers will determine how the various data elements will be manipulated and then will choose the appropriate tools for those specific purposes.

Multimodel databases apply different data models in a single database based on the characteristics of various data elements. Source: Martin Fowler

Multimodel databases are by definition more complicated than their single-model counterparts. Managing this complexity is the principal challenge of developers, primarily because each data storage mechanism requires its own interface and creates a potential performance bottleneck. However, the alternative of attempting to apply the relational model to NoSQL-type unstructured data will require a tremendous amount of development and maintenance effort.

Putting the multimodel database design into practice

John P. Wood highlights the primary shortcoming of RDBMSs in clustered environments: the way they enforce data integrity places inordinate demands on processing power and storage requirements. RDBMSs depend on fast, simple access to all data continually to prevent duplicates, enforce constraints, and otherwise maintain the database.

While you can scale out relational databases via slave-master, sharding, and other approaches, doing so increases the app's complexity. More importantly, a key-value store is often a better fit for that data than RDBMS's rows and columns, even with object/relation mapping tools.

Wood describes two scenarios in which polyglot persistence improves database performance: when performing complex calculations on massive data sets; and when needing to store data that varies greatly from document to document, or that is constantly changing structure. In the first instance, data is moved from the relational to the NoSQL database and then processed by the application to maximize the benefits of clustering. In the second, structure is applied to the document on the fly to allow data inside the document to be queried.

The basic relational (SQL) model compared to the document (NoSQL) model. Source: Aaron Stannard

The trend toward supporting multiple data models in a single database is evident in the new Morpheus Virtual Appliance, which supports heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you monitor and analyze all your databases using a single dashboard to provide instant insight into consumption and availability of system resources.

The Morpheus Virtual Appliance is the first and only database provisioning and management platform that works with private, public, and hybrid clouds. A free full replica set is provisioned for each database instance, and backups are created for your MySQL and Redis databases.

Visit the Morpheus site to create a free trial account!

Relational or Graph: Which Is Best for Your Database?


Choosing between the structured relational database model or the "unstructured" graph model is less and less an either-or proposition. For some organizations, the best approach is to process their graph data using standard relational operators, while others are better served by migrating their relational data to a graph model.

The conventional wisdom is that relational is relational and graph is graph, and never the twain shall meet. In fact, relational and graph databases now encounter each other all the time, and both can be better off for it.

The most common scenario in which "unstructured" graph data coexists peaceably with relational schema is placement of graph content inside relational database tables. Alekh Jindal of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) points out in a July 9, 2014, post on the Intel Science and Technology Center for Big Data blog that most graph data originates in an RDBMS.

Rather than extract the graph data from the RDBMS for import to a graph processing system, Jindal suggests applying the graph-analytics features of the relational database. When a graph is stored as a set of nodes and a set of edges in an RDBMS, built-in relational operators such as selection, projection, and join can be applied to capture node/edge access, neighborhood access, graph traversal, and other basic graph operations. Combining these basic operations makes possible more complex analytics.

Similarly, stored procedures can be used as driver programs to capture the iterative operations of graph algorithms. The down side of expressing graph analytics as SQL queries is the performance hit resulting from multiple self-joins on tables of nodes and edges. Query pipelining and other parallel-processing features of RDBMSs can be used to mitigate any resulting slowdowns.

When Jindal compared the performance of a column-oriented relational database and Apache Giraph on PageRank and ShortestPath, the former outperformed the latter in two graph-analytics datasets: one from LiveJournal with 4.8 million nodes and 68 million edges; and one from Twitter with 41 million nodes and 1.4 billion edges.


A column-oriented RDBMS matched or exceeded the performance of a native graph database in processing two graph datasets. Source: Alekh Jindal, MIT CSAIL

When migrating data from relational to graph makes sense

While there are many instances in which extending the relational model to accommodate graph-data processing is the best option, there are others where a switch to the graph model is called for. One such case is the massive people database maintained by Whitepages, which resided for many years in siloed PostgreSQL, MySQL, and Oracle databases.

As explained in a November 12, 2014, post on Linkurious, Whitepages discovered that many of its business customers were using the directory to ask graph-like questions, primarily for fraud prevention. In particular, the businesses wanted to know whether a particular phone number was associated with a real person at a physical address, and what other phone numbers and addresses have been associated with a particular person.

The development team hired by Whitepages used the Titan scalable graph database to meet the company's need for scalability, availability, high performance (processing 30,000 vertices per second), and high ingest rate (greater than 200 updates per second). The resulting graph schema more accurately modeled the way Whitepages customers where querying the database: from location to location, and number to number.


The Whitepages graph schema tracks people as they change physical address and telephone number, among other attributes. Source: Linkurious

Whitepages has made its graph infrastructure available to the public via the WhitePages PRO API 2.0.

Whether you find your organization's data better suited to either the graph or relational model, the Morpheus Virtual Appliance will help you with real-time database and system operational insights. Get your MongoDB, MySQL, Elasticsearch, or Redis databases provisioned with a simple point-and-click interface, and manage SQL, NoSQL, and In-Memory databases across hybrid clouds. 

New Compilers Streamline Optimization and Enhance Code Conversion


Researchers are developing compiler technologies that optimize and regenerate code in multiple languages and for many different platforms in only one or a handful of steps. While much of their work focuses on Java and JavaScript, their innovations will impact developers working in nearly all programming languages.

Who says you can't teach an old dog new tricks? One of the staples of any developer's code-optimization toolkit is a compiler, which checks your program's syntax, semantics, and other aspects for errors and otherwise optimizes its performance.

Infostructure Associates' Wayne Kernochan explains in an October 2014 TechTarget article that compilers are particularly adept at improving the performance of big data and business-critical online transaction processing (OLTP) applications. As recent developments in compiler technology point out, the importance of the programs goes far beyond these specialty apps.

Google is developing two new Java compilers named Jack (Java Android Compiler Kit) and Jill (Jack Intermediate Library Linker) that are part of Android SDK 21.1. I Programmer's Harry Fairhead writes in a December 12, 2014, article that Jack compiles Java code directly to a .dex Dalvik Executable rather than using the standard javac compiler to convert the source code to Java bytecode and then to Dalvik bytecode by feeding it through the dex compiler.

In addition to skipping the conversion to Java bytecode, Jack also optimizes and applies Proguard's obfuscation in a single step. The .dex code Jack generates can be fed to either the Dalvik engine or the new ART Android RunTime Engine, which uses Ahead-of-Time compilation to improve speed.

Jill converts .jar library files into the .jack library format to allow it to be merged with the rest of the object code.

Google's new Jack and Jill Java compilers promise to speed up compilation by generating Dalvik bytecode without first having to convert it from Java bytecode. Source: I Programmer

In addition to streamlining compilation, Jack and Jill reduce Google's reliance on Java APIs, which are the subject of the company's ongoing lawsuit with Oracle. At present, the compilers don't support Java 8, but in terms of retaining compatibility with Java, it appears Android has become the tail wagging the dog.

Competition among open-source compiler infrastructures heats up

The latest versions of the LLVM and Gnu Compiler Collection (GCC) are in a race to see which can out-perform the other. Both open-source compiler infrastructures generate object code from any kind of source code; they support C/C++, Objective-C, Fortran, and other languages. InfoWorld's Serdar Yegulalp reports in a September 8, 2014, article that testing conducted by Phoronix of LLVM 3.5 and a pre-release version of GCC 5 found that LLVM recorded faster C/C++ compile times. However, LLVM trailed GCC when processing some encryption algorithms and other tests.

Version 3.5 of the LLVM compiler infrastructure outperformed GCC 5 in some of Phoronix's speed tests but trailed in others, including audio encoding. Source: Phoronix

The ability to share code between JavaScript and Windows applications is a key feature of the new DuoCode compiler, which supports cross-compiling of C# code into JavaScript. InfoWorld's Paul Krill describes the new compiler in a January 22, 2015, article. DuoCode uses Microsoft's Roslyn compiler for code parsing, syntactic tree (AST) generation, and contextual analysis. DuoCode then handles the code translation and JavaScript generation, including source maps.

Another innovative approach to JavaScript compiling is the experimental Higgs compiler created by University of Montreal researcher Maxime Chevalier-Boisvert. InfoWorld's Yegulalp describes the project in a September 19, 2014, article. Higgs differs from other just-in-time (JIT) JavaScript compilers such as Google's V8, Mozilla's SpiderMonkey, and Apple's LLVM-backed FTLJIT project in that it has a single level rather than being multitiered, and it accumulates type information as machine-level code rather than using type analysis.

When it comes to optimizing your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases, the new Morpheus Virtual Appliance makes it as easy as pointing and clicking in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with a single click, and each instance includes a free full replica set for failover and fault tolerance. Your MySQL and Redis databases are backed up and you can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

The Key to Selecting a Programming Language: Focus


There isn't a single best programming language. Rather than flitting from one language to the next as each comes into fashion, determine the platform you want to develop apps for -- the web, mobile, gaming, embedded systems -- and then focus on the predominant language for that area.

"Which programming languages do you use?"

In many organizations, that has become a loaded question. There is a decided trend toward open source development tools, as indicated by the results of a Forrester Research survey of 1,400 developers. ZDNet's Steven J. Vaughan-Nichols reports on the study in an October 29, 2014, article.

Conventional wisdom says open-source development tools are popular primarily because they cost less than their proprietary counterparts. That belief is turned on its head by the Forrester survey, which found performance and reliability are the main reasons why developers prefer to work with open-source tools. (Note that Windows still dominates on the desktop, while open source leads on servers, in data centers, and in the cloud.)

Then again, "open source" encompasses a universe of different development tools for various platforms: the web, mobile, gaming, embedded systems -- the list goes on. A would-be developer can waste a lot of time bouncing from Rails to Django to Node.js to Scala to Clojure to Go. As Quincy Larson explains in a November 14, 2014, post on the FreeCodeCamp blog, the key to a successful career as a programmer is to focus.

Larson recounts his seven months of self-study of a half-dozen different programming languages before landing his first job as a developer -- in which he used none of them. Instead, his team used Ruby on Rails, a relative graybeard among development environments. The benefits of focusing on a handful of tools are many: developers quickly become experts, productivity is enhanced because people can collaborate without a difference in tools getting in the way, and programmers aren't distracted by worrying about missing out on the flavor of the month.

Larson recommends choosing a single type of development (web, gaming, mobile) and sticking with it; learning only one language (JavaScript/Node.js, Rails/Ruby, or Django/Python); and following a single online curriculum (such as FreeCodeCamp.com or NodeSchool.io for JavaScript, TheOdinProject.com or TeamTreehouse.com for Ruby, and Udacity.com for Python).


A cutout from Lifehacker's "Which Programming Language?" infographic lists the benefits of languages by platform. Source: Lifehacker

Why basing your choice on potential salary is a bad idea

Just because you can make a lot of money developing in a particular language doesn't mean it's the best career choice. Readwrite's Matt Asay points out in a November 28, 2014, article that a more rewarding criterion in the long run is which language will ensure you can find a job. Asay recommends checking RedMonk's list of popular programming languages.

Boiling the decision down to its essence, the experts quoted by Asay suggest JavaScript for the Internet, Go for the cloud, and Swift (Apple) or Java (Android) for mobile. Of course, as with most tech subjects, opinions vary widely. In terms of job growth, Ruby appears to be fading, Go and Node.js are booming, and Python is holding steady.

But don't bail on Ruby or other old-time languages just yet. According to Quartz's programmer salary survey, Ruby on Rails pays best, followed by Objective C, Python, and Java.

While Ruby's popularity may be on the wane, programmers can still make good coin if they know Ruby on Rails. Source: Quartz, via Readwrite

Also championing old-school languages is Readwrite's Lauren Orsini in a September 1, 2014, article. Orsini cites a study by researchers at Princeton and UC Berkeley that found inertia is the primary driver of developers' choice of language. People stick with a language because they know it, not because of any particular features of the language. Exhibits A, B, C, and D of this phenomenon are PHP, Python, Ruby, and JavaScript -- and that doesn't even include the Methuselah of languages: C.

No matter your language of choice, you'll find it combines well with the new Morpheus Virtual Appliance, which lets you monitor and manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with a single click, and each instance includes a free full replica set for failover and fault tolerance. Your MySQL and Redis databases are backed up and you can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

MongoDB Poised to Play a Key Role in Managing the Internet of Things


Rather than out-and-out replacing their relational counterparts, MongoDB and other NoSQL databases will coexist with traditional RDBMSs. However, as more -- and more varied -- data swamps companies, the scalability and data-model flexibility of NoSQL will make it the management platform of choice for many of tomorrow's data-analysis applications.

There's something comforting in the familiar. When it comes to databases, developers and users are warm and cozy with the standard, nicely structured tables-and-rows relational format. In the not-too-distant past, nearly all of the data an organization needed fit snugly in the decades-old relational model.

Well, things change. What's changing now is the nature of a business's data. Much time and effort has been spent converting today's square-peg unstructured data into the round hole of relational DBMSs. But rather than RDBMSs being modified to support the characteristics of non-textual, non-document data, companies are now finding it more effective to adapt databases designed for unstructured data to accommodate traditional data types.

Two trends are converging to make this transition possible: NoSQL databases such as MongoDB are maturing to add the data-management features businesses require; and the amount and types of data are exploding with the arrival of the Internet of Things (IoT).

Heterogeneous DBs are the wave of the future

As ReadWrite's Matt Asay reports in a November 28, 2014, article, any DBAs who haven't yet added a NoSQL database or two to their toolbelt are in danger of falling behind. Asay cites a report by Machine Research that found relational and NoSQL databases are destined to coexist in the data center: the former will continue to be used to process "structured, highly uniform data sets," while the latter will manage the unstructured data created by "millions and millions of sensors, devices, and gateways."

Relational databases worked for decades because you could predict the characteristics of the data they held. One of the distinguishing aspects of IoT data is its unpredictability: you can't be sure where it will come from, or what forms it will take. Managing this data requires a new set of skills, which has led some analysts to caution that a severe shortage of developers trained in NoSQL may impede the industry's growth.

The expected increase in NoSQL-based development in organizations could be hindered by a shortage of skilled staff. Source: VisionMobile, via ReadWrite

The ability to scale to accommodate data elements measured in the billions is a cornerstone of NoSQL databases, but Asay points out the feature that will drive NoSQL adoption is flexible data modeling. Whatever devices or services are deployed in the future, NoSQL is ready for them.

Document locking one sign of MongoDB's growing maturity

According to software consultant Andrew C. Oliver -- a self-described "unabashed fan of MongoDB" -- the highlight of last summer's MongoDB World conference was the announcement that document-level locking is now supported. Oliver gives his take on the conference happenings in a July 3, 2014, article on InfoWorld.

Oliver compares MongoDB's document-level locking to row-level locking in an RDBMS, although documents may contain much more data than a row in an RDBMS. Some conference-goers projected that multiple documents may one day be written with ACID consistency, even if done so "locally" to a single shard.

Another indication of MongoDB becoming suitable for a wider range of applications is the release of the SlamData analytics tool that works without having to export data via ETL from MongoDB to an RDBMS or Hadoop. InfoWorld's Oliver describes SlamData in a December 11, 2014, article.

In contrast to the Pentaho business-intelligence tool that also supports MongoDB, SlamData CEO Jeff Carr states that the company's product doesn't require a conversion of document databases to the RDBMS format. SlamData is designed to allow people familiar with SQL to analyze data based on queries of MongoDB document collections via a notebook-like interface.

The SlamData business-intelligence tool for MongoDB uses a notebook metaphor for charting data based on collection queries. Source: InfoWorld

There's no simpler or more-efficient way to manage heterogeneous databases than by using the point-and-click interface of the new Morpheus Virtual Appliance, which lets you monitor and analyze heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with one click, and each instance includes a free full replica set for failover and fault tolerance. You can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

MongoDB 3.0 First Look: Faster, More Storage Efficient, Multi-model


Document-level locking and pluggable storage APIs top the list of new features in MongoDB 3.0, but the big-picture view points to a more prominent role for NoSQL databases in companies of all types and sizes. The immediate future of databases is relational, non-relational, and everything in between -- sometimes all at once.

Version 3.0 of MongoDB, the leading NoSQL database, is being touted as the first release that is truly ready for the enterprise. The new version was announced in February and shipped in early March. At least one early tester, Adam Comerford, reports that MongoDB 3.0 is indeed more efficient at managing storage, and faster at reading compressed data.

The new feature in MongoDB 3.0 gaining the lion's share of analysts' attention is the addition of the WiredTiger storage engine and pluggable API that MongoDB acquired in December 2014. JavaWorld's Andrew C. Oliver states in a February 3, 2015, article that WiredTiger will likely boost performance over MongoDB's default MMapV1 engine in apps where reads don't greatly outnumber writes.

Oliver points out that WiredTiger's B-tree and Log Structured Merge (LSM) algorithms benefit apps with large caches (B-tree) and with data that doesn't cache well (LSM). WiredTiger also promises data compression that reduces storage needs by up to 80 percent, according to the company.

The addition of the WiredTiger storage engine is one of the new features in MongoDB 3.0 that promises to improve performance, particularly for enterprise customers. Source: Software Development Times

Other enhancements in MongoDB 3.0 include the following:

  • Document-level locking for concurrency control via WiredTiger
  • Collection-level concurrency control and more efficient journaling in MMapV1
  • A pluggable API for integration with in-memory, encrypted, HDFS, hardware-optimized, and other environments
  • The Ops Manager graphical management console in the enterprise version

Computing's John Leonard emphasizes in a February 3, 2015, article that MongoDB 3.0's multi-model functionality via the WiredTiger API positions the database to compete with DataStax' Apache Cassandra NoSQL database and Titan graph database. Leonard also highlights the new version's improved scalability.


Putting MongoDB 3.0 to the (performance) test

MongoDB 3.0's claims of improved performance were borne out by preliminary tests conducted by Adam Comerford and reported on his Adam's R&R blog in posts on February 4, 2015, and February 5, 2015. Comerford repeated compression tests with the WiredTiger storage engine in release candidate 7 (RC7) -- expected to be the last before the final version comes out in March -- that he ran originally using RC0 several months ago. The testing was done on an Ubuntu 14.10 host with an ext4 file system.

The results showed that WiredTiger's on-disk compression reduced storage to 24 percent of non-compressed storage, and to only 16 percent of the storage space used by MMapV1. Similarly, the defaults for WiredTiger with MongoDB (the WT/snappy bar below) used 50 percent of non-compressed WiredTiger and 34.7 percent of MMapV1.

Testing WiredTiger storage (compressed and non-compressed) compared to MMapV1 storage showed a tremendous advantage for the new MongoDB storage engine. Source: Adam Comerford

Comerford's tests of the benefits of compression for reads when available I/O capacity is limited demonstrated much faster performance when reading compressed data using snappy and zlib, respectively. A relatively slow external USB 3.0 drive was used to simulate "reasonable I/O constraints." The times indicate how long it took to read the entire 16GB test dataset from the on-disk testing into memory from the same disk.

Read tests from compressed and non-compressed disks in a simulated limited-storage environment indicate faster reads with WiredTiger in all scenarios. Source: Adam Comerford


All signs point to a more prominent role in organizations of all sizes for MongoDB in particular and NoSQL in general. Running relational and non-relational databases side-by-side is becoming the rule rather than the exception. The new Morpheus Virtual Appliance puts you in good position to be ready for multi-model database environments. It supports rapid provisioning and deployment of MongoDB v3.0 across public, private and hybrid clouds. Sign Up for a Free Trial now!

Preparing Developers for a Multi-language Multi-paradigm Future


Tried-and-true languages such as Java, C++, Python, and JavaScript continue to dominate the most popular lists, but modern app development requires a multi-language approach to support diverse platforms and links to backend servers. The future will see new languages being used in conjunction with the old reliables.

Every year, new programming languages are developed. Recent examples are Apple's Swift and Carnegie Mellon University's Wyvernet. Yet for more than a decade, the same handful no. of languages have retained their popularity with developers -- Java, JavaScript, C/C++/C#/Objective-C, Python, Ruby, PHP -- even though each is considered to have serious shortcomings for modern app development.

According to TIOBE Software's TIOBE Index for January 2015, JavaScript recorded the greatest increase in popularity in 2014, followed by PL/SQL and Perl.

The same old programming languages dominate the popularity polls, as shown by the most-recent TIOBE Index. Source: TIOBE Software

Of course, choosing the best language for any development project rarely boils down to a popularity contest. When RedMonk's Donnie Berkholz analyzed GitHub language trends in May 2014, aggregating new users, issues, and repositories, he concluded that only five languages have mattered on GitHub since 2008: JavaScript, Ruby, Java, PHP, and Python.


An analysis of language activity on GitHub between 2008 and 2013 indicates growing fragmentation. Source: RedMonk

Two important caveats to Berkholz's analysis are that GitHub focused on Ruby on Rails when it launched but has since gone more mainstream; and that Windows and iOS development barely register because both are generally open source-averse. As IT World's Phil Johnson points out in a May 7, 2014, article, while it's dangerous to draw conclusions about language popularity based on this or any other single analysis, it seems clear the industry is diverging rather than converging.

Today's apps require a multi-language, multi-paradigm approach

Even straightforward development projects require expertise in multiple languages. TechCrunch's Danny Crichton states in a July 10, 2014, article that creating an app for the web and mobile entails HTML, CSS, and JavaScript for the frontend (others as well, depending on the libraries required); Java and Objective-C (or Swift) for Android and iPhone, respectively; and for links to backend servers, Python, Ruby, or Go, as well as SQL or other database query languages.

Crichton identifies three trends driving multi-language development. The first is faster adoption of new languages: GitHub and similar sites encourage broader participation in developing libraries and tutorials; and developers are more willing to learn new languages. Second, apps have to run on multiple platforms, each with unique requirements and characteristics. And third, functional programming languages are moving out of academia and into the mainstream.

Researcher Benjamin Erb suggests that rather than functional languages replacing object-oriented languages, the future will be dominated by multi-paradigm development, in particular to address concurrency requirements. In addition to supporting objects, inheritance, and imperative code, multi-paradigm languages incorporate higher-order functions, closures, and restricted mutability.

One way to future-proof your SQL, NoSQL, and in-memory databases is by using the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with one click, and each instance includes a free full replica set for failover and fault tolerance. You can administer your databases using your choice of tools. Visit the Morpheus site for pricing information and to create a free account.

Troubleshooting Problems with MySQL Replication


One of the most common MySQL operations is replicating databases between master and slave servers. While most such connections are straightforward to establish and maintain, on occasion something goes amiss: some master data may not replicate on the slave, or read requests may be routed to the master rather than to the server, for example. Finding a solution to a replication failure sometimes requires a little extra detective work.

Replication is one of the most basic operations in MySQL -- and any other database: it's used to copy data from one database server (the master) to one or more others (the slaves). The process improves performance by allowing loads to be distributed among multiple slave servers for reads, and by limiting the master server to writes.

Additional benefits of replication are security via slave backups; analytics, which can be performed on the slaves without affecting the master's performance; and widespread data distribution, which is accomplished without requiring access to the master. (See the MySQL Reference Manual for more on replication.)

As with any other aspect of database management, replication doesn't always proceed as expected. The Troubleshooting Replication section of the MySQL Reference Manual instructs you to check for messages in your error log when something goes wrong with replication. If the error log doesn't point you to the solution, ensure that binary logging is enabled in the master by issuing a SHOW MASTER STATUS statement. If it's enabled, "Position" is nonzero; if it isn't, make sure the master is running with the --log-bin option.

The manual offers several other replication-troubleshooting steps:

  • The master and slave must both start with the --server-id option, and each server must have a unique ID value;
  • Run SHOW SLAVE STATUS to ensure the Slave_IO_Running and Slave_SQL_Running values are both "yes";
  • Run SHOW_PROCESSLIST and look in the State column to verify that the slave is connecting to the master;
  • If a statement succeeded on the master but failed on the slave, the nuclear option is to do a full database resynchronization, which entails deleting the slave's database and copying a new snapshot from the master. (Several less-drastic alternatives are described in the MySQL manual.)

Solutions to real-world MySQL replication problems

What do you do when MySQL indicates the master-slave connection is in order, yet some data on the master isn't being copied to the slave? That's the situation described in a Stack Overflow post from March 2010.

Even though replication appears to be configured correctly, data is not being copied from the master to the slave. Source: Stack Overflow

The first step is to run "show master status" or "show master status\G" on the master database to get the correct values for the slave. The slave status above indicates the slave is connected to the master and awaiting log events. Synching the correct log file position should restore copying to the slave.

To ensure a good sync, stop the master, dump the database, record the master log file positions, restart the master, import the database to the slave, and start the slave in slave mode with the correct master log file position.

Another Stack Overflow post from March 2014 presents a master/slave setup using JDBC drivers in which transactions marked as read-only were still pinging the master. Since the MySQL JDBC driver was managing the connections to the physical servers -- master and slave -- the connection pool and Spring transaction manager weren't aware that the database connection was linking to multiple servers.

The solution is to return control to Spring, after which the transaction on the connection will be committed. The transaction debug message will indicate that queries will be routed to the slave server so long as the connection is in read-only mode. By resetting the connection before it is returned to the pool, the read-only mode is cleared and the last log message will show that queries are now being routed to the master server.

The point-and-click dashboard in the new Morpheus Virtual Appliance makes it a breeze to diagnose and repair replication errors -- and other hiccups -- in your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site to create a free account.

"Too Many Connections": How to Increase the MySQL Connection Count To Avoid This Problem


If you don't have enough connections open to your MySQL server, your users will begin to receive a "Too many connections" error while trying to use your service. To fix this, you can increase the maximum number of connections to the database that are allowed, but there are some things to take into consideration before simply ramping up this number.

Items to Consider

Before you increase the connections limit, you will want to ensure that the machine on which the database is housed can handle the additional workload. The maximum number of connections that can be supported depends on the following variables:

  • The available RAM – The system will need to have enough RAM to handle the additional workload.
  • The thread library quality of the platform - This will vary based on the platform. For example, Windows can be limited by the Posix compatibility layer it uses (though the limit no longer applies to MySQL v5.5 and up). However, there remains memoray usage concerns depending on the architecture (x86 vs. x64) and how much memory can be consumed per application process. 
  • The required response time - Increasing the number could increase the amount of time to respond to request. This should be tested to ensure it meets your needs before going into production.
  • The amount of RAM used per connection - Again, RAM is important, so you will need to know if the RAM used per connection will overload the system or not.
  • The workload required for each connection - The workload will also factor in to what system resources are needed to handle the additional connections.

Another issue to consider is that you may also need to increase the open files limit–This may be necessary so that enough handles are available.

Checking the Connection Limit

To see what the current connection limit is, you can run the following from the MySQL command line or from many of the available MySQL tools such as phpMyAdmin:


The show variables command.

This will display a nicely formatted result for you:


Example result of the show variables command.

Increasing the Connection Limit

To increase the global number of connections temporarily, you can run the following from the command line:



An example of setting the max_connections global.

If you want to make the increase permanent, you will need to edit the my.cnf configuration file. You will need to determine the location of this file for your operating system (Linux systems often store the file in the /etc folder, for example). Open this file add a line that includes max_connections, followed by an equal sign, followed by the number you want to use, as in the following example:


example of setting the max_connections

The next time you restart MySQL, the new setting will take effect and will remain in place unless or until this is changed again.

Easily Scale a MySQL Database

Instead of worrying about these settings on your own system, you could opt to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, Redis, and Elasticsearch).

In addition, MySQL and Redis have automatic back ups, and each database instance is replicated, archived, and deployed on a high performance infrastructure with Solid State Drives. You can start a free account today to begin taking advantage of this service!

How to Minimize Data Wrangling and Maximize Data Intelligence


It's not unusual for data analysts to spend more than half their time cleaning and converting data rather than extracting business intelligence from it. As data stores grow in size and data types proliferate, a new generation of tools are arriving that promise to deliver sophisticated analysis tools into the hands of non-data scientists.

One of the hottest job titles in technology is Data Scientist, perhaps surpassed only by the newest C-level position: Chief Data Scientist. IT's long-standing skepticism about such trends is evident by the joke cited by InfoWorld's Yves de Montcheuil that a data scientist is a business analyst who lives in California.

There's nothing funny about every company's need to translate its data into business intelligence. That's where data scientists take the lead role, but as the amount and types of data proliferate, data scientists find themselves spending the bulk of their time cleaning and converting data rather than analyzing and communicating it to business managers.

A recent survey of data scientists (registration required) conducted by IT-project crowdsourcing firm CrowdFlower found that two out of three analysts claim cleaning and organizing data is their most time-consuming task, and 52 percent report their biggest obstacle is poor quality data. While the respondents named 48 different technologies they use in their work, the most popular is Excel (55.6 percent), followed by the open source language R (43.1 percent) and the Tableau data-visualization software (26.1 percent).

Data scientists identify their greatest challenges as time spent cleaning data, poor data quality, lack of time for analysis, and ineffective data modeling. Source: CrowdFlower

What's holding data analysis back? The data scientists surveyed cite a lack of tools required to do their job effectively (54.3 percent), failure of their organizations to state goals and objectives clearly (52.3 percent), and insufficient investment in training (47.7 percent).

A dearth of tools, unclear goals, and too little training are reported as the principal impediments to data scientists' effectiveness. Source: CrowdFlower

New tools promise to 'consumerize' big data analysis

It's a common theme in technology: In the early days, only an elite few possess the knowledge and tools required to understand and use it, but over time the products improve and drop in price, businesses adapt, and the technology goes mainstream. New data-analysis tools are arriving that promise to deliver the benefits of the technology to non-scientists.

Steve Lohr profiles several of these products in an August 17, 2014, article in the New York Times. For example, ClearStory Data's software combines data from multiple sources and converts it into charts, maps, and other graphics. Taking a different approach to the data-preparation problem is Paxata, which offers software that retrieves, cleans, and blends data for analysis by various visualization tools.

The not-for-profit Open Knowledge Labs bills itself as a community of "civic hackers, data wranglers and ordinary citizens intrigued and excited by the possibilities of combining technology and information for good." The group is seeking volunteer "data curators" to maintain core data sets such as GDP and ISO-codes. OKL's Rufus Pollock describes the project in a January 3, 2015, post.

Open Knowledge Labs is seeking volunteer coders to curate core data sets as part of the Frictionless Data Project. Source: Open Knowledge Labs

There's no simpler or straightforward way to manage your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases than by using the new Morpheus Virtual Appliance. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds via a single point-and-click dashboard. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site to create a free account.

"Too Many Connections": How to Increase the MySQL Connection Count To Avoid This Problem


If you don't have enough connections open to your MySQL server, your users will begin to receive a "Too many connections" error while trying to use your service. To fix this, you can increase the maximum number of connections to the database that are allowed, but there are some things to take into consideration before simply ramping up this number.

Items to Consider

Before you increase the connections limit, you will want to ensure that the machine on which the database is housed can handle the additional workload. The maximum number of connections that can be supported depends on the following variables:

  • The available RAM – The system will need to have enough RAM to handle the additional workload.
  • The thread library quality of the platform - This will vary based on the platform. For example, Windows can be limited by the Posix compatibility layer it uses (though the limit no longer applies to MySQL v5.5 and up). However, there remains memoray usage concerns depending on the architecture (x86 vs. x64) and how much memory can be consumed per application process. 
  • The required response time - Increasing the number could increase the amount of time to respond to request. This should be tested to ensure it meets your needs before going into production.
  • The amount of RAM used per connection - Again, RAM is important, so you will need to know if the RAM used per connection will overload the system or not.
  • The workload required for each connection - The workload will also factor in to what system resources are needed to handle the additional connections.

Another issue to consider is that you may also need to increase the open files limit–This may be necessary so that enough handles are available.

Checking the Connection Limit

To see what the current connection limit is, you can run the following from the MySQL command line or from many of the available MySQL tools such as phpMyAdmin:


The show variables command.

This will display a nicely formatted result for you:


Example result of the show variables command.

Increasing the Connection Limit

To increase the global number of connections temporarily, you can run the following from the command line:



An example of setting the max_connections global.

If you want to make the increase permanent, you will need to edit the my.cnf configuration file. You will need to determine the location of this file for your operating system (Linux systems often store the file in the /etc folder, for example). Open this file add a line that includes max_connections, followed by an equal sign, followed by the number you want to use, as in the following example:


example of setting the max_connections

The next time you restart MySQL, the new setting will take effect and will remain in place unless or until this is changed again.

Easily Scale a MySQL Database

Instead of worrying about these settings on your own system, you could opt to use a service like Morpheus, which offers databases as a service on the cloud. With Morpheus, you can easily and quickly set up your choice of several databases (including MySQL, MongoDB, Redis, and Elasticsearch).

In addition, MySQL and Redis have automatic back ups, and each database instance is replicated, archived, and deployed on a high performance infrastructure with Solid State Drives. You can start a free account today to begin taking advantage of this service!

Viewing all 1101 articles
Browse latest View live