Morpheus Blog

The many OpenSSL vulnerabilities coming to light in recent months have motivated a thorough audit of the open system's code. But this hasn't prevented companies from implementing proprietary SSL alternatives, including application delivery controllers running a streamlined, closed SSL stack, and Google's own BoringSSL implementation.

It's only March, but it has already been a rough year for OpenSSL security. On January 8, the OpenSSL Project issued updates that addressed eight separate security holes, two of which were rated as "moderate" in severity. SC Magazine's Adam Greenberg reports on the patches in a January 8, 2015, article.

Then in the first week of March, the FREAK vulnerability was disclosed, which made one-fourth of all SSL-encrypted sites susceptible to man-in-the-middle attacks, as Informationweek Dark Reading's Kelly Jackson Higgins explains in a March 3, 2015, article.

Now site managers are sweating yet another OpenSSL patch for a security hole that could be just as serious as FREAK. In a mailing list notice posted on March 16, 2015, the OpenSSL Project's Matt Caswell announced the March 19, 2015, release of a patch for multiple OpenSSL vulnerabilities, at least one of which is classified as "high" severity. The Register's Darrin Pauli reports in a March 17, 2015, post that the updates apply to OpenSSL versions 1.0.2a, 1.0.1m, 1.0.0r, and 0.9.8zf.

Web giants finance long-overdue OpenSSL security audit

The alert about the new OpenSSL vulnerability comes just more than a week after it was announced that the NCC Group security firm would be conducting an audit of OpenSSL code. The goal of the audit is to spot errors in the code before they are discovered in the wild, as ZDNet's Steven J. Vaughan-Nichols writes in a March 7, 2015, article.

NCC Group principal security engineer Thomas Ritter states that the OpenSSL codebase is now stable enough to undergo a thorough analysis and revision. The focus of the NCC Group audit will be on Transport Layer Security attacks related to protocol flow, state transitions, and memory management. Preliminary results are expected by early summer 2015, according to Ritter.

OpenSSL is only one of the many Secure Sockets Layer/Transport Layer Security implementations for encrypting web content. Source: Ale Agostini

Serious vulnerabilities discovered in OpenSSL in the recent past, including Heartbleed, Shellshock, and Early CCS, cause sites to rush to apply patches. The OpenSSL audit is the first project under the Linux Foundation's Core Infrastructure Initiative, which is funded in large part by contributions from Google, Amazon, Cisco Systems, Microsoft, and Facebook, as the Register's Pauli notes in a March 10, 2015, article.

Proprietary SSL implementations as a safer alternative to OpenSSL

The number and severity of OpenSSL security holes have caused some organizations to build their own proprietary SSL stack on application delivery controllers, as FirstPost's Shibu Paul describes in a March 9, 2015, article. ADCs are a new type of advanced load balancer for frontend servers that use a streamlined version of the SSL stack designed to be small enough to execute in the kernel.

An advantage of proprietary SSL stacks is that hackers don't have access to the code the way they do for open systems. If an organization discovers a vulnerability in its proprietary SSL stack, it can address the problem without the public being aware of it. That's why companies using ADCs weren't susceptible to Heartbleed or man-in-the-middle attacks.

Application delivery controllers are touted as a more-secure alternative to OpenSSL because they rely on a proprietary SSL stack. Source: Scope Middle East

Google's response to the many OpenSSL vulnerabilities was to create its own version of the encryption standard, called BoringSSL. As Matthew McKenna writes in a February 25, 2015, post on the TechZone 360 site, having to manage more than 70 OpenSSL patches was making it difficult for the company to maintain consistency across its multiple code bases.

Maintaining security without impacting manageability is a key precept of the new Morpheus Virtual Appliance, which lets you provision, deploy, and monitor heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single point-and-click console. With the Morpheus database-as-a-service (DBaaS) you can manage all your SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

In addition, the service allows you to migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

Choosing the right database for a project is an extremely important step in planning and development. Picking the wrong setup can cost quite a bit of time and money, and can leave you with numerous upset users in the process. Both MongoDB and MySQL are excellent databases when used in their expected ways, but which one is better for building a social network?

What is MongoDB?

MongoDB is a NoSQL database, which means that related data gets stored in single documents for fast retrieval. This is often a good model for when data won’t need to be duplicated in multiple documents (which can cause inconsistencies).

An example of a MongoDB document. Source: MongoDB.

MongoDB is easily scalable in such cases, so the database can have rapid horizontal growth while automatically keeping things in order. This can be especially good when you have large amounts of data and need a quick response time.

What is MySQL?

MySQL is a relational database, which means that data gets stored (preferably) in normalized tables so that there is no duplication. This is a good model when you need data to be consistent and reliable at all times (such as personal information or financial data).

An example of a MySQL table. Source: MySQL.

While horizontal scaling can be more difficult, it does adhere to the ACID model (atomicity, consistency, isolation, durability), which means you have far fewer worries about data reliability.

How Does Social Networking Work?

Social networks offer different ways for people to connect. Whether it is through a mutual friendship, a business associate, or following a well-known person or business for updates, there are numerous methods of getting information out over social networks.

The key ingredient is in the connections: for anyone a user is connected with or following, that person will typically see the updates from all of those connections once logged in to the system.

An example social network relationship diagram. Source: SarahMei.com.

Comparison of the Databases

Given that social data has various relations, to users in particular, it lends better to a relational database over time. Even though a NoSQL solution like MongoDB can seem like a great way to retrieve lots of data quickly, the relational nature of users in a social network can cause lots of duplication to occur.

Such duplication lends itself to data becoming inconsistent and/or unreliable over time, or to queries becoming much more difficult to handle if the duplication is removed (since documents will likely need to point to other documents, which is not optimal for a NoSQL type of database).

As a result, MySQL would be the better recommendation, since it will have the data reliability and relational tools necessary to handle the interactions and relationships among numerous users. You may also decide to use both MySQL and MongDB together to utilize the best features of each database.

Get MySQL or MongoDB

Whether you decide to use one or both databases, the new Morpheus Virtual Appliance seamlessly provisions and manages both SQL and NoSQL databases across private and public (or even hybrid) clouds. With its easy to use interface, you can have a new instance of a database up and running in seconds.

Visit the Morpheus site to create a free account.

Researchers are developing compiler technologies that optimize and regenerate code in multiple languages and for many different platforms in only one or a handful of steps. While much of their work focuses on Java and JavaScript, their innovations will impact developers working in nearly all programming languages.

Who says you can't teach an old dog new tricks? One of the staples of any developer's code-optimization toolkit is a compiler, which checks your program's syntax, semantics, and other aspects for errors and otherwise optimizes its performance.

Infostructure Associates' Wayne Kernochan explains in an October 2014 TechTarget article that compilers are particularly adept at improving the performance of big data and business-critical online transaction processing (OLTP) applications. As recent developments in compiler technology point out, the importance of the programs goes far beyond these specialty apps.

Google is developing two new Java compilers named Jack (Java Android Compiler Kit) and Jill (Jack Intermediate Library Linker) that are part of Android SDK 21.1. I Programmer's Harry Fairhead writes in a December 12, 2014, article that Jack compiles Java code directly to a .dex Dalvik Executable rather than using the standard javac compiler to convert the source code to Java bytecode and then to Dalvik bytecode by feeding it through the dex compiler.

In addition to skipping the conversion to Java bytecode, Jack also optimizes and applies Proguard's obfuscation in a single step. The .dex code Jack generates can be fed to either the Dalvik engine or the new ART Android RunTime Engine, which uses Ahead-of-Time compilation to improve speed.

Jill converts .jar library files into the .jack library format to allow it to be merged with the rest of the object code.

Google's new Jack and Jill Java compilers promise to speed up compilation by generating Dalvik bytecode without first having to convert it from Java bytecode. Source: I Programmer

In addition to streamlining compilation, Jack and Jill reduce Google's reliance on Java APIs, which are the subject of the company's ongoing lawsuit with Oracle. At present, the compilers don't support Java 8, but in terms of retaining compatibility with Java, it appears Android has become the tail wagging the dog.

Competition among open-source compiler infrastructures heats up

The latest versions of the LLVM and Gnu Compiler Collection (GCC) are in a race to see which can out-perform the other. Both open-source compiler infrastructures generate object code from any kind of source code; they support C/C++, Objective-C, Fortran, and other languages. InfoWorld's Serdar Yegulalp reports in a September 8, 2014, article that testing conducted by Phoronix of LLVM 3.5 and a pre-release version of GCC 5 found that LLVM recorded faster C/C++ compile times. However, LLVM trailed GCC when processing some encryption algorithms and other tests.

Version 3.5 of the LLVM compiler infrastructure outperformed GCC 5 in some of Phoronix's speed tests but trailed in others, including audio encoding. Source: Phoronix

The ability to share code between JavaScript and Windows applications is a key feature of the new DuoCode compiler, which supports cross-compiling of C# code into JavaScript. InfoWorld's Paul Krill describes the new compiler in a January 22, 2015, article. DuoCode uses Microsoft's Roslyn compiler for code parsing, syntactic tree (AST) generation, and contextual analysis. DuoCode then handles the code translation and JavaScript generation, including source maps.

Another innovative approach to JavaScript compiling is the experimental Higgs compiler created by University of Montreal researcher Maxime Chevalier-Boisvert. InfoWorld's Yegulalp describes the project in a September 19, 2014, article. Higgs differs from other just-in-time (JIT) JavaScript compilers such as Google's V8, Mozilla's SpiderMonkey, and Apple's LLVM-backed FTLJIT project in that it has a single level rather than being multitiered, and it accumulates type information as machine-level code rather than using type analysis.

When it comes to optimizing your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases, the new Morpheus Virtual Appliance makes it as easy as pointing and clicking in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with a single click, and each instance includes a free full replica set for failover and fault tolerance. Your MySQL and Redis databases are backed up and you can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

Too often the increased efficiencies and performance improvements promised by new data technologies seem to vanish into thin air when the systems hit the production floor. Not so for these three companies that implemented MongoDB databases in very different environments, but realized very similar benefits: faster app speeds and lower overall system costs.

A hedge fund reduced its software licensing costs by a factor of 40, and its data storage by 40 percent. In addition, its quantitative analysts' modeling is now 25 times faster.

A retailer has installed in-store touch screens that give its customers an enjoyable, interactive shopping experience. The company can create and modify its online catalogs in just minutes to keep pace with ever-changing fashion trends.

A firm that provides affiliate-marketing and partner-management services for enterprises was able to expand without incurring the added expenses for hardware and services it anticipated. The company's customers realized improved performance because the new system's compression and other storage enhancements allowed more of their report requests to be processed in RAM.

All three of these success stories were made possible by converting the companies' traditional databases to MongoDB.

Hedge fund adopts a self-service model for financial analyses

In the past, whenever British hedge fund AHL Man Group wanted to add any new data sources, it became a long, drawn-out process that piled onto the IT department's busy workload. As ComputerWeekly's Brian McKenna reports in a January 21, 2015, article, AHL decided to standardize on Python in 2012, and subsequently discovered that Python interfaced very smoothly with its MongoDB databases.

By the end of 2013 the company had completed a proof-of-concept project, after which it was able to finalize its transition to MongoDB by the end of May 2014, at which time its legacy-system licenses expired. The result was a 40-fold decrease in licensing costs, and a 40 percent reduction in disk-storage requirements. In addition, the switch to a self-service model has allowed some of the firm's analysts to perform their "quant" modeling up to 25 times faster than previously.

Retailer's in-store tablets keep pace with fashion trends

Another January 21, 2015, article on the Apparel site recounts how retailer Chico's FAS developed a MongoDB-based application for its in-store touch-screen Tech Tablets that customers use as virtual catalogs. In addition to highlighting Chico's latest styles, the tablets show product videos and testimonials. The key benefit of the MongoDB application is the ability to create and adapt catalogs in minutes rather than the weeks required previously.

It took Chico's only five months to develop and implement the MongoDB-based app, which easily scaled to meet the retailer's increased demand in the holiday shopping season. More importantly, the app created an interactive, personalized shopping experience that's sure to bring its customers back for more.

MongoDB distro lets expanding marketer avoid high hardware costs

As a company grows, its data networks have to grow along with it, which often increases cost and complexity exponentially. Affiliate-marketing and partner-management firm Performance Horizon Group (PHG) faced skyrocketing hardware expenses as it grew its operations supporting enterprise clients in more than 150 countries.

By implementing Tokutek's TokuMX distribution of MongoDB, PHG reduced its need for new servers by a factor of eight, according to PHG CTO Pete Cheyne. In addition, each of the new servers required only half the RAM of its existing machines while accommodating a growing number of data sets. PHG's implementation of TokuMX is described in a December 2, 2014, Tokutek press release.

Any organization can improve the efficiency of its database-management operations by adopting the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with one click, and each instance includes a free full replica set for failover and fault tolerance. You can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

There isn't a single best programming language. Rather than flitting from one language to the next as each comes into fashion, determine the platform you want to develop apps for -- the web, mobile, gaming, embedded systems -- and then focus on the predominant language for that area.

"Which programming languages do you use?"

In many organizations, that has become a loaded question. There is a decided trend toward open source development tools, as indicated by the results of a Forrester Research survey of 1,400 developers. ZDNet's Steven J. Vaughan-Nichols reports on the study in an October 29, 2014, article.

Conventional wisdom says open-source development tools are popular primarily because they cost less than their proprietary counterparts. That belief is turned on its head by the Forrester survey, which found performance and reliability are the main reasons why developers prefer to work with open-source tools. (Note that Windows still dominates on the desktop, while open source leads on servers, in data centers, and in the cloud.)

Then again, "open source" encompasses a universe of different development tools for various platforms: the web, mobile, gaming, embedded systems -- the list goes on. A would-be developer can waste a lot of time bouncing from Rails to Django to Node.js to Scala to Clojure to Go. As Quincy Larson explains in a November 14, 2014, post on the FreeCodeCamp blog, the key to a successful career as a programmer is to focus.

Larson recounts his seven months of self-study of a half-dozen different programming languages before landing his first job as a developer -- in which he used none of them. Instead, his team used Ruby on Rails, a relative graybeard among development environments. The benefits of focusing on a handful of tools are many: developers quickly become experts, productivity is enhanced because people can collaborate without a difference in tools getting in the way, and programmers aren't distracted by worrying about missing out on the flavor of the month.

Larson recommends choosing a single type of development (web, gaming, mobile) and sticking with it; learning only one language (JavaScript/Node.js, Rails/Ruby, or Django/Python); and following a single online curriculum (such as FreeCodeCamp.com or NodeSchool.io for JavaScript, TheOdinProject.com or TeamTreehouse.com for Ruby, and Udacity.com for Python).

A cutout from Lifehacker's "Which Programming Language?" infographic lists the benefits of languages by platform. Source: Lifehacker

Why basing your choice on potential salary is a bad idea

Just because you can make a lot of money developing in a particular language doesn't mean it's the best career choice. Readwrite's Matt Asay points out in a November 28, 2014, article that a more rewarding criterion in the long run is which language will ensure you can find a job. Asay recommends checking RedMonk's list of popular programming languages.

Boiling the decision down to its essence, the experts quoted by Asay suggest JavaScript for the Internet, Go for the cloud, and Swift (Apple) or Java (Android) for mobile. Of course, as with most tech subjects, opinions vary widely. In terms of job growth, Ruby appears to be fading, Go and Node.js are booming, and Python is holding steady.

But don't bail on Ruby or other old-time languages just yet. According to Quartz's programmer salary survey, Ruby on Rails pays best, followed by Objective C, Python, and Java.

While Ruby's popularity may be on the wane, programmers can still make good coin if they know Ruby on Rails. Source: Quartz, via Readwrite

Also championing old-school languages is Readwrite's Lauren Orsini in a September 1, 2014, article. Orsini cites a study by researchers at Princeton and UC Berkeley that found inertia is the primary driver of developers' choice of language. People stick with a language because they know it, not because of any particular features of the language. Exhibits A, B, C, and D of this phenomenon are PHP, Python, Ruby, and JavaScript -- and that doesn't even include the Methuselah of languages: C.

No matter your language of choice, you'll find it combines well with the new Morpheus Virtual Appliance, which lets you monitor and manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Rather than out-and-out replacing their relational counterparts, MongoDB and other NoSQL databases will coexist with traditional RDBMSs. However, as more -- and more varied -- data swamps companies, the scalability and data-model flexibility of NoSQL will make it the management platform of choice for many of tomorrow's data-analysis applications.

There's something comforting in the familiar. When it comes to databases, developers and users are warm and cozy with the standard, nicely structured tables-and-rows relational format. In the not-too-distant past, nearly all of the data an organization needed fit snugly in the decades-old relational model.

Well, things change. What's changing now is the nature of a business's data. Much time and effort has been spent converting today's square-peg unstructured data into the round hole of relational DBMSs. But rather than RDBMSs being modified to support the characteristics of non-textual, non-document data, companies are now finding it more effective to adapt databases designed for unstructured data to accommodate traditional data types.

Two trends are converging to make this transition possible: NoSQL databases such as MongoDB are maturing to add the data-management features businesses require; and the amount and types of data are exploding with the arrival of the Internet of Things (IoT).

Heterogeneous DBs are the wave of the future

As ReadWrite's Matt Asay reports in a November 28, 2014, article, any DBAs who haven't yet added a NoSQL database or two to their toolbelt are in danger of falling behind. Asay cites a report by Machine Research that found relational and NoSQL databases are destined to coexist in the data center: the former will continue to be used to process "structured, highly uniform data sets," while the latter will manage the unstructured data created by "millions and millions of sensors, devices, and gateways."

Relational databases worked for decades because you could predict the characteristics of the data they held. One of the distinguishing aspects of IoT data is its unpredictability: you can't be sure where it will come from, or what forms it will take. Managing this data requires a new set of skills, which has led some analysts to caution that a severe shortage of developers trained in NoSQL may impede the industry's growth.

The expected increase in NoSQL-based development in organizations could be hindered by a shortage of skilled staff. Source: VisionMobile, via ReadWrite

The ability to scale to accommodate data elements measured in the billions is a cornerstone of NoSQL databases, but Asay points out the feature that will drive NoSQL adoption is flexible data modeling. Whatever devices or services are deployed in the future, NoSQL is ready for them.

Document locking one sign of MongoDB's growing maturity

According to software consultant Andrew C. Oliver -- a self-described "unabashed fan of MongoDB" -- the highlight of last summer's MongoDB World conference was the announcement that document-level locking is now supported. Oliver gives his take on the conference happenings in a July 3, 2014, article on InfoWorld.

Oliver compares MongoDB's document-level locking to row-level locking in an RDBMS, although documents may contain much more data than a row in an RDBMS. Some conference-goers projected that multiple documents may one day be written with ACID consistency, even if done so "locally" to a single shard.

Another indication of MongoDB becoming suitable for a wider range of applications is the release of the SlamData analytics tool that works without having to export data via ETL from MongoDB to an RDBMS or Hadoop. InfoWorld's Oliver describes SlamData in a December 11, 2014, article.

In contrast to the Pentaho business-intelligence tool that also supports MongoDB, SlamData CEO Jeff Carr states that the company's product doesn't require a conversion of document databases to the RDBMS format. SlamData is designed to allow people familiar with SQL to analyze data based on queries of MongoDB document collections via a notebook-like interface.

The SlamData business-intelligence tool for MongoDB uses a notebook metaphor for charting data based on collection queries. Source: InfoWorld

There's no simpler or more-efficient way to manage heterogeneous databases than by using the point-and-click interface of the new Morpheus Virtual Appliance, which lets you monitor and analyze heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

There aren't many MySQL databases that don't need to support users' remote connections. While many failed remote connections can be traced to a misconfigured my.cnf file, the many nuances of MySQL remote links make troubleshooting a dropped network connection anything but straightforward.

Some Linux administrators were rattled this week to learn of the discovery by Qualys of a bug in the GNU C Library (glibc) that could render affected systems vulnerable to a remote code execution attack. In a January 27, 2015, article, The Register's Neil McAllister describes the dangers posed by Ghost to Linux and a handful of other OSes.

Ghost affects versions of glibc back to 2.2, which was released in 2000, but as threats go, this one appears to be pretty mild. For one thing, the routines involved are old and rarely used these days. Even when they are used, they aren't called in a manner that the vulnerability could exploit. Still, Linux vendors Debian, Red Hat, and Ubuntu have released patches for Ghost.

As Ars Technica's Dan Goodin explains in a January 27, 2015, article, Ghost may affect MySQL servers, Secure Shell servers, form submission apps, and other mail servers in addition to the Exim server on which Qualys demonstrated the remote code execution attack. However, Qualys has confirmed that Ghost does not impact Apache, Cups, Dovecot, GnuPG, isc-dhcp, lighttpd, mariadb/mysql, nfs-utils, nginx, nodejs, openldap, openssh, postfix, proftpd, pure-ftpd, rsyslog, samba, sendmail, sysklogd, syslog-ng, tcp_wrappers, vsftpd, or xinetd.

Finding a solution to common MySQL remote-connection glitches

While protecting against Ghost may be as simple as applying a patch, managing remote connections on MySQL servers and clients can leave DBAs pounding their keyboards. ITworld's Stephen Glasskeys writes in a December 19, 2014, post about the hoops he had to jump through to find the cause of a failed remote connection on a Linux MySQL server.

After using the ps command to list processes, Glasskeys found that the --skip-networking command was enabled, which tells MySQL not to listen for remote TCP/IP connections. Running KDE's Find Files/Folders tool determined that rc.mysqld was the only script file containing the text "--skip-networking".

Diagnosing the cause of failed remote connections on a MySQL server led to the file rc.mysqld. Source: ITworld

To restore remote connections, open rc.mysqld and comment out the command by placing a pound sign (#) at the beginning of the line. Then edit the MySQL configuration file /etc/my.cnf as follows, making sure bind-address is set to 0.0.0.0:

Edit the /etc/my.cnf file to ensure bind-address is set to 0.0.0.0. Source: ITworld

Finally, use the "iptables --list" command to make sure the Linux server is set to accept requests on MySQL's port 3306, and the "iptables" command to enable them if it's not. After you restart MySQL, you can test the remote connection using the credentials and other options as they appear in the my.cnf file on the Linux server.

When MySQL's % wildcard operator leaves a remote connection hanging

A Stack Overflow post from April 2013 describes a situation where MySQL's % wildcard operator failed to allow a user "user@%" to connect remotely. Such remote connections require that MySQL's bind port 3306 be present in each machine's IP in my.cnf. Also, the user has to be created in both localhost and the % wildcard, and permissions granted on all databases. (You may also need to open port 3306, depending on your OS.)

Enable remote connections in the MySQL my.cnf file by adding each machine's IP and creating users in localhost and the % wildcard. Source: ITworld

Diagnosing failed remote connections and other database glitches is facilitated by the point-and-click interface of the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Document-level locking and pluggable storage APIs top the list of new features in MongoDB 3.0, but the big-picture view points to a more prominent role for NoSQL databases in companies of all types and sizes. The immediate future of databases is relational, non-relational, and everything in between -- sometimes all at once.

Version 3.0 of MongoDB, the leading NoSQL database, is being touted as the first release that is truly ready for the enterprise. The new version was announced in February and shipped in early March. At least one early tester, Adam Comerford, reports that MongoDB 3.0 is indeed more efficient at managing storage, and faster at reading compressed data.

The new feature in MongoDB 3.0 gaining the lion's share of analysts' attention is the addition of the WiredTiger storage engine and pluggable API that MongoDB acquired in December 2014. JavaWorld's Andrew C. Oliver states in a February 3, 2015, article that WiredTiger will likely boost performance over MongoDB's default MMapV1 engine in apps where reads don't greatly outnumber writes.

Oliver points out that WiredTiger's B-tree and Log Structured Merge (LSM) algorithms benefit apps with large caches (B-tree) and with data that doesn't cache well (LSM). WiredTiger also promises data compression that reduces storage needs by up to 80 percent, according to the company.

The addition of the WiredTiger storage engine is one of the new features in MongoDB 3.0 that promises to improve performance, particularly for enterprise customers. Source: Software Development Times

Other enhancements in MongoDB 3.0 include the following:

Document-level locking for concurrency control via WiredTiger
Collection-level concurrency control and more efficient journaling in MMapV1
A pluggable API for integration with in-memory, encrypted, HDFS, hardware-optimized, and other environments
The Ops Manager graphical management console in the enterprise version

Computing's John Leonard emphasizes in a February 3, 2015, article that MongoDB 3.0's multi-model functionality via the WiredTiger API positions the database to compete with DataStax' Apache Cassandra NoSQL database and Titan graph database. Leonard also highlights the new version's improved scalability.

Putting MongoDB 3.0 to the (performance) test

MongoDB 3.0's claims of improved performance were borne out by preliminary tests conducted by Adam Comerford and reported on his Adam's R&R blog in posts on February 4, 2015, and February 5, 2015. Comerford repeated compression tests with the WiredTiger storage engine in release candidate 7 (RC7) -- expected to be the last before the final version comes out in March -- that he ran originally using RC0 several months ago. The testing was done on an Ubuntu 14.10 host with an ext4 file system.

The results showed that WiredTiger's on-disk compression reduced storage to 24 percent of non-compressed storage, and to only 16 percent of the storage space used by MMapV1. Similarly, the defaults for WiredTiger with MongoDB (the WT/snappy bar below) used 50 percent of non-compressed WiredTiger and 34.7 percent of MMapV1.

Testing WiredTiger storage (compressed and non-compressed) compared to MMapV1 storage showed a tremendous advantage for the new MongoDB storage engine. Source: Adam Comerford

Comerford's tests of the benefits of compression for reads when available I/O capacity is limited demonstrated much faster performance when reading compressed data using snappy and zlib, respectively. A relatively slow external USB 3.0 drive was used to simulate "reasonable I/O constraints." The times indicate how long it took to read the entire 16GB test dataset from the on-disk testing into memory from the same disk.

Read tests from compressed and non-compressed disks in a simulated limited-storage environment indicate faster reads with WiredTiger in all scenarios. Source: Adam Comerford

All signs point to a more prominent role in organizations of all sizes for MongoDB in particular and NoSQL in general. Running relational and non-relational databases side-by-side is becoming the rule rather than the exception. The new Morpheus Virtual Appliance puts you in good position to be ready for multi-model database environments. It supports rapid provisioning and deployment of MongoDB v3.0 across public, private and hybrid clouds. Sign Up for a Free Trial now!

Tried-and-true languages such as Java, C++, Python, and JavaScript continue to dominate the most popular lists, but modern app development requires a multi-language approach to support diverse platforms and links to backend servers. The future will see new languages being used in conjunction with the old reliables.

Every year, new programming languages are developed. Recent examples are Apple's Swift and Carnegie Mellon University's Wyvernet. Yet for more than a decade, the same handful no. of languages have retained their popularity with developers -- Java, JavaScript, C/C++/C#/Objective-C, Python, Ruby, PHP -- even though each is considered to have serious shortcomings for modern app development.

According to TIOBE Software's TIOBE Index for January 2015, JavaScript recorded the greatest increase in popularity in 2014, followed by PL/SQL and Perl.

The same old programming languages dominate the popularity polls, as shown by the most-recent TIOBE Index. Source: TIOBE Software

Of course, choosing the best language for any development project rarely boils down to a popularity contest. When RedMonk's Donnie Berkholz analyzed GitHub language trends in May 2014, aggregating new users, issues, and repositories, he concluded that only five languages have mattered on GitHub since 2008: JavaScript, Ruby, Java, PHP, and Python.

An analysis of language activity on GitHub between 2008 and 2013 indicates growing fragmentation. Source: RedMonk

Two important caveats to Berkholz's analysis are that GitHub focused on Ruby on Rails when it launched but has since gone more mainstream; and that Windows and iOS development barely register because both are generally open source-averse. As IT World's Phil Johnson points out in a May 7, 2014, article, while it's dangerous to draw conclusions about language popularity based on this or any other single analysis, it seems clear the industry is diverging rather than converging.

Today's apps require a multi-language, multi-paradigm approach

Even straightforward development projects require expertise in multiple languages. TechCrunch's Danny Crichton states in a July 10, 2014, article that creating an app for the web and mobile entails HTML, CSS, and JavaScript for the frontend (others as well, depending on the libraries required); Java and Objective-C (or Swift) for Android and iPhone, respectively; and for links to backend servers, Python, Ruby, or Go, as well as SQL or other database query languages.

Crichton identifies three trends driving multi-language development. The first is faster adoption of new languages: GitHub and similar sites encourage broader participation in developing libraries and tutorials; and developers are more willing to learn new languages. Second, apps have to run on multiple platforms, each with unique requirements and characteristics. And third, functional programming languages are moving out of academia and into the mainstream.

Researcher Benjamin Erb suggests that rather than functional languages replacing object-oriented languages, the future will be dominated by multi-paradigm development, in particular to address concurrency requirements. In addition to supporting objects, inheritance, and imperative code, multi-paradigm languages incorporate higher-order functions, closures, and restricted mutability.

One way to future-proof your SQL, NoSQL, and in-memory databases is by using the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Elasticsearch is a great tool to provide fast and powerful search services to your web sites or applications, but care should be taken when moving from development to production. By following the checklist below, you can avoid some issues that may arise if you use development settings in a production environment!

Configure Your Log and Data Paths

To minimize the chances of data loss in a production environment, it is highly recommended that you change your log and data paths from the default paths to something that is less likely to be accidentally overwritten.

You can make these changes in the configuration file (which uses YAML syntax) under path, as in the following example, which uses suggested production paths from the Elasticsearch team:

Suggested settings for the log and data paths. Source: Elasticsearch.

Configure Your Node and Cluster Names

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of "elasticsearch " could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

The default node names are chosen randomly from a set of roughly 3000 Marvel character names. While this wouldn't be so bad for a node or two, this could get quite confusing as you add more than a few nodes. The better option is to use a descriptive name from the beginning to avoid potential confusion as nodes are added later.

Configure Memory Settings

Memory swapping used on systems could potentially cause the elasticsearch process to be swapped, which would not be good while running in production. Suggestions from the Elasticsearch team to fix this include disabling swapping, configuring swapping to only run in emergency conditions, or (for Linux/Unix users) using mlockall to try to lock the address space of the process into RAM to keep it from being swapped.

Configure Virtual Memory Settings

Elasticsearch indices use mmapfs/niofs, but the default mmap count on operating systems can potentially be too low. If so, you will end up with errors such as "out of memory" exceptions. To fix this, you can up the limit to accommodate Elasticsearch indices. The following example shows how the Elasticsearch team recommends increasing this limit on Linux systems (run the command as root):

Suggested command to increase the mmap count for Linux systems. Source: Elasticsearch.

Ensure Elasticsearch Is Monitored

It is a good idea to monitor your Elasticsearch installation so that you can see the status or be alerted if or when something goes wrong. A service such as Happy Apps can provide this type of monitoring for you (and can monitor the rest of your app as well).

Get ElasticSearch in the Cloud

When you launch your application that uses Elasticsearch, you will want reliable and stable database hosting. Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard.

With Morpheus, you have support for SQL, NoSQL, and in-memory databases like Redis across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

One of the most common MySQL operations is replicating databases between master and slave servers. While most such connections are straightforward to establish and maintain, on occasion something goes amiss: some master data may not replicate on the slave, or read requests may be routed to the master rather than to the server, for example. Finding a solution to a replication failure sometimes requires a little extra detective work.

Replication is one of the most basic operations in MySQL -- and any other database: it's used to copy data from one database server (the master) to one or more others (the slaves). The process improves performance by allowing loads to be distributed among multiple slave servers for reads, and by limiting the master server to writes.

Additional benefits of replication are security via slave backups; analytics, which can be performed on the slaves without affecting the master's performance; and widespread data distribution, which is accomplished without requiring access to the master. (See the MySQL Reference Manual for more on replication.)

As with any other aspect of database management, replication doesn't always proceed as expected. The Troubleshooting Replication section of the MySQL Reference Manual instructs you to check for messages in your error log when something goes wrong with replication. If the error log doesn't point you to the solution, ensure that binary logging is enabled in the master by issuing a SHOW MASTER STATUS statement. If it's enabled, "Position" is nonzero; if it isn't, make sure the master is running with the --log-bin option.

The manual offers several other replication-troubleshooting steps:

The master and slave must both start with the --server-id option, and each server must have a unique ID value;
Run SHOW SLAVE STATUS to ensure the Slave_IO_Running and Slave_SQL_Running values are both "yes";
Run SHOW_PROCESSLIST and look in the State column to verify that the slave is connecting to the master;
If a statement succeeded on the master but failed on the slave, the nuclear option is to do a full database resynchronization, which entails deleting the slave's database and copying a new snapshot from the master. (Several less-drastic alternatives are described in the MySQL manual.)

Solutions to real-world MySQL replication problems

What do you do when MySQL indicates the master-slave connection is in order, yet some data on the master isn't being copied to the slave? That's the situation described in a Stack Overflow post from March 2010.

Even though replication appears to be configured correctly, data is not being copied from the master to the slave. Source: Stack Overflow

The first step is to run "show master status" or "show master status\G" on the master database to get the correct values for the slave. The slave status above indicates the slave is connected to the master and awaiting log events. Synching the correct log file position should restore copying to the slave.

To ensure a good sync, stop the master, dump the database, record the master log file positions, restart the master, import the database to the slave, and start the slave in slave mode with the correct master log file position.

Another Stack Overflow post from March 2014 presents a master/slave setup using JDBC drivers in which transactions marked as read-only were still pinging the master. Since the MySQL JDBC driver was managing the connections to the physical servers -- master and slave -- the connection pool and Spring transaction manager weren't aware that the database connection was linking to multiple servers.

The solution is to return control to Spring, after which the transaction on the connection will be committed. The transaction debug message will indicate that queries will be routed to the slave server so long as the connection is in read-only mode. By resetting the connection before it is returned to the pool, the read-only mode is cleared and the last log message will show that queries are now being routed to the master server.

The point-and-click dashboard in the new Morpheus Virtual Appliance makes it a breeze to diagnose and repair replication errors -- and other hiccups -- in your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site to create a free account.

Before you sign on the dotted line for a cloud service supporting your application development or other core IT operation, make sure you have an easy, seamless exit strategy in place. Just because an infrastructure service is based on open-source software doesn't mean you won't be locked in by the service's proprietary APIs and other specialty features.

In the quest for ever-faster app design, deployment, and updating, developers increasingly turn to cloud infrastructure services. These services promise to let developers focus on their products rather than on the underlying servers and other exigencies required to support the development process.

However, when you choose cloud services to streamline development, you run the risk of being locked in, at either the code level or the architecture level. Florian Motlik, CTO of continuous-integration service Codeship, writes in a February 21, 2015, article on Gigaom that infrastructure services mask the complexities underlying cloud-based development.

Depending on the type of cloud infrastructure service you choose, the vendor may manage more or less of your data operations. Source: Crucial

Even when the services you use adhere strictly to open systems, there is always a cost associated with switching providers: transfer the data, change the DNS, and thoroughly test the new setup. Of particular concern are services such as Google App Engine that lock you in at the code level. However, Amazon Web Services Lambda, Heroku, and other infrastructure services that let you write Node.js functions and invoke them either via an API or on specific events in S3, Kinesis, or DynamoDB entail a degree of architecture lock-in as well.

To minimize lock-in, Motlik recommends using a micro-services architecture based on technology supported by many different providers, such as Rails or Node.

Cloud Computing Journal's Gregor Petri identifies four types of cloud lock-in: the horizontal type locks you into a specific product and prevents you from switching to a competing service; vertical limits your choices in other levels of the stack, such as database or OS; diagonal locks you into a single vendor's family of products, perhaps in exchange for reduced management and training costs, or to realize a substantial discount; and generational prevents you from adopting new technologies as they become available.

Gregor Petri identifies four types of cloud lock-in: horizontal, vertical, diagonal, and generational. Source: Cloud Computing Journal

Will virtualization bring about the demise of cloud lock-in?

Many cloud services are addressing the lock-in trap by making it easier for potential customers to migrate their data and development tools/processes from other platforms to the services' own environments. Infinitely Virtual founder and CEO Adam Stern claims that virtualization has "all but eliminated" lock-in related to operating systems and open source software. Stern is quoted by Linux Insider's Jack M. Germain in an article from November 2013.

Alsbridge's Rick Sizemore points out that even with the availability of tools for migrating data between VMWare, OpenStack, and Amazon Web Services, customers may be locked in by contract terms that limit when they can remove their data. Sizemore also cautions that services may combine open source tools in a proprietary way that locks in your data.

In a February 9, 2015, article in Network World, HotLink VP Jerry McLeod points out that you can minimize the chances of becoming locked into a particular service by ensuring that you can move hybrid workloads seamlessly between disparate platforms. McLeod warns that vendors may attempt to lock in their customers by requiring that they sign long-term contracts.

Seamless workload migration and customer-focused contract terms are only two of the features that make the new Morpheus Virtual Appliance a "lock-in free" zone. With the Morpheus database-as-a-service (DBaaS) you can provision, deploy, and monitor your MongoDB, Redis, MySQL, and ElasticSearch databases from a single point-and-click console. Morpheus lets you work with SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

Configure Your Log and Data Paths

You can make these changes in the configuration file (which uses YAML syntax) under path, as in the following example, which uses suggested production paths from the Elasticsearch team:

Suggested settings for the log and data paths. Source: Elasticsearch.

Configure Your Node and Cluster Names

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of "elasticsearch " could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

Configure Memory Settings

Configure Virtual Memory Settings

Suggested command to increase the mmap count for Linux systems. Source: Elasticsearch.

Ensure Elasticsearch Is Monitored

Get ElasticSearch in the Cloud

Depending on the type of cloud infrastructure service you choose, the vendor may manage more or less of your data operations. Source: Crucial

To minimize lock-in, Motlik recommends using a micro-services architecture based on technology supported by many different providers, such as Rails or Node.

Gregor Petri identifies four types of cloud lock-in: horizontal, vertical, diagonal, and generational. Source: Cloud Computing Journal

Will virtualization bring about the demise of cloud lock-in?

The new MySQL 5.7.6 developer milestone 16 features noteworthy security upgrades, but others propose more radical approaches to database security. One method puts applications in charge of testing and reporting on their own security, while another separates the app from all security responsibility by placing each in its own virtual machine.

When a database release claims to improve performance over its predecessors by a factor of two to three times, you take notice. That's what Kay Ewbank claims in a March 12, 2015, post on the iProgrammer site about MySQL 5.7.6 developer milestone 16. The new version was released on March 9, 2015, and is available for download (its source code can be downloaded from GitHub).

In a March 10, 2015, post on the MySQL Server blog, Geir Hoydalsvik lists milestone 16's many new features and fixes. (Prepare to give your mouse scroll wheel a workout: the list is long.) Ewbank points in particular to the InnoDB data engine's CREATE TABLESPACE syntax for creating general tablespaces in which you can choose your own mapping between tables and tablespaces. This allows you to group all the tables of one customer in a single tablespace, for example.

(Note Hoydalsvik's warning that the milestone release is "for use at your own risk" and may require data format changes or a complete data dump.)

One of the update's security enhancements relates to the way the server checks the validity of the secure_file_priv system variable, which is intended to limit the effects of data import and export operations. In the new release, secure_file_priv can be set to null to disable all data imports and exports. Also, the default value now depends on the INSTALL_LAYOUT CMake option.

The default value of the secure_file_priv system variable is platform specific in MySQL 5.7.6 developer milestone 16. Source: MySQL Release Notes

Apps that continuously test and report on their own security

Data security generally entails scanning applications to spot problems and missing patches. In a March 5, 2015, article on InformationWeek's Dark Reading site, Jeff Williams proposes building security into the application via "instrumentation," which entails continuous testing and reporting by the app of its own security status.

Instrumentation collects security information from the apps without requiring scans because the programs test their own security and report the results back to the server. Williams provides the example of reports identifying all non-parameterized queries in an organization based on a common SQL injection defense: requiring that all queries be parameterized.

The opposite extreme: Separating security from the application

Another novel approach to database security is exemplified in Waratek AppSecurity for Java, which is reviewed by SC Magazine's Peter Stephenson in a March 2, 2015, article. The premise is that security is too important to be left to the application's developers. Instead, create a sandbox for Java, similar to a firewall but without the tendency to report false positives.

Waratek's product assigns each app the equivalent of its own virtual container, complete with a hypervisor. The container holds its own security rules, which frees developers to focus solely on their applications. Stephenson offers the example of a container that defends against a SQL injection attack on a MySQL database.

Waratek AppSecurity for Java creates a secure virtual machine that applies security rules from outside the application. Source: Waratek

Application security is at the core of the new Morpheus Virtual Appliance. With the Morpheus database-as-a-service (DBaaS) you can provision, deploy, and monitor heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single point-and-click console. Morpheus lets you work with your SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

It's not unusual for data analysts to spend more than half their time cleaning and converting data rather than extracting business intelligence from it. As data stores grow in size and data types proliferate, a new generation of tools are arriving that promise to deliver sophisticated analysis tools into the hands of non-data scientists.

One of the hottest job titles in technology is Data Scientist, perhaps surpassed only by the newest C-level position: Chief Data Scientist. IT's long-standing skepticism about such trends is evident by the joke cited by InfoWorld's Yves de Montcheuil that a data scientist is a business analyst who lives in California.

There's nothing funny about every company's need to translate its data into business intelligence. That's where data scientists take the lead role, but as the amount and types of data proliferate, data scientists find themselves spending the bulk of their time cleaning and converting data rather than analyzing and communicating it to business managers.

A recent survey of data scientists (registration required) conducted by IT-project crowdsourcing firm CrowdFlower found that two out of three analysts claim cleaning and organizing data is their most time-consuming task, and 52 percent report their biggest obstacle is poor quality data. While the respondents named 48 different technologies they use in their work, the most popular is Excel (55.6 percent), followed by the open source language R (43.1 percent) and the Tableau data-visualization software (26.1 percent).

Data scientists identify their greatest challenges as time spent cleaning data, poor data quality, lack of time for analysis, and ineffective data modeling. Source: CrowdFlower

What's holding data analysis back? The data scientists surveyed cite a lack of tools required to do their job effectively (54.3 percent), failure of their organizations to state goals and objectives clearly (52.3 percent), and insufficient investment in training (47.7 percent).

A dearth of tools, unclear goals, and too little training are reported as the principal impediments to data scientists' effectiveness. Source: CrowdFlower

New tools promise to 'consumerize' big data analysis

It's a common theme in technology: In the early days, only an elite few possess the knowledge and tools required to understand and use it, but over time the products improve and drop in price, businesses adapt, and the technology goes mainstream. New data-analysis tools are arriving that promise to deliver the benefits of the technology to non-scientists.

Steve Lohr profiles several of these products in an August 17, 2014, article in the New York Times. For example, ClearStory Data's software combines data from multiple sources and converts it into charts, maps, and other graphics. Taking a different approach to the data-preparation problem is Paxata, which offers software that retrieves, cleans, and blends data for analysis by various visualization tools.

The not-for-profit Open Knowledge Labs bills itself as a community of "civic hackers, data wranglers and ordinary citizens intrigued and excited by the possibilities of combining technology and information for good." The group is seeking volunteer "data curators" to maintain core data sets such as GDP and ISO-codes. OKL's Rufus Pollock describes the project in a January 3, 2015, post.

Open Knowledge Labs is seeking volunteer coders to curate core data sets as part of the Frictionless Data Project. Source: Open Knowledge Labs

There's no simpler or straightforward way to manage your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases than by using the new Morpheus Virtual Appliance. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds via a single point-and-click dashboard. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site for pricing information and to create a free account.

A New Age Of Hybrid Cloud Management

Join us for this exciting webcast on how to develop, deploy, and manage your applications and databases on any cloud or infrastructure.

You’ll learn how to:

Provision databases and applications on any cloud (AWS, Azure, Google, RackSpace) or infrastructure (OpenStack, VMware, or Bare-Metal)
Elastically scale applications and databases
Automatically backup and recover databases and applications
Log & Monitor Databases and Applications for faster troubleshooting and SLA management
Clone and migrate databases and applications across hybrid clouds
Gain complete infrastructure and resource visibility across your on-premise servers and the cloud

Who should attend?

VP/Dir of IT, Solution Architects, DBAs, Cloud Architects, IT Ops, Datacenter Managers The LIVE webinar will be on Thursday, September 17th at 8am PDT, and there will also be a Q&A for you to ask questions in real-time.

Business managers become techies, and techies become business managers in the modern decentralized organization.

These days, every company is a tech company, and every worker is a tech worker. Business managers are getting more tech-savvy, and IT managers are getting more business-savvy. Yet the cultural barriers in organizations of all sizes between IT departments and lines of business persist -- to the disadvantage of both.

Now there are clear signs that the fundamental nature of the relationship between IT and the business side is changing, just as the way both groups work is changing fundamentally. As every manager knows, change doesn't come easy. Yet every manager also knows that the long-term success of the company depends on embracing those changes.

A positive consequence of the technification of business and the businification of tech is that the two groups are truly collaborating in ways they rarely have in the past. Every business decision not only involves technology, it is predicated on it. Likewise, every tech decision has at its foundation the advancement of the company's short-term and long-term business goals.

As the adage goes: Easier said than done. Yet it's being done -- and done successfully -- in organizations of all types and sizes. Here's a look at what those success stories have in common.

Fusing IT with business: It's all about attitude

In a September 1, 2015, article on BPMInstitute.org, Tim Musschoot writes that successful collaboration between business departments and IT departments depends on two things: having a common understanding among all parties; and managing each party's expectations related to the other.

It's not enough for each side to be using the same terminology. To establish a mutual understanding, you must consider behavioral aspects (who does what, how do they do it, when do they do it); information aspects (what is an order form, who is the customer, how do they relate); and the business rules that apply to processes and information (when can an order be placed, how are rates established).

The quickly closing gap between business and IT is evident by a recent survey of both groups conducted by Harris Poll and sponsored by Appian. Forbes' Joe McKendrick examines the survey results in an April 27, 2015, article. One finding of the survey is that business executives are almost as enthusiastic as IT managers about the potential of cloud services to boost their organizations' profits. Similarly, business managers perceive application maintenance, app and data silos, expensive app development, and slow app deployments as drains on their companies.

The need for business and IT to collaborate more closely on app development, deployment, and maintenance is highlighted in a Forrester study from 2013. As InfoQ's Ben Linders writes, the business side often perceives IT staff as "order takers," while IT departments focus on excelling at specific central functions. Both sides have to resist the tendency to fall into the customer-provider model.

Breaking out of the business/IT silos requires replacing the hierarchical organization with one based on overlapping, double-linked circles. The circles include everyone involved in the development project, and double-linking ensures that the members of each circle are able to communicate immediately and naturally on all the organization's ongoing projects. This so-called sociocracy model ensures that the lines of communication in the company remain visible at all times.

Your database-scaling strategy should match the way the system is used, and the extent to which the database's nodes are distributed.

Not so long ago, the solution to balky database performance was to throw more hardware at it: faster servers and memory, more network bandwidth, and boom! Your database is back to bullet-train speeds.

Any system administrator will tell you that databases aren't what they used to be. For one thing, today's databases are much more widely distributed, reaching end points of all types: phones, tablets, controllers, sensors, even appliances. For another thing, a single database is more likely to run on multiple platforms and interface with systems of all types. Last but not least, the amount of data stored in a typical production database dwarfs the typical storage of just a few years ago.

So you've got more data than ever, integrating with more devices and platforms than ever, and reaching more device types than ever. Add to this the modern demands of real-time analysis and zero downtime. It's enough to make a DBA consider a midlife career change.

But before you start thinking about life as a karaoke repairman, remember that for every problem technology poses, it offers many possible solutions. The trick is to find the answer for your technical predicament, and apply it to best effect. The solution to many database performance glitches is scalability. The challenge is that traditional RDBMSs are easy to scale up by adding more and faster CPUs and memory, but they're notorious for not scaling out easily, which is a problem for today's widely distributed systems. Here's how to take much of the sting out of scaling databases.

The prototypical example of a scale-out system is a web app, which expands across low-cost servers as the number of users increases. By contrast, an RDBMS is designed to scale up: improve performance by adding processing power and more/faster memory.

Web apps are noted for scaling out to commodity servers as demand increases, while RDBMSs scale up by adding more and faster processor and memory, but less well as the number of users goes up. Source: CouchBase

Forget the backend -- Analytics now happens in the real-time data stream

If you're still analyzing data in batch mode on the back end, you could be further behind the competition than you think. InfoWorld's Andrew C. Oliver explains in a July 25, 2015, article that systems based on streaming analytics may cost more initially, but over time they make much better use of resources. One reason real-time analytics is more efficient is that you're not re-analyzing historic data the way you do with batch-mode analysis.

In a Dr. Dobbs article from November 2012, Nikita Shamgunov distinguishes the scale-out requirements of OLTP and OLAP databases: OLTP is used for real-time transaction processing; and OLAP accommodates analysis of large amounts of aggregate data.

Online analytical processing emphasizes deep dives into large pools of aggregate data, while online transaction processing focuses on many simultaneous data transactions with fewer data demands and shorter durations. Source: Paris Technology

Scaling OLAP databases relies on these four characteristics:

Columnar indexes and compression (not specifically scaling out, but they reduce CPU cyles per node)
Intra-parallel query execution (a scale-up partitioning technique that executes subqueries in parallel)
Shared-nothing architecture (prevents a single point of contention)
Distributed query optimization (sends a single SQL query to multiple nodes)

Here are the four considerations for ensuring OLTP databases scale well:

Don't rely on columnar indexes because OLTP is real-time, and its data records are accessed randomly, which reduces the effectiveness of indexes.
Parallel query execution is likewise less effective in OLTP because few queries require processing large amounts of data.
Data in OLAP systems is generally loaded in batch mode, but OLTP data is ingested in real time, so optimizing bursty traffic is less beneficial in OLTP.
In OLTP, uptime is paramount; OLAP systems are more tolerant of downtime.

Simple management of SQL, NoSQL, and in-memory databases is a key feature of the Morpheus next-gen Platform as a Service software appliance built on Docker containers. Morpheus supports management of the full app and database lifecycle in a way that accommodates your unique needs. Provisioning databases, apps, and app stack components is simple and straightforward via the intuitive Morpheus interface.

The range of databases, apps, and components you can provision includes MySQL, MongoDB, Cassandra, PostgreSQL, Redis, ElasticSearch, and Memcached databases; ActiveMQ and RabbitMQ messaging; Tomcat, Java, Ruby, Go, and Python application servers and runtimes; Apache and NGinX web servers; and Jenkins, Confluence, and Nexus app stack components. Request a demo to learn more.

Upcoming OpenSSL Security Overhaul Is Long Overdue

MySQL vs. MongoDB: The Pros and Cons When Building a Social Network

New Compilers Streamline Optimization and Enhance Code Conversion

How Three Companies Improved Performance and Saved Money with MongoDB

The Key to Selecting a Programming Language: Focus

MongoDB Poised to Play a Key Role in Managing the Internet of Things

Troubleshoot Lost MySQL Remote Connections

MongoDB 3.0 First Look: Faster, More Storage Efficient, Multi-model

Preparing Developers for a Multi-language Multi-paradigm Future

Don't Go Into Elasticsearch Production without this Checklist

Troubleshooting Problems with MySQL Replication

Avoid Being Locked into Your Cloud Services

Don't Go Into Elasticsearch Production without this Checklist

Avoid Being Locked into Your Cloud Services

Three Different Approaches to Hardening MySQL Security

How to Minimize Data Wrangling and Maximize Data Intelligence

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

Hybrid Cloud Management Webinar

A New Age Of Hybrid Cloud Management

Who should attend?

How Cloud Services Break Down Barriers Within and Between Organizations

Fusing IT with business: It's all about attitude

Database Scaling Made Simple