Written By Phil Turner, Director, Enterprise Solutions, Datatrend Technologies, Inc.
For most people in IT, SAP is a familiar name and many know the company now provides a wide range of software products, the primary suite being synonymous with its name. SAP has woven itself into the corporate fabric of many companies in America and around the world by providing industry specific, highly integrated business and financial management systems to control and automate many business processes. In doing so, SAP, and the data it manages, have become vital assets for any company employing the software.
Over time, SAP infrastructures tend to expand and evolve, encompassing more business functions and allowing greater operational integration. In doing so, the interdependent systems become more critical to business operations but also generate more data and often more complex data relationships. This can also be true for non-SAP environments where applications have become woven together in sometimes a less elegant fashion but still often manage the corporate “crown jewels” of data. And this data is becoming a focus of business innovation, agility, and competitive advantage.
Most organizations soon realize the challenge this integrated data and business logic platform can create to speed and agility: things can start to slow even as your computing systems become faster. Partly this is due to more complex business logic and more integration between applications; systems waiting on systems as it were. But one significant performance issue is becoming common across many applications: disk performance has not kept pace with computing performance increases for many years and the performance gap is having an ever increasing impact.
This was seen early in data warehouse, business intelligence, and decision support applications – what we generally refer to as analytics environments. This is because large amounts of data were analyzed, sorted, parsed, etc. to produce an informational insight across large amounts of data. As business transactions try to encompass more analytics like data and behavior, the performance problem creeps out further into the infrastructure.
All manner of clever techniques are employed to deal with this performance gulf between disk and CPU – or between business logic and business data. Everything from large disk arrays, large caches, solid state disks and other flash technologies to completely new system architectures where disks are dedicated to specific CPU cores to minimize movement.
But if you truly examine the gap that has developed in the transaction response path components, you can quickly see that no matter how closely we couple the disk (data) to the CPU (logic), we have a 300,000 times gap in “speed of operation”.
This chart shows the relative data access times (on a logarithmic scale) of different technologies where the data is, or can be stored for permanent retention or temporary manipulation. The problem is that if the CPU took, for example, 1 second to perform a compare on its internal registers, it would take over 23 months to load the next data values for comparison from spinning disk. Yes, that’s a 61 million to 1 response difference. Now, in reality it doesn’t take the CPU 1 second to do the compare nor does it take 23 months to load the next data item but the relative performance still holds true. The truth is your computing systems (business logic) spend a LOT of time waiting on data loads that show as CPU busy time…. busy waiting that is.
So, SAP (and others) have been working on ways to close this gap as much as possible. Through a lot of development and several acquisitions, SAP positioned themselves quite well to take advantage of a powerful trend in IT: the dramatic drop in memory costs. With system memory being far more dense, and far faster than just a few years ago, it ironically is also dramatically cheaper than ever before. It is now common to have dozens or hundreds of GB of memory in a server and TB memory footprints are possible and affordable. What’s even more interesting is what affect this can have on your overall application platform.
These advances have made possible and for the first time really practical, in-memory database platforms. Not just for a specific application but for more generic use. By having a relatively low cost way to move all the application’s (or applications’) data into live memory, it has the potential to speed up transactions 300,000 times. In the prior example of a 1 second compare waiting for 23 months to perform the next one, moving the database all into memory brings it down to a wait of only 3.4 minutes, a dramatic improvement.
This is why SAP developed HANA, an in-memory database technology that started off as a repository for SAP analytics on the valuable corporate data and is now becoming a far more generic platform for in-memory database access.
Not only are transactions of many types able to operate dramatically quicker, this approach has the unusual benefit of letting your servers, the same servers you have now, handle more workload and do it faster at the same time. Or, if you prefer, reduce your CPU count for the same workload and save (or defer) costs on both hardware and software licenses. How is this possible to be both faster and cheaper?
While in-memory database technology certainly makes data access dramatically faster, it also has the benefit of removing a lot of the CPU wait time. Since a CPU waiting on an IO is usually not something that can be allocated as free cycles, it is inherently “CPU busy” time in most systems. Hence, faster IO operations mean less CPU wait time thereby freeing up those “wasted” cycles to do more work. Or, if enough cycles are freed up you can actually reduce CPU count (and the associated software costs) for a given transactional workload.
So, if this is so great why isn’t everyone doing it (yet)? A few issues have needed to be resolved.
- First, the hardware costs for very large memory platforms needed to become economically attractive. That has taken place in the last few years. Additionally, SAP HANA performs significant data compression for the in-memory storage allowing a 7TB database to be run in as little as 1TB of memory.
- Second, the software interfaces on the in-memory database server needed to be standardized to common database access methods to avoid application changes. That has been a more recent development and SAP HANA is doing well at that.
- Third, a mechanism to ensure data persistence needed to be developed. This is now available but required significant ingenuity to make the whole solution very attractive. This is where IBM added some clever innovation.
While most major server vendors have developed a platform solution or two (or three) to support SAP HANA implementations, many take some of the shine off of the promise of SAP HANA’s simplicity, performance, and scalability. SAP developed great software, the major server manufacturers use similar memory technology but, the approach to persistent data storage couldn’t be more different.
Since in-memory databases operate at such high transaction rates, any persistent storage approach needs to be able to keep pace: not easy when disk is 300,000 times slower than memory. SAP architected their software to use PCI flash drives (much faster than solid state disks) to log data changes. This allows for a minimal performance impact to database operations when record updates are made.
Then, changes are copied to the persistent disk storage in the background. In this way all of the data is still stored on disk for persistence, but is not in the transaction path for performance.
Since for certain environments (like analytics) it is desirable to scale the SAP HANA database across multiple servers, a way to share the persistent data store is required. For most implementations in the industry this means a shared SAN and its associated cost and complexity, something we are trying to avoid with the in-memory database approach where performance is provided by server memory instead of a storage subsystem.
Ingeniously, and uniquely, IBM has certified not only the lowest cost SAP HANA scalable platform, but also the simplest and fastest implementation as well. By exploiting a systems technology which allows the sharing of a server’s internal disk across multiple servers, IBM has eliminated the need for a SAN with SAP HANA implementations and the associated cost and installation and operation complexities. IBM exploits their highly proven General Parallel Files System (GPFS) with the File Placement Option (FPO) to allow the simple and low cost internal disks of the server nodes to be aggregated, shared, and made highly available across all the nodes in an SAP HANA platform instance. Whether you have one server, 10 servers, or 100 servers as your SAP HANA scalable database, IBM provides a truly unique value proposition for an already innovative and valuable in-memory database platform from SAP known as HANA. In fact, SAP’s own largest HANA demonstration platforms are run on IBM.
Find out how SAP HANA running on IBM’s innovative platform can provide you an appliance to dramatically improve your SAP or non-SAP application platform’s responsiveness and efficiency. To learn more, contact Datatrend, or the author, Phil Turner, at email@example.com.