Data Structures and Other DDL
Welcome to part 4 in our on-going series addressing the changing nature of DB2 for z/OS. Part 1 covered database and DBA trends; part 2 looked at changes with Universal table spaces; and part 3 discussed the role of LOBs with unstructured data. But there are other changes happening to DB2 data structures that impact the manner in which we use and manage DB2 for z/OS and today, in part 4, we will take a look at some of these changes and their impact.
Mainframe disk, or DASD, is usually equated to a 3380 or 3390. We perceive physical hardware devices with a one-to-one relationship between a disk drive and a volume. But today these physical devices are replaced by disk arrays. An array is the combination of two or more physical disk storage devices in a single logical device or multiple logical devices.
These disk arrays are typically referred to as RAID devices, where RAID stands for Redundant Array of Independent Disks. The basic idea behind RAID is to combine multiple disk devices into an array that is perceived by the system as a single disk drive. RAID offers high availability and rapid failover because the drives are hot-swappable meaning that a drive can be replaced while the array is up & running.
The IBM DS8000 series is one of the more popular disk systems for IBM System z mainframe platforms. It is built for high-speed OLTP and large database files, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives.
Mainframe disk storage arrays still emulate 3390 track architectures as that’s what z/OS understands. You may well have heard of your site using 3390 Model 3 disks. The Model number refers to the old triple capacity 3390 models, which could hold just under 3GB of data. Modern disk arrays can emulate almost any volume configuration, up to an architectural limit of 65,520 cylinders.
The important thing to understand here though, is that when you are doing a read or write you are probably doing that to multiple physical disks in the disk array under the covers, instead of to a single 3390 device. The new disk architectures and devices, with concepts like log structured files and with cache in the gigabyte sizes, have a noticeable impact on database physical design considerations. Conventional database design rules based on data set placement are becoming less important and can be ignored in most cases.
SMS-managed storage. Another important storage-related change is SMS, or system managed storage. With SMS, the system determines data set placement, thereby minimizing DBA work. SMS is a requirement for DB2 10 for z/OS because the DB2 Catalog must be SMS-managed as of DB2 10. Furthermore, any table spaces or indexes with data sets larger than 4 GB require DFSMS-managed data sets.
SMS is achieves in z/OS using IBM DFSMS.
Without DFSMS, the user is responsible for distributing DB2 data sets among disks. This process needs to be reviewed periodically, either when the workload changes, or when the storage server configuration changes. Furthermore, as we just discussed, with RAID devices there is no guarantee that your data set placement attempts will actually achieve the separation you request because the RAID device spreads the data across multiple disks behind the scenes in a way that you cannot control.
With DFSMS, the DBA can automate data set placement. Many shops have moved to SMS for the majority of DB2 data set placement. It is relatively straightforward to use DFSMS to achieve all the objectives for DB2 data set placement and design. DFSMS has the necessary flexibility to support everything the DB2 administrator may want.
Of course, to achieve a successful implementation, an agreement between the storage administrator and the DB2 administrator is required so that they can together establish an environment that satisfies both their objectives. And the mindset of trying to place your data to separate access types needs to be let go.
Compression. Another storage related DB2 issue is compression and it is my contention that compression is under-utilized in most shops. With today’s DB2-provided, hardware assisted compression you can achieve high compression rates with excellent performance. A good rule of thumb is to compress any table space with at least 50 pages when DSN1COMP shows a compression ratio of at least 25%.
The CPU overhead to compress that you may recall from the early days of DB2 is not a major concern for modern table space compression. For sequential data access, such as with sequential prefetch, I/O savings can significantly improve performance. This is so because a single compressed page can contain more rows than an uncompressed page, so more data is read into the buffer pool with each I/O.
And as of DB2 9 for z/OS you can compress indexes, too. Many of the same reasons that would cause you to compress a table space can compel you to consider compressing indexes. Index compression differs significantly from table space compression, even though the end result (a reduction in storage) is the same.
The first important difference is that index compression does not use Ziv-Lempel and does not require a dictionary. This means that newly inserted index keys can be compressed without having to wait for LOAD or REORG to be run. Perhaps the most significant difference between data and index compression is how the data is brought into the buffer pool. Index leaf pages are decompressed before they are stored in the buffer pool. This differs from table space compression where data pages are read into the buffer pool compressed and then compression is performed row-by-row. Index leaf pages are compressed and decompressed at the page level. This offers better performance as opposed to repetitively decompressing multiple index keys each time the index is searched. Another difference is that DB2 compresses only index leaf pages and not other types of index pages (non-leaf, root). The largest proportion of an index is consumed by the leaf pages. DB2 compresses the index keys as well as the RID lists.
Hybrid Transaction Analytic Processing (HTAP)
With the rise of big data and analytics, more organizations are desiring to perform in-depth analysis of their data using OLAP and data mining techniques. These analytical queries differ from transactional queries in terms of data access, design, and performance requirements. But increasingly, more organizations are demanding the ability to perform both OLTP and OLAP on the same data in the same DBMS.
IBM has addressed this need with the IBM DB2 Analytics Accelerator for z/OS, or IDAA. IDAA is an appliance that integrates with your z Systems mainframe and DB2 for z/OS. It provides high-speed analytical processing – at time delivering orders of magnitude performance improvement. When installed, the DB2 Optimizer becomes aware of the IDAA and can build access plans to shuttle appropriate workload to IDAA. That means you do not have to change your programs to take advantage of the power of IDAA.
But it does mean that there are additional administration tasks such as configuring the IDAA and setting up the tasks to synchronize data between DB2 for z/OS and the IDAA appliance. Once again, proving, it’s not your Daddy’s DB2!
Other Changes Worth Noting
At this point, I’m risking this blog entry becoming too large, but there are additional data structure changes that we have seen over the course of the past few DB2 versions. But instead of describing each of them I’m going to briefly a few of the additional major changes along with a link for those wanting additional details.
DB2 10 for z/OS introduced hash-organized tables. A hash, or hash function, is an algorithm that converts a defined set of data elements into a small number, usually a single integer that can serve as an index to an array or a storage location on disk. Hashes are particularly efficient for single value lookups, such as a primary key lookup, but are inefficient for sequential access.
Temporal data support was added in DB2 10 for z/OS. With temporal tables a time period is attached to the data to indicate when it was valid or changed in the database. A traditional database stores data implied to be valid at the current point-in-time; it does not track the past or future states of the data. Temporal support makes it possible to store different database states and to query the data as of those different states. That means different DDL to support temporal data, as well as different SQL syntax to query it (time travel query).
And DB2 11 for z/OS adds transparent archiving, somewhat similar to system time temporal tables. Transparent archiving makes it easier for DBAs to separate operational data that is actively being used from inactive data that can be archived. This improves performance by reducing the size of the table space and helps to protect inactive data from inadvertent access.
There have also been new data types introduced over the past few releases, including binary, pureXML, DECFLOAT, TIMESTAMP with varying precision and TIMESTAMP WITH TIME ZONE. Not to mention that Synonyms were deprecated for DB2 11 for z/OS. That means they are still there and functional now, but will soon be removed. And that means you will soon have to find all of the places where Synonyms are used and replace them with Aliases or direct table access.
The Bottom Line
DB2’s infrastructure and the structures used to store data in DB2 are changing… rapidly. As such, organizations and their DBAs must be ready to embrace the change to ensure that effective, efficient, modern DB2 databases and applications are being built. Failure to do so can negatively impact the business and will cause you to react when the older ways of doing things are no longer supported.