With all the discussion of COBOL recently, I’ve heard a lot of talk about moving COBOL to the Cloud. I have had some experience in this area, so I could not resist writing about it.
This is another off-topic post for this blog site, but I added this as a companion for: It’s all COBOL’s fault!
For those not familiar, a mainframe is just a large server. Older applications need to run on one of a small number of proprietary operating systems (e.g. CICS, IMS, etc.), but on most mainframes you can also run UNIX or Linux.
When you want to get rid of the mainframe, ostensibly to save money, you can port those Unix or Linux applications to a Cloud server easily, but you will then have those pesky older applications which require the proprietary operating systems.
When most people refer to the mainframe or mainframe code or even COBOL programs, they are really referring to applications that run in one of the proprietary systems on a mainframe, such as CICS, IMS, or batch jobs using JCL.
A Cloud server is just a large computing environment where you can spin up virtual Linux servers of varying sizes quickly, and if you get a spike in demand you can spin up more servers. In fact, if you want to set up a Cloud server, the best and cheapest solution might be a mainframe.
Taking advantage of the Cloud environment
One of the issues that get overlooked in the general hype of Cloud servers is that applications have to be designed to take advantage of a Cloud environment so that work can be shared across many servers. The Cloud environment works well for web traffic where you can distribute incoming traffic from browsers across many web servers.
Applications that were written as a monolithic process, e.g. to read through a large file and do lengthy calculations are not good candidates to benefit from deployment to the Cloud. They were designed to run on a single large computer and would need to be redesigned to split up the workload across multiple servers, and then put the results back together again at the end.
COBOL on Linux Cloud servers
Let’s put aside the issue of truly taking advantage of a Cloud environment by running on an arbitrary number of Linux servers, and just stick to the simple question of porting a mainframe COBOL application to run on a single Linux server, even though this would not in itself help your system handle spikes in load.
I hope to make clear that porting a COBOL application to a Linux server is possible but not easy, and the problems have nothing to do with COBOL.
Batch COBOL using DB2
If you have a batch application written in COBOL which processes DB2 SQL data, it is relatively simple to migrate to a Linux server. There are a number of COBOL compilers available on Linux, and you could install DB2 on Linux or you could even port to MySQL. There are some minor incompatibilities between DB2 and MySQL, but no worse than between MySQL and Postgres.
One issue would be the lack of test suites – TDD was unheard of when these were written, but such a system could be tested by running workloads through the old and new systems and comparing the results.
Notice that I said ‘batch application’ above? That’s kind of a give-away – mainframes generally process huge batch runs every night, and have advanced job-scheduling systems and a job control language to process flows of jobs, which often pass temporary files as the output of one step into the input of the next, and system utilities to sort said temp files any which way. Even if your entire organization’s data is stored in relational databases like DB2 or Oracle, migrating the complete JCL batch process is not an easy job, but again, it is a well-trodden path, so there are tools and techniques available.
Online applications using DB2
For online interactive systems, the mainframe terminals which are text-only (mainly 80 columns by 24 rows) are connected to a transaction processing system such as CICS or IMS/TM. In the simple case, where all the data is in relational databases, you’re probably using CICS.
The interactive programs are usually complex, with terminal-control and validation code intermingled with database access. They did not have to be written that way – my company was one of many that delivered well-structured systems in the 90s which kept a strict separation between front-end code and back-end business logic, but most of the CICS code that I have seen was not done this way.
This is not a criticism of mainframe developers – if you have been around long enough you’ll recall some very tangled code in C or BASIC from the 80s also – the mainframe code is unique only in that it is still used in production for important tasks by big companies.
Emulate CICS or rewrite to Web?
You could attempt to run the terminal programs in a CICS emulation on *nix – IBM still sells TXSeries, which is a capable product on Linux, but you will almost certainly have to make some modifications to the code, and you would still end up with a text UI. Most migration projects are looking for a browser interface, so unless you are happy with one of the many screen-scraping systems, you are going to have to rewrite the code.
You could attempt to split the COBOL code into purely front-end code and purely back-end code, and put it back into production before migrating, and sometimes that would be a worthwhile step. Most times though, you would bite the bullet and handle the functionality split while rewriting.
So the interactive systems could get rewritten into whatever tooling you feel like, splitting the logic into front-end and back-end code. You would probably want to target a browser, so maybe React or Angular on the front end, and Node or Java/Spring on the back end. You would need someone on the team with a good understanding of CICS to tell you what the legacy code does, but this is again a fairly straightforward task. Others have done this, and I’ve worked on a successful project like this myself.
Non-relational data: CICS VSAM or flat files
I cheated a bit in the examples above, by saying that the data was in a DB2 relational/SQL DB. DB2 became common around 1990, and was heavily used in new systems since then, but if you have older systems, then you will be working with data in VSAM indexed files or flat files, read sequentially.
Indexed files are not usually considered a database, but if you are using CICS, then it supports ACID transactions for updates in multiple indexed files, and (through two-phase commits) it even extends transactions to DB2. Mixed forms of data are not uncommon, with older applications using indexed files having newer functionality added using DB2.
Flat files are the modern incarnation of magnetic tapes – those spinning reels that you see in old TV shows. You would see the tapes moving in both directions, because it was common for a job to read a record, then back up and write it with new values.
This same functionality still exists today in many batch applications, often without rewriting the programs, but now working with flat files that are processed sequentially. This functionality can be handled in much the same way on *nix servers, but there are not many programmers who are familiar with working with files like this.
When I’ve worked on migrations like this, we’ve generally gone for a complete conversion of the indexed-file data to relational as part of the migration, rather than try to come up with ways to handle transactions that span multiple indexed files. However, let me just say that converting indexed file data to a SQL DB is not for the faint of heart.
What about IMS/DB?
Now if a mainframe developer is reading this, they would say “hold on, you are ignoring IMS DB”. That’s quite true, I am ignoring IMS DB. IMS DB is a hierarchical DB that is blazing fast and there is nothing like in the *nix or Windows world.
I worked on a small IMS DB to DB2 migration once, and it was complex. “Get the next child record”, or “get the parent record” can be simulated in SQL using clever keys, but it is error-prone and complex – not ideal.
Trying to emulate a hierarchical structure in a SQL DB can result in code that is brittle and hard to maintain. So I would not consider migrating an existing IMS DB application – I would look at rewriting it from scratch. Of course, IMS DB is used in very high-performance applications, so the rewrite would have to be designed for high performance.
It’s the data, the whole data, and nothing but the data
I hope I’ve made it clear that the problems I have seen with migrating mainframe applications to *nix have nothing to do with the source code – the database is the key to the problems. I have seen projects that got underway with great enthusiasm, and then simply ground to a halt once they had uncovered all the issues around the data. Other projects kept going and migrated subsystems that could be completed, but left large parts of the total intact. Some projects kept going until the funding ran out, and then stopped having delivered nothing useful. I also heard of some projects that were completely successful, but very expensive.
Successful projects generally involve a mixed approach: replacing some applications with purchased equivalents, discovering some unnecessary applications, rewriting some applications, and migrating some applications. Any successful project though, has to deal with large amounts of data in various file formats, and decide what to do with it.
The least of the worries is whether the programs were written in COBOL.