This article is from the source 'guardian' and was first published or seen on . It last changed over 40 days ago and won't be checked again for changes.

You can find the current article at its original source at http://www.guardian.co.uk/technology/2012/jun/25/how-natwest-it-meltdown

The article has changed 4 times. There is an RSS feed of changes available.

Version 1 Version 2
How NatWest's IT meltdown developed How NatWest's IT meltdown developed
(4 months later)
NatWest has admitted that it could not say exactly how much money should be in individual accounts as the crisis caused by a failed software update last week spiralled out of control for days.NatWest has admitted that it could not say exactly how much money should be in individual accounts as the crisis caused by a failed software update last week spiralled out of control for days.
The bank was quick to deny claims by the Unite union that the "offshoring" of IT jobs to locations in India had led to the the problems which appeared on Tuesday night and which paralysed its systems through to Friday, and which have not yet been fixed.The bank was quick to deny claims by the Unite union that the "offshoring" of IT jobs to locations in India had led to the the problems which appeared on Tuesday night and which paralysed its systems through to Friday, and which have not yet been fixed.
However a number of programmers and experts who have worked on or with NatWest systems told the Guardian that they could not imagine the problem happening in the period before the redundancies of experienced staff since 2010.However a number of programmers and experts who have worked on or with NatWest systems told the Guardian that they could not imagine the problem happening in the period before the redundancies of experienced staff since 2010.
"[NatWest owner] Royal Bank of Scotland has 40 years' experience running these systems and banks as a rule don't drop the ball like this," one said. "Somebody somewhere made a decision that has led to this.""[NatWest owner] Royal Bank of Scotland has 40 years' experience running these systems and banks as a rule don't drop the ball like this," one said. "Somebody somewhere made a decision that has led to this."
The Guardian's investigations suggest that NatWest's problems began on Tuesday night when it updated a key piece of software – CA-7, which controls the batch processing systems that deal with retail banking transactions – ahead of the regular nightly run.The Guardian's investigations suggest that NatWest's problems began on Tuesday night when it updated a key piece of software – CA-7, which controls the batch processing systems that deal with retail banking transactions – ahead of the regular nightly run.
RBS/NatWest has not said what went wrong, though one programmer who has worked on RBS/NatWest's systems told the Guardian: "CA-7 is a very common and reliable product used to automate large sequences of batch mainframe work [which are usually referred to as 'jobs']. It will start jobs, wait for them to run, then start other jobs dependent on the first ones completing, and so on. RBS processes accounts overnight via thousands of jobs."RBS/NatWest has not said what went wrong, though one programmer who has worked on RBS/NatWest's systems told the Guardian: "CA-7 is a very common and reliable product used to automate large sequences of batch mainframe work [which are usually referred to as 'jobs']. It will start jobs, wait for them to run, then start other jobs dependent on the first ones completing, and so on. RBS processes accounts overnight via thousands of jobs."
The jobs take transactions from various places, such as ATM withdrawals, bank-to-bank salary payments, and so on, and finish by updating the master copy of the account – in a system known as Caustic – with the definitive balance.The jobs take transactions from various places, such as ATM withdrawals, bank-to-bank salary payments, and so on, and finish by updating the master copy of the account – in a system known as Caustic – with the definitive balance.
"It seems whoever made the update to CA-7 managed to delete or corrupt the files which hold the schedule for the overnight jobs, so they did not run, or ran incorrectly," the programmer told the Guardian. "They have backed out from this change, but now are trying to play catch-up, and have been doing so for a few days.""It seems whoever made the update to CA-7 managed to delete or corrupt the files which hold the schedule for the overnight jobs, so they did not run, or ran incorrectly," the programmer told the Guardian. "They have backed out from this change, but now are trying to play catch-up, and have been doing so for a few days."
The batch processing system, which reconciles the movement of money in and out of more than 10m NatWest and Ulster Bank accounts, did not run correctly for three nights – meaning that millions of transactions were not processed until it did begin running correctly on Friday. Even when it had been fixed, the batches of transactions have had to be re-run in order, beginning with Tuesday, so that nobody's account goes wrongly into overdraft.The batch processing system, which reconciles the movement of money in and out of more than 10m NatWest and Ulster Bank accounts, did not run correctly for three nights – meaning that millions of transactions were not processed until it did begin running correctly on Friday. Even when it had been fixed, the batches of transactions have had to be re-run in order, beginning with Tuesday, so that nobody's account goes wrongly into overdraft.
A NatWest spokesperson, asked whether it knew how much money people had at any time, said: "All the money is safe in the bank. It's being applied to people's accounts. We can show people statements on screens if they come into branches." The bank is offering special opening hours, extending to 6pm, this week. NatWest also ran extra batches to catch up with the transaction backlog over the weekend.A NatWest spokesperson, asked whether it knew how much money people had at any time, said: "All the money is safe in the bank. It's being applied to people's accounts. We can show people statements on screens if they come into branches." The bank is offering special opening hours, extending to 6pm, this week. NatWest also ran extra batches to catch up with the transaction backlog over the weekend.
NatWest, like many other banks, uses the CA-7 software and attendant files to fit its own custom needs. The problem only surfaced once the batch run was underway in the early hours of Wednesday. RBS appears to have advertised for specialists in CA-7 in February in India – to which a number of its IT jobs were moved after 2010. "Looking for candidates having 4-7 years of experience in Batch Administration using CA7 tool," the advert read. "Urgent Requirement by RBS."NatWest, like many other banks, uses the CA-7 software and attendant files to fit its own custom needs. The problem only surfaced once the batch run was underway in the early hours of Wednesday. RBS appears to have advertised for specialists in CA-7 in February in India – to which a number of its IT jobs were moved after 2010. "Looking for candidates having 4-7 years of experience in Batch Administration using CA7 tool," the advert read. "Urgent Requirement by RBS."
The Unite union has criticised RBS management for cutting jobs in the UK and shifting a number of them offshore. Since 2010, hundreds of IT jobs have been cut from RBS's Edinburgh headquarters and shifted abroad. RBS/NatWest has denied that it made any difference.The Unite union has criticised RBS management for cutting jobs in the UK and shifting a number of them offshore. Since 2010, hundreds of IT jobs have been cut from RBS's Edinburgh headquarters and shifted abroad. RBS/NatWest has denied that it made any difference.
But some observers strongly disagree. "This was not inevitable – you can always avoid problems like this if you test sufficiently," said David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, though not RBS/NatWest. "But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it."But some observers strongly disagree. "This was not inevitable – you can always avoid problems like this if you test sufficiently," said David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, though not RBS/NatWest. "But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it."
One programmer who worked on the RBS/NatWest systems during the takeover in 2001-02 said that the latest problems suggested a paucity of staff on the spot with experience of what to do. "The people in India will have done their darndest to do a good job, but without the knowledge of the overall system that you get from years of experience on the ground, it's easier to see how you get a big operational failure."One programmer who worked on the RBS/NatWest systems during the takeover in 2001-02 said that the latest problems suggested a paucity of staff on the spot with experience of what to do. "The people in India will have done their darndest to do a good job, but without the knowledge of the overall system that you get from years of experience on the ground, it's easier to see how you get a big operational failure."
Banks have for decades used huge mainframe systems to process payments such as cheques and to update customers' accounts; the transactions for each day are collected and are then run in a single gigantic batch overnight, so that accounts have been credited and debited with the correct amounts by the morning. That is why internet banking transactions are not processed if you carry them out after certain times: the banks' systems simply don't add them into the queue for that night's batch.Banks have for decades used huge mainframe systems to process payments such as cheques and to update customers' accounts; the transactions for each day are collected and are then run in a single gigantic batch overnight, so that accounts have been credited and debited with the correct amounts by the morning. That is why internet banking transactions are not processed if you carry them out after certain times: the banks' systems simply don't add them into the queue for that night's batch.
Sources familiar with NatWest's systems, and who have also spoken to staff there, explained that the problems with the update surfaced during the batch run. NatWest confirmed on Monday that the problem first surfaced on Tuesday, and that "we confirmed the fix on Friday".Sources familiar with NatWest's systems, and who have also spoken to staff there, explained that the problems with the update surfaced during the batch run. NatWest confirmed on Monday that the problem first surfaced on Tuesday, and that "we confirmed the fix on Friday".
The problems with the upgrade were spotted during the overnight run ahead of Wednesday morning. "We have guardian systems which spot when things go wrong," a NatWest spokesperson said.The problems with the upgrade were spotted during the overnight run ahead of Wednesday morning. "We have guardian systems which spot when things go wrong," a NatWest spokesperson said.
But by Friday, when the fix was implemented, three sets of batch runs had failed. If a batch fails badly – as here – then all of the transactions, including the payments in and out of accounts, are "rolled back" to the starting point, as if it had never run. The set of transactions from Wednesday was then added to the pending list on Wednesday, and attempted to run in the early hours of Thursday; that too failed. By the time the fix had been done, there were three days' worth of unimplemented transactions queued up.But by Friday, when the fix was implemented, three sets of batch runs had failed. If a batch fails badly – as here – then all of the transactions, including the payments in and out of accounts, are "rolled back" to the starting point, as if it had never run. The set of transactions from Wednesday was then added to the pending list on Wednesday, and attempted to run in the early hours of Thursday; that too failed. By the time the fix had been done, there were three days' worth of unimplemented transactions queued up.
Richard Price, a Norwich-based systems developer who has worked on banking systems that linked into NatWest's, explains: "Banking systems are like a huge game of Jenga [the tower game played with interlaced blocks of wood]. Two unrelated transactions might not look related now, but 500,000 transactions from now they might have a huge relation. So everything needs to be processed in order." Thus Tuesday's batch must run before Wednesday's or Thursday's to avoid, for example, penalising someone who has a large sum of money leave their account on Thursday that might put them in debt but which would be covered by money arriving on Wednesday.Richard Price, a Norwich-based systems developer who has worked on banking systems that linked into NatWest's, explains: "Banking systems are like a huge game of Jenga [the tower game played with interlaced blocks of wood]. Two unrelated transactions might not look related now, but 500,000 transactions from now they might have a huge relation. So everything needs to be processed in order." Thus Tuesday's batch must run before Wednesday's or Thursday's to avoid, for example, penalising someone who has a large sum of money leave their account on Thursday that might put them in debt but which would be covered by money arriving on Wednesday.
Price said that any software update would first have been subjected to quality assurance and user acceptance testing before being implemented.Price said that any software update would first have been subjected to quality assurance and user acceptance testing before being implemented.
CA-7 is familiar to many in the banking industry: it was originally released in 1980 by Uccel – which was then taken over by Computer Associates, which provides key software for scores of banks. Computer Associates told the Guardian: "RBS is a valued CA Technologies customer, we are offering all assistance possible to help them resolve their technical issues which are highly unique to their environment. We do not comment on customer confidential issues." However it declined to say whether CA-7 lay at the heart of the problems.CA-7 is familiar to many in the banking industry: it was originally released in 1980 by Uccel – which was then taken over by Computer Associates, which provides key software for scores of banks. Computer Associates told the Guardian: "RBS is a valued CA Technologies customer, we are offering all assistance possible to help them resolve their technical issues which are highly unique to their environment. We do not comment on customer confidential issues." However it declined to say whether CA-7 lay at the heart of the problems.
Comments
290 comments, displaying first
25 June 2012 4:40PM
The sad thing is that this will cost RBS a lot to fix. You can bet that they won't just eat that out of their profits. More likely that they will save the money by outsourcing something else, cutting staff or charging customers more - all the while moaning that they need to charge retail customers more so these problems don't happen again.
I suspect that these problems won't happen for RBS customers again....I also suspect that other High Street banks have placed orders for extra account transfer forms this week.

Very good article by the way, nice to see an actual news story trying to explain what went on rather than simply re-phrasing 'glitch' and quoting a spokesperson.
Link to this comment:
25 June 2012 4:55PM
Agreed, ImperfectRex - an excellent, informative article.
Link to this comment:
25 June 2012 5:03PM
Utterly shameful that a bank owned largely by the British Government is offshoring jobs, hopefully this will show the management exactly what happens what you get when you lose the in house experience.
I don't bank with Natwest, but my employer does and I rather need my salary this month, lets hope it is fixed by Friday.
Link to this comment:
25 June 2012 5:06PM
As a programmer dealing with something like this is up their with nuclear reactors as Number 1 Nightmare. Erk.
Link to this comment:
25 June 2012 5:09PM
A NatWest spokesperson, asked whether it knew how much money people had at any time, said: "All the money is safe in the bank. It's being applied to people's accounts. We can show people statements on screens if they come into branches."
Ah yes, good to know that the imaginary money is safely locked away into a database somewhere, as it would be terrible to think that this event may cause people to realise what a fantasy the whole banking system is.........
Link to this comment:
25 June 2012 5:23PM
Just a minor correction, the RBS/NatWest main retail system is called CAUSTIC (not CUSTIC) Its a bit geeky but its named for Customer Account Updates Statements Transactions Interest & Charges, although it no longer does Interest or Charges (the Offset program does the interest mainly now, and Service Charge Platform does the charges).
Its written in Mainframe Assembler with a bit of COBOL (and a few sceptre 4GL programs).
Pretty reliable code, just when you trash the whole CA-7 instance running it, all bets are off!
All I can say is I hope someone holds Mr Hester (and my previous management team) to account. If we still had the team in Goodman's Fields (and weren't offshored) then it would have been a whole lot better. I hope the "cost saving" was worth it!
Link to this comment:
25 June 2012 5:29PM
Maybe they really did run CUSTIC instead and omitted a big chunk of Account processing.
Link to this comment:
25 June 2012 5:33PM
EXCELLENT ARTICLE, YES I SHOUT IT LOUD, YOU DESERVE IT!
Well done a*
Link to this comment:
25 June 2012 5:34PM
Like most banks, they slap on a £30 (or more) surcharge milliseconds after go over your overdraft limit.
This is the opposite. Effectively the bank had every customer's money for near on a week, and the customers could not access it. A sort of reverse overdraft.
Will they be reciprocating? You can be damned sure that, even though customers like me couldn't access their money, NatWest was still earning interest on it ... a tidy sum of interest given the amounts involved.
So will NatWest be paying its overdraft dues? I don't just mean reimbursing any losses customers may have incurred. I mean paying for having money that was inaccessible to the owners.
I'm not holding my breath.
Link to this comment:
25 June 2012 5:40PM
Lloyds have just done the same. It's all in the public doman if you do a search. Thousands of jobs have gone very recently.
Link to this comment:
25 June 2012 5:41PM
You see - this is what journalism is - getting to the truth! Shame we don't have more of this instead of cobbled up bits of PR puff.
Link to this comment:
25 June 2012 5:42PM
I've had it with Natwest and RBS. Which bank to move to, Co-op, First Direct? Any thoughts please.
Link to this comment:
25 June 2012 5:45PM
At last an excellent article on this fiasco.
Link to this comment:
25 June 2012 5:45PM
Really good article, so refreshing compared to the usual cut & paste of Press Releases.
I have been a customer of CA and it gives some pleasure to see that a bank is in its clutches!
Link to this comment:
25 June 2012 5:50PM
I'm trying to understand why this would have surfaced on Tuesday night rather than Monday night. (Was last Monday a banking holiday in the UK?) It's possible, of course, but if a change went in over the weekend, you'd think issues would surface after the first end of day batch run, which would be Monday night) rather than on Tuesday night. If they put in the change on Monday night, then they bought themselves the trouble they're in, as Monday's batch processing is HUGE, since it's collecting all the bank's business since Friday and processing it. Just some thoughts.
Link to this comment:
25 June 2012 5:53PM
This comment was removed by a moderator because it didn't abide by our community standards. Replies may also be deleted. For more detail see our FAQs.
25 June 2012 5:53PM
1200 words of detailed guff about IT and Ca-7 and whatnot, and yet there's a note about what Jenga is. Brilliant.
Link to this comment:
25 June 2012 5:53PM
Perhaps the writer of this article should work for the bank since - NatWest confirmed on Monday that the problem first surfaced on Tuesday,
Link to this comment:
25 June 2012 5:54PM
This really is incompetence on a grand scale.
No doubt the poor fool who actually screwed up the update will get hammered.
Meanwhile, the idiot managers, who no doubt got huge bonuses as a reward for the savings they made when they got rid of the competent IT staff in the first place, will now spend even greater amounts on PR consultants to explain why this was just an unfortunate accident which could have happened to anyone, and why it does not reflect at all badly on their decision-making, or the bank's ability to look after their customers' money.
Hopefully, everyone who was affected will move their accounts to another bank immediately, and all banks will get a wake up call that the banks are not just there to skim off as much money as they can, but are there to provide a RELIABLE and STABLE service.
Link to this comment:
25 June 2012 5:58PM
Thanks for doing a story on what went wrong with the technology, and why.
Link to this comment:
25 June 2012 6:02PM
@toffer9
They're all as bad as each other. Best bet has to be to have several accounts from different and unrelated banks - Nationwide may be a good bet as they are unlikely to be sold to another bank and maybe Co-op. Just don't expect to be free from poor customer service and endless cock-ups.
Link to this comment:
25 June 2012 6:02PM
On a purely technical point, the term "glitch" is of course a misnomer in this case, since the problem is not of a transitory nature.
Link to this comment:
25 June 2012 6:07PM
Look, most of us couldn't care less how it works (or not). We have all been forced to get bank accounts, allowing them to take varying amounts of our wages for delivering nothing in return. This breakdown only adds insult to injury.
Link to this comment:
25 June 2012 6:08PM
Which bank to move to, Co-op, First Direct? Any thoughts please.

I've never had any problems with First Direct, but you miss out on the extreme feelings of self-righteousness so many Co-op customers enjoy. The Co-op are also planning a massive expansion, which comes with risks.
On the other hand we're all relying on the successful return of RBS/NatWest to the private sector to get our £40bn back. so do you mind staying put for a bit?
Link to this comment:
25 June 2012 6:12PM
Excellent article, but probably only half the story as the CA upgrade would definitely have been over the weekend. So, it probably went like this:
1. Test environment upgrade failed on Saturday
2. Test environment fixed on Sunday
3. Big meeting Monday to discuss test environment failure and Sundays fix. Confidence high so green light for upgrade late on Monday for implementation on Tuesday.
4. Upgrade failed on Tuesday
5. Sunday's fix on test system ineffective on live system due to differences in the test and live environments.
Please feel free to correct me if I got it wrong.
Link to this comment:
25 June 2012 6:13PM
Much of this comes from http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/
Link to this comment:
25 June 2012 6:17PM
I have worked with a lot of transactional databases and batch scheduling in my time and this is pretty much what I expected had happened.. Yes, as everyone has said, this is a great informative article. People are sick of hearing 'glitch' as some form of explanation. People aren't stupid and deserve full and proper explanations especially when it comes to the security of hard-earned cash and savings.
Link to this comment:
25 June 2012 6:22PM
My tea caddy doesn't have this kind of problem.
Link to this comment:
25 June 2012 6:27PM
This comment was removed by a moderator because it didn't abide by our community standards. Replies may also be deleted. For more detail see our FAQs.
25 June 2012 6:28PM
'ElizabethBathory' - That's spot on. Lloyds have done the same and it will only be a matter of time before they get bitten by taking the cheap labour option. It's very short sighted and when things like this happen it shows you that you have to test properly and thoroughly.
Link to this comment:
25 June 2012 6:30PM
This is the best description of what has happened on the web.
My experience in IT goes back to before CA-7 e.g. JCL and what has been described is most likely correct.
I just hope some good comes out of this fiasco by RBS/NWB and they re-employ those sys progs let go
Link to this comment:
25 June 2012 6:30PM
It's depressing to see time and time again companies attempting to reduce overheads by offshoring their operations to poorly skilled and poorly managed companies that don't understand the businesses they are supporting.
Link to this comment:
25 June 2012 6:35PM
Why do these mainframes require a third party high level scheduler bolted on? ICL's mainframe OS George 3 had one built in back in 1975.
That was when we had a British IT industry, of course.
Link to this comment:
25 June 2012 6:35PM
Having used COBOL 35 years ago - eek - and batch processing, but having moved into a different working environment soon after, I'd assumed that with Internet banking etc all systems had gone real time. So can some kind person explain how they combine virtual real time data on Internet banking with overnight batch processing, please! Ta.
Link to this comment:
25 June 2012 6:40PM
You see - this is what journalism is - getting to the truth! Shame we don't have more of this instead of cobbled up bits of PR puff.
Well said Elizabeth and what about the BBC 6'0 clock news this evening where they are supposed to be journalists too? It led on the story but chose to have it fronted by the millionaire Stephen Hester - naturally he flannelled and his only line of any note was that no one would be out of pocket. As chief executive of RBS he should do the decent thing and resign immediately or if not, he should be sacked but of course there's no chance of either of those things happening.
Link to this comment:
25 June 2012 6:41PM
Wipro, cognizant, TCS, Accenture ( employing mostly Indians ), it doesn't matter.
Pay peanuts, you get monkeys ( I've been in IT since CA-7 was first released as UCC7 ).
Having come to the UK 6 years back from Canada, I was quite shocked how UK banks played along like they were quite serious about offshore development / support, etc. being some kind of solution - it's only a problem major problem ( like this ) waiting to happen.
TBH, very surprised Lloyds wasn't the first to have this happen. My last stint there involved working 18 hours per day over a weekend to install and fix offshore rubbish while the Indian onshore staff stood out front smoking ( as usual ) or checking the pizza website for what freebies they were going to order next.
Of course, after returning home from 4 days of hard labour, I received a call from the Indian resourse manager telling me my contract was now over. The Indians stayed on board.
Link to this comment:
25 June 2012 6:42PM
Over on the CA UK website, they're still promoting something called May Mainframe Madness. You really couldn't make it up.
'The madness continues!' they assure us.
http://www.ca.com/us/lpg/May-Mainframe-Madness/May-Mainframe-Madness-2012.aspx
Link to this comment:
25 June 2012 6:45PM
This sort of thing happens all the time at banks ... and is fixed in the night shift by experienced IT people. Many banks are now shedding these people to increase bonuses completely oblivious of what is in store for them. It is not just that these people are made redundant, it is that they are unappreciated and leave voluntarily.
Link to this comment:
25 June 2012 6:47PM
If you can, think about a Credit Union. At least for some of your money.
I know they're still small in the UK, but they're growing. And the more people who join them, the biogger they'll get. (Duh ... obvious I know.)
In the USA, Canada, Australia and Ireland, Credit Unions are very significant players in the financial field. They give the big banks a good run for their money. And they score way, way higher in terms of customer satisfaction and service. And they put money back into the community, rather than lining the pockets of fat cat tax avoiders far away in the City.
Link to this comment:
25 June 2012 6:54PM
This comment was removed by a moderator because it didn't abide by our community standards. Replies may also be deleted. For more detail see our FAQs.
25 June 2012 6:58PM
It could be a case of too many chiefs and not enough Indians? Outsourcing strategy almost definitely at fault - the loss of knowledge, banking systems are complex enough without spreading teams across continents, time zones, introducing cultural and language barriers.
Link to this comment:
25 June 2012 7:01PM
I wonder if NatWest Directors have cancelled their home insurance?
This move of offshoring their IT is just as negligent as cancelling home insurance - but yes it would improve your short term finances.
I know most managers in the IT industry dismiss local knowledge but any staff on the coal face know how crucial it is.
Also has it occurred to the NatWest/RBS (and others engaged in IT offshoring) that India has endemic corruption? What's to stop a repeat of this fiasco - a bung of a few rupees to the right people would ensure a repeat!
Link to this comment:
25 June 2012 7:05PM
This comment was removed by a moderator because it didn't abide by our community standards. Replies may also be deleted. For more detail see our FAQs.
25 June 2012 7:05PM
My local building society had a minor computer problem a few months ago, but they were able to go back to using pen and paper until the problem was sorted out. I don't think many big banks could cope with that.
Link to this comment:
25 June 2012 7:09PM
I would have expected the FSA to demand these large banks have business continuity plans in place to allow them to continue to operate in the event of IT system failure.
Link to this comment:
25 June 2012 7:13PM
I see the photo at the top of the article is advertising a one year fixed rate bond at 6.5%. This sounds like the best deal since Icesave was offering 10%.
Link to this comment:
25 June 2012 7:16PM
It always amuses me to hear the current media buzzword 'glitch' in contexts like these. In my early days as a design engineer, a 'glitch' was a bit of electrical noise that could cause a logic gate to switch state - one of the pranks of nature that needed to be controlled.
These days it's used to cover blame for crap design, incompetent installation and inadequate testing.
Link to this comment:
25 June 2012 7:17PM
Haven't worked on banking systems (aluminium smelters and artificial blood products more my line) but this sounds depressingly familiar!
Recently worked on a project where a validation phase was supposed to prevent such issues. However since this was outsourced to highly expensive 'validation specialists' they didn't actually know enough to find any problems that the developers didn't catch. Added hugely to the cost and nothing to the quality of the deliverables. Outsourcing it vastly appealing to the bean-counters but when things go wrong they can go wrong on a greater scale.
Link to this comment:
25 June 2012 7:19PM
As someone working in a similar area for another bank (sorry, no names, I still value my payslip!), I have the utmost sympathy for the workers on the ground here. Management too often plough ahead with changes despite staff concerns, and take a "fix-on-fail" approach. I would like to believe ours will take heed from NatWest's catastrophe, but I'm not holding my breath...
Link to this comment:
25 June 2012 7:19PM
It is a shame for the customers, and the branch staff. I feel for them having to deal with some customers who take the opportunity to be abusive. I think the staff should get a share of the directors bonus for having to listen to customer after customer complaining about it. The mainframe system used by Natwest isn't as good as that of other Banks. It is very old, but would cost too much to change. Other Banks have better systems, but of course it only takes a simple error to cause a whole load of hell!
Link to this comment:
Comments on this page are now closed.
Turn autoplay off
Turn autoplay on
Please activate cookies in order to turn autoplay off
Edition: UK
About us
Today's paper
Subscribe
Guardian's investigations suggest bank's problems began on Tuesday night when it updated key piece of software called CA-7
NatWest has admitted that it could not say exactly how much money should be in individual accounts as the crisis caused by a failed software update last week spiralled out of control for days.
The bank was quick to deny claims by the Unite union that the "offshoring" of IT jobs to locations in India had led to the the problems which appeared on Tuesday night and which paralysed its systems through to Friday, and which have not yet been fixed.
However a number of programmers and experts who have worked on or with NatWest systems told the Guardian that they could not imagine the problem happening in the period before the redundancies of experienced staff since 2010.
"[NatWest owner] Royal Bank of Scotland has 40 years' experience running these systems and banks as a rule don't drop the ball like this," one said. "Somebody somewhere made a decision that has led to this."
The Guardian's investigations suggest that NatWest's problems began on Tuesday night when it updated a key piece of software – CA-7, which controls the batch processing systems that deal with retail banking transactions – ahead of the regular nightly run.
RBS/NatWest has not said what went wrong, though one programmer who has worked on RBS/NatWest's systems told the Guardian: "CA-7 is a very common and reliable product used to automate large sequences of batch mainframe work [which are usually referred to as 'jobs']. It will start jobs, wait for them to run, then start other jobs dependent on the first ones completing, and so on. RBS processes accounts overnight via thousands of jobs."
The jobs take transactions from various places, such as ATM withdrawals, bank-to-bank salary payments, and so on, and finish by updating the master copy of the account – in a system known as Caustic – with the definitive balance.
"It seems whoever made the update to CA-7 managed to delete or corrupt the files which hold the schedule for the overnight jobs, so they did not run, or ran incorrectly," the programmer told the Guardian. "They have backed out from this change, but now are trying to play catch-up, and have been doing so for a few days."
The batch processing system, which reconciles the movement of money in and out of more than 10m NatWest and Ulster Bank accounts, did not run correctly for three nights – meaning that millions of transactions were not processed until it did begin running correctly on Friday. Even when it had been fixed, the batches of transactions have had to be re-run in order, beginning with Tuesday, so that nobody's account goes wrongly into overdraft.
A NatWest spokesperson, asked whether it knew how much money people had at any time, said: "All the money is safe in the bank. It's being applied to people's accounts. We can show people statements on screens if they come into branches." The bank is offering special opening hours, extending to 6pm, this week. NatWest also ran extra batches to catch up with the transaction backlog over the weekend.
NatWest, like many other banks, uses the CA-7 software and attendant files to fit its own custom needs. The problem only surfaced once the batch run was underway in the early hours of Wednesday. RBS appears to have advertised for specialists in CA-7 in February in India – to which a number of its IT jobs were moved after 2010. "Looking for candidates having 4-7 years of experience in Batch Administration using CA7 tool," the advert read. "Urgent Requirement by RBS."
The Unite union has criticised RBS management for cutting jobs in the UK and shifting a number of them offshore. Since 2010, hundreds of IT jobs have been cut from RBS's Edinburgh headquarters and shifted abroad. RBS/NatWest has denied that it made any difference.
But some observers strongly disagree. "This was not inevitable – you can always avoid problems like this if you test sufficiently," said David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, though not RBS/NatWest. "But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it."
One programmer who worked on the RBS/NatWest systems during the takeover in 2001-02 said that the latest problems suggested a paucity of staff on the spot with experience of what to do. "The people in India will have done their darndest to do a good job, but without the knowledge of the overall system that you get from years of experience on the ground, it's easier to see how you get a big operational failure."
Banks have for decades used huge mainframe systems to process payments such as cheques and to update customers' accounts; the transactions for each day are collected and are then run in a single gigantic batch overnight, so that accounts have been credited and debited with the correct amounts by the morning. That is why internet banking transactions are not processed if you carry them out after certain times: the banks' systems simply don't add them into the queue for that night's batch.
Sources familiar with NatWest's systems, and who have also spoken to staff there, explained that the problems with the update surfaced during the batch run. NatWest confirmed on Monday that the problem first surfaced on Tuesday, and that "we confirmed the fix on Friday".
The problems with the upgrade were spotted during the overnight run ahead of Wednesday morning. "We have guardian systems which spot when things go wrong," a NatWest spokesperson said.
But by Friday, when the fix was implemented, three sets of batch runs had failed. If a batch fails badly – as here – then all of the transactions, including the payments in and out of accounts, are "rolled back" to the starting point, as if it had never run. The set of transactions from Wednesday was then added to the pending list on Wednesday, and attempted to run in the early hours of Thursday; that too failed. By the time the fix had been done, there were three days' worth of unimplemented transactions queued up.
Richard Price, a Norwich-based systems developer who has worked on banking systems that linked into NatWest's, explains: "Banking systems are like a huge game of Jenga [the tower game played with interlaced blocks of wood]. Two unrelated transactions might not look related now, but 500,000 transactions from now they might have a huge relation. So everything needs to be processed in order." Thus Tuesday's batch must run before Wednesday's or Thursday's to avoid, for example, penalising someone who has a large sum of money leave their account on Thursday that might put them in debt but which would be covered by money arriving on Wednesday.
Price said that any software update would first have been subjected to quality assurance and user acceptance testing before being implemented.
CA-7 is familiar to many in the banking industry: it was originally released in 1980 by Uccel – which was then taken over by Computer Associates, which provides key software for scores of banks. Computer Associates told the Guardian: "RBS is a valued CA Technologies customer, we are offering all assistance possible to help them resolve their technical issues which are highly unique to their environment. We do not comment on customer confidential issues." However it declined to say whether CA-7 lay at the heart of the problems.