[Info-vax] EMC to Hitachi Array Migration

Sat Mar 14 21:02:21 EDT 2015

> -----Original Message-----
> From: Info-vax [mailto:info-vax-bounces at info-vax.com] On Behalf Of
> dodecahedron99 at gmail.com
> Sent: 13-Mar-15 6:16 PM
> To: info-vax at info-vax.com
> Subject: Re: [New Info-vax] EMC to Hitachi Array Migration
> 
> On Thursday, February 26, 2015 at 2:10:36 AM UTC+11, Stephen Hoffman
> wrote:
> > On 2015-02-25 12:45:31 +0000, RGB said:
> >
> 
> < snip >
> 
> >
> > That ignorance isn't limited to OpenVMS, unfortunately.   Each new
> > generation of storage engineering and storage management
> eventually
> > seems to learn that disks and disk controllers don't know enough to
> > perform a reliable recovery or a reliable backup at the level of the
> > storage devices -- disks and disk controllers have no concept of
> > quiescence; of the quiet points and transaction boundaries.  Customers
> > unfortunately get to learn about this when the recovery fails.
> > Because there's typically the I/O of multiple threads and multiple
> > operations in parallel, it's not trivial for the host to tell the
> > controller "consistent and quiesced; get your snapshot", either -- and
> > that sort of host-to-controller communication is not typical.
> >
> > TL;DR: Folks that try block-level data migrations on the fly can end up
> > unhappy, depending on how the storage cut-over point matches the
> last
> > complete wad of I/Os from the active applications.
> >
> >
> > --
> > Pure Personal Opinion | HoffmanLabs LLC
> 
> It was many moons ago at another place, in another time that we moved
> from EMC to Hitachi as a test
> 
> I'm not sure of the logistics but I believe some type of live migration was
> tried and then we switched back to backup/restore method to get a
> clean data snapshot as issues appeared around rdb, perhaps migration
> tools have improved since then?
> 
> The Hitachi's outperformed the EMC symmetrics frame by around 40%
> but this was a number of years ago. The Hitachi system was then
> decommissioned about 3 months later! It was nothing but politics being
> played to force EMC to be more reasonable on their pricing (so I was
> told). All pretty nasty business practices back then...
> 
> Wind the clock forward somewhat, but still a few years back, we tried
> EMC to EMC using their so called srdf live migration which supposedly
> would guarantee DR type redundancy but as Hoff expertly states, it's the
> cut-over point and what was in progress when that cut-over happens,
> that makes all the difference.
> SFDF has hooks for Oracle to ensure the database can help EMC switch at
> a known point, RDB we found has no such hooks and upon further
> testing, 1 in 4 cut-over's failed to create a copy on another EMC frame
> that we could reliably use to bring RDB back up again, making it almost
> useless for a DR solution.
> 
> We ended up going for a VMS backup solution and shipping the data
> over to the DR site for trickle updating at set intervals. The DR site was a
> warm standby system, someone didn't want to pay to have a full hot
> standby so we couldn't use things like rdb replication etc.
> 

Typically the SAN based replication licenses like EMC's SRDF are much 
more expensive than OpenVMS's host based shadowing.

> Maybe things have improved with EMC but I doubt it, VMS is a
> diminishing fish in a large pond and the people out there who know
> storage and who know VMS are few and far between. Even people an
> XP arrays seem to know less and less about VMS going forward too.
>
> The issue with RDB was having a transaction cut mid stream and then
> RDB upon opening the database on the DR system, wanting to find the
> ruj to rollback the failed transaction but the ruj was incomplete having
> been axed midstream so the rollback failed and the database then
> refused to open, but I am scratching my head as to further details, I was
> a side party to the actual events
> 

Regardless of the technology (OS or SAN or appliance based), there are 
2 basic types of inter-site data replication:

1. Asynchronous - you must assume that there is a very high probability
that *some* data will be lost during a significant event. Performs 
better at local site as updates to the remote site are queued up & local
application does not wait for a remote site ack . Data that might be lost 
depends on queuing sizes and replication intervals. Sites can be very 
far apart (hundreds or thousands of KM). Also called active-passive.

2. Synchronous - writes are applied to both sites and only considered
complete when writes are successful at both sites. For those with gray
hair, this might also be considered in context of 2 phase commit. 
Site distances depend on read-write ratio but the best practice for
Most applications like financial shops is the sites should be within 
100km. Also called active-active. 

OpenVMS host based shadowing is synchronous.

> Perhaps the 'new' file system might have some hooks put into it that
> would allow third party access to trigger quiescent points on the file
> system?
> 

Imho, this feature would be more applicable to improving features at
the local site e.g. being able to fail processes over to different OS's.

inter site data challenges will still depend on whether asynch vs. sync 
strategy is in place.

> RGB, please post the eventual solution of what happened when it's all
> done and dusted so that others researching can be enlightened as to
> what worked and what didn't :-)

Regards,

Kerry Main
Back to the Future IT Inc.
 .. Learning from the past to plan the future

Kerry dot main at backtothefutureit dot com