[Info-vax] Weird problem with non-booting AlphaServer 800.

Tue Jan 8 18:36:47 EST 2013

David Froble wrote 2013-01-09 00:16:
> Jan-Erik Soderholm wrote:
>> Stephen Hoffman wrote 2013-01-07 18:13:
>>> On 2013-01-07 16:49:14 +0000, Jan-Erik Soderholm said:
>>>>
>>>>  >>>boot dka100 -flags 0,1
>>>> (boot dka100.1.0.5.0 -flags 0,1)
>>>> block 0 of dka100.1.0.5.0 is a valid boot block
>>>> reading 1168 blocks from dka100.1.0.5.0
>>>> bootstrap code read in
>>>> base = 1d6000, image_start = 0, image_bytes = 92000
>>>> initializing HWRPB at 2000
>>>> initializing page table at 7ffce000
>>>> initializing machine state
>>>> setting affinity to the primary CPU
>>>> jumping to bootstrap code
>>>>
>>>> [hanging indefinitley here...]
>>>
>>> That's gotten through the primitive boot stuff and has tried to transition
>>> to what looks to be the OpenVMS Alpha V7.3-2 APB primary bootstrap.
>>>
>>> Enable the verbose boot-time diagnostics (add 30000 to the boot flags), and
>>> see what gets posted from APB, if anything.
>>>
>>
>> Result with the 30000 flags value :
>>
>>  >>>boot -flags 0,30000
>> (boot dka0.0.0.5.0 -flags 0,30000)
>> block 0 of dka0.0.0.5.0 is a valid boot block
>> reading 1168 blocks from dka0.0.0.5.0
>> bootstrap code read in
>> base = 1d6000, image_start = 0, image_bytes = 92000
>> initializing HWRPB at 2000
>> initializing page table at 7ffce000
>> initializing machine state
>> setting affinity to the primary CPU
>> jumping to bootstrap code
>>  >>>
>>
>> The ">>>" is redisplayed after a 1-2 min delay with no activity
>> at all on the console.
>>
>> I've now asked for a CD distro to be mounted in the CD to try
>> to boot from that device.
>>
>> A new (well, a used but another) AS800 is on the way from our
>> distrubutor. Together with a techie...
>>
>> Jan-Erik.
>>
>>
>>
>>
>>
>>> Try booting from a different disk, a distro, or booting as a diskless
>>> satellite.
>>>
>>> Guesses...
>>>
>>> Might be a device name problem (particularly if allocation classes are in
>>> play), or it could be a failing disk, or failing disk controller, or
>>> incorrect termination, etc.
>>>
>>> I'd probably replace the battery, too, as that's cheap (if that box uses a
>>> coin cell), and a failing battery can sometimes cause weird errors.
>>>
>>> Some troubleshooting <http://labs.hoffmanlabs.com/node/192> details.
>>>
>>> Or alternatively, retire this box and roll in a less-fossil Alpha; an EV6
>>> or later.
>>>
>>>
>>
>
> Why don't you just jump into the company jet and fly over there, so you can
> get your hands on the system ??

The system is back up.
It was the missing ALLOCLASS = "1" that seems to have
stopped the whole boot process.

The system and data disks are now shadowed.

The problem right *now* is that CDD still points to the
DKAnnn device for the integrated database that is now
located on a DSAnnn device. I have got an Oracle Support
knowledgebase article that has som CDO commands I'll try
tomorrow. It also seems as one could do an new "integerate"
to sort that out, but I have never use CDD or CDO before.

The commands to do the reintegration seems easy enough.
Example from the Oracle document:

$ mcr cdo
CDO> delete generic cdd$database mydb.
$ mcr sql$
SQL> integrate database filename disk$user2:[mine]mydb.rdb
cont> create pathname cdd$default.mydb;

There is also a description on how to patch CDD directly if
"time is not available to do the integrate", as the paper
says. I have the time, so the first procedure is probably
easier/faster.

Continuing tomorrow...

Jan-Erik.