Chapter 7: Memory and serial ROM

18-MAR-2024

FD indication on the front panel

After that blown cap, there have been no further incidents. I put the memory back in and switch the system on. The diagnostic display on the front panel goes from FF to FD and stops.

DEC have traditionally used 8-bit codes to indicate the power-on sequence progress. On most systems they were displayed on eight red or amber LEDs at the back. In the DEC 3000 AXP family, however, the higher end models 500/800/900 are equipped with a rather fancy Lights and Switch Module (LSM), part no. 54–21145–02 (or –01 in rack mount variants), with a two digit hexadecimal LED display. The hex digit indicators look like TIL311 or DIS1417.

Quote: FD – Memory sizing completed.
              All MCRs mapped out (no memory detected - fatal error, branches to SROM
              miniconsole).
DEC 3000 Models 600/600S AXP and 800/800S AXP Service Information

FD means ‘no memory found’. The advice given in the manual is to re-seat the SIMMs. Which is what I’ve already done. As I then read on the internet, mere re-seating is often not enough and the contacts and connectors should be properly cleaned with isopropyl alcohol.

19-MAR-2024

Kilrock Anti-Viral Sanitiser

I popped into my local DIY shop for some IPA (isopropyl alcohol, that is) but found none. Just as I was about to leave, I spotted a bottle of sanitiser on sale for £1 (cash only!). From what I remember, IPA is a common ingredient in sanitisers — worth a shot, for a quid anyway.

I pull out all the memory motherboards and SIMMs and give them a thorough clean with a cotton swab soaked in sanitiser. This doesn’t help much, and the machine is again stuck at FD.

At this stage I need access to the diagnostic serial port on the system board to find out what’s going on.

Serial ROM

The Alpha chip uses just three pins to bootstrap itself. They are connected to a serial ROM (SROM). When the processor goes out of reset, the contents of the SROM are loaded directly into the instruction cache and then executed from there. This code then configures the memory and caches, loads the main System ROM from Flash into memory, and transfers control to it.

Once the contents of the serial ROM have been loaded into the instruction cache, the clock and data signals for the SROM become simple I/O pins. The SROM code uses them to implement a bit-bashed serial interface at 9600 or 19 200 baud, because the normal serial ports are provided by the Serial Communications Controller, which is not yet available at this early stage. The buffered signals are routed to the 2x5-pin connector J11 on the system board, where they can be used with any TTL level (5V-tolerant) USB–UART adapter cable or similar device for early power-on diagnostics.

SROM diagnostic port J11      SROM diagnostic port with 3.45V regulator cable plugged in

Two pins at the bottom are connected to the 3.45 V regulator module (probably for voltage monitoring). The pinout isn’t really documented anywhere. Now, a variant of this port used on AlphaStation 200 is somewhat documented, but there is no guarantee it would be the same in the DEC 3000 AXP. Besides, the pin numbering in that document is unusual and I can't guess which pin is where.

After a bit of poking about I discover the pinout.

GND510SROMCDAT/RX
GND4 9BSROMCLK/TX
+12 V3 8+5 V
shorted across to pin 72 7shorted across to pin 2
+3.45 V1 6GND

Since I don’t want to disturb anything related to the power supply, I leave the existing socket connector of the 3.45 V regulator in place. The remaining eight holes in it are empty, and I put four regular jumper wires through them and onto pins 4, 8, 9, and 10 of the J11 connector.

The machine shouldn’t be powered up for any substantial amount of time with the side panel removed.

CAUTION: System Unit Cover.
              If you have removed the system unit cover for any reason, be
              sure to replace it on the system unit and close it securely before
              turning on the system. It is important that the system unit
              cover and side panels be in place while the system is operating
              to insure proper cooling of the system components and devices.
DEC 3000 Model 800/800S/900 AXP Owner’s Guide

Luckily, my 20 cm jumper leads are just about long enough to make it through one of the holes in the top of the chassis, so I can put the side panel back in place.

My comms cable in SROM port      My comms cable sticking out of the chassis

Time to turn on the power and see if there is anything SROM can tell us.

DEC 3000 - M800 SROM 6.1
Powerup Sequence
ff.fd.
Seq/PC  fd000000.000017a0
        *** No usable memory detected ***
        Default Mem Cfg: Banks 0 and 6 = 8MB, both mapped to addr 0.
MCRstat 11411111.11151145
bnkSize 00000000.00000000
memSize 00000000.00000000


SROM> 

Yay! This is the first real sign of life from this system. At least the processor is alive. The memory is not, though, and the SROM code stops at FD and jumps to its ‘miniconsole’ showing the SROM> prompt.

Like the pinout of its diagnostic connector, the miniconsole isn’t documented either. I found something on the internet which describes a much later SROM for a different system. A few commands work but most don’t, including ‘mt’ for memory test.

20-MAR-2024

Memory test

SROM chip 27C512

The “serial” ROM in this system is actually implemented on an 8-bit parallel UV-erasable programmable ROM chip 27C512. It is organised as 8 jumper-selectable bit streams. Stream 0 contains the normal boot code; the others are ‘for manufacturing use’.

SROM select jumpers J1–J8

I shift the jumper through the other positions to check what else is in this SROM. Unfortunately, these systems do not come with a RESET button; instead, there’s the HALT button on the front panel, but it doesn’t do anything until the console software has been loaded from the System ROM (aka Flash). So it’s a power cycle every time I want to run another SROM image. Here’s what I’ve found:

ImageJumperDescription
0J8Powerup Sequence
1J7Mini-Console at 19200 baud
2J6Mini-Console at 9600 baud
3J5Cache Test (longword)
4J4Mfg Test – bctest
5J3Empty (no output)
6J2LongWord Memory test (no cache)
7J1LongWord Memory test (cache on)

The baud rate is 9600 except for image 1. 'bctest' must be referring to the 2 MB write-back backup cache, or Bcache, located on the System Module.

Image 6 looks like what I’m after. I put the jumper on J2 and switch the system on for a closer look. The test displays F0 on the front panel and continuously spews out memory errors:

DEC 3000 - M800 SROM 6.1
Mfg Test
ff.fd.
Seq/PC  fd000000.00001388
        *** No usable memory detected ***
        Default Mem Cfg: Banks 0 and 6 = 8MB, both mapped to addr 0.
MCRstat 11411111.11151145
bnkSize 00000000.00000000
memSize 00000000.00000000

fb.f0.
MCRstat 11411111.11151145
bnkSize 00000000.00000000
memSize 00000000.00000008

        memTest (no-cache)
        LongWord Memory Test

address:407ffdec wrote:ffffffff read:00000000
address:407ffde8 wrote:ffffffff read:00000000
address:407ffdd4 wrote:ffffffff read:00800000
address:407ffdd0 wrote:ffffffff read:00800000
address:407ffdcc wrote:ffffffff read:00800000
address:407ffdc8 wrote:ffffffff read:00800000
address:407ffdb4 wrote:ffffffff read:bb1c44cc
address:407ffdb0 wrote:ffffffff read:bb1c44cc
address:407ffdac wrote:ffffffff read:00800000
address:407ffda8 wrote:ffffffff read:00800000
address:407ffd94 wrote:ffffffff read:00000000
address:407ffd90 wrote:ffffffff read:00000000
address:407ffd8c wrote:ffffffff read:00800000
address:407ffd88 wrote:ffffffff read:00800000
address:407ffd74 wrote:ffffffff read:44e33b73
address:407ffd70 wrote:ffffffff read:44e33b73
address:407ffd6c wrote:ffffffff read:00800000
address:407ffd68 wrote:ffffffff read:00800000
(and so on).

This looks like a lot of errors, but there is pattern to those faulty addresses. Leaving aside the 407ffd prefix for the moment and looking at the last two digits in binary, here’s what we get:

BadGood
ec = 1110 1100
e8 = 1110 1000
e4 = 1110 0100
e0 = 1110 0000
dc = 1101 1100
d8 = 1101 1000
d4 = 1101 0100
d0 = 1101 0000
cc = 1100 1100
c8 = 1100 1000
c4 = 1100 0100
c0 = 1100 0000
bc = 1011 1100
b8 = 1011 1000
b4 = 1011 0100
b0 = 1011 0000

Half of the memory is faulty! And it’s spread across the data bus so that no memory can be used. Let’s see which address bits differentiate good memory from bad:

xxx01x00 – bad
xxx00x00 – good
xxx11x00 – good
xxx10x00 – bad

(The ‘x’ bits can be either 0 or 1.)

This pattern persists throughout the output of the memory test, so we seem to be onto something. Let’s review the memory organisation in the Model 800.

Memory Subsystem.
              The memory subsystem includes the following:
              * Four memory motherboards (MMB) that mount on the system module. To have an
                operational memory subsystem, all four MMBs must be present.
              * The memory arrays are spread among the four MMBs.  Each bank of memory consists
                of eight memory modules, two on each MMB.
              The memory subsystem supports up to 1 gigabyte (GB) of memory.
DEC 3000 Models 600/600S AXP and 800/800S AXP Service Information

The memory bus width is 256 data bits (plus 56 ECC bits). Within a bank, 8 SIMMs are arranged in parallel to form an 8x32 = 256 bit data bus. This means that the five least significant bits of an address select some of the 32 byte lanes on the bus, and higher order bits are used to select a bank and to form an address that goes to all the SIMMs in that bank. In all likelihood, the address is partitioned like this:

Address bits
29 … 54 3 21 0
Bank selector and
SIMM address
SIMM selector
in bank
Byte lane
in SIMM

The memory test operates on four-byte longwords, so the two least significant bits of all addresses shown above are 00. Looking at our table of good and bad addresses, we can see that bits 4-3-2 suggest that memory modules 0, 1, 6, and 7 are good, and modules 2, 3, 4, and 5 are not quite.

Memory Bank Layout
DEC 3000 Models 600/600S AXP and 800/800S AXP Service Information

According to this diagram, the two inwards facing MMBs are somehow out of order. Perhaps those long connectors where MMBs mate with the System Module are still oxidised. After all, they weren’t easy to reach with a cotton swab and sanitiser, especially the sockets on the MMBs. And the sanitiser, as it turned out, doesn’t list any IPA on its label.

MMB female connector

21-MAR-2024

WD-40 Specialist Contact Cleaner

I venture again to Wickes in search of isopropyl alcohol. Eventually I spot this WD-40 Specialist Contact Cleaner in the automotive section for £7.80.

Starting with MMB1 on the right hand side, I pull it out and place onto the anti-static mat. Both male and female connectors get drenched in the contact cleaner as I spray it onto every pin and into every receptacle. I also mate them three times to spread the liquid before it has evaporated. The SIMMs and their sockets receive some thorough soaking as well.

Before trying the memory test again, I swap this MMB with its neighbour, which previously showed no errors. In case my cleaning has been unsuccessful, this will help me understand whether the fault is on MMB1 or on the System Module. Here’s what I get now:

DEC 3000 - M800 SROM 6.1
Mfg Test
ff.fd.
Seq/PC  fd000000.00001388
        *** No usable memory detected ***
        Default Mem Cfg: Banks 0 and 6 = 8MB, both mapped to addr 0.
MCRstat 11411111.11151145
bnkSize 00000000.00000000
memSize 00000000.00000000

fb.f0.
MCRstat 11411111.11151145
bnkSize 00000000.00000000
memSize 00000000.00000008

        memTest (no-cache)
        LongWord Memory Test

address:407fbff4 wrote:ffffffff read:00000000
address:407fbff0 wrote:ffffffff read:00000000
address:407fbfd4 wrote:ffffffff read:00000000
address:407fbfd0 wrote:ffffffff read:00000000
address:407fbfb4 wrote:ffffffff read:00000000
address:407fbfb0 wrote:ffffffff read:00000000
address:407fbf94 wrote:ffffffff read:00000000
address:407fbf90 wrote:ffffffff read:00000000
address:407fbf74 wrote:ffffffff read:00000000
address:407fbf70 wrote:ffffffff read:00000000
address:407fbf54 wrote:ffffffff read:00000000
address:407fbf50 wrote:ffffffff read:00000000
address:407fbf34 wrote:ffffffff read:00000000
address:407fbf30 wrote:ffffffff read:00000000

Aha! Firstly, the errors in the 407ff…–407fc… range appear to have gone away, but that’s not important right now. I’ve found what I was hoping for: there are no longer errors at the addresses ending in ec, e8, cc, c8, ac, a8, and so on. These followed the xxx01x00 pattern pointing at SIMM 2 and SIMM 3 on MMB1 on the right hand side, which I’ve just bathed in copious amounts of contact cleaner.

22-MAR-2024

Okay. Rinse and repeat, now with the MMB on the left hand side:
  1. Spray contact cleaner onto pins on the System Module.
  2. Spray it into holes on the MMB.
  3. Put MMB in and pull it out three times.
  4. Clean SIMMs and their slots.
  5. Optionally, swap two MMBs for further diagnostics.
And now it’s time for another test.
DEC 3000 - M800 SROM 6.1
Mfg Test
ff.fd.fb.f0.
MCRstat 11111111.11801180
bnkSize 00000200.00000500
memSize 00000040.00000040

        memTest (no-cache)
        LongWord Memory Test

....done.
....done.
....done.
....done.
....done.

Each dot takes a while before they resolve into a satisfying ‘done.’ Mesmerising. I could watch this all day.