BitScope Control Plane

Table of Contents

A cluster control plane is the sub-system that allows a cluster manager (i.e. a computer) to manage operation of the cluster as a whole. Most importantly, it enables the remote management of the power and cooling of each node individually. The control plane in BitScope Clusters also allows real-time monitoring of key system parameters including power supply voltage and current and the current and temperature of every node. It provides out-of-band communication channels to share data between nodes and/or for the manager to access nodes without using the primary network e.g. in the event of network failure or when nodes do not have network access. The control plane can perform graceful power-down, error recovery, remote logging and debugging. It uses the BIOS in the BMC built into Cluster Blade to access nodes down to the firmware, bootloader and kernel level. It supports remote access to system console on each node which is vital for cluster management down to the metal and for systems or kernel layer software development or debugging.

Connect

(open BMC connection)

Each BMC is accessed by its host (i.e. the Raspberry Pi) via the primary serial port. This serial port appears on GPIO 14 & 15 but the signals are (internally) connected to the BMC so you don't need to connect anything to these pins to use the BMC.

The device name used to access the serial port depends on the bootloader or operating system. For Raspberry Pi O/S it is usually named /dev/serial0. When using the BMC this serial port must not also be used for other purposes such as the system console or a connected HAT. The BMC is locked by default to prevent accidental usage collisions of this sort.

The serial port on the Raspberry Pi must be enabled (read more).

Do not enable the system console on any host to be used as a cluster manager. If the console is enabled, you may still reach the BMC but characters will be lost and for practical purposes, the BMC will be unusable (from the host, it's always accessible via the control bus).

When the serial port is enabled (and the console disabled) the BMC may be accessed interactively with a terminal program (e.g. GTKTerm or GNU Screen):

  1. open the serial port at 115200 baud with no flow control,

    $ screen /dev/serial0 115200 # screen example
    
  2. unlock the BMC (or disable the lock) and
  3. start using the BMC.

All the commands described in the remainder of this manual assume you have the terminal open on the node you are using as the cluster manager.

  1. If no characters are echoed as you type, be sure you have unlocked the BMC correctly.
  2. If you see garbled characters or missing characters (A) check you have opened the correct serial port /dev/serial0 at (B) the correct baud rate or that (C) that the system console is not enabled (D) there is not another process that has also opened the serial port or (E) there is no HAT or other device connected to GPIO 14 and 15.
  3. You may prefer to configure your terminal to perform local <CR> echo upon receipt of <LF> for a prettier display (like many devices, the BMC uses only <LF> characters).

Unlock

(unlock BMC for use)

Upon BMC power-on or reset the serial port to the host is locked. No commands or characters are accepted or echoed when the port is locked. Locking the BMC ensures that if the host's bootloader, firmware or the kernel or a user program boots assuming it has control of the serial port its use of the serial port will not collide with communications with the BMC.

Unlocking simply requires a special sequence of characters to be sent to the BMC.

The default unlock sequence is UnLockMe.

In the unlikely event that this default sequence is emitted by a host in a given deployment, it can be changed to a different sequence or the BMC can be locked off altogether.

Locking is configurable in four ways:

TYPE DESCRIPTION
LOCKED The node cannot unlock its own BMC. Only the cluster manager can unlock the BMC (via the control bus). This type is typically used in managed or untrusted cluster.
ANONYMOUS The receipt of up to 8 characters will unlock the BMC (8 characters is the default).
PATTERN The receipt of up to 8 user specified characters will unlock the BMC.
UNLOCKED The BMC is automatically unlocked immediately upon boot or reset.

Unlock is configured using the Unlock configuration parameter. It is an 8 character string. After boot or reset, the host must send the correct character sequence to unlock the BMC. The character codes (in hex) available to use by Unlock are as follows:

01 ~ 7f A literal character code that must be matched. Used to create a PATTERN sequence. Can be shorten from 8 characters by terminating with 00.
ff The wildcard code which matches any incoming character. Used to create ANONYMOUS or PATTERN sequences that include wildcards.
80 ~ fe A blocking code that matches no incoming character. If this code appears anywhere in the sequence the BMC is LOCKED
00 Acceptance code that terminates the matching sequence and accepts. If it's the first character in the sequence the BMC UNLOCKED upon boot.

Some example sequences might be:

42697453636f7065 The literal string BitScope must be received to unlock the BMC.
ffffffffffffffff Any 8 character string will unlock the BMC.
00ffffffffffffff The BMC will reboot/reset unlocked.
ff00ffffffffffff Any single character will unlock the BMC.
feffffffffffffff Nothing can unlock the BMC.
48656c6c6f2100ff The literal string Hello! will unlock the BMC.

BMC Status

(report BMC status)

The = command reports BMC Status. It is the command most frequently used interactively. It reports the status of the local or a remote BMC. It returns:

ID MS XX YY ZZ

where

FIELD VALUES MEANING
ID 00 to 7f Node address in the cluster. Added in CB04A016.
MS 00 or ff Node status (00 master ff slave). Added in CB04A016.
XX 0 1 2 Power State (defined here)
YY U8 Current Draw (8 bit copy of the 04M current measurement defined here)
ZZ U8 Fan Speed as calculated (defined here).

This command is always available. It is useful to quickly discover whether the BMC is working locally or remotely. When used remotely, it reports which remote node is attached. In all cases it reports the prevailing processing load via the proxies of current load and fan speed

Command Pipe

(become cluster master)

The Command Pipe is the mechanism by which a node becomes the bus master. When active, the pipe passes commands from the master (the cluster manager) to a slave (a managed node, via the control bus). Replies are returned from the slave back to the master. The pipe may be opened for interactive or non-interactive use. Either method achieves the same result; the master connects with a slave over the bus and all commands thereafter executed on the slave.

The interractive pipe commands are:

| Open Pipe to forward commands to the bus and received replies from the bus. The local BMC is not accessible when the pipe is open.
^G Close Pipe to return control to the local BMC. Only meaningful when the pipe was opened with | and is ignored otherwise. This character may be configured to be something other than ^G.

The non-interractive pipe commands are:

{ Open Pipe to forward commands to the bus and received replies from the bus. The local BMC is not accessible when the pipe is open.
} Close Pipe to return control to the local BMC. Only meaningful when the pipe was opened with { and is ignored otherwise.

Talking to the bus on its own is not very useful. Normally, when opening the pipe a slave address is specified. For example, to open the slave at address 32 the command sequence is [20]|. The addressed slave will attach to the bus to become the target for all commands issued by the master until the pipe is closed.

The addressed slave must exist. Confirmation that pipe has been successfully opened and the addressed slave is responding can be discovered by issuing the # command. The slave will respond with its UUID which will be different to the master UUID. The status command = may also be used to determine the address of the attached slave and the fact that it is a slave.

Once connection is established all commands, except ^G (or } if opened with { or * if using the beta edition) issued by the master are received by the slave and all replies from the slave including reply data are returned to the master. In this way the master as full control of the slave until it sends ^G or } or * to close the pipe.

If the selected slave does not exist the pipe will still connect to the bus. Any command then issued (unless the console is opened with ~) will be echoed but no command replies will appear. For example the # will not respond with a UUID. However, the echoed # command itself confirms the pipe is open and the bus is working (this is effectively a loopback test of the bus).

Power Control

(power up/down nodes)

The primary function of a cluster control plane is to manage node power. The / and \ commands are available to do this. These commands are ignored if sent to the local BMC. They are only useful to manage power on a slave node issued from a master via the control bus using the command pipe bus mastership mechanism.

CMD COMMAND MEANING
/ POWERON Turn power on (starts node)
\ POWEROFF Turn power off (hard stop)

These commands update the power state:

TOK STATE MEANING
0 OFF Power is OFF (host may be disabled or unavailable)
1 ENABLED Power is ON (host is available and power has been applied)
2 DISABLED Power in ON but power has been manually disabled

The prevailing power state is reported by the status command =.

Upon receipt of the POWEROFF command \ the slave immediately powers off (state 0).

The \ command is a hard power down. If the slave node being powered off needs to perform housekeeping before power is removed, it's up to higher layer procotols (used to manage the cluster) to ensure this happens. If the cluster nodes do not have other means to communicate (to do this), the BMC provides a mailbox to enable protocols of this type to be built. Alternatively, the console (if enabled on the slave) can be used. The BMC does not otherwise know or define how to perform a graceful power down.

When the / command is issued, the resulting state may be 1 or 2. If 1 the node has successfully powered on. If 2 the power has been enabled but the node will not power on because the power override shunt has been applied (i.e. it's physically connected to the node) or the node may be unoccupied by a host.

The override shunt is intended as a mechanism to clamp a node from powering on regardless of the BMC. It's useful when installing or commissioning new systems or when the cluster control plane is not being used.

If using a beta edition, the / and \ commands echo the next state 1 and 0. In production editions the state is not reported. Use the status command = to determine the state instead.

Node Cooling

(manage node cooling)

The Fan F command is used to monitor and control the cooling fan. An opcode is specified upon issuing F to select whether to report or update the fan parameters.

OPCODE FUNCTION
00 Report Fan Parameters
01 Update and Report Fan Parameters

There are three fan parameters specified via parameter registers:

FIELD PARAMETER REG TYPE MEANING
XX OFFSET vmFanLo Q8 Lower bound to fan speed.
YY LIMIT vmFanHi Q8 Upper bound to fan speed.
ZZ SCALE vmFanGain U8 Scale applied to current measurement to calculate the fan speed.

Upon execution of F the command returns:

XX YY ZZ VV

It does this regardless of which opcode is selected. The VV field is the calculated Q8 fan speed value. If opcode [01]F is used the values in the registers are used to update the parameters prior to returning them. The reported VV is the value prevailing before the parameters changed. Use [00]F a short time later to see the updated VV value (it takes a short time to update).

BMC fan control operates by measuring the current draw of the node and calculating a fan speed required to dissipate the heat generated. This is the same current measurement as reported by the = command.

The SCALE parameter applies a scale FACTOR (specified via TOKEN) to scale the current value applied to the fan speed algorithm:

TOKEN FACTOR
00 4
01 2
02 1
03 1/2

This scaled value is then added to OFFSET and limited to LIMIT before being applied to control the fan.

Console Port

(open a console)

The Console command ~ allows a master to communicate with a slave host via the slave's console port over the control bus. This enables full remote access to the host at the firmware, bootloader, kernel and user layer. This is ideal for system adminstration purposes or software development at the firmware, kernel, system or user level on any host on the cluster. To use the console, the master opens a pipe (to a selected slave) and then opens the console (to the selected slave's host). For example:

[7e] | [03]~

which opens a pipe to node 7e and then the console (on node 7e) at a baud rate of 57k6. From that point on, all characters sent by the master are forwarded to the slave's host via its console and all characters sent from the slave host via the console are returned to the master. The console is terminated upon closing the pipe. The available baud rates are:

CMD DESCRIPTION
[00]~ Open console at 115,200
[01]~ Open console at 9,200
[02]~ Open console at 19,200
[03]~ Open console at 57,600

Each command opens the console for bi-directional traffic immediately at the selected baud rate.

The console supports all standard 7-bit ASCII characters. That is, characters with code values from 0x01 to 0x7f. The NULL character 0x00 and extended ASCII or UNICODE characters are ignored.

Console Baud Rates

Baud rates up to 115,200 are supported. However, the console port on most hosts including Raspberry Pi is not subject to flow control. This means that if a large number of characters are sent from the host back-to-back via the console over the bus, overrun (i.e. missed characters) may occur. Whether this happens depends on what you're talking to (e.g. a bootloader, kernel or user process) on the remote host and what its actual baud rate is. Some hosts in some modes may transmit at a slightly higher rate than 115k baud (it depends on how the baud rate is generated). While Cluster Blade is designed to accomodate these abnormally higher rates, the Raspberry Pi you're using as the master node may not. If you are experiencing problems we recommend configuring your remote host to use one of the lower supported baud rates. For example, at 57k6 overrun is impossible. Note: when using an external Bus Controller, this problem will not occur (the controller is fast enough).

Open Console Mute

CB04A020

It is possible to open the console as mute. This means the console will wait to receive a certain number of characters before it sends any characters back to the master. This is useful for error recovery when the remote host has "gone rogue".

The command to do this is the same as opening the console but with an additional specifier that tells the console how many characters to receive before enabling transmit. The high digit of the baud rate specifier is used for this as follows:

CMD DESCRIPTION
[20]~ Open console at 115,200, start sending after receiving 2 characters
[11]~ Open console at 9,900, start sending after receiving 1 character
[83]~ Open console at 57,600, start sending after receiving 8 characters

Up to 15 "mute echo characters" can be sent this way. This is usually more than enough to recover most bootloaders and operating systems.

Remote Host Error Recovery

When the console is open, a runaway process or misbehaving command (on the remote host) may send unlimited and arbitrary data back to the master. When used interactively the usual way to resolve this is to issue an interrupt ^C or termination ^D character to the runaway process to stop it. A problem may arise if the remote host sends so much data so fast that the master cannot get a termination character sent across the bus to stop it.

The solution is simple.

End the pipe (which closes the console), reconnect and reopen the console mute. At this point the console is open but the remote host cannot send any characters. Send the necessary termination (e.g. ^C) and then resume normal operation.

Nuclear Option.

It may be that the remote host has gone completely rogue and will not respond to any termination characters sent to it. In this case you still have the Nuclear Option which is to (A) close the pipe (B) reconnect to the slave and (C) power down or repower the host to (hard) reboot it or return to the bootloader (if it's been configured on that host).

Only recommended if you've completely lost control (hard reboots can corrupt disk images).

Programming Interface

CB04A016

The BitScope Control Plane is an extension of the BitScope I/O System. It is similar to the BitScope Virtual Machine but designed to manage clusters not general purpose test and measurement. Like all BitScope VM it uses a set of registers and commands accessed via a simple serial protocol.

Commands

CB04A016

CMD GROUP ACTION
? ID Print TID (type identifier) aka the firmware revision. Use this command to determine the BMC revision (e.g. CB04A016). Use the revision to determine which other commands are available in the BMC.
# ID Print UUID (universally unique identifier). Use UUID as a key to access information (e.g. in a database describing a particular cluster deployment) for each physical node.
     
[ Entry Commence data entry. Clears vmInput. This command is optional but it is shown used in this document for clarity.
] Entry Conclude data entry. May push vmInput onto the stack. This command is optional but it is shown used in this document for clarity.
0..9 Entry Increment vmInput by the digit specified and left-shift.
a..f Entry Increment vmInput by the hex digit specified and left-shift.
     
@ Register Set Address Register vmPointer.
+ Register Increment Value Indirect via vmPointer.
- Register Decrement Value Indirect via vmPointer.
s Register Store the value in vmInput to register vmPointer.
p Register Print register at vmPointer.
n Register Increment vmPointer.
z Register Store the value in vmInput to register vmPointer and post-increment.
     
. Execute End of Sentence (EOS). Context released. NOP.
! Execute Soft reset and terminate active operation. Requires vmInput to have the value 55 (to avoid accidental execution).
     
{ Bus Open a non-interactive pipe on the control bus to communicate with a (slave) node. The selected node is specified via vmInput prior to issuing this command.
} Bus Close an open non-interactive pipe. The selected slave detaches from the bus.
     
| Bus Open an interactive pipe on the control bus to communicate with a (slave) node. The node is selected via vmInput. Command added in CB04A016.
^G Bus Close an open interactive pipe. The selected slave detaches from the bus. If the slave's console is open, it will be closed. The character (^G) can be configured to be a different character (e.g. ^D). Command added in CB04A016. Use the * in earlier editions.
     
~ Console Open the console on a slave at a baud rate specified by vmInput (00 115k 01 9k6 02 19k2 03 56k7). The master must first open a pipe to the slave. The console is closed when the pipe (on the master) is closed. Command was added in CB04A016.
     
/ Power Turn power on (which starts node boot).
\ Power Turn power off (hard stop).
= Power Request status (what are you doing?).
     
R Memory Read SRAM.
W Memory Write SRAM.
S Memory Dump SRAM.
     
r Memory Read from EEPROM.
w Memory Write to EEPROM.
     
M Module Read an A/D channel and return as U16.
F Module Read and/or Update FAN Control.
C Module Machine Calibration Coefficients. Use the C command in beta editions.
     
` Comms Mailbox Exchange.

Registers

CB04A020

REGISTER ADDR SIZE DESCRIPTION
vmInput 00 1 Input Register
vmPointer 01 1 Address Register (VMR)
vmStore 02 1 Address Register (EEPROM)
vmAddress 08 2 Address Register (RAM)
vmCount 0a 2 Dump Size (RAM)
vmFanLo 10 1 Fan Low Speed Limit
vmFanHi 11 1 Fan High Speed Limit
vmFanGain 12 1 Fan Gain Factor
vmSlave 1c 1 Slave Status.
vmError 1d 1 Error Status.
vmStation 1e 1 Logical Node Address (live).
vmIdent 1f 1 Physical Node Address (Hardware assigned).

Station

vmStation vmIdent

Each node has a unique (geographical) location (within a cluster) at which it may be accessed. This location is referred to as the node's station (or ID in shorthand). Its default value is assigned in hardware. Literally it is address of the physical location of the node in the cluster. Its value may be modified by a configuration setting under certain circumstances.

Two registers report the ID for each node's BMC:

REGISTER ADDR SIZE DESCRIPTION
vmStation 1e 1 Logical Node ID (Live).
vmIdent 1f 1 Physical Node ID (Hardware assigned).

The vmStation is assigned the value of vmIdent at boot time unless (re)configured:

ADDR PARAMETER
7e Location Override. When assigned, its value will be used to set the node ID in the cluster at boot time if the blade is stand-alone or in an ad-hoc cluster. Legal values range from 00 to 77 (120 nodes). If the blade is located in a BitScope Cluster the physical location of the blade in the cluster will override this.

In this case the vmStation will be assigned the value of the configuration variable 7e, unless the blade is mounted in a BitScope Cluster, in which case the vmIndent value applies which cannot be overriden.

The BMC detects whether the Blade is physically located is part of a BitScope Cluster by checking the node ID. When a Blade is not part of the cluster, the ID for each node is virtual and assigned the following defaults:

N ID DESCRIPTION
1 7c First and (default) master node. HDMI and USB are accessible from this node.
2 7d Second node on the Blade.
3 7e Third node on the Blade.
4 7f Fourth node on the Blade.

Where N is an arbitrary node number within a cluster and ID is the station, i.e. the node's BMC address on the control plane. When a Blade is part of a cluster the ID will be automatically set to be different to these defaults. The ID for each node depends on the physical cluster and is immutable based on how the hardware has been commissioned.

When the BMC detects a node with one of the default ID listed above, it knows the blade is operating stand-alone. In this case, the default ID may be used as is and this is recommended.

When a blade is part of an ad-hoc cluster (i.e. custom cluster built without BitScope's cluster infrastructure) which comprises more than one blade, the ID of the nodes in the blades must be configured to be different to each other to avoid goegraphic address (i.e. ID) collisions on the bus. If you don't do this, all blades will locate their nodes at the same four addresses on the bus; 7c, 7d, 7e and 7f. This of course results in collision. This is an ad-hoc cluster commissioning responsibility which should be undertaken, one at a time, on each Blade before the blades are interconnected into a single cluster. This need only be done once and only when building ad-hoc clusters.

Errors

vmError

There are a range of potential error conditions. The set of errors that are detected are:

ERRNO VALUE MEANING
0 00 No Error. Situation Normal.
-1 ff Host frame error (Break).
-2 fe Bus frame error (Break).
-3 fd Host channel overrun.
-4 fc Host pipe overrun.
-5 fb Bus pipe overrun.
-6 fa Bus channel overrun.

Any error is reported via the vmError register.

REGISTER ADDR SIZE DESCRIPTION
vmError 1d 1 Error Status.

The vmError value must be manually reset to 00 if you seek to detect whether any error has occured. However, none of these errors are serious so all can be ignored (simply retry the operation if it did not succeed the first time). Note: errors prior to the one reported may have occured. However, such errors are undetectable (only the most recent error is reported).

Reset

!

The BMC can be reset. This is not recommended in normal usage but it can be useful to reload configuration parameters (perhaps after changing them) or to recover normal operation when uncertain as to the state of the BMC. The ! command activates reset. However, as a safety precaution, it will only fire if the value 55 is loaded to the vmInput register (i.e. issue [55]! as a command sequence).

If the Power Off configuration parameter is set, issuing [55]! will power-off the node.

BMC Modules

CB04A016

Modules are sub-systems of the BMC. They are accessed and executed via one or more commands acting on zero or more registers. The commands and registers used depend on which module is being used. The modules implemented in BMC CB04A016 (and later editions) are:

MODULE COMMANDS DESCRIPTION
METER M A/D Module (Read any A/D channel and return as U16)
MAILBOX ` Token Exchange. Used to implement power control protocols, between the cluster manager and a cluster node.
SRAM R W S Shared Memory. Use is application specific.
EEPROM r w Persistent Memory. Used for Device Configuration and Boot-Time program execution.
CAL C CAL Memory (Machine Calibration Coefficients)

Measurements

M

The Meter M command is used to measure voltages, currents and temperature. An opcode is specified upon issing M to select which signal measure. The CODE returned as an unsigned Q16 with the following meanings:

CMD SIGNAL UNIT RANGE CODE VALUE
[00]M Ground (VDD) V 4.096 0040 4 mV
[01]M RAW Current A 4.096 2100 528 mA
[02]M RAW Voltage V 45.056 8740 23.77 V
[03]M VRef 2.048V V 4.096 80c0 2.060 V
[04]M Node Current A 4.096 2680 616 mA
[05]M Temperature T ? 5b40 ?

The RANGE column reports the full scale value of the measurement. The VALUE column reports some typical example measurements.

MailBox

`

The Mailbox is a single byte exchange.

With appropriate software running on the master and slave it can be used to create graceful runtime management protocols for use with the cluster control plane.

The Mailbox provides atomic byte exchange between a slave and master across the control bus. When a master has opened a pipe and connected with a slave, this command allows it to send a byte to the slave and receive the most recent byte issued by the slave (using the same command).

On either the slave or master, typical usage is issue XX` to send XX into the exchange. The exchange atomically replies YY as the reply data to the ` command. YY is the most recent value issued by other side of the exchange.

The mailbox command ` is idempotent so it may be issued multiple times. Subsequence issuance of the command is used simply to return the state. It would normally be used with a polling mechanism on each side of the mailbox. For example, to implement a management protocol analogous to UNIX Init, a set of master "mailbox tokens" could be defined:

TOK COMMAND MEANING
0 HALT Halt (shut down gracefully)
1 SYSTEM Boot into Single User Mode.
2 RUNLEVEL2 Boot to runlevel 2.
3 RUNLEVEL3 Boot to runlevel 3.
4 RUNLEVEL4 Boot to runlevel 4.
5 RUNLEVEL5 Boot to runlevel 5.
6 REBOOT Reboot.
     
10 STATUS Report current status.

These would be issued by the cluster manager (master) and received by the node (slave). A set of matching slave state tokens could be defined as:

TOK STATE MEANING
0 OFF Power is OFF (host may be disabled or unavailable, unknowable)
1 ENABLED Power is ON and host is available and powered has been applied (nothing else is known)
2 DISABLED Power in ON but host is disabled or host is unavailable
3 STOPPING Power is ON and one (or more) request(s) to power down have been received
4 STOPPED Power is ON and the host has acknowledged the power down
5 RUNNING Power is ON and host has reported it's running (more detail as yet unknown)
6 ERROR Error or unknown state.
     
11 SYSTEM Single User Mode.
12 RUNLEVEL2 Runlevel 2.
13 RUNLEVEL3 Runlevel 3.
14 RUNLEVEL4 Runlevel 4.
15 RUNLEVEL5 Runlevel 5.

The master can issue STATUS at any time to ask the slave for its state. When STOPPED the master knows it is safe to POWEROFF of the node. If the state never reaches STOPPED (either because the host has crashed or the host is not running software to report this to the BMC) after a (master defined) timeout, the master can POWEROFF the node.

SRAM

R W S

The SRAM is region of memory shared between a node and the cluster manager.

A master can use it to communicate arbitrary data with a slave. In this case the mailbox can serve as a semaphore mechanism. Alternatively, a slave may use SRAM to record information which the master may interrogate later. For example, a console debugging or logging port on the slave.

How SRAM is used is not defined in this document. How it is read and written is defined here.

The R, W and S commands are used to read, write and dump SRAM:

READ [08]@[YY]z[XX]sR Returns the value at address XXYY
  [08]@ Point to LSB of address.
  +r Returns the value at the next address XXYY + 1. Increments the LSB only.
WRITE [08]@[YY]z[XX]s[ZZ]W Write the value ZZ to the address XXYY
  [08]@ Point to LSB of address.
  [ZZ]+W Write the value ZZ to the next address XXYY + 1
DUMP [08]@[YY]z[XX]s[10]z[00]sS Dump 16 values starting at address XXYY.
  S Dump the next 16 values (starting at XXYY)

The + version of the R and W commands can be used in sequence read or write successive values in SRAM. The - may be used to decrement through SRAM if preferred.

EEPROM

r w

The EEPROM is used to store persistent state.

How it is used is defined in Configuration. How it is read and written is defined here. The r and w commands are used to read and write EEPROM:

READ [02]@[XX]sr Returns the value at address XX
  +r Returns the value at the next address XX + 1
WRITE [02]@[XX]s[YY]w Write the value YY to the address XX
  [ZZ]+w Write the valye ZZ to the next address XX + 1

The + version of these commands can be used in sequence read or write successive values in EEPROM. The - may be used to decrement through EEPROM if preferred.

Coefficients

C

The Coefficients C command is used to read the BMC calibration coefficients. An address precedes the issuing of the C command to select which coefficient to read:

CMD COEF PURPOSE
[00]C MUI0 Unique Identifier
:: ::  
[08]C MUI8  
[13]C TSRL2 Temperature indicator ADC reading at 90°C (low range setting)
[16]C TSHR2 Temperature indicator ADC reading at 90°C (high range setting)
[18]C FVRA1X ADC FVR1 Output voltage for 1x setting (in mV)
[19]C FVRA2X ADC FVR1 Output voltage for 2x setting (in mV)
[1a]C FVRA4X ADC FVR1 Output voltage for 4x setting (in mV)
[1b]C FVRC1X Comparator FVR2 output voltage for 1x setting (in mV)
[1c]C FVRC2X Comparator FVR2 output voltage for 2x setting (in mV)
[1d]C FVRC4X Comparator FVR2 output voltage for 4x setting (in mV)

They are useful to calibrate M results for voltage, current and temperature meaurements to better than default precision. This is not required for normal operation. A separate application note will be linked here with details about how to do this when available.

BMC Configuration

CB04A016

The EEPROM maintains persistent configuration parameters.

It is recommended that configuration changes be made when the node is powered down.

The configuration defines how the cluster operates at boot time and how it continues to operate in the event that it receives no communications from the host or (via the control bus) from the cluster manager. Any configuration parameter that has a value ff is unconfigured and ignored. If a parameter has a different value it will normally be used at boot time only to modify the operation of the BMC.

ADDR PARAMETER
7f Power Off. When assigned a value 55 the node will boot with the power disabled. Note: it is not possible to have default master (address 7c) to boot with power disabled (to do so would risk losing control of the blade on which the node is located if all the other nodes were also set to powered off at boot and the user did not have a cluster manager on the bus).
7e Address Override. When assigned, its value will be used to set the node ID in the cluster at boot time if the blade is stand-alone or in an ad-hoc cluster. Legal values range from 00 to 77 (120 nodes). If the blade is located in a BitScope Cluster the physical address of the blade in the cluster will override this.
7c Fan Scale. Overrides the factory default value of the SCALE parameter in FAN Control.
7b Fan Limit. Overrides the factory default value of the LIMIT parameter in FAN Control.
7a Fan Offset. Overrides the factory default value of the OFFSET parameter in FAN Control.
73 Close Pipe. If assigned (on a master) overrides the default end pipe character ^G.
72 No Fan. When assigned aa, upon the next boot, the node may no longer change the fan control parameters. The master may still modify the Fan paramters.
71 No SRAM. When assigned aa, upon the next boot, the node may no longer access SRAM. The master may still use SRAM (normally for hardware system logging which cannot be modified or updated by the local node).
70 Peon. When assigned aa, upon the next boot, the node will become a strict slave if the blade is part of BitScope Cluster. It cannot become master and it cannot change its own configuration parameters. Peon is ignored on a stand-alone blade or an ad-hoc cluster.
60 .. 67 Unlock. An eight byte unlock sequence. When the BMC sees the unlock sequence (from the host) the host channel is enabled. The default string is UnLockMe (case sensitive). It can be set to a different string (of up to 8 character), set to by-pass unlock (boot unlocked) or made permanently locked so the host cannot unlock it own BMC (for managed clusters) .

The r and w commands are used to read and write configuration parameters.

For example, to assert power off on boot for a node:

[02]@[7f]s[55]w

or to read the configured node address that may be assigned upon boot

[02]@[7e]sr

When the required changes have been made, the node BMC must be reboot for them to take effect. This is achieved by issuing the [55]! (reset command).

Issuing [55]! (reset) when Power Off is configured will power down the node!

A node may be Peon which means it does not have permission to become a cluster manager (Master) or to change its own configuration. This will be the case in a managed cluster because configuration is the exclusive domain of the cluster manager. Unless it has been locked, a node can always read its own configuration so it can know how it has been set up to run. To disable Peon, No SRAM or No Fan the master must change parameter value to a value other than aa.

If Power Off is configured, a node may only be powered on via the control bus by the cluster manager (i.e. a master node). If that master cannot reach the slave the slave cannot be powered on. This can occur if the master does not know the slave address, or it cannot reach the slave at that address (perhaps due to a misconfigured ad-hoc cluster where an assigned slave address collides with the address of another node in the cluster). To recover, power-on the blade stand-alone (i.e. not in a cluster) and repair the erroneous values on any affected nodes via the blade master (which always at address 7c when a blade is stand-alone).

Questions and Answers

Why isn't the BMC responding?

I've followed the connect and unlock procedures but the master node BMC does not respond.

Why not?

How do I check if it's working?

The recommended way to check the status if unknown is to issue the = command.

If there is no response the BMC is probably locked.

However, there is another possibility.

The BMC may be unlocked but the master node may have the console open which is connected to a slave node. The master is therefore talking to the slave node. If that slave is not responding for whatever reason, e.g. no system console enabled, operating system has crashed, then you will receive no response to any commands you send even though the master BMC is unlocked.

The same thing can happen if you select a non-existent slave node, usually by specifying an incorrect node address with the { or | commands. To escape from these situations send } and/or ^G to escape back to local mode and then issue = to confirm you have succeeded.

If there is no response to any of this, then there may be a serial I/O problem talking to the local BMC or there is a hardware fault on the control bus or BMC.

Have you re-checked the tips explained in the connect section?

Releases

CB04A020 Release Notes

Second production release.

  1. fixed 115k baud console.

    Previously the remote console feature occasionally dropped characters when a lot of data was sent from the remote host via the bus. This could occur even if the remote host was sending at precisely 115k baud. This problem is fixed and Cluster Blade can now accomodate abnormally high rates. The local host (master) may not support this however so the caveats still apply.

  2. unlock code is changed

    The previous (default) unlock sequence was BitScope. This has been changed to UnLockMe. This is just the factory default, it can be changed to whatever is required.

  3. relocated some API registers

    These changes made to simplfy the API and to accomodate changes made in support of (1). The body of this document has been updated accordingly.

CB04A019 Release Notes

Interim update (not released publicly).

  1. fixed fan idle algorithm.

    Previously, when a fan was idle, a low level "jitter noise" may have been emitted. This was audible for some people (those with particularly good hearing). Whether the noise was emitted depended on cpu load, the type of fan and the angular position of the fan when idle. This has been fixed. When idle the fan is completely silent.

  2. enabled fan when console open.

    Previously the built in fan control was not active when the console was opened. In this case the fan speed would not change until the console was close. Benign in most cases, it did mean that if one ran a heavy workload via the console, the fan would not increase speed until the console was again closed. While closing the console after issuing a command is normal practice in cluster management, this change means the fan speed now continues to be managed even when the console is open.

  3. enabled fan when master active.

    Similarly to (2) the fan speed is now controlled when a node is being used as the cluster manager. While it's not common to run a heavy workload on a node that is the cluster manager, it is now possible to do so and have the fan respond correctly.

CB04A018 Release Notes

Interim update (not released publicly).

  1. fixed fan control algorithm.

    The fan speed is driven by the node current draw, subject to scaling, and saturates at a point set by the fan control parameters. This worked for 02 (x1) and 03 (/2) but not for 00 (x4) and 01 (x2). In the latter cases, the fan failed to increase speed when current draw went beyond 1A (with 00) or 2A (with 01). While this could be worked around (with manual override) the fact that a node can draw up to 4A meant cooling may be insufficient in high load use-cases. Fixed.

CB04A017 Release Notes

Interim update (not released publicly).

  1. added open console with mute function

    The baud rate specified is now limited to the low nybble. The high nybble specifies how many characters the console must accept from the bus before sending data to the bus.

    This mechanism allows a bus master to "force feed" up to 15 bytes to a slave host (via its console) before the slave host is allowed to send data back to the bus (via its console).

    The canonic use-case is a host vomiting data (via the console) to the bus for which a ^C or ^D is required to stop it, but, because the (simplex) bus is full of console traffic, the master can no longer reliably transmit a character back (over the bus) to the slave (because of bus collisions). Collisions will cause all nodes to drop off the bus but reconnecting to the rogue slave results in the same problem (the slave will vommit onto the bus as soon as the console is reopened).

    By opening the console with mute, the master ensures it can send up to 15 characters to the slave via the bus before the slave has a chance to flood the bus with (more) junk. Upon receipt of the terminator (C or ^D) the slave host should stop vomiting allowing normal operation to proceed. If it does not, the slave has gone completely AWOL and nothing the master can do can retrieve it (except avoiding opening the console). In this case, a hard reboot (power cycle) may be the only option to recover (as the problem is within the slave host).

  2. removed extraneous LF from / and \.

    This was a bug in CB04A016. Now fixed.

  3. modified pipe to use | and { uniquely

    • When opened with | it is closed with the Close Pipe.

      This is intended for interactive use. The default Close Pipe is ^G. It may be configured to a different character (via Close Pipe). This may be necessary if ^G is used by a slave node (via the console) for some other purpose.

    • When opened with { it is always closed with }.

      This is intended for non-interactive use. Command strings are more readable using this mechanism. The closing character cannot be replaced (it is always }). This means it may not be appropriate for use with a console (if the host uses } for other purposes).

    In both cases, a closing character is returned when the pipe is close. Closing the pipe causes all attached nodes to detach. On the master the ^G or Close Pipe or } character will be returned (depending on how the pipe was opened and how it's configured). On the slave the ^G or Close Pipe character will be returned. In this case } will never be returned because the slave never opened a pipe.

    Normally there should only be two attached nodes (the master and slave). If there are others (which an error condition) they will detach when any master closes its pipe or any node attempts to become master. This means the bus is always recoverable. These (extraneous) nodes will return ^G or Close Pipe to their hosts unless they (also) had a pipe opened on the bus with { in which case } will be returned.

CB04A016 Release Notes

First Production Release.

  1. added the host console mechanism. The Console allows the master to talk with a slave host via the bus. The console command is ~. The value in the input register selects the baud rate as 00 115k 01 9k6 02 19k2 03 57k6. The 115k baud rate does not work reliably for strings longer than a dozen characters (due to bus overrun). This may be fixed in a future revision (with fifos). Baud rates 01, 02 and 03 are recommended until then.
  2. defined unlock default string BitScope
  3. added echo squash for console use. When an open pipe sees the console command '~' it enables squashing. When squashing, characters sent by the pipe are not echoed. The exception is end pipe (an address) which must echo (to end the pipe). This mechanism is added to enable the use of a console.
  4. modified end pipe mechanism. The Close Pipe character (^G by default) is no longer sent (or received) over the bus. When received (from the host, on the master) the master puts its own address on the bus instead. All nodes (including the master) detach from the bus upon receipt of an(y) address. The (releasing) master node does not (re)attach (itself) despite its own address appearing on the bus. Any consoles (if running on any slaves) also terminate and detach from the bus. No character received by a console (from its host) can cause the console to terminate. All 7 bit characters (except the Close Pipe) are sent to a slave's host (via the console).
  5. modified so high bit set characters are dropped Only 7 bit ASCII is legal across the bus. High bit set values are reserved for address selection on the bus.
  6. removed Close Pipe command (i.e. * is no longer used or required)
  7. updated pipe open can use | as well as { They are equivalent to each other to open the pipe. They close differently.
  8. removed / \ reply character payload Use state command = to learn state

CB04A015 Release Notes

Second beta release.

  1. added serial unlock mechanism.

    The default unlock word is BitScope.

CB04A014 Release Notes

First beta release.

  1. added peon mode.

    A node may be Peon which means it does not have permission to become a cluster manager (Master) or to change its own configuration. This will be the case in a managed cluster by default because configuration is the exclusive domain of the cluster manager. Asserting Peon Mode in an unmanaged cluster achieves the same result. A node can always read its own configuration so it can know how it has been set up to run. To disable Peon, No SRAM and/or No Fan the master must change parameter value to something other than aa.

  2. fixed serial error reporting.

    It is sometimes possible for a host to overrun or experience other low level serial communications errors when talking to the BMC. This can occur if the VM protocol is violated by a user operating interactively or by bugs in programs talking to the BMC. Such errors are non-fatal but they need to be reported (for diagnostic purposes). This update fixes the reporting of these errors as documented here.

  3. relocated measurement, cooling and calibration commands.

    These commands were previously implemented as vectors. They now exist as commands which makes them easier to use interactively.

  4. fixed bus recovery mechanism.

    It was previously possible to lose control of the bus when illegal characters were sent to the BMC. This is no longer possible.

  5. added address override mechanism.

    It is now possible to relocate the address of nodes other than by the hardware mechanism built into BitScope Clusters. The new address is store in FLASH. This allows the creation of ad-hoc clusters comprising more than one Blade without requring the use of hardware addressing.

  6. protected the reset command.

    The reset command ! now requires a uniqu vector 55 to be specified before it does anything. This change make inadvertent use of reset unlikley. Reset remains a benign operation (from the host's point of view) unless boot with power off is enabled (in which case the node will power off - caveat emptor).

  7. added boot with power off feature.

    It is now possible to configure a node to remain powered off when blade power is applied. The makes it possible for a cluster to be configured to automaticlly power on only those nodes that are enabled to do so when power is applied to the cluster. That is, any "boot with powered off" node must be powered on by the cluster manager before use.

Impressum

Online docs.bitscope.com/control-plane
Author BitScope Products <products@bitscope.com>
Copyright © 2020-2023 MetaChip Pty. Ltd. T/A BitScope Designs. All Rights Reserved.
License Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Feedback We welcome your feedback. If you find any errors, omissions or just information that is confusing or incorrect, please email us at feedback@bitscope.com with details.
Permalink docs.bitscope.com/BRL23E7E
Date [2023-01-26 Thu 13:49]
Rev w8tt2du8jl22fgx2