BitScope Control Plane
Table of Contents
A cluster control plane is the sub-system that allows a cluster manager (i.e. a computer) to manage operation of the cluster as a whole. Most importantly, it enables the remote management of the power and cooling of each node individually. The control plane in BitScope Clusters also allows real-time monitoring of key system parameters including power supply voltage and current and the current and temperature of every node. It provides out-of-band communication channels to share data between nodes and/or for the manager to access nodes without using the primary network e.g. in the event of network failure or when nodes do not have network access. The control plane can perform graceful power-down, error recovery, remote logging and debugging. It uses the BIOS in the BMC built into Cluster Blade to access nodes down to the firmware, bootloader and kernel level. It supports remote access to system console on each node which is vital for cluster management down to the metal and for systems or kernel layer software development or debugging.
Connect
(open BMC connection)
Each BMC is accessed by its host (i.e. the Raspberry Pi) via the primary serial port. This serial port appears on GPIO 14 & 15 but the signals are (internally) connected to the BMC so you don't need to connect anything to these pins to use the BMC.
The device name used to access the serial port depends on the bootloader or operating
system. For Raspberry Pi O/S it is usually named /dev/serial0
. When using the BMC this
serial port must not also be used for other purposes such as the system console or a
connected HAT. The BMC is locked by default to prevent accidental usage collisions of this
sort.
The serial port on the Raspberry Pi must be enabled (read more).
Do not enable the system console on any host to be used as a cluster manager. If the console is enabled, you may still reach the BMC but characters will be lost and for practical purposes, the BMC will be unusable (from the host, it's always accessible via the control bus).
When the serial port is enabled (and the console disabled) the BMC may be accessed interactively with a terminal program (e.g. GTKTerm or GNU Screen):
open the serial port at
115200
baud with no flow control,$ screen /dev/serial0 115200 # screen example
- unlock the BMC (or disable the lock) and
- start using the BMC.
All the commands described in the remainder of this manual assume you have the terminal open on the node you are using as the cluster manager.
- If no characters are echoed as you type, be sure you have unlocked the BMC correctly.
- If you see garbled characters or missing characters (A) check you have opened the
correct serial port
/dev/serial0
at (B) the correct baud rate or that (C) that the system console is not enabled (D) there is not another process that has also opened the serial port or (E) there is no HAT or other device connected to GPIO 14 and 15. - You may prefer to configure your terminal to perform local <CR> echo upon receipt of <LF> for a prettier display (like many devices, the BMC uses only <LF> characters).
Unlock
(unlock BMC for use)
Upon BMC power-on or reset the serial port to the host is locked. No commands or characters are accepted or echoed when the port is locked. Locking the BMC ensures that if the host's bootloader, firmware or the kernel or a user program boots assuming it has control of the serial port its use of the serial port will not collide with communications with the BMC.
Unlocking simply requires a special sequence of characters to be sent to the BMC.
The default unlock sequence is UnLockMe
.
In the unlikely event that this default sequence is emitted by a host in a given deployment, it can be changed to a different sequence or the BMC can be locked off altogether.
Locking is configurable in four ways:
TYPE | DESCRIPTION |
---|---|
LOCKED |
The node cannot unlock its own BMC. Only the cluster manager can unlock the BMC (via the control bus). This type is typically used in managed or untrusted cluster. |
ANONYMOUS |
The receipt of up to 8 characters will unlock the BMC (8 characters is the default). |
PATTERN |
The receipt of up to 8 user specified characters will unlock the BMC. |
UNLOCKED |
The BMC is automatically unlocked immediately upon boot or reset. |
Unlock is configured using the Unlock configuration parameter. It is an 8 character string. After boot or reset, the host must send the correct character sequence to unlock the BMC. The character codes (in hex) available to use by Unlock are as follows:
01 ~ 7f |
A literal character code that must be matched. Used to create a PATTERN sequence. Can be shorten from 8 characters by terminating with 00 . |
ff |
The wildcard code which matches any incoming character. Used to create ANONYMOUS or PATTERN sequences that include wildcards. |
80 ~ fe |
A blocking code that matches no incoming character. If this code appears anywhere in the sequence the BMC is LOCKED |
00 |
Acceptance code that terminates the matching sequence and accepts. If it's the first character in the sequence the BMC UNLOCKED upon boot. |
Some example sequences might be:
42697453636f7065 |
The literal string BitScope must be received to unlock the BMC. |
ffffffffffffffff |
Any 8 character string will unlock the BMC. |
00ffffffffffffff |
The BMC will reboot/reset unlocked. |
ff00ffffffffffff |
Any single character will unlock the BMC. |
feffffffffffffff |
Nothing can unlock the BMC. |
48656c6c6f2100ff |
The literal string Hello! will unlock the BMC. |
BMC Status
(report BMC status)
The =
command reports BMC Status. It is the command most frequently used interactively. It
reports the status of the local or a remote BMC. It returns:
ID MS XX YY ZZ |
where
FIELD | VALUES | MEANING |
---|---|---|
ID |
00 to 7f |
Node address in the cluster. Added in CB04A016 . |
MS |
00 or ff |
Node status (00 master ff slave). Added in CB04A016 . |
XX |
0 1 2 |
Power State (defined here) |
YY |
U8 | Current Draw (8 bit copy of the 04M current measurement defined here) |
ZZ |
U8 | Fan Speed as calculated (defined here). |
This command is always available. It is useful to quickly discover whether the BMC is working locally or remotely. When used remotely, it reports which remote node is attached. In all cases it reports the prevailing processing load via the proxies of current load and fan speed
Command Pipe
(become cluster master)
The Command Pipe is the mechanism by which a node becomes the bus master. When active, the pipe passes commands from the master (the cluster manager) to a slave (a managed node, via the control bus). Replies are returned from the slave back to the master. The pipe may be opened for interactive or non-interactive use. Either method achieves the same result; the master connects with a slave over the bus and all commands thereafter executed on the slave.
The interractive pipe commands are:
| | Open Pipe to forward commands to the bus and received replies from the bus. The local BMC is not accessible when the pipe is open. |
^G |
Close Pipe to return control to the local BMC. Only meaningful when the pipe was opened with | and is ignored otherwise. This character may be configured to be something other than ^G . |
The non-interractive pipe commands are:
{ |
Open Pipe to forward commands to the bus and received replies from the bus. The local BMC is not accessible when the pipe is open. |
} |
Close Pipe to return control to the local BMC. Only meaningful when the pipe was opened with { and is ignored otherwise. |
Talking to the bus on its own is not very useful. Normally, when opening the pipe a slave
address is specified. For example, to open the slave at address 32 the command sequence is
[20]|
. The addressed slave will attach to the bus to become the target for all commands
issued by the master until the pipe is closed.
The addressed slave must exist. Confirmation that pipe has been successfully opened and
the addressed slave is responding can be discovered by issuing the #
command. The slave
will respond with its UUID
which will be different to the master UUID
. The status command
=
may also be used to determine the address of the attached slave and the fact that it is
a slave.
Once connection is established all commands, except ^G
(or }
if opened with {
or *
if
using the beta edition) issued by the master are received by the slave and all replies
from the slave including reply data are returned to the master. In this way the master as
full control of the slave until it sends ^G
or }
or *
to close the pipe.
If the selected slave does not exist the pipe will still connect to the bus. Any command
then issued (unless the console is opened with ~
) will be echoed but no command replies
will appear. For example the #
will not respond with a UUID
. However, the echoed #
command
itself confirms the pipe is open and the bus is working (this is effectively a loopback
test of the bus).
Power Control
(power up/down nodes)
The primary function of a cluster control plane is to manage node power. The /
and \
commands are available to do this. These commands are ignored if sent to the local BMC.
They are only useful to manage power on a slave node issued from a master via the control
bus using the command pipe bus mastership mechanism.
CMD | COMMAND | MEANING |
---|---|---|
/ |
POWERON | Turn power on (starts node) |
\ |
POWEROFF | Turn power off (hard stop) |
These commands update the power state:
TOK | STATE | MEANING |
---|---|---|
0 |
OFF | Power is OFF (host may be disabled or unavailable) |
1 |
ENABLED | Power is ON (host is available and power has been applied) |
2 |
DISABLED | Power in ON but power has been manually disabled |
The prevailing power state is reported by the status command =
.
Upon receipt of the POWEROFF command \
the slave immediately powers off (state 0
).
The \
command is a hard power down. If the slave node being powered off needs to perform
housekeeping before power is removed, it's up to higher layer procotols (used to manage
the cluster) to ensure this happens. If the cluster nodes do not have other means to
communicate (to do this), the BMC provides a mailbox to enable protocols of this type to
be built. Alternatively, the console (if enabled on the slave) can be used. The BMC does
not otherwise know or define how to perform a graceful power down.
When the /
command is issued, the resulting state may be 1
or 2
. If 1
the node has
successfully powered on. If 2
the power has been enabled but the node will not power on
because the power override shunt has been applied (i.e. it's physically connected to the
node) or the node may be unoccupied by a host.
The override shunt is intended as a mechanism to clamp a node from powering on regardless of the BMC. It's useful when installing or commissioning new systems or when the cluster control plane is not being used.
If using a beta edition, the /
and \
commands echo the next state 1
and 0
. In production
editions the state is not reported. Use the status command =
to determine the state
instead.
Node Cooling
(manage node cooling)
The Fan F
command is used to monitor and control the cooling fan. An opcode
is specified
upon issuing F
to select whether to report or update the fan parameters.
OPCODE | FUNCTION |
---|---|
00 |
Report Fan Parameters |
01 |
Update and Report Fan Parameters |
There are three fan parameters specified via parameter registers:
FIELD | PARAMETER | REG | TYPE | MEANING |
---|---|---|---|---|
XX |
OFFSET |
vmFanLo |
Q8 | Lower bound to fan speed. |
YY |
LIMIT |
vmFanHi |
Q8 | Upper bound to fan speed. |
ZZ |
SCALE |
vmFanGain |
U8 | Scale applied to current measurement to calculate the fan speed. |
Upon execution of F
the command returns:
XX YY ZZ VV |
It does this regardless of which opcode
is selected. The VV
field is the calculated Q8 fan
speed value. If opcode [01]F
is used the values in the registers are used to update the
parameters prior to returning them. The reported VV
is the value prevailing before the
parameters changed. Use [00]F
a short time later to see the updated VV
value (it takes a
short time to update).
BMC fan control operates by measuring the current draw of the node and calculating a fan
speed required to dissipate the heat generated. This is the same current measurement as
reported by the =
command.
The SCALE
parameter applies a scale FACTOR
(specified via TOKEN
) to scale the current
value applied to the fan speed algorithm:
TOKEN | FACTOR |
---|---|
00 |
4 |
01 |
2 |
02 |
1 |
03 |
1/2 |
This scaled value is then added to OFFSET
and limited to LIMIT
before being applied to
control the fan.
Console Port
(open a console)
The Console command ~
allows a master to communicate with a slave host via the slave's
console port over the control bus. This enables full remote access to the host at the
firmware, bootloader, kernel and user layer. This is ideal for system adminstration
purposes or software development at the firmware, kernel, system or user level on any host
on the cluster. To use the console, the master opens a pipe (to a selected slave) and then
opens the console (to the selected slave's host). For example:
[7e] | [03]~ |
which opens a pipe to node 7e
and then the console (on node 7e
) at a baud rate of 57k6
.
From that point on, all characters sent by the master are forwarded to the slave's host
via its console and all characters sent from the slave host via the console are returned
to the master. The console is terminated upon closing the pipe. The available baud rates
are:
CMD | DESCRIPTION |
---|---|
[00]~ |
Open console at 115,200 |
[01]~ |
Open console at 9,200 |
[02]~ |
Open console at 19,200 |
[03]~ |
Open console at 57,600 |
Each command opens the console for bi-directional traffic immediately at the selected baud rate.
The console supports all standard 7-bit ASCII characters. That is, characters with code
values from 0x01
to 0x7f
. The NULL character 0x00
and extended ASCII or UNICODE characters
are ignored.
Console Baud Rates
Baud rates up to 115,200
are supported. However, the console port on most hosts including
Raspberry Pi is not subject to flow control. This means that if a large number of
characters are sent from the host back-to-back via the console over the bus, overrun (i.e.
missed characters) may occur. Whether this happens depends on what you're talking to (e.g.
a bootloader, kernel or user process) on the remote host and what its actual baud rate is.
Some hosts in some modes may transmit at a slightly higher rate than 115k
baud (it depends
on how the baud rate is generated). While Cluster Blade is designed to accomodate these
abnormally higher rates, the Raspberry Pi you're using as the master node may not. If you
are experiencing problems we recommend configuring your remote host to use one of the
lower supported baud rates. For example, at 57k6
overrun is impossible. Note: when using
an external Bus Controller, this problem will not occur (the controller is fast enough).
Open Console Mute
CB04A020
It is possible to open the console as mute. This means the console will wait to receive a certain number of characters before it sends any characters back to the master. This is useful for error recovery when the remote host has "gone rogue".
The command to do this is the same as opening the console but with an additional specifier that tells the console how many characters to receive before enabling transmit. The high digit of the baud rate specifier is used for this as follows:
CMD | DESCRIPTION |
---|---|
[20]~ |
Open console at 115,200, start sending after receiving 2 characters |
[11]~ |
Open console at 9,900, start sending after receiving 1 character |
[83]~ |
Open console at 57,600, start sending after receiving 8 characters |
Up to 15 "mute echo characters" can be sent this way. This is usually more than enough to recover most bootloaders and operating systems.
Remote Host Error Recovery
When the console is open, a runaway process or misbehaving command (on the remote host)
may send unlimited and arbitrary data back to the master. When used interactively the
usual way to resolve this is to issue an interrupt ^C
or termination ^D
character to the
runaway process to stop it. A problem may arise if the remote host sends so much data so
fast that the master cannot get a termination character sent across the bus to stop it.
The solution is simple.
End the pipe (which closes the console), reconnect and reopen the console mute. At this
point the console is open but the remote host cannot send any characters. Send the
necessary termination (e.g. ^C
) and then resume normal operation.
Nuclear Option.
It may be that the remote host has gone completely rogue and will not respond to any termination characters sent to it. In this case you still have the Nuclear Option which is to (A) close the pipe (B) reconnect to the slave and (C) power down or repower the host to (hard) reboot it or return to the bootloader (if it's been configured on that host).
Only recommended if you've completely lost control (hard reboots can corrupt disk images).
Programming Interface
CB04A016
The BitScope Control Plane is an extension of the BitScope I/O System. It is similar to the BitScope Virtual Machine but designed to manage clusters not general purpose test and measurement. Like all BitScope VM it uses a set of registers and commands accessed via a simple serial protocol.
Commands
CB04A016
CMD | GROUP | ACTION |
---|---|---|
? |
ID | Print TID (type identifier) aka the firmware revision. Use this command to determine the BMC revision (e.g. CB04A016 ). Use the revision to determine which other commands are available in the BMC. |
# |
ID | Print UUID (universally unique identifier). Use UUID as a key to access information (e.g. in a database describing a particular cluster deployment) for each physical node. |
[ |
Entry | Commence data entry. Clears vmInput . This command is optional but it is shown used in this document for clarity. |
] |
Entry | Conclude data entry. May push vmInput onto the stack. This command is optional but it is shown used in this document for clarity. |
0..9 |
Entry | Increment vmInput by the digit specified and left-shift. |
a..f |
Entry | Increment vmInput by the hex digit specified and left-shift. |
@ |
Register | Set Address Register vmPointer . |
+ |
Register | Increment Value Indirect via vmPointer . |
- |
Register | Decrement Value Indirect via vmPointer . |
s |
Register | Store the value in vmInput to register vmPointer . |
p |
Register | Print register at vmPointer . |
n |
Register | Increment vmPointer . |
z |
Register | Store the value in vmInput to register vmPointer and post-increment. |
. |
Execute | End of Sentence (EOS). Context released. NOP. |
! |
Execute | Soft reset and terminate active operation. Requires vmInput to have the value 55 (to avoid accidental execution). |
{ |
Bus | Open a non-interactive pipe on the control bus to communicate with a (slave) node. The selected node is specified via vmInput prior to issuing this command. |
} |
Bus | Close an open non-interactive pipe. The selected slave detaches from the bus. |
| | Bus | Open an interactive pipe on the control bus to communicate with a (slave) node. The node is selected via vmInput . Command added in CB04A016 . |
^G |
Bus | Close an open interactive pipe. The selected slave detaches from the bus. If the slave's console is open, it will be closed. The character (^G ) can be configured to be a different character (e.g. ^D) . Command added in CB04A016 . Use the * in earlier editions. |
~ |
Console | Open the console on a slave at a baud rate specified by vmInput (00 115k 01 9k6 02 19k2 03 56k7). The master must first open a pipe to the slave. The console is closed when the pipe (on the master) is closed. Command was added in CB04A016 . |
/ |
Power | Turn power on (which starts node boot). |
\ |
Power | Turn power off (hard stop). |
= |
Power | Request status (what are you doing?). |
R |
Memory | Read SRAM. |
W |
Memory | Write SRAM. |
S |
Memory | Dump SRAM. |
r |
Memory | Read from EEPROM. |
w |
Memory | Write to EEPROM. |
M |
Module | Read an A/D channel and return as U16. |
F |
Module | Read and/or Update FAN Control. |
C |
Module | Machine Calibration Coefficients. Use the C command in beta editions. |
` |
Comms | Mailbox Exchange. |
Registers
CB04A020
REGISTER | ADDR | SIZE | DESCRIPTION |
---|---|---|---|
vmInput |
00 |
1 | Input Register |
vmPointer |
01 |
1 | Address Register (VMR) |
vmStore |
02 |
1 | Address Register (EEPROM) |
vmAddress |
08 |
2 | Address Register (RAM) |
vmCount |
0a |
2 | Dump Size (RAM) |
vmFanLo |
10 |
1 | Fan Low Speed Limit |
vmFanHi |
11 |
1 | Fan High Speed Limit |
vmFanGain |
12 |
1 | Fan Gain Factor |
vmSlave |
1c |
1 | Slave Status. |
vmError |
1d |
1 | Error Status. |
vmStation |
1e |
1 | Logical Node Address (live). |
vmIdent |
1f |
1 | Physical Node Address (Hardware assigned). |
Station
vmStation
vmIdent
Each node has a unique (geographical) location (within a cluster) at which it may be
accessed. This location is referred to as the node's station
(or ID
in shorthand). Its
default value is assigned in hardware. Literally it is address of the physical location of
the node in the cluster. Its value may be modified by a configuration setting under
certain circumstances.
Two registers report the ID
for each node's BMC:
REGISTER | ADDR | SIZE | DESCRIPTION |
---|---|---|---|
vmStation |
1e |
1 | Logical Node ID (Live). |
vmIdent |
1f |
1 | Physical Node ID (Hardware assigned). |
The vmStation
is assigned the value of vmIdent
at boot time unless (re)configured:
ADDR | PARAMETER |
---|---|
7e |
Location Override. When assigned, its value will be used to set the node ID in the cluster at boot time if the blade is stand-alone or in an ad-hoc cluster. Legal values range from 00 to 77 (120 nodes). If the blade is located in a BitScope Cluster the physical location of the blade in the cluster will override this. |
In this case the vmStation
will be assigned the value of the configuration variable 7e
,
unless the blade is mounted in a BitScope Cluster, in which case the vmIndent
value
applies which cannot be overriden.
The BMC detects whether the Blade is physically located is part of a BitScope Cluster by
checking the node ID
. When a Blade is not part of the cluster, the ID
for each node is
virtual and assigned the following defaults:
N | ID | DESCRIPTION |
---|---|---|
1 | 7c |
First and (default) master node. HDMI and USB are accessible from this node. |
2 | 7d |
Second node on the Blade. |
3 | 7e |
Third node on the Blade. |
4 | 7f |
Fourth node on the Blade. |
Where N
is an arbitrary node number within a cluster and ID
is the station
, i.e. the
node's BMC address on the control plane. When a Blade is part of a cluster the ID
will be
automatically set to be different to these defaults. The ID
for each node depends on the
physical cluster and is immutable based on how the hardware has been commissioned.
When the BMC detects a node with one of the default ID
listed above, it knows the blade is
operating stand-alone. In this case, the default ID
may be used as is and this is
recommended.
When a blade is part of an ad-hoc cluster (i.e. custom cluster built without BitScope's
cluster infrastructure) which comprises more than one blade, the ID
of the nodes in the
blades must be configured to be different to each other to avoid goegraphic address (i.e.
ID
) collisions on the bus. If you don't do this, all blades will locate their nodes at the
same four addresses on the bus; 7c
, 7d
, 7e
and 7f
. This of course results in collision.
This is an ad-hoc cluster commissioning responsibility which should be undertaken, one at
a time, on each Blade before the blades are interconnected into a single cluster. This
need only be done once and only when building ad-hoc clusters.
Errors
vmError
There are a range of potential error conditions. The set of errors that are detected are:
ERRNO | VALUE | MEANING |
---|---|---|
0 | 00 |
No Error. Situation Normal. |
-1 | ff |
Host frame error (Break). |
-2 | fe |
Bus frame error (Break). |
-3 | fd |
Host channel overrun. |
-4 | fc |
Host pipe overrun. |
-5 | fb |
Bus pipe overrun. |
-6 | fa |
Bus channel overrun. |
Any error is reported via the vmError
register.
REGISTER | ADDR | SIZE | DESCRIPTION |
---|---|---|---|
vmError |
1d |
1 | Error Status. |
The vmError
value must be manually reset to 00
if you seek to detect whether any error has
occured. However, none of these errors are serious so all can be ignored (simply retry the
operation if it did not succeed the first time). Note: errors prior to the one reported
may have occured. However, such errors are undetectable (only the most recent error is
reported).
Reset
!
The BMC can be reset. This is not recommended in normal usage but it can be useful to
reload configuration parameters (perhaps after changing them) or to recover normal
operation when uncertain as to the state of the BMC. The !
command activates reset.
However, as a safety precaution, it will only fire if the value 55
is loaded to the
vmInput
register (i.e. issue [55]!
as a command sequence).
If the Power Off configuration parameter is set, issuing [55]!
will power-off the node.
BMC Modules
CB04A016
Modules are sub-systems of the BMC. They are accessed and executed via one or more
commands acting on zero or more registers. The commands and registers used depend on which
module is being used. The modules implemented in BMC CB04A016
(and later editions) are:
MODULE | COMMANDS | DESCRIPTION |
---|---|---|
METER |
M |
A/D Module (Read any A/D channel and return as U16) |
MAILBOX |
` |
Token Exchange. Used to implement power control protocols, between the cluster manager and a cluster node. |
SRAM |
R W S |
Shared Memory. Use is application specific. |
EEPROM |
r w |
Persistent Memory. Used for Device Configuration and Boot-Time program execution. |
CAL |
C |
CAL Memory (Machine Calibration Coefficients) |
Measurements
M
The Meter M
command is used to measure voltages, currents and temperature. An opcode
is
specified upon issing M
to select which signal measure. The CODE
returned as an unsigned
Q16 with the following meanings:
CMD | SIGNAL | UNIT | RANGE | CODE | VALUE |
---|---|---|---|---|---|
[00]M |
Ground (VDD) | V | 4.096 | 0040 | 4 mV |
[01]M |
RAW Current | A | 4.096 | 2100 | 528 mA |
[02]M |
RAW Voltage | V | 45.056 | 8740 | 23.77 V |
[03]M |
VRef 2.048V | V | 4.096 | 80c0 | 2.060 V |
[04]M |
Node Current | A | 4.096 | 2680 | 616 mA |
[05]M |
Temperature | T | ? | 5b40 | ? |
The RANGE
column reports the full scale value of the measurement. The VALUE
column reports
some typical example measurements.
MailBox
`
The Mailbox is a single byte exchange.
With appropriate software running on the master and slave it can be used to create graceful runtime management protocols for use with the cluster control plane.
The Mailbox provides atomic byte exchange between a slave and master across the control bus. When a master has opened a pipe and connected with a slave, this command allows it to send a byte to the slave and receive the most recent byte issued by the slave (using the same command).
On either the slave or master, typical usage is issue XX`
to send XX
into the exchange.
The exchange atomically replies YY
as the reply data to the `
command. YY
is the most
recent value issued by other side of the exchange.
The mailbox command `
is idempotent so it may be issued multiple times. Subsequence
issuance of the command is used simply to return the state. It would normally be used with
a polling mechanism on each side of the mailbox. For example, to implement a management
protocol analogous to UNIX Init, a set of master "mailbox tokens" could be defined:
TOK | COMMAND | MEANING |
---|---|---|
0 |
HALT | Halt (shut down gracefully) |
1 |
SYSTEM | Boot into Single User Mode. |
2 |
RUNLEVEL2 | Boot to runlevel 2. |
3 |
RUNLEVEL3 | Boot to runlevel 3. |
4 |
RUNLEVEL4 | Boot to runlevel 4. |
5 |
RUNLEVEL5 | Boot to runlevel 5. |
6 |
REBOOT | Reboot. |
10 |
STATUS | Report current status. |
These would be issued by the cluster manager (master) and received by the node (slave). A set of matching slave state tokens could be defined as:
TOK | STATE | MEANING |
---|---|---|
0 |
OFF | Power is OFF (host may be disabled or unavailable, unknowable) |
1 |
ENABLED | Power is ON and host is available and powered has been applied (nothing else is known) |
2 |
DISABLED | Power in ON but host is disabled or host is unavailable |
3 |
STOPPING | Power is ON and one (or more) request(s) to power down have been received |
4 |
STOPPED | Power is ON and the host has acknowledged the power down |
5 |
RUNNING | Power is ON and host has reported it's running (more detail as yet unknown) |
6 |
ERROR | Error or unknown state. |
11 |
SYSTEM | Single User Mode. |
12 |
RUNLEVEL2 | Runlevel 2. |
13 |
RUNLEVEL3 | Runlevel 3. |
14 |
RUNLEVEL4 | Runlevel 4. |
15 |
RUNLEVEL5 | Runlevel 5. |
The master can issue STATUS
at any time to ask the slave for its state. When STOPPED
the
master knows it is safe to POWEROFF
of the node. If the state never reaches STOPPED
(either because the host has crashed or the host is not running software to report this to
the BMC) after a (master defined) timeout, the master can POWEROFF
the node.
SRAM
R
W
S
The SRAM is region of memory shared between a node and the cluster manager.
A master can use it to communicate arbitrary data with a slave. In this case the mailbox can serve as a semaphore mechanism. Alternatively, a slave may use SRAM to record information which the master may interrogate later. For example, a console debugging or logging port on the slave.
How SRAM is used is not defined in this document. How it is read and written is defined here.
The R
, W
and S
commands are used to read, write and dump SRAM:
READ | [08]@[YY]z[XX]sR |
Returns the value at address XXYY |
[08]@ |
Point to LSB of address. | |
+r |
Returns the value at the next address XXYY + 1 . Increments the LSB only. |
|
WRITE | [08]@[YY]z[XX]s[ZZ]W |
Write the value ZZ to the address XXYY |
[08]@ |
Point to LSB of address. | |
[ZZ]+W |
Write the value ZZ to the next address XXYY + 1 |
|
DUMP | [08]@[YY]z[XX]s[10]z[00]sS |
Dump 16 values starting at address XXYY . |
S |
Dump the next 16 values (starting at XXYY ) |
The +
version of the R
and W
commands can be used in sequence read or write successive
values in SRAM. The -
may be used to decrement through SRAM if preferred.
EEPROM
r
w
The EEPROM is used to store persistent state.
How it is used is defined in Configuration. How it is read and written is defined here.
The r
and w
commands are used to read and write EEPROM:
READ | [02]@[XX]sr |
Returns the value at address XX |
+r |
Returns the value at the next address XX + 1 |
|
WRITE | [02]@[XX]s[YY]w |
Write the value YY to the address XX |
[ZZ]+w |
Write the valye ZZ to the next address XX + 1 |
The +
version of these commands can be used in sequence read or write successive values in
EEPROM. The -
may be used to decrement through EEPROM if preferred.
Coefficients
C
The Coefficients C
command is used to read the BMC calibration coefficients. An address
precedes the issuing of the C
command to select which coefficient to read:
CMD | COEF | PURPOSE |
---|---|---|
[00]C |
MUI0 |
Unique Identifier |
:: | :: | |
[08]C |
MUI8 |
|
[13]C |
TSRL2 |
Temperature indicator ADC reading at 90°C (low range setting) |
[16]C |
TSHR2 |
Temperature indicator ADC reading at 90°C (high range setting) |
[18]C |
FVRA1X |
ADC FVR1 Output voltage for 1x setting (in mV) |
[19]C |
FVRA2X |
ADC FVR1 Output voltage for 2x setting (in mV) |
[1a]C |
FVRA4X |
ADC FVR1 Output voltage for 4x setting (in mV) |
[1b]C |
FVRC1X |
Comparator FVR2 output voltage for 1x setting (in mV) |
[1c]C |
FVRC2X |
Comparator FVR2 output voltage for 2x setting (in mV) |
[1d]C |
FVRC4X |
Comparator FVR2 output voltage for 4x setting (in mV) |
They are useful to calibrate M
results for voltage, current and temperature meaurements to
better than default precision. This is not required for normal operation. A separate
application note will be linked here with details about how to do this when available.
BMC Configuration
CB04A016
The EEPROM maintains persistent configuration parameters.
It is recommended that configuration changes be made when the node is powered down.
The configuration defines how the cluster operates at boot time and how it continues to
operate in the event that it receives no communications from the host or (via the control
bus) from the cluster manager. Any configuration parameter that has a value ff
is
unconfigured and ignored. If a parameter has a different value it will normally be used at
boot time only to modify the operation of the BMC.
ADDR | PARAMETER |
---|---|
7f |
Power Off. When assigned a value 55 the node will boot with the power disabled. Note: it is not possible to have default master (address 7c ) to boot with power disabled (to do so would risk losing control of the blade on which the node is located if all the other nodes were also set to powered off at boot and the user did not have a cluster manager on the bus). |
7e |
Address Override. When assigned, its value will be used to set the node ID in the cluster at boot time if the blade is stand-alone or in an ad-hoc cluster. Legal values range from 00 to 77 (120 nodes). If the blade is located in a BitScope Cluster the physical address of the blade in the cluster will override this. |
7c |
Fan Scale. Overrides the factory default value of the SCALE parameter in FAN Control. |
7b |
Fan Limit. Overrides the factory default value of the LIMIT parameter in FAN Control. |
7a |
Fan Offset. Overrides the factory default value of the OFFSET parameter in FAN Control. |
73 |
Close Pipe. If assigned (on a master) overrides the default end pipe character ^G . |
72 |
No Fan. When assigned aa , upon the next boot, the node may no longer change the fan control parameters. The master may still modify the Fan paramters. |
71 |
No SRAM. When assigned aa , upon the next boot, the node may no longer access SRAM. The master may still use SRAM (normally for hardware system logging which cannot be modified or updated by the local node). |
70 |
Peon. When assigned aa , upon the next boot, the node will become a strict slave if the blade is part of BitScope Cluster. It cannot become master and it cannot change its own configuration parameters. Peon is ignored on a stand-alone blade or an ad-hoc cluster. |
60 .. 67 |
Unlock. An eight byte unlock sequence. When the BMC sees the unlock sequence (from the host) the host channel is enabled. The default string is UnLockMe (case sensitive). It can be set to a different string (of up to 8 character), set to by-pass unlock (boot unlocked) or made permanently locked so the host cannot unlock it own BMC (for managed clusters) . |
The r
and w
commands are used to read and write configuration parameters.
For example, to assert power off on boot for a node:
[02]@[7f]s[55]w
or to read the configured node address that may be assigned upon boot
[02]@[7e]sr
When the required changes have been made, the node BMC must be reboot for them to take
effect. This is achieved by issuing the [55]!
(reset command).
Issuing [55]!
(reset) when Power Off is configured will power down the node!
A node may be Peon which means it does not have permission to become a cluster manager
(Master) or to change its own configuration. This will be the case in a managed cluster
because configuration is the exclusive domain of the cluster manager. Unless it has been
locked, a node can always read its own configuration so it can know how it has been set up
to run. To disable Peon, No SRAM or No Fan the master must change parameter value to a
value other than aa
.
If Power Off is configured, a node may only be powered on via the control bus by the
cluster manager (i.e. a master node). If that master cannot reach the slave the slave
cannot be powered on. This can occur if the master does not know the slave address, or it
cannot reach the slave at that address (perhaps due to a misconfigured ad-hoc cluster
where an assigned slave address collides with the address of another node in the cluster).
To recover, power-on the blade stand-alone (i.e. not in a cluster) and repair the
erroneous values on any affected nodes via the blade master (which always at address 7c
when a blade is stand-alone).
Questions and Answers
Why isn't the BMC responding?
I've followed the connect and unlock procedures but the master node BMC does not respond.
Why not?
How do I check if it's working?
The recommended way to check the status if unknown is to issue the =
command.
If there is no response the BMC is probably locked.
However, there is another possibility.
The BMC may be unlocked but the master node may have the console open which is connected to a slave node. The master is therefore talking to the slave node. If that slave is not responding for whatever reason, e.g. no system console enabled, operating system has crashed, then you will receive no response to any commands you send even though the master BMC is unlocked.
The same thing can happen if you select a non-existent slave node, usually by specifying
an incorrect node address with the {
or |
commands. To escape from these situations send }
and/or ^G
to escape back to local mode and then issue =
to confirm you have succeeded.
If there is no response to any of this, then there may be a serial I/O problem talking to the local BMC or there is a hardware fault on the control bus or BMC.
Have you re-checked the tips explained in the connect section?
Releases
CB04A020
Release Notes
Second production release.
fixed
115k
baud console.Previously the remote console feature occasionally dropped characters when a lot of data was sent from the remote host via the bus. This could occur even if the remote host was sending at precisely
115k
baud. This problem is fixed and Cluster Blade can now accomodate abnormally high rates. The local host (master) may not support this however so the caveats still apply.unlock code is changed
The previous (default) unlock sequence was
BitScope
. This has been changed toUnLockMe
. This is just the factory default, it can be changed to whatever is required.relocated some API registers
These changes made to simplfy the API and to accomodate changes made in support of (1). The body of this document has been updated accordingly.
CB04A019
Release Notes
Interim update (not released publicly).
fixed fan idle algorithm.
Previously, when a fan was idle, a low level "jitter noise" may have been emitted. This was audible for some people (those with particularly good hearing). Whether the noise was emitted depended on cpu load, the type of fan and the angular position of the fan when idle. This has been fixed. When idle the fan is completely silent.
enabled fan when console open.
Previously the built in fan control was not active when the console was opened. In this case the fan speed would not change until the console was close. Benign in most cases, it did mean that if one ran a heavy workload via the console, the fan would not increase speed until the console was again closed. While closing the console after issuing a command is normal practice in cluster management, this change means the fan speed now continues to be managed even when the console is open.
enabled fan when master active.
Similarly to (2) the fan speed is now controlled when a node is being used as the cluster manager. While it's not common to run a heavy workload on a node that is the cluster manager, it is now possible to do so and have the fan respond correctly.
CB04A018
Release Notes
Interim update (not released publicly).
fixed fan control algorithm.
The fan speed is driven by the node current draw, subject to scaling, and saturates at a point set by the fan control parameters. This worked for
02
(x1) and03
(/2) but not for00
(x4) and01
(x2). In the latter cases, the fan failed to increase speed when current draw went beyond 1A (with00)
or 2A (with01
). While this could be worked around (with manual override) the fact that a node can draw up to 4A meant cooling may be insufficient in high load use-cases. Fixed.
CB04A017
Release Notes
Interim update (not released publicly).
added open console with mute function
The baud rate specified is now limited to the low nybble. The high nybble specifies how many characters the console must accept from the bus before sending data to the bus.
This mechanism allows a bus master to "force feed" up to 15 bytes to a slave host (via its console) before the slave host is allowed to send data back to the bus (via its console).
The canonic use-case is a host vomiting data (via the console) to the bus for which a ^C or ^D is required to stop it, but, because the (simplex) bus is full of console traffic, the master can no longer reliably transmit a character back (over the bus) to the slave (because of bus collisions). Collisions will cause all nodes to drop off the bus but reconnecting to the rogue slave results in the same problem (the slave will vommit onto the bus as soon as the console is reopened).
By opening the console with mute, the master ensures it can send up to 15 characters to the slave via the bus before the slave has a chance to flood the bus with (more) junk. Upon receipt of the terminator (C or ^D) the slave host should stop vomiting allowing normal operation to proceed. If it does not, the slave has gone completely AWOL and nothing the master can do can retrieve it (except avoiding opening the console). In this case, a hard reboot (power cycle) may be the only option to recover (as the problem is within the slave host).
removed extraneous LF from
/
and\
.This was a bug in
CB04A016
. Now fixed.modified pipe to use
|
and{
uniquelyWhen opened with
|
it is closed with the Close Pipe.This is intended for interactive use. The default Close Pipe is
^G
. It may be configured to a different character (viaClose Pipe
). This may be necessary if^G
is used by a slave node (via the console) for some other purpose.When opened with
{
it is always closed with}
.This is intended for non-interactive use. Command strings are more readable using this mechanism. The closing character cannot be replaced (it is always
}
). This means it may not be appropriate for use with a console (if the host uses}
for other purposes).
In both cases, a closing character is returned when the pipe is close. Closing the pipe causes all attached nodes to detach. On the master the
^G
orClose Pipe
or}
character will be returned (depending on how the pipe was opened and how it's configured). On the slave the^G
orClose Pipe
character will be returned. In this case}
will never be returned because the slave never opened a pipe.Normally there should only be two attached nodes (the master and slave). If there are others (which an error condition) they will detach when any master closes its pipe or any node attempts to become master. This means the bus is always recoverable. These (extraneous) nodes will return
^G
orClose Pipe
to their hosts unless they (also) had a pipe opened on the bus with{
in which case}
will be returned.
CB04A016
Release Notes
First Production Release.
- added the host console mechanism.
The Console allows the master to talk with a slave host via the bus. The console command
is
~
. The value in the input register selects the baud rate as00
115k01
9k602
19k203
57k6. The 115k baud rate does not work reliably for strings longer than a dozen characters (due to bus overrun). This may be fixed in a future revision (with fifos). Baud rates01
,02
and03
are recommended until then. - defined unlock default string BitScope
- added echo squash for console use. When an open pipe sees the console command '~' it enables squashing. When squashing, characters sent by the pipe are not echoed. The exception is end pipe (an address) which must echo (to end the pipe). This mechanism is added to enable the use of a console.
- modified end pipe mechanism.
The Close Pipe character (
^G
by default) is no longer sent (or received) over the bus. When received (from the host, on the master) the master puts its own address on the bus instead. All nodes (including the master) detach from the bus upon receipt of an(y) address. The (releasing) master node does not (re)attach (itself) despite its own address appearing on the bus. Any consoles (if running on any slaves) also terminate and detach from the bus. No character received by a console (from its host) can cause the console to terminate. All 7 bit characters (except the Close Pipe) are sent to a slave's host (via the console). - modified so high bit set characters are dropped Only 7 bit ASCII is legal across the bus. High bit set values are reserved for address selection on the bus.
- removed Close Pipe command (i.e.
*
is no longer used or required) - updated pipe open can use
|
as well as{
They are equivalent to each other to open the pipe. They close differently. - removed
/
\
reply character payload Use state command=
to learn state
CB04A015
Release Notes
Second beta release.
added serial unlock mechanism.
The default unlock word is
BitScope
.
CB04A014
Release Notes
First beta release.
added peon mode.
A node may be Peon which means it does not have permission to become a cluster manager (Master) or to change its own configuration. This will be the case in a managed cluster by default because configuration is the exclusive domain of the cluster manager. Asserting Peon Mode in an unmanaged cluster achieves the same result. A node can always read its own configuration so it can know how it has been set up to run. To disable Peon, No SRAM and/or No Fan the master must change parameter value to something other than
aa
.fixed serial error reporting.
It is sometimes possible for a host to overrun or experience other low level serial communications errors when talking to the BMC. This can occur if the VM protocol is violated by a user operating interactively or by bugs in programs talking to the BMC. Such errors are non-fatal but they need to be reported (for diagnostic purposes). This update fixes the reporting of these errors as documented here.
relocated measurement, cooling and calibration commands.
These commands were previously implemented as vectors. They now exist as commands which makes them easier to use interactively.
fixed bus recovery mechanism.
It was previously possible to lose control of the bus when illegal characters were sent to the BMC. This is no longer possible.
added address override mechanism.
It is now possible to relocate the address of nodes other than by the hardware mechanism built into BitScope Clusters. The new address is store in FLASH. This allows the creation of ad-hoc clusters comprising more than one Blade without requring the use of hardware addressing.
protected the reset command.
The reset command
!
now requires a uniqu vector55
to be specified before it does anything. This change make inadvertent use of reset unlikley. Reset remains a benign operation (from the host's point of view) unless boot with power off is enabled (in which case the node will power off - caveat emptor).added boot with power off feature.
It is now possible to configure a node to remain powered off when blade power is applied. The makes it possible for a cluster to be configured to automaticlly power on only those nodes that are enabled to do so when power is applied to the cluster. That is, any "boot with powered off" node must be powered on by the cluster manager before use.
Impressum
Online | docs.bitscope.com/control-plane |
---|---|
Author | BitScope Products <products@bitscope.com> |
Copyright | © 2020-2023 MetaChip Pty. Ltd. T/A BitScope Designs. All Rights Reserved. |
License | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
Feedback | We welcome your feedback. If you find any errors, omissions or just information that is confusing or incorrect, please email us at feedback@bitscope.com with details. |
Permalink | docs.bitscope.com/BRL23E7E |
Date | |
Rev | w8tt2du8jl22fgx2 |