The goal is to have this expert software monitor the EUVE payload autonomously 24 hours a day, 7 days a week. It will also page the Anomaly Coordinator for the ESOC (ACE) for corrective actions when anomalies are detected. A by-product of Eworks development and evaluation process is a test-bedding baseline that applied to future software packages which could be incorporated into the ESOC's operational software.
A series of the EUVE payload subsystems flowcharts was developed by the EUVE operations team and approved by the EUVE scientists. These approved flow charts were coded into rules that Eworks uses in monitoring the payload. The first series of flowcharts consists of the following payload subsystems which are considered essential to monitor for the health and safety of the payload. Following each subsystem is the listing of associated system status that Eworks checks for the payload's health.
- Overall Health
Realtime telemetry arrival
(Has realtime telemetry arrived within a specified time frame?)
Top level status check
(Clock updating? RIFs and TIFs power on? CDP patched?)
Detectors functioning?
Rate shutdown functioning correctly?
- High Voltage
Programmed high voltage limit checks
Special observation configuration
Detector high voltage read-back limit check
High voltage status appropriate for Day/Night?
- Power limit checks
CDP current
Detectors voltages and currents
RIF voltages and currents
TIF voltages and currents
- Command Echoes
Command counts
(Is increase in command counter accompanied by command echoes?)
Command echoes
(Any instrument error messages received requiring attention?)
- Thermal limit checks
CDP temperature
Detectors temperatures
RIF temperatures
TIF temperatures
Mirror and focal plane differential temperatures
The first version of Eworks was released on July 5, 1994 with limited payload monitoring functionality. Eworks is currently being run in parallel with the existing payload monitoring software which requires operator intervention during the realtime passes on a 24-hour-a-day basis.
Eworks is configured and being developed on a network of Unix workstations and servers in a distributed environment. Eworks is a collection of programs built upon Talarian Corporation's RTworks realtime data monitoring tools. The programs include the following modules:
- euvertdaq Data Acquisition module
- euvertie Inference Engine module
- euverthci Graphical Human-computer Interface module
- euverttr Telemetry Reception module
- rtserver RTworks Data Server module
The Eworks architecture is shown in figure 1. The data acquisition module reads telemetry as it is received at the ESOC, extracts the engineering health information from it and passes that on to the data server which makes it available to the inference engine, human-computer interface and telemetry reception watchdog modules.
The inference engine module monitors the health information using rules written here at the CEA and pages an ACE if any problem conditions are found.
The two telemetry reception modules monitor the absent of telemetry for prolong periods of time. Presently, one module pages the ACE if the ESOC's telemetry reception has been absenced for longer than six consecutive hours and the other module pages the ACE to prepare to shut down the payload if absence of telemetry exceeds sixteen hours.
The human-computer interface module can be used to display graphical representations of the state of the payload.
These modules use network protocols to communicate with each other and so can be run on separate machines if desired.
Testing and evaluation of the released Eworks is in progress in both the development and operational environments. The test data is compiled primarily from the ESOC logbooks and the data discrepancy logs (DDLs) and classified as follows:
Type of Anomalies Descriptions
seu Single Event Upset (cosmic ray hits to payload electronics)
ool Out of Limit of Payload Monitors
eng Engineering Test Data (i.e., high voltage power supply or detector test)
obs Special Target Observations (i.e., moon, bright targets)
others Other Anomalies (i.e., power shed, clamup, etc...)
Each test data set is compiled with a complete description of the anomaly including the time the anomaly occurred and the related errors seen in the telemetry data. This is accomplished through analyzing the anomaly with the corresponding flowcharts.
In order to determine the number of errors expected, we used a combination of the anomaly or test descriptions, logbooks, and data products obtained by decommutating the data with the End-to-End System software (EES). We use that information in conjunction with the flow charts to determine the number of expected errors.
In some cases our knowledge and experience were enough to determine the number of expected errors from the description of the anomaly. Reading the logbooks enabled us, in some cases, to determine exactly what happened, and, again, we could use experience to determine the expected errors. For the more complex problems, there simply was not enough specific information in the logs and we had to resort to looking at data products, i.e., limit transitions and the command echoes to make the determination. The number of errors expected is a function of the software that we were testing and could increase as the software becomes more sophisticated.
Eworks is tested with the realtime anomaly test data and cross checked with the production data covering the same time period when the anomalies occurred. The important things that we looked for in this first phase of stress testing were the following:
- whether a particular anomaly that occurred at a specific major frame was detected by Eworks
and Eworks accurately displayed the corresponding error and alert messages
- whether a glitch in the realtime telemetry caused Eworks to stop functioning
- and whether telemetry glitches triggered erroneous warnings
Testing of Eworks is in progress using the normal telemetry and the anomaly data set compiled since EUVE launch. A new flowchart developed for the payload configuration during special observations has been submitted to the EUVE scientists for approval. We are resolving the outstanding issues of the Eworks flowcharts for future implemention.
The changes made to the flowcharts have been submitted to the Hardware Scientist John Vallega for his approval. These changes involve
- the identification of day, night, dawn and dusk crossing in the telemetry stream
- rate shutdown algorithm
- detector high voltages and differential temperature tolerances for the mirror and the focal plane
The following outstanding issues are being resolved
- the ability to implement external parameters for Eworks to reference for expected events (i.e.,
special observations and engineering tests) and anomalies within a specified time frame
- define and refine the ESOC's activities involved in the simulation run over the September and
October 1994 period
The telemetry team will be continuing with the regression testing of Eworks. The test environment and testing procedures will be updated based on lessons learned through the first phase of Eworks testing. The TBD (To be determined) sections of the existing flowcharts will be updated as we become more knowledgeable in dealing with difficult cases. Additional flowcharts for the remaining payload subsystems will be developed in the upcoming months.
The results from stress testing Eworks with anomaly data is compiled, analyzed and presented in table 1. The first column indicates the date (year, month, day) that the event occurred followed by a key word referring to the type of anomaly data used in testing Eworks (see Type of Anomalies descriptions in the table above).
The third column indicates the total number of errors Eworks is expected to detect which are the numbers compiled by analyzing the events with the EES software and the flowcharts. The fourth column indicates the number of errors detected by the current version of Eworks. In some of the tests the number of errors detected is greater than the number of errors expected. This is due to the tight tolerances in some of the limit constraints.
We found two cases in which Eworks did not catch instrument settings that changed for the testing of different configurations for special observations. This error has been corrected by introducing a new "Special Observation" flowchart to deal with special configurations.
(Notes on Table 1 which follows on next page)
If any monitor went out of limits and the EES software reported a violation of both yellow and red limits, only one error was counted.
* We decommutated the file associated with the archiveID number
in which the event occurred. This number is a lower limit.
# For the SEU's detected by the dumpallram that are listed here,
the ISW that would implement the check for whether the CDP
was patched had not been installed, so no errors are expected.
@ Hopr should be out of limits when detectors are at half voltage during orbital night.
IP Indicates testing in progress
Table 1: Stress Testing Anomalies -------------------------------------------------------------------------------------------- Dates Description # of Errors # of Errors Expected Detected -------------------------------------------------------------------------------------------- 920814.eng daytime data; detectors on in day 0 0 930324.eng WSZ Test 0 0 930601.eng WSZ Test 0 0 930711.eng WSZ Test 0 0 930731.eng WSZ Test 0 0 940312.eng WSZ Test 0 0 940327.eng WSZ Test 0 0 940330.eng WSZ Test 0 0 940429.eng WSZ Test 0 0 940507.eng HV Test 10 IP 931112.ool Tif6IonC red for a few frames 0 0 931215.ool Tif4+Rf under red for a few secs 1 IP 931116.others Power shed clamup 15 IP 920820.seu Cdp reset, Tifs, Rifs were off 10* 25 920927.seu Detector 5 at 50% voltage. 1 IP 921218.seu Tif7 Reset. 3 7 930322.seu CDP reset, TIFs, RIFs, off, Sunsensors disabled 9* 50 931107.seu Clam-up during EOR uplink in R/T 14* IP 931124.seu CDP reset, Payload in TO mode, occurred in R/T 3* IP 940110.seu SEU detected by dumpallram 0# IP 940218.seu SEU detected by dumpallram 0# IP 940411.seu CDP reset, TIFs, RIFs, off, Sunsensors disabled 16* IP 931103.obs Reconfigured for moon (1-3+7 off) 6@ 5 931115.obs Det1,2,3off 3@ 3 931125.obs Det 1,2,3 off for Feige 24 6@ IP 931129.obs det 1-6 off, allwsz, det7 on and blanked 12@ IP 931203.obs Moon (1-3+7 off) 8@ IP 931205.obs lower voltage for H 1504 (U Gem?) obs (1-3 off) 6@ IP 931219.obs det 3 off for Beta CMa obs 2@ IP 931221.obs det 1,2,and 7 off 6@ IP 931224.obs det1,2,3,7 off for moon, shutdown rates chg 8@ IP 931229.obs two rt passes/det 3,7 off at end 4@ IP 931230.obs det 3,7 off 4@ IP 940129.obs Detector 1,2,3,7 off, rate shutdown changed 8@ IP --------------------------------------------------------------------------------------------
When anomalies are detected by Eworks, the SOCTOOLS displays are used to verify that Eworks is detecting the anomalies correctly. Eworks problems encountered in this environment are logged as an ESOC data discrepancy log (DDL).
Table 2. shows the listing of ESOC DDLs relating to Eworks problems. The descriptions of the problems and the status of the resolution to the problems are also shown in the table.
The DDL# indicates the data discrepancy log number assigned to the problem and the EPR number is the EUVE Problem Report number assigned to problem which needs further analysis. The status "Open" indicates problem under investigation and "Closed" indicates problem solved and the resolution implemented in the next version.
Table 2: DDL Report ----------------------------------------------------------------------------------- DDL # Date Status EPR # Description ----------------------------------------------------------------------------------- 1119 7/07 Closed Received errors in EWorks at beginning of pass. 1382 CDPclock did not update as expected 1383 ccnt changed 48>64 but there were no command echoes 1120 7/07 Closed Ework's errors during Frank's shift 1384 RTdaq: "Unable to keep up" 1385 RThci: "Type mismatch..." 1386 RTie: no license 1387 rttr.out: undefined symbol 1123 7/08 Closed Eworks errors relating to DDL #1120 1124 7/08 Closed Eworks errors relating to DDL #1119 and #11120 1126 7/09 Eworks errors during Paul's shift Oped 1388 "Det6HVlt wandering" Closed 1389 "Failure to achieve Rate Shutdown" 1127 7/09 Closed Eworks errors relating to DDL #1126 1128 7/10 Closed 1390 ADct 2 zero for 3 frames 1129 7/09 Closed Eworks shows Det1 ADct 0 for 3 frames Error relating to DDL #1128 1133 7/16 Closed 1398 Eworks script not creating working directory 1134 7/17 EPR 1397 Eworks shows "Det3HVlt wander" warning when Det 3 HV command sent 1135 7/17 Closed 1395 Det3Hopr violating LOWER YELLOW limit EPR 1396 Rate shutdown change 1138 7/20 Closed 1403 Tel3 violating thermal control 1139 7/24 Closed Rif1Cur and CDPPwrV showed a sad face state but the status flag showed green 1141 7/27 Closed 1407 RTworks did not process data for R/T pass euvertdaq exited during a telemetry glitch 1143 8/03 Closed 1410 Could not bring up eworks on the SOC net 1144 8/04 Closed Unable to kill Eworks in a directory not writable by me Action: this has been fixed in a new version 1151 8/09 Open Eworks did not notice that Det5RSrt or Det7Blnk are out of limits 1152 8/10 Closed RTworks failed to process R/T data error relating to DDL #1141 -----------------------------------------------------------------------------------
1. It would be very useful to display all the monitors with the last 10 or 15 frames of data so
that one can look at how a monitor reading has been progressing for the past 15 seconds.
Currently, only the power system has this feature.
2. Color shade the limits in the plots to give an idea of the monitor's readings relative to the limit
boundaries.
3. As a diagnostic tool, it would be useful to be able to bring up the flowchart associated with
the monitor in trouble and have the path to the troubled node in the flowchart highlighted.
This would allow us to quickly obtain useful information for speedy corrective action.
4. It would be useful to have the ability to bring up a graphical representation of the
subsystem in trouble and highlight the component associated with the anomaly.
Based on the Eworks test results compiled, Eworks is found to be able to detect most of the anomalies in the test data at the correct time. A few of the constraints were missed due to special cases not considered in our first flowchart development. We will continue optimizing the flowcharts as testing and use of Eworks progress.
Glossary
ACE Anomaly Coordinator for the ESOC
CDP Command Data and Power interface unit
DDL Data Discrepancy Log
EPR EUVE Problem Report
ESOC EUVE Science Operations Center
EUVE Extreme UltraViolet Explorer
End-to-End Design analysis tool for the EUVE project
Hopr Programmed High Voltage
ISW Instrument SoftWare
RIF Relay InterFace unit
SEU Single Event Upset
TBD To be determined
TIF Telescope InterFace unit