Table of Contents

Table of Contents Eworks Development Status Report Draft One: August 1994 Extreme Ultraviolet Explorer Science Operations Center and EUVE Software Development Team Center for Extreme Ultraviolet Astrophysics University of California Berkeley California 94720 - 5030
Eworks Development Status Report

Eworks Development Status Report

Draft One: August 1994

Draft One: August 1994

Lawrence Wong, Allen Hopkins, Marty Eckert, Frank Kronberg, Paul Wang

Extreme Ultraviolet Explorer Science Operations Center

Extreme Ultraviolet Explorer Science Operations Center

and

and

EUVE Software Development Team

EUVE Software Development Team

Center for Extreme Ultraviolet Astrophysics

Center for Extreme Ultraviolet Astrophysics

University of California

University of California

Berkeley California 94720 - 5030

Berkeley California 94720 - 5030

Table of Contents

Table of Contents

Overview 3
Development 3
Flow Chart Development 3
Eworks Architecture 5
Stress Test Environment 7
Tasks in Progress 8
Tasks Planned for the Upcoming Months 8
Test Results 9
Stress Testing in the Development Environment 9
Stress Testing Anomalies 10
Testing in the Operational Environment 11
DDL Report 12
User Interface Evaluation 13
Summary 13
Glossary 14
I. Overview
Eworks is a rule-based expert system telemetry monitoring application using the RTworks software package from Talarian. It is being developed by the EUVE software development team at the CEA (Center for the Extreme Ultraviolet Astrophysics) at UC Berkeley. RTworks can be used to develop and run distributed applications for real-time monitoring, analysis and display of complex systems.

The goal is to have this expert software monitor the EUVE payload autonomously 24 hours a day, 7 days a week. It will also page the Anomaly Coordinator for the ESOC (ACE) for corrective actions when anomalies are detected. A by-product of Eworks development and evaluation process is a test-bedding baseline that applied to future software packages which could be incorporated into the ESOC's operational software.

II. Development
1. Flow Chart Development
A series of the EUVE payload subsystems flowcharts was developed by the EUVE operations team and approved by the EUVE scientists. These approved flow charts were coded into rules that Eworks uses in monitoring the payload. The first series of flowcharts consists of the following payload subsystems which are considered essential to monitor for the health and safety of the payload. Following each subsystem is the listing of associated system status that Eworks checks for the payload's health.

- Overall Health

Realtime telemetry arrival

(Has realtime telemetry arrived within a specified time frame?)

Top level status check

(Clock updating? RIFs and TIFs power on? CDP patched?)

Detectors functioning?

Rate shutdown functioning correctly?

- High Voltage

Programmed high voltage limit checks

Special observation configuration

Detector high voltage read-back limit check

High voltage status appropriate for Day/Night?

- Power limit checks

CDP current

Detectors voltages and currents

RIF voltages and currents

TIF voltages and currents

- Command Echoes

Command counts

(Is increase in command counter accompanied by command echoes?)

Command echoes

(Any instrument error messages received requiring attention?)

- Thermal limit checks

CDP temperature

Detectors temperatures

RIF temperatures

TIF temperatures

Mirror and focal plane differential temperatures

The first version of Eworks was released on July 5, 1994 with limited payload monitoring functionality. Eworks is currently being run in parallel with the existing payload monitoring software which requires operator intervention during the realtime passes on a 24-hour-a-day basis.

2. Eworks architecture
Eworks is configured and being developed on a network of Unix workstations and servers in a distributed environment. Eworks is a collection of programs built upon Talarian Corporation's RTworks realtime data monitoring tools. The programs include the following modules:

- euvertdaq Data Acquisition module

- euvertie Inference Engine module

- euverthci Graphical Human-computer Interface module

- euverttr Telemetry Reception module

- rtserver RTworks Data Server module

The Eworks architecture is shown in figure 1. The data acquisition module reads telemetry as it is received at the ESOC, extracts the engineering health information from it and passes that on to the data server which makes it available to the inference engine, human-computer interface and telemetry reception watchdog modules.

The inference engine module monitors the health information using rules written here at the CEA and pages an ACE if any problem conditions are found.

The two telemetry reception modules monitor the absent of telemetry for prolong periods of time. Presently, one module pages the ACE if the ESOC's telemetry reception has been absenced for longer than six consecutive hours and the other module pages the ACE to prepare to shut down the payload if absence of telemetry exceeds sixteen hours.

The human-computer interface module can be used to display graphical representations of the state of the payload.

These modules use network protocols to communicate with each other and so can be run on separate machines if desired.

Figure A: Eworks Architecture

III. Stress Test Environment
Testing and evaluation of the released Eworks is in progress in both the development and operational environments. The test data is compiled primarily from the ESOC logbooks and the data discrepancy logs (DDLs) and classified as follows:

Type of Anomalies Descriptions

seu Single Event Upset (cosmic ray hits to payload electronics)

ool Out of Limit of Payload Monitors

eng Engineering Test Data (i.e., high voltage power supply or detector test)

obs Special Target Observations (i.e., moon, bright targets)

others Other Anomalies (i.e., power shed, clamup, etc...)

Each test data set is compiled with a complete description of the anomaly including the time the anomaly occurred and the related errors seen in the telemetry data. This is accomplished through analyzing the anomaly with the corresponding flowcharts.

In order to determine the number of errors expected, we used a combination of the anomaly or test descriptions, logbooks, and data products obtained by decommutating the data with the End-to-End System software (EES). We use that information in conjunction with the flow charts to determine the number of expected errors.

In some cases our knowledge and experience were enough to determine the number of expected errors from the description of the anomaly. Reading the logbooks enabled us, in some cases, to determine exactly what happened, and, again, we could use experience to determine the expected errors. For the more complex problems, there simply was not enough specific information in the logs and we had to resort to looking at data products, i.e., limit transitions and the command echoes to make the determination. The number of errors expected is a function of the software that we were testing and could increase as the software becomes more sophisticated.

Eworks is tested with the realtime anomaly test data and cross checked with the production data covering the same time period when the anomalies occurred. The important things that we looked for in this first phase of stress testing were the following:

- whether a particular anomaly that occurred at a specific major frame was detected by Eworks

and Eworks accurately displayed the corresponding error and alert messages

- whether a glitch in the realtime telemetry caused Eworks to stop functioning

- and whether telemetry glitches triggered erroneous warnings

IV. Tasks in Progress
Testing of Eworks is in progress using the normal telemetry and the anomaly data set compiled since EUVE launch. A new flowchart developed for the payload configuration during special observations has been submitted to the EUVE scientists for approval. We are resolving the outstanding issues of the Eworks flowcharts for future implemention.

The changes made to the flowcharts have been submitted to the Hardware Scientist John Vallega for his approval. These changes involve

- the identification of day, night, dawn and dusk crossing in the telemetry stream

- rate shutdown algorithm

- detector high voltages and differential temperature tolerances for the mirror and the focal plane

The following outstanding issues are being resolved

- the ability to implement external parameters for Eworks to reference for expected events (i.e.,

special observations and engineering tests) and anomalies within a specified time frame

- define and refine the ESOC's activities involved in the simulation run over the September and

October 1994 period

V. Tasks Planned for the Upcoming Months
The telemetry team will be continuing with the regression testing of Eworks. The test environment and testing procedures will be updated based on lessons learned through the first phase of Eworks testing. The TBD (To be determined) sections of the existing flowcharts will be updated as we become more knowledgeable in dealing with difficult cases. Additional flowcharts for the remaining payload subsystems will be developed in the upcoming months.

VI. Test Results
1. Stress Testing in the Development Environment
The results from stress testing Eworks with anomaly data is compiled, analyzed and presented in table 1. The first column indicates the date (year, month, day) that the event occurred followed by a key word referring to the type of anomaly data used in testing Eworks (see Type of Anomalies descriptions in the table above).

The third column indicates the total number of errors Eworks is expected to detect which are the numbers compiled by analyzing the events with the EES software and the flowcharts. The fourth column indicates the number of errors detected by the current version of Eworks. In some of the tests the number of errors detected is greater than the number of errors expected. This is due to the tight tolerances in some of the limit constraints.

We found two cases in which Eworks did not catch instrument settings that changed for the testing of different configurations for special observations. This error has been corrected by introducing a new "Special Observation" flowchart to deal with special configurations.

(Notes on Table 1 which follows on next page)

If any monitor went out of limits and the EES software reported a violation of both yellow and red limits, only one error was counted.

* We decommutated the file associated with the archiveID number
in which the event occurred. This number is a lower limit.

# For the SEU's detected by the dumpallram that are listed here,
the ISW that would implement the check for whether the CDP

was patched had not been installed, so no errors are expected.

@ Hopr should be out of limits when detectors are at half voltage during orbital night.

IP Indicates testing in progress

Table 1: Stress Testing Anomalies 
--------------------------------------------------------------------------------------------
Dates           Description                                       # of Errors   # of Errors   
                                                                  Expected      Detected      
--------------------------------------------------------------------------------------------
920814.eng      daytime data; detectors on in day                     0         0             
930324.eng     WSZ Test                                               0         0             
930601.eng     WSZ Test                                               0         0             
930711.eng      WSZ Test                                              0         0             
930731.eng      WSZ Test                                              0         0             
940312.eng      WSZ Test                                              0         0             
940327.eng      WSZ Test                                              0         0             
940330.eng      WSZ Test                                              0         0             
940429.eng      WSZ Test                                              0         0             
940507.eng      HV Test                                             10          IP            
931112.ool     Tif6IonC red for a few frames                          0         0             
931215.ool     Tif4+Rf under red for a few secs                       1         IP            
931116.others   Power shed clamup                                   15          IP            
920820.seu     Cdp reset, Tifs, Rifs were off                       10*         25            
920927.seu       Detector 5 at 50% voltage.                          1          IP            
921218.seu       Tif7 Reset.                                         3          7             
930322.seu       CDP reset, TIFs, RIFs, off, Sunsensors disabled     9*         50            
931107.seu       Clam-up during EOR uplink in R/T                   14*         IP            
931124.seu       CDP reset, Payload in TO mode, occurred in R/T      3*         IP            
940110.seu       SEU detected by dumpallram                          0#         IP            
940218.seu       SEU detected by dumpallram                          0#         IP            
940411.seu       CDP reset, TIFs, RIFs, off, Sunsensors disabled    16*         IP            
931103.obs       Reconfigured for moon (1-3+7 off)                    6@        5             
931115.obs       Det1,2,3off                                         3@         3             
931125.obs       Det 1,2,3 off for Feige 24                          6@         IP            
931129.obs       det 1-6 off, allwsz, det7 on and blanked           12@         IP            
931203.obs       Moon (1-3+7 off)                                     8@        IP            
931205.obs       lower voltage for H 1504 (U Gem?) obs (1-3 off)     6@         IP            
931219.obs       det 3 off for Beta CMa obs                          2@         IP            
931221.obs       det 1,2,and 7 off                                   6@         IP            
931224.obs       det1,2,3,7 off for moon, shutdown rates chg         8@         IP            
931229.obs       two rt passes/det 3,7 off at end                    4@         IP            
931230.obs       det 3,7 off                                         4@         IP            
940129.obs       Detector 1,2,3,7 off, rate shutdown changed         8@         IP            
                                                                                              
--------------------------------------------------------------------------------------------
2. Testing in the Operational Environment
In addition to the Eworks stress testing in the development environment, testing is also done in the operational environment where we run Eworks in parallel with the existing payload monitoring software called SOCTOOLS.

When anomalies are detected by Eworks, the SOCTOOLS displays are used to verify that Eworks is detecting the anomalies correctly. Eworks problems encountered in this environment are logged as an ESOC data discrepancy log (DDL).

Table 2. shows the listing of ESOC DDLs relating to Eworks problems. The descriptions of the problems and the status of the resolution to the problems are also shown in the table.

The DDL# indicates the data discrepancy log number assigned to the problem and the EPR number is the EUVE Problem Report number assigned to problem which needs further analysis. The status "Open" indicates problem under investigation and "Closed" indicates problem solved and the resolution implemented in the next version.

Table 2: DDL Report 
-----------------------------------------------------------------------------------
DDL #  Date  Status  EPR #  Description                                              
-----------------------------------------------------------------------------------
1119   7/07  Closed         Received errors in EWorks at beginning of pass.          
                     1382    CDPclock did not update as expected                     
                     1383    ccnt changed 48>64 but there were no command echoes  
1120   7/07  Closed         Ework's errors during Frank's shift                      
                     1384   RTdaq: "Unable to keep up"                               
                     1385   RThci: "Type mismatch..."                                
                     1386   RTie: no license                                         
                     1387   rttr.out: undefined symbol                               
1123   7/08  Closed         Eworks errors relating to DDL #1120                      
1124   7/08  Closed         Eworks errors relating to DDL #1119 and #11120           
1126   7/09                 Eworks errors during Paul's shift                        
             Oped    1388   "Det6HVlt wandering"                                     
             Closed  1389   "Failure to achieve Rate Shutdown"                       
1127   7/09  Closed         Eworks errors relating to DDL #1126                      
1128   7/10  Closed  1390   ADct 2 zero for 3 frames                                 
                                                                                     
1129   7/09  Closed         Eworks shows Det1 ADct 0 for 3 frames                    
                            Error relating to DDL #1128                              
1133   7/16  Closed  1398   Eworks script not creating working directory             
                                                                                     
1134   7/17  EPR     1397   Eworks shows "Det3HVlt wander" warning when              
                            Det 3 HV command sent                                    
1135   7/17  Closed  1395   Det3Hopr violating LOWER YELLOW limit                    
             EPR     1396   Rate shutdown change                                     
1138   7/20  Closed  1403   Tel3 violating thermal control                           
1139   7/24  Closed         Rif1Cur and CDPPwrV showed a sad face state              
                            but the status flag showed green                         
1141   7/27  Closed  1407   RTworks did not process data for R/T pass                
                            euvertdaq exited during a telemetry glitch               
1143   8/03  Closed  1410   Could not bring up eworks on the SOC net                 
1144   8/04  Closed         Unable to kill Eworks in a directory not writable by me  
                            Action: this has been fixed in a new version             
1151   8/09  Open           Eworks did not notice that Det5RSrt or Det7Blnk          
                            are out of limits                                        
1152   8/10  Closed         RTworks failed to process R/T data                       
                            error relating to DDL #1141                              
-----------------------------------------------------------------------------------
3. User Interface Evaluation
In addition to systematic testing of Eworks, functionality of the Eworks' user interface implementation is also evaluated by the ESOC personnel. The following feedback has been compiled:

1. It would be very useful to display all the monitors with the last 10 or 15 frames of data so

that one can look at how a monitor reading has been progressing for the past 15 seconds.

Currently, only the power system has this feature.

2. Color shade the limits in the plots to give an idea of the monitor's readings relative to the limit

boundaries.

3. As a diagnostic tool, it would be useful to be able to bring up the flowchart associated with

the monitor in trouble and have the path to the troubled node in the flowchart highlighted.

This would allow us to quickly obtain useful information for speedy corrective action.

4. It would be useful to have the ability to bring up a graphical representation of the

subsystem in trouble and highlight the component associated with the anomaly.

VII. Summary
Based on the Eworks test results compiled, Eworks is found to be able to detect most of the anomalies in the test data at the correct time. A few of the constraints were missed due to special cases not considered in our first flowchart development. We will continue optimizing the flowcharts as testing and use of Eworks progress.

Glossary

ACE Anomaly Coordinator for the ESOC

CDP Command Data and Power interface unit

DDL Data Discrepancy Log

EPR EUVE Problem Report

ESOC EUVE Science Operations Center

EUVE Extreme UltraViolet Explorer

End-to-End Design analysis tool for the EUVE project

Hopr Programmed High Voltage

ISW Instrument SoftWare

RIF Relay InterFace unit

SEU Single Event Upset

TBD To be determined

TIF Telescope InterFace unit