You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

This page will also be used to communicate important information during dry-runs. To get notified of any changes to this page, please log in and click on  in the upper right.

What to expect during a dry-run

Dry-runs are system sessions to simulate the DHS behaviour during the DHS Move. During a dry-run read access to the main DHS in Reading will be closed and instead essential data necessary to continue all critical activities during the move will be provided by a temporary DHS in Bologna. 

Archive/Store operations will not be impacted and will continue as normal. Data archived during this period will be retrievable as normal.

At the start of the system session, all active/queued MARS retrieve requests and ECFS get operations on the DHS will fail. This is to switch MARS & ECFS to the dry-run configuration. After such failure, we encourage users to re-run requests/scripts once, to experience the expected behaviour during the DHS Move:

  • Requests for essential/recent data should be satisfied from the temporary DHS.
  • Requests for historical data should fail with an explanatory message. We encourage users to wait until the system session has ended before re-running their requests.

After the system session is over, user work can resume as normal

Timetable of System Session on  

Time (UTC)Event ImpactUser action
8:00Change configuration of all services using MARS/ECFS to point to the temporary DHS in Bologna
Check service status
8:00 - 8:30Drain all MARS/ECFS activity in DHS infrastructure in Reading All active/queued MARS retrieve requests and ECFS get operations on the DHS in Reading will fail.Please re-run to test behaviour expected during DHS Move. 
8:30 - 17:30DHS Move dry-run mode Behaviour as expected during the DHS Move: only essential/recent data will be availableIf the data you request is not available, wait until the end of the System Session before re-running failed requests.
18:00Revert to use DHS infrastructure in Reading:  Short interruption.

Please re-run all failed requests/scripts. 

18:00 -  
Longer than usual turn-around time should be expected until back-log clearsCheck service status

Log output samples of execution at various stages

When draining MARS activity in Reading

mars - INFO   - 20220331.170650 - Server task is 809 [marsod-core]
mars - ERROR  - 20220331.170650 - Mars server task finished in error
mars - ERROR  - 20220331.170650 - UserError: DHS infrastructure in Reading disabled. Please, re-run your request. If you see this message again for the same request, please, raise a ticket [marsod-core]
mars - ERROR  - 20220331.170650 - Error code is -2


When requesting data available in temporary DHS in Bologna

mars - INFO   - 20220331.165631 - Server task is 839 [temporary-dhs-prod]
mars - INFO   - 20220331.165631 - Request cost: N fields, BBBBBBBB Mbytes online, nodes: mvr007 [temporary-dhs-prod]
mars - INFO   - 20220331.165631 - Transfering BBBBBBBBB bytes
mars - INFO   - 20220331.165631 - N fields retrieved from 'temporary-dhs'


When requesting data not available during the DHS Move

mars - INFO   - 20220331.144749 - Server task is 614 [bologna-marsod-blank]
mars - ERROR  - 20220331.144749 - Mars server task finished in error
mars - ERROR  - 20220331.144749 - UserError: This data exists but will be unavailable during the DHS Move. For more information see https://confluence.ecmwf.int/x/jSKADQ [bologna-marsod-blank]
mars - ERROR  - 20220331.144749 - Error code is -2


Available tools  

MARS

ECFS

Add Relevant user information/commands re temporary DHS for ECFS

How to report a problem

If your activity is critical and we have missed to identify it as such, please raise a support ticket, see below, and we will look at your specific requirements.

If your request/command does not behave as described above, please, provide as much information as possible so analysts can investigate/reproduce the problem, for example:

  • Service/Tool you are using (mars client, ecfs, metview, web api, verify, etc...)
  • Version of the tool
  • Host, such as, ecgb, ATOS, cca, your workstation, ... and the environment under which it runs, interactive, batch, ...
  • Log output produced by MARS/ECFS, including the request

Please report all issues via our Support Portal, mentioning "dhs dry-run" in the title of your computing problem ticket. 



  • No labels