If you have time-consuming tasks, many simple tasks or a lot of files to process, you can use batch system on distributed clusters, such as HybriLIT with SLURM, NICA prototype cluster with SGE or MPD-Tier1 with Torque – to essentially accelerate your work.
If you know how to work with SLURM (SLURM on HybriLIT), Sun Grid Engine (user guide) and Torque systems (doc index), you can use sbatch or qsub command on the clusters to parallel data processing. Simple examples of user jobs for SLURM, SGE and Torque can be found in ‘macro/mpd_scheduler/examples/batch’ directory in our software. Otherwise, MPD-Scheduler was developed to simplify running of user tasks in parallel.

MPD-Scheduler is a module of MpdRoot and BmnRoot software. It uses existing batch system (SLURM, SGE and Torque are supported) to distribute user jobs on the cluster and simplifies parallel job executing without knowledge of the batch systems. Jobs for distributed execution are described and passed to MPD-Scheduler as XML file:
$ mpd-scheduler my_job.xml

Example of MPD-Scheduler job:

<job name="reco_job">
   <macro path="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”>
    <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
    <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
    <file sim_input="energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
   </macro>
   <run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>
</job>

XML description of a job to run in batch system starts with <job> tag and ends with closing </job> tag. The attribute ‘name’ defines name of the job to identify it.

Tag <macro> sets information about a macro being executed by MpdRoot or BmnRoot:

  • path – path of the ROOT macro for distributed execution.
    Important! Since MPD job can be started by scheduler on any machine of the cluster, path must point to the shared space (e.g. /nica/mpd volumes on the NICA cluster). This argument is required.
  • start_event – number of the start event to process for all input files (as macro’s argument after input and output files’ path if present). This argument is optional.
  • count_event – count of events to process for all input files (as macro’s argument after start_event argument if present). This argument is optional.
  • add_args – additional last arguments of the ROOT macro, if required.
  • name – conventional name of the macro if you want to use dependenies. This argument is optional and usually not used.

Tag <file> is included inside the <macro> tag and contains information about input and output files to process by the macro:

  • input – path to one input file or a set of files in case of regular expression (?,*,+) to process.
  • file_input – path to text file containing the list of input files separated by new line.
  • job_input – name of one of the previous jobs that resulting files are input files for the current macro.
  • macro_input – name of one of the previous macros in the same job that resulting files are input files for the current macro.
  • sim_input – string to specify a list of input simulation files forming from the Unified database.
  • exp_input – string to specify a list of input experimental (raw) files forming from the Unified database.
  • output – path to result files.
    Important! Since user job can be started by MPD-Scheduler on any machine of the cluster, path must point to the shared space, e.g. NFS volume being available on all cluster nodes.
  • start_event – number of the start event specific for this file. This argument is optional.
  • count_event – count of events to process specific for this file. This argument is optional.
    Important! If start_event (or count_event) is set in <macro> and <file> tags then the value in the <file> tag is selected.
  • parallel_mode – processor count to parallel event processing of this input file. This argument is optional.
  • merge – whether merge result part files in parallel_mode. Default value is true, possible values of the attribute: “false”, “true”, “nodel”, “chain”.

The example of sim_input string is “energy=3,gen=urqmd”, where the following parameters: collision energy of select events is equal 3 GeV, event generator is UrQMD. All possible parameters are described in database utility (console) section. As the list of input files to process can include many files in one <file> tag, special variables can be used in the output argument to set the list of output files:

${counter} = counter is corresponding regular input file to process, the counter starts with 1 and increases by 1 for next input file.
${input} = absolute path of regular input file.
${file_name} = regular input file name without extension.
${file_name_with_ext} = regular input file name with extension.

Also, you can use additional possibility to exclude first (‘~’ symbol after colon) or last (‘-‘ symbol after colon) characters for the special variables above, e.g. ${file_name_with_ext:~N} is used as input file name with extension but without first N chars, or ${file_name:-N} – input file name without last N chars.

Tag <run> describes run parameters and allocated resources for the job. It can contain the following arguments:

  • mode – execution mode. Value ‘global’ sets distributed processing on cluster with batch system, ‘local’ – multi-threaded execution on the user multi-core computer. The default value is ‘local’.
  • count – maximum count of the processors allocated for this job. If this count is greater than the number of files to process, the count of the allocated processors is assigned to the number of files. The default value is 1. You can assign the processor count to 0, all cores will be used in case of the local mode or all processors of the batch system will be requested in case of the global mode.
  • config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro. This argument is optional.
  • logs – log file path for multi-threaded (local) mode.
  • priority – priority of the job (an integer in the range -1023 to 1024 with default value: 0).
  • queue – queue name of the batch system to be used for running the job if it’s not default, e.g. “mpd@bfsrv.jinr-t1.ru” for LIT T1 cluster queue for MPD.
  • hosts – selected host names separated by comma to process the job, e.g. “nc15,nc17” – for running the job only on nc15.jinr.ru and nc17.jinr.ru nodes of LHEP farm. The parameter is not usually used.

If you want to run non-ROOT (arbitrary) command by MPD-Scheduler, use <command> tag instead of <macro> with argument line – command line for distributed execution by scheduling system. Example of the MPD job with <command> tag:

<job>
   <command line="get_mpd_prod energy=5-9"/>
   <run mode="global" config="~/mpdroot/build/config.sh"/>
</job>

 
Another example of the job for local multithreaded execution:

<job>
   <macro name=“~/mpdroot/macro/mpd/reco.C">
    <file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“ start_event=”0” count_event=”0”/>
    <file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“ start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>
   </macro>
   <run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/>
</job>

XML-file for MPD-Scheduler can contain more than one job, and a list of user jobs for distributed execution. In this case <job> tags are included in general <jobs> tag. Some dependencies can be set between different jobs, so that job depending on another job will not start its execution until the latter end. To set dependency, use ‘dependency’ attribute assigned the name of another job in the <job> tag. Also, MPD-Scheduler job can contain more than one macro to execute, just use several <macro> tags in the job.
 
The directory “macro/mpd_scheduler/examples” of the software contains different XML-examples for the developed scheduling system.

Attention! MPD-Scheduler uses Sun Grid Engine batch system by default. To use another batch system, you should change the corresponding (second) line of ‘macro/mpd_scheduler/src/batch_select.h’ file and re-compile the software.

MPD-Scheduler is also described at MS PowerPoint presentation. If you have any questions about MPD-Scheduler, please, email: gertsen@jinr.ru.