HGS Parallelization - Best Practices

Fully-integrated hydrologic simulations, such as those performed with HydroGeoSphere, involve highly nonlinear processes, and thus the computational efficiency of the model becomes a critical issue for those performing hydrologic simulations. HGS was parallelized by Hwang et al., 2014 to over come this challenge.

The post summarizes how to setup a parallel HGS simulation, as well as some general best practices for running a parallel simulation.

Parallel Model Setup

  • HGS runs in serial mode by default
  • The level of parallelization is controlled by the parallelindx.dat file
  • The parallelindx.dat file is created the first time a particular model is simulated
Default parallelindx.dat file

Default parallelindx.dat file

  • Once created, the parallelindx.dat file can be edited to adjust the level of parallelization (Note: this must be done prior to starting a simulation, and the file must be in the simulation folder).
  • To adjust the level of parallelization edit the first 3 inputs (Number_of_CPU, Num_Domain_Partitiong, and, Solver Type)
  • Number of CPU - specifies the number of CPU threads to be used for a simulation (do not exceed number available on machine)
  • Number of Domain Partitions - specifies into how many pieces the domain will be divided (should be the same as Number of CPU)
  • Solver Type: 1 = serial   2 = parallel
Sample parallelindx.dat file for parallelization with 6 CPUs

Sample parallelindx.dat file for parallelization with 6 CPUs

General Rules of Thumb and Best Practice

  • Test the model in serial mode to make sure that in starts properly before testing parallel mode
  • Use even number of CPU's and Domain Partitions (e.g., 2, 4, 6, 8....)
  • Use fewer cores than you have available on your computer
  • Note: parallel efficiency degrades with increasing CPU count. Optimum parallel efficiency has been found to occur around 100,000 nodes per CPU.
  • It is a good idea to profile your parallel efficiency prior to setting up production runs. This can be done by running the model using different numbers of CPUs and assessing run time.
  • Once the optimal level of parallelization has been identified production runs should be performed using this parallelization level for optimal efficiency

 

The following video walks through running an HGS model in parallel mode.