Set up an Elastic Machine Group

Set up an Elastic Machine Group#

The ElasticMachineGroup, similarly to the standard MachineGroup, is composed of a group of homogeneous machines that work individually to run multiple simulations. The difference is that the number of active machines is scaled up and/or down automatically based on the simulations in the queue. This prevents computational resources from being idle when there are no sufficient simulations to run and allows scaling up the computational resources when the queue is full.

Note that, the elasticity is independent of each machine being preemptible, i.e., these can be combined and Inductiva API manages the simulations accordingly.

Users can configure the resource with the minimum number of active machines they want to always be active, min_machines, and the maximum number of machines that can be active at all times, max_machines.

Let’s we start an ElasticMachineGroup object:

import inductiva

# Configure an elastic machine group to start with a minimum of 1 machine up to a
# maximum of 3, each with a data disk of 30 Gb.
elastic_machine_group = inductiva.resources.ElasticMachineGroup(
    machine_type="c2-standard-30", min_machines=1,
    max_machines=3, data_disk_gb=30)

# Launch the Elastic machine group to make it available to run simulations:
elastic_machine_group.start()

Once started, simulations can be submitted to the queue of the elastic machine group.

To explore our elastic machine group, we will follow the example of running multiple simulations in parallel, based on the templating mechanism built in the Inductiva API, but now with a scalable infrastructure.

import inductiva

# Download the input files for the SWASH simulation
template_dir = inductiva.utils.download_from_url(
    "https://storage.googleapis.com/inductiva-api-demo-files/"
    "swash-template-example.zip", unzip=True)

# Initialize the SWASH simulator
swash = inductiva.simulators.SWASH()

# Explore the simulation for different water levels
water_levels_list = [3.5, 3.75, 4.0, 4.5, 5.0]

# Launch multiple simulations
for i, water_level in enumerate(water_levels_list):
    target_dir = f"./inductiva_input/swash-sim-{i}"  
    inductiva.TemplateManager.render_dir(
                            source_dir=template_dir,
                            target_dir=target_dir,
                            water_level=water_level,
                            overwrite=False)

    # Run the simulation on the dedicated MachineGroup
    task = swash.run(input_dir=target_dir,
                    sim_config_filename="input.sws",
                    on=elastic_machine_group)

As our simulations are submitted to the queue of the elastic machine group, we will use the CLI to check the status of our simulations and the scaling of the resources.

TIP 1 Open two terminal windows to monitor both the simulations and resources simultaneously.

TIP 2 With the watch method available in some OS’s, one can repeatedly apply the CLI commands to monitor the tasks and the resources.

To track the active resources we use

$ watch inductiva resources ls

and for our last 5 tasks submitted, we tracked them with

$ watch inductiva tasks list -n 5

Right after launch, the elastic machine group starts with a single active machine and picks up one simulation:

Resources monitoring

Every 2.0s: inductiva resources ls

Active Resources:

       NAME                                MACHINE TYPE         ELASTIC         TYPE           # MACHINES         DATA SIZE IN GB         SPOT         STARTED AT (UTC)
       api-t83rivz4yeh29k3hy4dofa38d       c2-standard-30       True            standard       1/3                30                      False        07 Feb, 11:47:20

Tasks monitoring

Every 2.0s: inductiva tasks list                                                                                                                   
            
       ID                              SIMULATOR         STATUS          SUBMITTED              STARTED                COMPUTATION TIME         RESOURCE TYPE
       r4kerxf4b53krgn0s3fyece3b       swash             submitted       07 Feb, 11:47:49       n/a                    n/a                      n/a
       j9qzrpiohgt7x97od3tw4wccd       swash             submitted       07 Feb, 11:47:48       n/a                    n/a                      n/a
       iqi71gonoacfj7fknox3rvnq2       swash             submitted       07 Feb, 11:47:46       n/a                    n/a                      n/a
       dxmnxdrfrv84pfbzbvm9v0dat       swash             submitted       07 Feb, 11:47:44       n/a                    n/a                      n/a
       bgtwgnnyq5qa5hecegzdx6okr       swash             started         07 Feb, 11:47:42       07 Feb, 11:48:08       *0:01:50                 c2-standard-30

Notice that on the tasks, there are 4 simulations submitted and waiting on the queue of the elastic machine group. With the awareness of a non-empty queue, after 2 minutes another machine becomes active and after 4 minutes, the elastic machine group is fully active and three tasks are running simultaneously:

Resources monitoring

Active Resources:

       NAME                                MACHINE TYPE         ELASTIC         TYPE           # MACHINES         DATA SIZE IN GB         SPOT         STARTED AT (UTC)
       api-t83rivz4yeh29k3hy4dofa38d       c2-standard-30       True            standard       3/3                30                      False        07 Feb, 11:47:20

Tasks monitoring

Every 2.0s: inductiva tasks list                                                                                                     Ivans-MacBook-Air.local: Wed Feb  7 11:54:12 2024


       ID                              SIMULATOR         STATUS          SUBMITTED              STARTED                COMPUTATION TIME         RESOURCE TYPE
       r4kerxf4b53krgn0s3fyece3b       swash             submitted       07 Feb, 11:47:49       n/a                    n/a                      n/a
       j9qzrpiohgt7x97od3tw4wccd       swash             submitted       07 Feb, 11:47:48       n/a                    n/a                      n/a
       iqi71gonoacfj7fknox3rvnq2       swash             started         07 Feb, 11:47:46       07 Feb, 11:52:44       *0:01:31                 c2-standard-30
       dxmnxdrfrv84pfbzbvm9v0dat       swash             started         07 Feb, 11:47:44       07 Feb, 11:50:27       *0:03:50                 c2-standard-30
       bgtwgnnyq5qa5hecegzdx6okr       swash             started         07 Feb, 11:47:42       07 Feb, 11:48:08       *0:06:10                 c2-standard-30

Now, as the tasks finish running, the still active machine pick-up another task from the queue until all are completed. When all are complete the elastic machine group will start scaling down until it stays with the min_machines, in our case, 1 machine.

Tasks monitoring

Every 2.0s: inductiva tasks list                                                                                        

       ID                              SIMULATOR         STATUS         SUBMITTED              STARTED                COMPUTATION TIME         RESOURCE TYPE
       r4kerxf4b53krgn0s3fyece3b       swash             success        07 Feb, 11:47:49       07 Feb, 12:00:55       0:10:29                  c2-standard-30
       j9qzrpiohgt7x97od3tw4wccd       swash             success        07 Feb, 11:47:48       07 Feb, 11:58:10       0:10:03                  c2-standard-30
       iqi71gonoacfj7fknox3rvnq2       swash             success        07 Feb, 11:47:46       07 Feb, 11:52:44       0:10:02                  c2-standard-30
       dxmnxdrfrv84pfbzbvm9v0dat       swash             success        07 Feb, 11:47:44       07 Feb, 11:50:27       0:10:20                  c2-standard-30
       bgtwgnnyq5qa5hecegzdx6okr       swash             success        07 Feb, 11:47:42       07 Feb, 11:48:08       0:09:54                  c2-standard-30

Resources monitoring

Every 2.0s: inductiva resources ls                                                                                         

Active Resources:

       NAME                                MACHINE TYPE         ELASTIC         TYPE           # MACHINES         DATA SIZE IN GB         SPOT         STARTED AT (UTC)
       api-t83rivz4yeh29k3hy4dofa38d       c2-standard-30       True            standard       3/3                30                      False        07 Feb, 11:47:20

The elastic machine group plays the trade-off between the idle time of resources and the queue time of the simulations without extra configuration by the user.

At the end, there will always be the number of min_machines active, therefore, don’t forget to terminate your resources with:

$ inductiva resources terminate --all -y