Testing the Impact of Hyperparameters

Testing the Impact of Hyperparameters#

We will now give a deeper look at one of the hyperparameters of our simulation that is important due to its influence on both the fidelity of the SPH simulation and the associated computational costs: particle radius. We need to find a value for this hyperparameter that leads to generating a dataset with similar enough characteristics to the one used by Sanchez-Gonzalez et al., while still keeping our cost under control.

So, to be more systematic, we are going to use again Inductiva’s templating mechanism to substitute the numerical value of the particle radius hyperparameter in the .JSON configuration file with templated variable.

Here’s what our templated configuration file looks like, where we’ve kept all hyperparameters fixed except for the particle radius:

"Configuration": {
    "stopAt": 1,
    "timeStepSize": 0.001,
    "particleRadius": {{ particle_radius | default(0.008) }},
    "simulationMethod": 4,
    "boundaryHandlingMethod": 0,
    "kernel": 1,
    "cflMethod": 1,
    "cflFactor": 0.5,
    "cflMinTimeStepSize": 0.0001,
    "cflMaxTimeStepSize": 0.005,
    "gravitation": [0, 0, -9.81],
    "gradKernel": 1,
    "enableVTKExport": true,
    "dataExportFPS": 60,
    "particleAttributes": "velocity;density"
}

Now, let’s save the .JSON file in the local directory, within the download folder, to prepare for running four parallel simulation, for four different particle radius: 0.01, 0.008, 0.006, and 0.004 (meters).


import inductiva

# Launch a machine group with four c3-standard-4
machine_group = inductiva.resources.MachineGroup("c3-standard-4")
machine_group.start()

# Assuming the template folder was downloaded to the local directory,
# set the path to it.
template_dir = "./splishsplash-template-dir"

# Define the radii of the particles
particle_radii = [0.01, 0.008, 0.006, 0.004]

tasks_list = []

for n, radius in enumerate(particle_radii, start=1):
    # Define the directory where the rendered templates will appear filled 
    # with the values of the variables defined below.
    target_dir = f"splishsplash-hyperparameter-search_{n}"
    inductiva.TemplateManager.render_dir(
                            source_dir=template_dir,
                            target_dir=target_dir,
                            particle_radius=radius,
                            overwrite=False)
    
    task = SPlisHSPlasH.run(input_dir=target_dir,
                            sim_config_filename="config.json",
                            on=machine_group)
    tasks_list.append(task)

For each particle size, our API creates a new folder, updates the settings with the new size, and starts the simulation. This lets us run all four simulations in parallel on 4 different VMs making it faster to see how changing the particle size affects the results.

While we wait for each simulation to complete, we can track their progress and collect the results afterwards using our Command Line Interface (CLI):

# Monitor the status of the tasks every 10 seconds
$ inductiva task list -n 4 --watch 10

# Download the results of the tasks
$ inductiva storage list -m 4

Checking the volume of data produced#

The table below shows how different particle radius affect the total number of particles and the amount of data generated by the simulation (4 seconds of simulated time). As we decrease the particle radius, we need more particles to fill in the corresponding volumes in the simulation, increasing the amount of data we generate. Specifically, halving the particle radius results in an eightfold increase in the number of particles, as expected.

Table 1. Number of particles and total size of simulation output for each particle radius.

Particle Radius

Total N. of Particles

MB of data produced

0.01

15625

166

0.008

29791

213

0.006

73720

525

0.004

244584

1760

For reference, the dataset produced by Sanchez-Gonzalez et al. was based on simulations with about 8k to 25k particles, so it seems that for creating a comparable dataset we don’t need that particle radius to go below 0.008. Also, if each simulation is producing hundreds of MB of data, then we probably don’t want smaller particle radius anyway, because we it would be challenging to store the data produced by thousands of simulation (let alone training the GNNs with all that data).

Checking costs (again)#

Reducing the particle radius naturally results in a higher number of particles required for the simulation to run, and this demands more computing power. We will compare two different compute options of the c3 family available from Google Cloud, and that represent two ends of the spectrum in terms of compute power and costs:

  • c3-standard-4 at $0.23 per hour

  • c3-standard-88 at $5.053 per hour

The tables below show how much it takes to run a simulation and its corresponding cost on the two types of hardware for the four particle radius being considered.

Table 2. Simulation runtimes and costs on a c3-standard-4 machine, priced at $0.23 per hour, as performed via the Inductiva API.

Particle Radius

Time to run

Cost in $

0.01

16m31s

0.06

0.008

27m11s

0.10

0.006

56m59s

0.22

0.004

6h11m27s

1.42

Table 3. Simulation runtimes and costs on a c3-standard-88 machine, priced at $5.053 per hour, as performed via the Inductiva API.

Particle Radius

Time to run

Cost in $

0.01

3m27s

0.29

0.008

5m16s

0.44

0.006

8m20s

0.70

0.004

1h25m04s

7.16

Obviously, as we increase the number of particles by reducing their radius, the computation times grow very quickly. Most importantly, we seem to be hitting a wall for particle radius of 0.004 where the number of particle is so high that we may even be hitting some RAM limits for these machines.

Given these performance and cost numbers, it is reasonable to commit to choosing particle radius of 0.01 for the data generation process.

The issue now becomes: which specific VM type should we use? This is quite relevant because if we are running 10k simulations or more, we will be spending hundreds to thousands of dollars, and the different between hundreds or thousands is basically up to our choice of VM (and how long we are willing to wait).

We only actually tested machines of the c3 family: are there other options that can do the job in reasoable time for even less money? Let’s find that out in the next part of this tutorial.