PhACE¶
PhACE is a physics-inspired equivariant neural network architecture. Compared to, for example, MACE and GRACE, it uses a geometrically motivated basis and a fast and elegant tensor product implementation. The tensor product used in PhACE leverages a equivariant representation that differs from the typical spherical one. You can read more about it here: https://pubs.acs.org/doi/10.1021/acs.jpclett.4c02376.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[phace]
where the square brackets indicate that you want to install the optional
dependencies required for phace.
Default Hyperparameters¶
The description of all the hyperparameters used in phace is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: experimental.phace
model:
max_correlation_order_per_layer: 3
num_message_passing_layers: 2
cutoff: 5.0
cutoff_width: 1.0
num_element_channels: 128
force_rectangular: false
spherical_linear_layers: false
radial_basis:
max_eigenvalue: 25.0
scale: 0.7
optimizable_lengthscales: false
nu_scaling: 0.1
mp_scaling: 0.1
overall_scaling: 1.0
disable_nu_0: true
use_sphericart: false
head_num_layers: 1
heads: {}
zbl: false
training:
compile: true
distributed: false
distributed_port: 39591
batch_size: 8
num_epochs: 1000
learning_rate: 0.01
warmup_fraction: 0.01
gradient_clipping: null
log_interval: 1
checkpoint_interval: 25
scale_targets: true
atomic_baseline: {}
fixed_scaling_weights: {}
num_workers: null
per_structure_targets: []
log_separate_blocks: false
log_mae: false
best_model_metric: rmse_prod
loss: mse
Tuning hyperparameters¶
The default hyperparameters above will work well in most cases, but they may not be optimal for your specific use case. There is good number of parameters to tune, both for the model and the trainer. Here, we provide a list of the parameters that are in general the most important (in decreasing order of importance) for the PhACE architecture:
- ModelHypers.radial_basis: RadialBasisHypers = {'max_eigenvalue': 25.0, 'optimizable_lengthscales': False, 'scale': 0.7}
Hyperparameters for the radial basis functions.
Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.
- ModelHypers.num_element_channels: int = 128
Number of channels per element.
This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.
- TrainerHypers.num_epochs: int = 1000
Number of epochs to train the model.
A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.
- TrainerHypers.batch_size: int = 8
Batch size for training.
Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.
- ModelHypers.num_message_passing_layers: int = 2
Number of message passing layers.
Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.
- TrainerHypers.learning_rate: float = 0.01
Learning rate for the optimizer.
You can try to increase this value (e.g., to 0.02 or 0.03) if training is very slow or decrease it (e.g., to 0.005 or less) if you see that training explodes in the first few epochs.
- ModelHypers.cutoff: float = 5.0
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- ModelHypers.force_rectangular: bool = False
Makes the number of channels per irrep the same.
This might improve accuracy with a limited increase in computational cost.
- ModelHypers.spherical_linear_layers: bool = False
Whether to perform linear layers in the spherical representation.
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
- ModelHypers.num_message_passing_layers: int = 2¶
Number of message passing layers.
Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.
- ModelHypers.cutoff: float = 5.0¶
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- ModelHypers.num_element_channels: int = 128¶
Number of channels per element.
This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.
- ModelHypers.force_rectangular: bool = False¶
Makes the number of channels per irrep the same.
This might improve accuracy with a limited increase in computational cost.
- ModelHypers.spherical_linear_layers: bool = False¶
Whether to perform linear layers in the spherical representation.
- ModelHypers.radial_basis: RadialBasisHypers = {'max_eigenvalue': 25.0, 'optimizable_lengthscales': False, 'scale': 0.7}¶
Hyperparameters for the radial basis functions.
Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.compile: bool = True¶
Whether to use torch.compile during training.
This can lead to significant speedups, but it will cause a compilation step at the beginning of training which might take up to 5-10 minutes, mainly depending on
max_eigenvalue.
- TrainerHypers.batch_size: int = 8¶
Batch size for training.
Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.
- TrainerHypers.num_epochs: int = 1000¶
Number of epochs to train the model.
A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.
- TrainerHypers.learning_rate: float = 0.01¶
Learning rate for the optimizer.
You can try to increase this value (e.g., to 0.02 or 0.03) if training is very slow or decrease it (e.g., to 0.005 or less) if you see that training explodes in the first few epochs.
- TrainerHypers.gradient_clipping: float | None = None¶
Gradient clipping value. If None, no clipping is applied.
- TrainerHypers.atomic_baseline: dict[str, float | dict[int, float]] = {}¶
The baselines for each target.
By default,
metatrainwill fit a linear model (CompositionModel) to compute the least squares baseline for each atomic species for each target.However, this hyperparameter allows you to provide your own baselines. The value of the hyperparameter should be a dictionary where the keys are the target names, and the values are either (1) a single baseline to be used for all atomic types, or (2) a dictionary mapping atomic types to their baselines. For example:
atomic_baseline: {"energy": {1: -0.5, 6: -10.0}}will fix the energy baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while fitting the baselines for the energy of all other atomic types, as well as fitting the baselines for all other targets.
atomic_baseline: {"energy": -5.0}will fix the energy baseline for all atomic types to -5.0.
atomic_baseline: {"mtt:dos": 0.0}sets the baseline for the “mtt:dos” target to 0.0, effectively disabling the atomic baseline for that target.This atomic baseline is substracted from the targets during training, which avoids the main model needing to learn atomic contributions, and likely makes training easier. When the model is used in evaluation mode, the atomic baseline is added on top of the model predictions automatically.
Note
This atomic baseline is a per-atom contribution. Therefore, if the property you are predicting is a sum over all atoms (e.g., total energy), the contribution of the atomic baseline to the total property will be the atomic baseline multiplied by the number of atoms of that type in the structure.
Note
If a MACE model is loaded through the
mace_modelhyperparameter, the atomic baselines in the MACE model are used by default for the target indicated inmace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.
- TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}¶
Fixed scaling weights for the model.
- TrainerHypers.per_structure_targets: list[str] = []¶
List of targets to calculate per-structure losses.
- TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'¶
Metric used to select the best model checkpoint.
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'¶
Loss function used for training.