PhACE¶

PhACE is a physics-inspired equivariant neural network architecture. Compared to, for example, MACE and GRACE, it uses a geometrically motivated basis and a fast and elegant tensor product implementation. The tensor product used in PhACE leverages a equivariant representation that differs from the typical spherical one. You can read more about it here: https://pubs.acs.org/doi/10.1021/acs.jpclett.4c02376.

Installation¶

To install this architecture along with the metatrain package, run:

pip install metatrain[phace]

where the square brackets indicate that you want to install the optional dependencies required for phace.

Default Hyperparameters¶

The description of all the hyperparameters used in phace is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: experimental.phace
  model:
    max_correlation_order_per_layer: 3
    num_message_passing_layers: 2
    cutoff: 5.0
    cutoff_width: 1.0
    num_element_channels: 128
    force_rectangular: false
    spherical_linear_layers: false
    radial_basis:
      max_eigenvalue: 25.0
      scale: 0.7
      optimizable_lengthscales: false
    nu_scaling: 0.1
    mp_scaling: 0.1
    overall_scaling: 1.0
    disable_nu_0: true
    use_sphericart: false
    head_num_layers: 1
    heads: {}
    zbl: false
  training:
    compile: true
    distributed: false
    distributed_port: 39591
    batch_size: 8
    num_epochs: 1000
    learning_rate: 0.01
    warmup_fraction: 0.01
    gradient_clipping: null
    log_interval: 1
    checkpoint_interval: 25
    scale_targets: true
    atomic_baseline: {}
    fixed_scaling_weights: {}
    num_workers: null
    per_structure_targets: []
    log_separate_blocks: false
    log_mae: false
    best_model_metric: rmse_prod
    loss: mse

Tuning hyperparameters¶

The default hyperparameters above will work well in most cases, but they may not be optimal for your specific use case. There is good number of parameters to tune, both for the model and the trainer. Here, we provide a list of the parameters that are in general the most important (in decreasing order of importance) for the PhACE architecture:

ModelHypers.radial_basis: RadialBasisHypers = {'max_eigenvalue': 25.0, 'optimizable_lengthscales': False, 'scale': 0.7}

Hyperparameters for the radial basis functions.

Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.

ModelHypers.num_element_channels: int = 128

Number of channels per element.

This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.

TrainerHypers.num_epochs: int = 1000

Number of epochs to train the model.

A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.

TrainerHypers.batch_size: int = 8

Batch size for training.

Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.

ModelHypers.num_message_passing_layers: int = 2

Number of message passing layers.

Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.

TrainerHypers.learning_rate: float = 0.01

Learning rate for the optimizer.

You can try to increase this value (e.g., to 0.02 or 0.03) if training is very slow or decrease it (e.g., to 0.005 or less) if you see that training explodes in the first few epochs.

ModelHypers.cutoff: float = 5.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.force_rectangular: bool = False

Makes the number of channels per irrep the same.

This might improve accuracy with a limited increase in computational cost.

ModelHypers.spherical_linear_layers: bool = False: Whether to perform linear layers in the spherical representation.

Model hyperparameters¶

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.max_correlation_order_per_layer: int = 3¶

Maximum correlation order per layer.

ModelHypers.num_message_passing_layers: int = 2¶

Number of message passing layers.

Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.

ModelHypers.cutoff: float = 5.0¶

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.cutoff_width: float = 1.0¶

Width of the cutoff smoothing function.

ModelHypers.num_element_channels: int = 128¶

Number of channels per element.

This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.

ModelHypers.force_rectangular: bool = False¶

Makes the number of channels per irrep the same.

This might improve accuracy with a limited increase in computational cost.

ModelHypers.spherical_linear_layers: bool = False¶

Whether to perform linear layers in the spherical representation.

ModelHypers.radial_basis: RadialBasisHypers = {'max_eigenvalue': 25.0, 'optimizable_lengthscales': False, 'scale': 0.7}¶

Hyperparameters for the radial basis functions.

Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.

ModelHypers.nu_scaling: float = 0.1¶

Scaling for the nu term.

ModelHypers.mp_scaling: float = 0.1¶

Scaling for message passing.

ModelHypers.overall_scaling: float = 1.0¶

Overall scaling factor.

ModelHypers.disable_nu_0: bool = True¶

Whether to disable nu=0.

ModelHypers.use_sphericart: bool = False¶

Whether to use spherical Cartesian coordinates.

ModelHypers.head_num_layers: int = 1¶

Number of layers in the head.

ModelHypers.heads: dict[str, Literal['linear', 'mlp']] = {}¶

Heads to use in the model, with options being “linear” or “mlp”.

ModelHypers.zbl: bool = False¶

Whether to use the ZBL potential in the model.

Trainer hyperparameters¶

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.compile: bool = True¶

Whether to use torch.compile during training.

This can lead to significant speedups, but it will cause a compilation step at the beginning of training which might take up to 5-10 minutes, mainly depending on max_eigenvalue.

TrainerHypers.distributed: bool = False¶

Whether to use distributed training.

TrainerHypers.distributed_port: int = 39591¶

Port for DDP communication.

TrainerHypers.batch_size: int = 8¶

Batch size for training.

Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.

TrainerHypers.num_epochs: int = 1000¶

Number of epochs to train the model.

A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.

TrainerHypers.learning_rate: float = 0.01¶

Learning rate for the optimizer.

You can try to increase this value (e.g., to 0.02 or 0.03) if training is very slow or decrease it (e.g., to 0.005 or less) if you see that training explodes in the first few epochs.

TrainerHypers.warmup_fraction: float = 0.01¶

Fraction of training steps for learning rate warmup.

TrainerHypers.gradient_clipping: float | None = None¶

Gradient clipping value. If None, no clipping is applied.

TrainerHypers.log_interval: int = 1¶

Interval to log metrics during training.

TrainerHypers.checkpoint_interval: int = 25¶

Interval to save model checkpoints.

TrainerHypers.scale_targets: bool = True¶

Whether to scale targets during training.

TrainerHypers.atomic_baseline: dict[str, float | dict[int, float]] = {}¶

The baselines for each target.

By default, metatrain will fit a linear model (CompositionModel) to compute the least squares baseline for each atomic species for each target.

However, this hyperparameter allows you to provide your own baselines. The value of the hyperparameter should be a dictionary where the keys are the target names, and the values are either (1) a single baseline to be used for all atomic types, or (2) a dictionary mapping atomic types to their baselines. For example:

atomic_baseline: {"energy": {1: -0.5, 6: -10.0}} will fix the energy baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while fitting the baselines for the energy of all other atomic types, as well as fitting the baselines for all other targets.

atomic_baseline: {"energy": -5.0} will fix the energy baseline for all atomic types to -5.0.

atomic_baseline: {"mtt:dos": 0.0} sets the baseline for the “mtt:dos” target to 0.0, effectively disabling the atomic baseline for that target.

This atomic baseline is substracted from the targets during training, which avoids the main model needing to learn atomic contributions, and likely makes training easier. When the model is used in evaluation mode, the atomic baseline is added on top of the model predictions automatically.

Note

This atomic baseline is a per-atom contribution. Therefore, if the property you are predicting is a sum over all atoms (e.g., total energy), the contribution of the atomic baseline to the total property will be the atomic baseline multiplied by the number of atoms of that type in the structure.

Note

If a MACE model is loaded through the mace_model hyperparameter, the atomic baselines in the MACE model are used by default for the target indicated in mace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.

TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}¶

Fixed scaling weights for the model.

TrainerHypers.num_workers: int | None = None¶

Number of workers for data loading.

TrainerHypers.per_structure_targets: list[str] = []¶

List of targets to calculate per-structure losses.

TrainerHypers.log_separate_blocks: bool = False¶

Whether to log per-block error during training.

TrainerHypers.log_mae: bool = False¶

Whether to log MAE alongside RMSE during training.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'¶

Metric used to select the best model checkpoint.

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'¶

Loss function used for training.