Overview
Stalwart's spam classifier is a statistical learning system that makes explicit, auditable decisions about whether a message should be treated as spam or ham. At its core, the classifier is a linear machine-learning model trained using logistic regression. Linear models provide predictable behaviour, fast inference, and well-understood convergence properties, all of which matter for a mail system that must operate continuously and at scale.
Training
The spam classifier is trained periodically to incorporate newly collected samples into the model. By default, training runs every 12 hours, though the interval can be adjusted to better match the size, load, and dynamics of the deployment. Shorter intervals allow faster adaptation to new patterns; longer intervals reduce computational overhead.
Learning
In addition to explicitly labelled training samples provided by users or administrators, Stalwart supports automatic learning. Automatic learning allows the classifier to incorporate new samples without direct user intervention, but only under conditions where the system has high confidence in the correctness of the label. The aim is to accelerate adaptation to new patterns while minimising the risk of reinforcing incorrect classifications.
Hyperparameters
Hyperparameters are configuration values that control how the classifier model is structured and how learning is performed. Unlike model parameters, which are learned from data during training, hyperparameters are fixed in advance and determine properties such as the size of the feature space, the behaviour of the optimiser, and how input features are scaled. Correct hyperparameter choices are essential for stable training and reliable classification.