📄️ Overview
Stalwart’s spam classifier is a statistical learning system designed to make explicit, auditable decisions about whether a message should be treated as spam or ham. At its core, the classifier is a linear machine learning model trained using logistic regression. Linear models provide predictable behavior, fast inference, and well-understood convergence properties, which are essential for a mail system that must operate continuously and at scale.
📄️ Training
The spam classifier is trained periodically to incorporate newly collected training samples into the model. By default, training is performed every 12 hours, although this interval can be adjusted by the administrator to better suit the size, load, and dynamics of the deployment. Shorter intervals allow faster adaptation to new patterns, while longer intervals reduce computational overhead.
📄️ Learning
In addition to explicitly labeled training samples provided by users or administrators, Stalwart supports automatic learning. Automatic learning allows the classifier to incorporate new training samples without direct user intervention, but only under conditions where the system has high confidence in the correctness of the label. The purpose of this mechanism is to accelerate adaptation to new patterns while minimizing the risk of reinforcing incorrect classifications.
📄️ Hyperparameters
Hyperparameters are configuration values that control how the model is structured and how learning is performed. Unlike model parameters, which are learned from data during training, hyperparameters are fixed in advance and determine properties such as the size of the feature space, the behavior of the optimizer, and how input features are scaled. Correct hyperparameter choices are essential for stable training and reliable classification behavior.