Fig. 1

Overview of the steps taken for development and validation of the proposed prognostic models and their respective datasets. The Study of Thoracic CT in COVID-19 (STOIC) challenge dataset consisted of one public set and four private sets used for several training (T) and validation (V) steps. Per set, the amount of samples is given with the number of covid-positive patients in parentheses. The icovid dataset consisted of covid-positive patients acquired at Universitair Ziekenhuis Brussel (UZB), Universitätsklinikum Heidelberg (UKHD) and Centre Hospitalier Universitaire de Liège (CHUL). In the latter, three-month follow-up (FU3) data were available for a subset of 149 patients. For all images, the lungs and lung lesions were segmented and handcrafted features were extracted. This is omitted from the scheme to avoid needlessly convoluting the figure. Development focused on one-month severity and three-month symptomatology. For the latter, preliminary tests were performed in a three-fold cross-validation and on a holdout set created from the subset with three-month follow-up at CHUL. For the one-month severity prediction, both a deep model and a logistic regression exploiting handcrafted features were developed in a four-fold cross-validation and tested on a holdout set created from the public STOIC data. For the logistic regression, five feature sets achieved the same area under the curve (AUC) of the receiver operating characteristic (ROC) in the four-fold cross-validation. The combination of features with the highest AUC on the holdout set was selected to continue. Next, an ensemble model was created, averaging the probabilities predicted by 20 logistic regression models, each trained on 2000 samples selected through sampling with replacement from the public STOIC data. Internal validation was performed through the STOIC challenge and its private datasets. After validating both the deep model and ensembled logistic regression approach on a first private validation set, preference was given to the latter method, which was then tested on a second private set. In the final stage of the challenge, the algrithm was retrained on both training sets and validated on the remaining data. External validation was performed for the ensemble model on the multicentre icovid dataset and through a comparison to the Maastricht University Model 3 (MUM3). In addition to the AUC, the precision-recall curve with average precision (AP) was evaluated. Besides the full set of patients, two subsets acquired in the respective timespans where the delta and omicron variants were most prevalent were considered