Learning stops collecting data due to storage space



Environment

  • vFunction Server version 3.0 and later
  • vFunction Server running on a Linux VM

Issue

This issue occurs in the following circumstances:

  1. An organization starts Learning using an existing Measurement to test specific workflows that have not been previously tested
  2. Unexpectedly, the count of Resources, Functions and Domains does not grow in the vFunction Server UI
  3. A review of the vfunction-vfapi-measurement Container’s logs shows the following warning:
22/03/2024 14:35:30.192 [measurements] 
  {"level":"warn",
  "s":"measurements",
  "time":"2024-03-22T14:35:30Z",
  "caller":"/src/vfapi/services/measurements/app/router.go:2318",
  "message":"skipping part due to storage size for measurement 51edd6eb-854f-4d82-b331-688cf6828ef4 is too large at 1123044775"}

22/03/2024 14:35:30.279 [measurements] 
  {"level":"warn",
  "s":"measurements",
  "time":"2024-03-22T14:35:30Z",
  "caller":"/src/vfapi/services/measurements/app/router.go:2359",
  "message":"storage size for measurement 51edd6eb-854f-4d82-b331-688cf6828ef4 is too large at 1123044775"}
  1. After SSH’ing to the vFunction Server’s VM, the /var/lib/docker/volumes/vfapi_measurements_storage_vol/_data/data/MEASUREMENT_UUID/ directory shows a significant amount of *.jsonl files from an older timestamp than when the new Learning was started that, unexpectedly, were not cleaned up automatically

Resolution

This issue will be resolved in vFunction Server version 3.5 and later. This version will ensure that, even in edge case scenarios, *.jsonl files will not accumulate in the Measurement directory and cause an excessive amount of storage space to be used unnecessarily. If already in this state, however, the existing Measurement will need to be re-imported to allow for additional Learning to occur.

Take the following steps to resolve this issue:

  1. Upgrade the vFunction Server to version 3.5 and later
  2. Download the Measurement that is in a problematic state
  3. Import the downloaded Measurement to create a new Measurement UUID with the same Domains and Learning data
  4. Start Learning again to gather testing results on the Resources, Functions and Domains that were not processed after the Storage Space was exceeded