>>> Join Us in the Fight Against Air Pollution

AirGradient Forum

Export raw data in original sampling frequency

Hi all,

If I connect AirGradient to my PC and use the Arduino IDE Serial Monitor, I observe measures being read quite frequently (maybe every second or even less?).

However, if I try to use the dashboard export method to retrieve my time series traces (by selecting “raw”), I rather get the data sampled at approximately one measure per minute (± some seconds).

Is there any way I can retrieve my time series data in their original sampling frequencies?

Related to that, how is the data downsampled to generate the traces exported from the dashboard?

Around 20 measurements are averaged in the firmware before being sent to the server approx. every minute. So minutely values are the highest resolution we save.

Hi Achim, thanks for clarifying. The info about 20 measurements average was quite helpful for me to find the respective implementation in the FW code.

However, as far as I could understand the code, such average is only implemented for outdoor monitors with two PM sensors available. For other configurations, it rather seems that raw values are sent to the server every 60 s (without any average) - and the remaining 59 s of data are only used for online logs (Serial.printf)?

Is this intended? Or maybe I have misunderstood something here?
Below are relevant links to the code that led me to this conclusion:

@Achim_AirGradient , slightly related to this: I was just trying to export once again my raw data from the dashboard, but now I’m unable to go past the date 2024-08-21. Is there any unspecified limit on number of past days that can be exported? If I select start date 2024-08-15, the resulting .csv will only span until 21th and ignore any days before that (without any notification to me).

For other selections of sampling frequency (e.g. 5 min), it does work as expected though. But then I don’t have access to the actual raw data, as this seems to rather be averaged data over 5 minutes windows.

We currently keep only 10 days of raw data in the system. You can see the data here:

https://app.airgradient.com/settings/place

We can extent this if you need longer period of raw data.

@MallocArray , now I’m curious to know how this works with Home Assistant / ESPHome:

(a) are we able to get AG data in their original sampling frequencies streamed to the local server?

(b) is there any setting to only keep the most recent data or are users by default able to keep all their data stored?

(a) are we able to get AG data in their original sampling frequencies streamed to the local server?

Yes, but you’ll need to modify the component YAMLs to adjust update_interval and such (e.g. just like I did.) But you can indeed observe, handle and publish to wherever else every single sample from the sensors.

(b) is there any setting to only keep the most recent data or are users by default able to keep all their data stored?

What do you mean exactly? ESPHome firmware does not keep storage of data internally. How much data is stored by HA depends on HA’s configuration (search for recorder.) You can also send the data to separate databases (such as influxdb) directly too.

1 Like

@nagisa nailed it

You can adjust the update_interval on a per-sensor basis if needed, although with the packages configuration, you may need to copy them locally, as nagisa did.

Thank you both for the further clarification and also example provided. Now I see that to add a value to the choice of Home Assistant with ESPHome. :slight_smile:

In general, what would you suggest in terms of needed resources to store the data locally (assuming 1Hz frequency for all sensors) for years and multiple monitors? I observed 2.3MB of data for 10 days of recording sampled once per minute. Then likely to expect 13.8 MB of data per day, thus 5GB per year per monitor?

I’m not familiar with influxdb, but I just quickly looked it up, and so it seems to be a time series oriented database provided by a third party. And they have some pricing plans for different cloud based usages, is that right?

In that case, do I still need the local server yo host home assistant (e.g. raspberry pi) as an interface to get data to influxdb? In principle, one should be able to set up the firmware code to stream the data directly to any cloud hosted server or database, correct?

I think for me the ideal scenario would still be to set up such cloud environment to receive the data directly from the AG monitor(s). If that can be achieved somehow with the aid from HA + ESPHome, then it would be great too.

You could do direct streaming. InfluxDB ingests data through a straightforward REST API. You also could install InfluxDB locally (there’s an open source version for local deployments) alongside your HA and have HA write this data for you using an integration. Or add some other intermediate service like Influx’s Telegraf and have it transcribe data from MQTT that ESPHome natively supports if you don’t actually want to maintain HA or figure out REST integrations yourself.

Hardware sizing guidelines | InfluxDB Enterprise Documentation has some guidance on the storage consumption. As a rule of thumb you can guesstimate at most 6 bytes for each observation and the rest is negligible.

There are also other time series databases that may be more palatable for you. However ultimately this data is time-series data, so using a tool appropriate for this data is likely to lead to least headaches.

1 Like

@Achim_AirGradient

I‘ve seen that also in the newest firmware averaged values are still only computed for monitors with two PM-sensors.
Is this intended?
And if so: why?
I agree with @Guilherme that with the current implementation only the „now“-value at second 60 is stored and the values for second 1 until 59 are discarded.
Thanks.

Yes. This will be implemented. There is already a GitHub issue on this.

1 Like