PMS5003 : CF=1 or CF=ATM data?

Guilherme · July 14, 2024, 6:53pm

Hello, based on a discussion in the PurpleAir Forum (What is the Difference Between CF=1, ATM, and ALT? - API - PurpleAir Community) , I’ve recently learned about the different data series available by the PMS5003 sensor, namely CF=1 and CF=ATM.

Now, I was wondering which series is used in the Air Gradient (indoor and outdoor) monitors. Based on its documented correction formula (link), I could only find “raw” PM readings, but I was not sure what that referred to.

It seems that the background wildfire document (link) would be using CF=ATM for that same formula, but their reasoning was not entirely clear to me: “Although originally developed on the cf_1 data it is being implemented on the cf_atm data since 10-minute averages are not currently available for the cf_1 data.”

Therefore, I just wanted to clarify which one is used in the Air Gradient monitors. If there is some internal documentation pointing to these details that I may have missed, I’d be glad to read more about it too.

Guilherme · July 18, 2024, 8:09pm

@Achim_AirGradient - I still wanted to investigate the firmware open-source code in the GitHub repo, but haven’t found time for that yet, unfortunately. So just to confirm - do you think this is some setting one could simply change in the code too? For example, changing if I want to read CF=1 or CF=ATM data out of the PMS5003 sensor?

Achim_AirGradient · July 19, 2024, 4:19am

You can change the firmware and flash the adjusted firmware back to the device.
Currently we expose only CF=ATM because this is the more commonly used one.
We are aware of the problems with the plantower algorithms and are in the process to develop better ones which we will then apply on top of CF=ATM.

Guilherme · August 18, 2024, 7:20pm

Similar to what I’ve just described for CO2 altitude compensation (Altitude compensation for CO2 - #12 by Guilherme), I also took the opportunity to add some changes into the firmware code to visualize multiple variations of the PM 2.5 estimation. And will also share below how / what I did, in case someone else would also like to try this.

As I didn’t want to break the integration with the AirGradient app and dashboard (nor with APIs), I tried to use the fields already available to store all estimations. As for now I’m not using the other PM variants in the AG dashboard, I use all of them for this purpose here.

Below is how my updatePm function in OneOpenAir.ino (arduino/examples/OneOpenAir/OneOpenAir.ino at master · airgradienthq/arduino · GitHub) looks like after incorporating following estimations:

CF=1 (also available from PMS5003)
CF=ATM (default AirGradient values)
EPA correction (as in Correction Algorithms)
CF=3.4 (as proposed by Dr. Lance Wallace: link1, link2):

static void updatePm(void) {
  if (ag->isOne()) {
    if (ag->pms5003.isFailed() == false) {

      /* PM 2.5 - CF = 1*/
      measurements.pm01_1 = ag->pms5003.getRawPm25Ae();

      /* PM2.5 - CF = ATM*/
      measurements.pm25_1 = ag->pms5003.getPm25Ae();

      /* PM 2.5 - EPA correction (https://www.airgradient.com/documentation/correction-algorithms/) */
      /* y={0 ≤ x <30: 0.524x - 0.0862RH + 5.75}
         y={30≤ x <50: (0.786*(x/20 - 3/2) + 0.524*(1 - (x/20 - 3/2)))x -0.0862RH + 5.75}
         y={50 ≤ x <210: 0.786x - 0.0862RH + 5.75}
         y={210 ≤ x <260: (0.69*(x/50 – 21/5) + 0.786*(1 - (x/50 – 21/5)))x - 0.0862RH*(1 - (x/50 – 21/5)) + 2.966*(x/50 – 21/5) + 5.75*(1 - (x/50 – 21/5)) + 8.84*(10^{-4})x^{2}(x/50 – 21/5)}
         y={260 ≤ x: 2.966 + 0.69x + 8.8410^{-4}*x^2} */
      int x = ag->pms5003.getPm25Ae();
      float RH = ag->sht.getRelativeHumidity();

      if ((x >= 0) && (x < 30)) {
        measurements.pm10_1 = (0.524 * x) - (0.0862 * RH) + 5.75;
      } else if (x < 50) {
        measurements.pm10_1 = (
          ((0.786 * (x/20 - 3/2)) + (0.524 * (1 - (x/20 - 3/2)))) * x
          - (0.0862 * RH) + 5.75
        );
      } else if (x < 210) {
        measurements.pm10_1 = (0.786 * x) - (0.0862 * RH) + 5.75;
      } else if (x < 260) {
        measurements.pm10_1 = (
          ((0.69 * (x/50 - 21/5)) + (0.786 * (1 - (x/50 - 21/5)))) * x
          - (0.0862 * RH * (1 - (x/50 - 21/5)))
          + (2.966 * (x/50 - 21/5))
          + (5.75 * (1 - (x/50 - 21/5)))
          + (8.84 * pow(10, -4) * pow(x, 2) * (x/50 - 21/5))
        );
      } else {
        measurements.pm10_1 = 2.966 + (0.69 * x) + (pow(8.8410, -4) * pow(x, 2));
      }

      /* PM 2.5 = 3(0.00030418*N1 + 0.0018512*N2 + 0.02069706*N3) */
      /* where N1, N2, and N3 are the number of particles
      per deciliter in the three smallest size
      categories 0.3-0.5 µm, 0.5-1 µm, and 1-2.5 µm. */
      measurements.pm03PCount_1 = (
        3.4 * (
          0.00030418 * (ag->pms5003.getPm03ParticleCount() - ag->pms5003.getPm05ParticleCount())  /* N1 */
          + 0.0018512 * (ag->pms5003.getPm05ParticleCount() - ag->pms5003.getPm10ParticleCount())  /* N2 */
          + 0.02069706 * (ag->pms5003.getPm10ParticleCount() - ag->pms5003.getPm25ParticleCount()))  /* N3 */
      );

      Serial.println();
      Serial.printf("PM2.5 (CF=1) ug/m3: %d\r\n", measurements.pm01_1);
      Serial.printf("PM2.5 (CF=ATM) ug/m3: %d\r\n", measurements.pm25_1);
      Serial.printf("PM2.5 (CF=EPA) ug/m3: %d\r\n", measurements.pm10_1);
      Serial.printf("PM2.5 (CF=3.4) ug/m3:: %d\r\n", measurements.pm03PCount_1);
      pmFailCount = 0;

To make this work, we also need to expose more readings from the PMS5003, therefore following scripts also need to be edited:

(a) New definitions in PMS5003.cpp (arduino/src/PMS/PMS5003.cpp at master · airgradienthq/arduino · GitHub):

/* @return int PM2.5 index with CF = 1 PM estimates */
int PMS5003::getRawPm25Ae(void) { return pms.getRaw2_5(); }

/* @return int Get number concentrations over 0.5 um/0.1L */
int PMS5003::getPm05ParticleCount(void) { return pms.getCount0_5(); }

/* @return int Get number concentrations over 1.0 um/0.1L */
int PMS5003::getPm10ParticleCount(void) { return pms.getCount1_0(); }

/* @return int Get number concentrations over 2.5 um/0.1L */
int PMS5003::getPm25ParticleCount(void) { return pms.getCount2_5(); }

(b)and include them in the corresponding .h file (arduino/src/PMS/PMS5003.h at master · airgradienthq/arduino · GitHub):

int getRawPm25Ae(void);
int getPm05ParticleCount(void);
int getPm10ParticleCount(void);
int getPm25ParticleCount(void);

After flashing the code into the AG indoor monitor, and using Arduino IDE 2 Serial Monitor to check the readings, I get following values now:

PM2.5 (CF=1) ug/m3: 81
PM2.5 (CF=ATM) ug/m3: 53
PM2.5 (CF=EPA) ug/m3: 42
PM2.5 (CF=3.4) ug/m3:: 44

Temperature in C: 23.13
Relative Humidity: 59

So we do see a great overestimation of PM2.5 by PMS5003 CF=1 in comparison to the other readings (as expected, according to this: What is the Difference Between CF=1, ATM, and ALT? - #2 by Lance - API - PurpleAir Community).

And the current AG default estimation (CF=ATM from PMS5003) does seem to yield a slightly greater value than the other proposed corrections (CF=3.4 and EPA) - at least for the observed PM and RH.

But this is a single point observation for this moment. Now that I have things set up I’ll keep monitoring these values and will share more insights here in the future.

Guilherme · August 25, 2024, 4:00pm

Today I was analyzing the initial results after a week of data, and it was good to identify a bug in my script implementation. When I was trying to replicate the results of the EPA formula, I found some issues with casting of integer to float in C++ (I’m used to Python programming instead, which is not so strict as C++).

As I was unable to edit my post above, I’m updating below the code snippet for the updatePm function (until pmFailCount = 0) in OneOpenAir.ino (arduino/examples/OneOpenAir/OneOpenAir.ino at master · airgradienthq/arduino · GitHub):

static void updatePm(void) {
  if (ag->isOne()) {
    if (ag->pms5003.isFailed() == false) {

      /* PM 2.5 - CF = 1*/
      measurements.pm01_1 = ag->pms5003.getRawPm25Ae();

      /* PM2.5 - CF = ATM*/
      measurements.pm25_1 = ag->pms5003.getPm25Ae();

      /* PM 2.5 - EPA correction (https://www.airgradient.com/documentation/correction-algorithms/) */
      /* y={0 ≤ x <30: 0.524x - 0.0862RH + 5.75}
         y={30≤ x <50: (0.786*(x/20 - 3/2) + 0.524*(1 - (x/20 - 3/2)))x -0.0862RH + 5.75}
         y={50 ≤ x <210: 0.786x - 0.0862RH + 5.75}
         y={210 ≤ x <260: (0.69*(x/50 – 21/5) + 0.786*(1 - (x/50 – 21/5)))x - 0.0862RH*(1 - (x/50 – 21/5)) + 2.966*(x/50 – 21/5) + 5.75*(1 - (x/50 – 21/5)) + 8.84*(10^{-4})x^{2}(x/50 – 21/5)}
         y={260 ≤ x: 2.966 + 0.69x + 8.8410^{-4}*x^2} */
      float x = measurements.pm25_1;
      float RH = measurements.Humidity;
      float epa_pm;

      if ((x >= 0) && (x < 30)) {
        epa_pm = (0.524 * x) - (0.0862 * RH) + 5.75;

      } else if (x < 50) {
        epa_pm = (
          ((0.786 * (x/20 - 3./2)) + (0.524 * (1 - (x/20 - 3./2)))) * x
          - (0.0862 * RH) + 5.75
        );

      } else if (x < 210) {
        epa_pm= (0.786 * x) - (0.0862 * RH) + 5.75;

      } else if (x < 260) {
        epa_pm = (
          ((0.69 * (x/50 - 21./5)) + (0.786 * (1 - (x/50 - 21./5)))) * x
          - (0.0862 * RH * (1 - (x/50 - 21./5)))
          + (2.966 * (x/50 - 21./5))
          + (5.75 * (1 - (x/50 - 21./5)))
          + (8.84 * pow(10, -4) * pow(x, 2) * (x/50 - 21./5))
        );

      } else {
        epa_pm = 2.966 + (0.69 * x) + (pow(8.8410, -4) * pow(x, 2));
      }

      /* ensure proper cast from float to int */
      measurements.pm10_1 = epa_pm + 0.5;

      /* PM 2.5 = 3(0.00030418*N1 + 0.0018512*N2 + 0.02069706*N3) */
      /* where N1, N2, and N3 are the number of particles
      per deciliter in the three smallest size
      categories 0.3-0.5 µm, 0.5-1 µm, and 1-2.5 µm. */
      float pm03_count = ag->pms5003.getPm03ParticleCount();
      float pm05_count = ag->pms5003.getPm05ParticleCount();
      float pm10_count = ag->pms5003.getPm10ParticleCount();
      float pm25_count = ag->pms5003.getPm25ParticleCount();
      measurements.pm03PCount_1 = (
        3.4 * (
          0.00030418 * (pm03_count - pm05_count)  /* N1 */
          + 0.0018512 * (pm05_count - pm10_count)  /* N2 */
          + 0.02069706 * (pm10_count - pm25_count) /* N3 */
        ) + 0.5 /* proper round float to int */
      );

      Serial.println();
      Serial.printf("PM2.5 (CF=1) ug/m3: %d\r\n", measurements.pm01_1);
      Serial.printf("PM2.5 (CF=ATM) ug/m3: %d\r\n", measurements.pm25_1);
      Serial.printf("PM2.5 (CF=EPA) ug/m3: %d\r\n", measurements.pm10_1);
      Serial.printf("PM2.5 (CF=3.4) ug/m3:: %d\r\n", measurements.pm03PCount_1);
      pmFailCount = 0;
    }

Please note that this only applies to the AirGradient indoor monitor with PMS5003. If using the outdoor monitor with PMS5003T, we would need to adjust another portion of the code.

And here is an example of corrected results after the fix:

PM2.5 (CF=1) ug/m3: 36
PM2.5 (CF=ATM) ug/m3: 33
PM2.5 (CF=EPA) ug/m3: 21
PM2.5 (CF=3.4) ug/m3:: 16

Temperature in C: 20.85
Relative Humidity: 40

If we apply the EPA correction (Correction Algorithms) based on the RH (40%) and the CF=ATM (33 ug/m3) measurements above, we are then able to get 20.89 ug/m3, which rounded then matches the 21 value reported above.

I’ll see if next week I can then share the comparisons of all four PM2.5 estimations implemented herein.

Guilherme · August 25, 2024, 11:31pm

Hi @Achim_AirGradient just a note here - as I was looking for the newly introduced compensated code in the arduino github repo, I found that you have a recent implementation in here (arduino/src/PMS/PMS.cpp at master · airgradienthq/arduino · GitHub).

However, I am afraid this may face the exact same bugs that I had in my original implementation, and then fixed today (as described in my previous post).

Firstly, the pm25 input argument for that function is an integer, and that is then used for some divisions (e.g. pm25 / 20). I believe for the division to work correctly, the pm25 should be float instead.

Similarly, with hardcoded values in other divisions, e.g. “3/2” in line 292 and “21/5” in line 296. The divisions numerators should be float as well (e.g. replace 3 with 3. , and 21 with 21.) to work as expected.

Finally, the cast of the result back to int may not be rounded properly. For example, the expected result 20.89 for input values of PM2.5=33 and RH=40 would be returned as 20 instead of 21. As discussed here: floating point - C++: How to round a double to an int? - Stack Overflow

At least these are the things I had to fix to make my implementation work, and seem to be similar in the github arduino code too. I’m not that familiar with C++, and may therefore be overlooking some details in the github implementation though. But I thought it would be worth to report it to you to double check.

PS: After sending it, I noticed there may be at least another issue with the arduino code in particular. For example, line 296 seems to be missing a power of 2 on a pm25 reference:

(8.84f * (1.e-4) * pm25* (pm25/50 - 21/5))

Achim_AirGradient · August 25, 2024, 11:48pm

@Guilherme Thank you for reporting. Ticket has been created on github.
Check Compensation Formula · Issue #225 · airgradienthq/arduino · GitHub

Please feel free to comment there.

Guilherme · August 31, 2024, 8:17pm

After two weeks with data collected, I can now share some initial insights with these numbers here.

TL;DR - a few key takeaways:

Reproduced CF_1 / CF_ATM ratio dependence on CF_1 readings (as reported by Dr. Wallace - link)
CF=ATM (current AG default) overestimates PM 2.5 readings in comparison to CF=3.4 and EPA
When PM 2.5 is below 5 μg/m³ , all methods seem to yield more similar values (including CF=ATM)
Although at values below 2 μg/m³ , CF=ATM seems to rather underestimate PM 2.5 (vs EPA or 3.4)
CF=3.4 never seems to return any zero-values, whereas others do (CF=ATM enforces this)
EPA wouldn’t return zero-values either, but if one rounds them to integer, it will happen too

My first goal was to try to replicate the plot by Dr. Wallace on the CF=1 / CF_ATM ratio as implemented in PMS5003 sensors. His data was obtained with PurpleAir monitors, but as AirGradient indoor monitors share the same sensor, I thought this would be a nice initial sanity check for the data I’m getting.

So this looks as expected. But please note that it only works if one selects raw data to be exported from the server (as this indeed seems to be raw, even though only sampled once per minute). For any other selection (e.g. 5 min), the data seem to be averaged and the plot above will look different.

Ok, after that, I wanted to compare the alternative correction formulas (CF=3.4 and EPA) in comparison to the current default readings of the AirGradient monitor (CF=ATM) - although I’m aware that the team is working to have the EPA values by default soon :

A) CF=3.4

In this case, we observe that almost all values are below the diagonal line, and thus can conclude that, within this PM 2.5 range ([0, 85] PM2.5 CF=ATM), the CF 3.4 correction consistently yields lower values.

The only exception for that observation is in very low PM 2.5 values, as zoomed in below:

So two interesting observations from the zoomed in image: (i) for PM 2.5 values below 5, the CF=3.4 would produce more similar results as the AG default estimation (and the latter will get greater values if CF=ATM below 2); (ii) the CF=ATM (curret AG default) can also result on zero values, which is not the case for the CF=3.4 alternative.

The zero-values from CF=ATM is discussed in the paper Cracking the code—Matching a proprietary algorithm for a low-cost sensor measuring PM1 and PM2.5. Dr. Wallace seems to have found the code behind the PMS5003 CF=ATM and CF=1 estimations, and observed that the CF=ATM could result on negative values, but were then rather converted to [artificial] zero values.

And a bonus observation here is that all values were rounded due to being treated as integer. For very low values, maybe one could argue that having float numbers would be beneficial instead, I believe.

B) EPA (Correction Algorithms)

Similar as the comparison with CF=3.4 , we observe that the current AirGradient values may be overestimating PM 2.5 in respect to the EPA correction formula. Another similar observation is that the CF=ATM values will get more similar to the EPA formula when dealing with quite low values (up to 5 μg/m³). And then the pattern inverts around 0-2 values, as shown below:

As EPA also depends on the relative humidity, this is an histogram of my indoor RH for that data:

So now one final interesting comparison is to see how CF=3.4 behaves in respect to the EPA correction:

Both methods seem to yield quite consistent PM 2.5 values, I’d say, based on the spread around the diagonal line. Maybe we could say that EPA tends to return slightly higher PM values at concentrations between 5 and 40 μg/m³, and above that the pattern inverts - CF=3.4 returns slightly higher values instead. Although those deviations seem to be within 10 μg/m³ from one another.

So this concludes it. If anyone is interested in any further comparisons, I’d be glad to create more plots.