Airgradient rebooting itself?

Hi all, new to the forum and community. I ordered an airgradient diy pro kit in mid december, received it about a week ago and put it together. The one issue I have been having is that multiple times a day the sensor seems to reboot for no apparent reason. It displays the wifi connect screen for a second before switching to the “warming up the sensors” dialogue, then returning to the normal output. This does not seem to impact the data being sent to the dashboard.

From looking at the code I can’t see any reason the board would enter those sections of code after bootup unless the ESP reset itself for some reason (the connectToWifi function and “warming up the sensors” dialogue are only in the ‘setup’ function, which afaik is only run once at boot). I have tried using a different USB power supply, and powering the ESP directly instead of through the added USB-C port. I also went back to touch up my soldering job and make sure there werent any shorts or bad connections, and I couldn’t find any.

I am wondering if anyone has any insight into this they could share. Is this behaviour expected? Would it indicate an issue with the power going to the board? Did I mess something up when assembling the board? Any help would be appreciated, Thanks.

1 Like

I also ordered in December/received a week ago and am seeing the same issue and have been debugging for the past 1+ week. It takes a long time to test various scenarios. When I have the D1 Mini connected via USB monitoring the serial port, I can see it hits a runtime exception. That’s why it reboots. This is my first time working with Arduino and the D1 Mini. Problem with the Arduino IDE is that copying stuff from the serial monitor is damn near impossible unless it fits in one screen. And there’s no exception decoder or debugger or anything that I know of.

I’ve added a small line of code to print the uptime (derived from millis()) to the screen, print to serial port, and set a stop-watch to match so that I can walk away and check-in to see if it had rebooted. I did a lot of tests but last night, I ran with just the bare-board D1 Mini monitoring the board over serial port and it ran over 12 hours. Today, I plugged the D1 Mini into the PCB and powered the board through the PCB (mine seems worse when powering via the PCB instead of the D1 Mini) and it ran fine over 3 hours – No sensors, just the OLED screen. I then added the TVOC sensor and re-ran for a few hours… still good for a couple of hours. I can’t add the temp/hum sensor because of the pull-ups, so I added the S8 and it ran fine for a couple of hours too. I then plugged in the PMS and walked away and came back and saw it last rebooted 30 minutes in – not sure if it’s a single or multiple reboots, since I only have uptime counter to go by, but I let it continue running and it’s been running for over 72 minutes as I type this.

Problem is even if it runs fine for 3-4 hours, I’m not 100% confident it’s stable because I’ve seen it reboot after a few hours. It’s difficult to know because it’s intermittent and I’m not there to catch it, so it takes so long to test.

I am working on code to remotely log over the network, but I wanted to do this basic test first with near-stock FW to see if I can detect a pattern or root cause…

1 Like

Ordered in January, just got mine, same issue: frequent random reboots with my pro pre-soldered version. So @philomellia that seems to suggest it’s not your solder work.

1 Like

Still not conclusive yet, but I’ve captured three exceptions so far. All are Exception 0, which is caused by an illegal instruction. From my reading, the exception notes “ctx: sys” meaning the exception is in the Arduino SDK stack and not the sketch stack. The first two exception happens after printing the POSTURL and before printing the http code and response, but the third doesn’t fit that. The first program counter is at 0x40218e08 while the 2nd and 3rd are 0x40218e58. So far, all exceptions happen when I have PMS sensor plugged in and so far I have not seen the exception with the PMS sensor unplugged.

Here are the beginning of the 3 exceptions:

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (0):
epc1=0x40218e08 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffeb90 end: 3fffffb0 offset: 0190
3fffed20:  00007fff 30e2625a 40103be6 00000100  
3fffed30:  3ffea148 7fffffff 00002200 00000001  
3fffed40:  00000001 00004208 00004000 4010372c  
3fffed50:  3ffea148 00000000 00000000 e0033035  
3fffed60:  00000000 00000001 0000000e 40100c04  
3fffed70:  00000020 3fffc200 00000022 00000001  
3fffed80:  40234bed 00000030 0000001e 00000022  
3fffed90:  3fffc200 40100b40 3fffc258 4000050c  
--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (0):
epc1=0x40218e58 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffebc0 end: 3fffffb0 offset: 0190
3fffed50:  40104d57 00000034 00000000 00040000  
3fffed60:  40104d57 00000033 00000000 00040000  
3fffed70:  00000000 40105c5b 00004000 401029c4  
3fffed80:  40103c20 00080000 3ffed2d8 e0037035  
3fffed90:  00000000 00000001 0000000e 40100c04  
--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (0):
epc1=0x40218e58 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffeb90 end: 3fffffb0 offset: 0190
3fffed20:  00007fff 2c8dd264 3ffedbe4 4010372c  
3fffed30:  3ffea140 00000000 00000000 00000030  
3fffed40:  00007fff 2c8dd264 00004000 00000100  
3fffed50:  3ffea140 7fffffff 00000000 00033035  
3fffed60:  00000000 00000001 0000000e 40100c04  
3fffed70:  00000020 00000001 00000000 401029c4  
3fffed80:  3ffe98c0 40105c43 3ffed170 00000022  
3fffed90:  3fffc200 40100b40 3fffc258 4000050c  

@Achim_AirGradient : Is there any way to decode these exceptions? I have the full stack trace/dump. I tried Exception Stack Trace Decoder, but it’s not compatible with the newer Arduino IDEs.

1 Like

Thanks for your info @ken830, at least two of the reboots happened while I was sitting at my desk with the sensor mounted on my wall in front of me, right as it was trying to spin up the fan in the PM sensor, I forgot to mention in my original post but that was why I suspected a power issue, I thought the extra current draw from the fan might be tripping something and triggering the ESP to reset.

I also forgot to mention my PCB version is 3.3.

@philomellia : You mentioned spinning up the fan of the PM sensor. What do you mean by that? In my custom code, I had logic in there to sleep the PM sensor, wake up every 5 minutes, wait 30 seconds, and read for 30 seconds, and then go back to sleep. I’ve disabled that for now in my current tests.

Update:

I captured a lot more exceptions and it became less and less conclusive. Almost like there is no pattern. I tried my more customized code that adds in support for SGP41 and a few other niceties and it crashed really often (within 20 minutes typically). I move to VSCode + PlatformIO to try to debug, but it turns out debug is not supported on ESP, only on Uno and Nano. I just can’t believe there is no way to debug. How would anyone be able to deploy an ESP-based solution??

ESP Exception Decoder is exactly what I need so, I finally gave up and looked into installing the older Arduino IDE V1.8x. My concern was having both the new 2.x and old 1.8x versions installed side-by-side. It seems like people with 1.8x could install 2.x side-by-side, but going the other way is obviously slightly different, but I took a chance anyway and it seemed to work okay…? It was able to see the installed libraries in library manager, but I had to install ESP support in the board manager to select the D1 Mini. Then I imported my code to a new project and was not able to compile at all! It wouldn’t even start. It just kept giving me an error saying it couldn’t locate “preferences.txt” but the preferences screen shows me the exact location of the file and it is right there. After more wasted hours of Google searches, I was able to stumble upon someone who had the same issue. Turns out there is a bug in the ESP8266 boards library that causes this issue. I had to install an older version (v3.0.0) and it instantly compiled and worked.

So far, I never got a chance to use the ESP Exception Decoder because something strange is happening – instead of crashing every few minutes, the board has been running non-stop for over 3 and a half hours as I type this. Hmmm… perhaps the new ESP8266 library is what’s causing the illegal operation exceptions?! It’s still too early to tell and I will need to to run at least 12 hours to be somewhat sure, but it seems like it could be the case.

If anyone else wants to try, roll back your ESP8266 down to version v3.0.0 and re-compile and see if the reboots still occur.

I should have another update tomorrow or soon after. If it turns out it’s the ESP8266 Arduino core, then I’ll have more work to do to figure out and report the issue.

2 Likes

@ken830 wow thanks for the detailed analysis. This is my first hands on project with an arduino or esp of any kind so I am not confident in how to debug the software myself, thanks so much for putting in that work.

This is what I meant by the fan “spinning up” (this is purely anecdotal and I only noticed it happening two maybe three times):
at least on my unit the fan on the PM sensor is not always spinning, at least not always at full speed. The two times I observed the reboot myself I happened to be looking at the display because I had heard the fan start to speed up for about a second, and that was when the reboot happened. The fan would start to speed up steadily and then after a second or two abruptly stop entirely and the display would refresh with the setup routine.
I said spinning up because I assumed the fans were stopped and only ran ocassionally to help increase the sensor efficiency by fully cycling the air it is measuring from, that may not be accurate I am not familiar with the specifics of the sensor’s operation.

Also for reference I do not have the VOC sensor in my unit.

This is my first time with Arduino or ESP too. I’m a hardware engineer, so my software skills are meager, but I figured I have enough ability to poke around and maybe get other, more experienced, people to take notice.

So the PM sensor does seem to modulate its fan speed at times, but I don’t believe it ever shuts off unless it goes into “sleep” mode either by pulling the SET pin low or sending the 0xE4 + 0x00 sleep command. I also noticed that my custom code made it reboot much more often. And my code putting the PMS sensor to sleep and waking it up periodically was definitely interesting even though I know it wasn’t the sole cause because I noticed the rebooting from day one, before I made any changes to the code.

Early on, I kind of suspected that it could be related to power too because it seemed worse when I powered it via the PCB and better when powered via the D1 Mini directly, but right now I can’t remember if the reboots were improved or just the screen freezing issue. At the time, I wasn’t sure if those were related, but I think I’ve root-caused the screen freezing issue and now I’m just working on the rebooting issue. By the way, you have a v3.3 board: does yours experience the issue where the screen stops updating but still seems to continue to upload to the cloud/dashboard? And does the temperature/humidity reading always work?

1 Like

I’m not clear - are y’all running ESP Home? Which is an alternative software for the unit I think?

I’m running what I think of as the stock sofware from the installation guide. The only modifications I’ve made are to set temperature to F and I ran S8_UART examples to turn off CO2 ABC and then do a manual calibration. I do have the SGP41 VOC sensor.

No, not running ESP Home. I ran bone stock and confirmed it reboots. I then did a slight modification to write the uptime to the screen and to the serial port so that I can confirm if there was a reboot by comparing it to a stopwatch I start at the same time I power up the board.

After extensive experiments with the stock code, I’m running my modified version of the stock code which crashes/reboots much more frequently to try to speed up the debug process.

If you want to play along, could you check what version of the ESP8266 library you have installed on your Arduino IDE? If it’s newer than v3.0.0, could you try to roll it back to v3.0.0 and re-compile and see if the reboots go away? I only have one unit and it’s still running the from the very first time I compiled with the older Arduino IDE v1.8… It’s going strong past 7-and-a-half hours now and still no crash and it was crashing every few minutes the last time I loaded the same code, but compiled under Arduino IDE V2.x and PlatformIO. The main difference is that I was forced to use ESP8266 v3.0.0.

@makingmark : Oh, and one more thing, since you mentioned you had to modify the code to change to Fahrenheit – does that mean you have PCB v3.7? Otherwise, there is already stock firmware pre-configured for F for <v3.7 PCBs on the build instructions page.

@makingmark I am running completely stock software, I flashed the DIY_PRO example code directly to the board and have made no modifications whatsoever.

@ken830 I have not noticed any issues with the OLED freezing. I think I remember skimming through a thread about that while trying to see if anyone else was having the reboot issue, but I don’t remember it being familiar to me. I have had no dropouts in terms of uploading the data to the dashboard, the sensor has not missed a single data point on there besides when I unplugged it for maintenance. The temperature and humidity do seem to update appropriately as well, the only issue I have with those is that the temperature seems to read about half a degree higher than any other thermometer I have been able to test in the same space.

I would be more than willing to try messing around with the code and recompiling, but I’ll have to try and get to that tomorrow or maybe later, as I’m a bit ill at the moment. I’ll get back to you if I have a chance to try those steps, and I might try to implement some debugging measured in my sensor as well so I can at least monitor the uptime.

If you get around to it, running bone-stock FW isn’t feasible because you have to be staring at the screen the entire time waiting for a reboot. Here’s the updateOLED2() function I used in my “near-stock” experiments. This was the only change to the code. I simply squeezed the three lines closer together to make room for a fourth line and on that line, I output the uptime in minutes and also write it to the serial port. Start a stop-watch timer at the same time you plug in the board and as long as they match, you know it hasn’t re-booted.

void updateOLED2(String ln1, String ln2, String ln3) {
      char buf[9];
      u8g2.firstPage();
      u8g2.firstPage();
      do {
          u8g2.setFont(u8g2_font_t0_16_tf);
          u8g2.drawStr(1, 10, String(ln1).c_str());
          u8g2.drawStr(1, 25, String(ln2).c_str());
          u8g2.drawStr(1, 40, String(ln3).c_str());
          u8g2.drawStr(1, 55, String(currentMillis/60000).c_str());
          Serial.println(String(String(currentMillis/60000) + "minutes").c_str());
      } while ( u8g2.nextPage() );
}
1 Like

I am so glad someone else posted about this. I received the AG Pro upgrade kit many months ago. I used my own D1 Mini, SGP30 and PMS sensor, although I didn’t actually migrate the hardware from my original DYI kit. I haven’t bought another S8 CO2 sensor yet

I use ESPHome and noticed the D1 mini was rebooting very regularly. Enough I couldn’t get a baseline for the SGP30. I tried running the D1 and SGP30 on a breadboard and it was perfectly stable so I thought maybe it was the AGPro board, but ultimately it seems to be related to the PMS sensor. If I unplug the PMS cable going to the board, then everything is stable with all of the rest of the components connected.

I’ve been meaning to try switching to the official AirGradient sketch, but I’ve been too lazy to get around to it, but it sounds like the others in this thread may be having a similar issue even with it.

For what it is worth, here is a graph of the uptime of the D1 Mini over the last 24 hours. Every time the graph spikes down, it has rebooted

Since running on ESPhome I had the D1 mini rebooting once in a while. It was way worse when I had more screen functions running which relate to memory usage. I also disable the screen update after 5min so the I2C bus is less busy. So I presume that there could be more reasons the ESP crashes/reboots. Also SGP30, SHT30, S8 CO2 and PMS.
My uptime can be more than a day. But eventually it will reboot.

One of the things I want to try but still couldn’t get around it is puttng a wemos C3 in it, which is an ESP32.

My code compiled with ESP8266 v3.0.0 yesterday is approaching 24 hours of uptime now, which is a drastic improvement from just a few minutes with the latest (v3.1.1).

@MallocArray & @Hendrik : I don’t know much about ESPhome, but does the code use the ES8266 board Arduino library?

Quick update:

The board uptime surpassed the 24h mark.

I’m going to call this as “good enough” given the time scale I need to work with to make meaningful progress in debugging this thing.

I moved back to Arduino IDE v2.0.3., opened the exact same sketch, upgraded ESP8266 to the latest (v3.1.1) in Board Manager, re-compiled, and… it crashed with an exception within ~ 1minute. I let it run for another 5 minutes and it crashed again.

I then reverted ESP8266 down to v3.0.0 in Board Manager, re-compiled, and so far it’s running for over 7 minutes with no crashing. I guess if it doesn’t crash, I’m going to let it run for at least 12-24h again. It’s a slow process, but I want to isolate this first before going in too many tangential directions.

ESPHome is its own thing. I’m not sure what exactly it does in the background, but what we see is not related to Arduino code at all. They use their own YAML format to configure sensors and displays.

If I unplug the PMS sensor, it seems perfectly stable, but I haven’t tried disabling the OLED and leaving the PMS connected to see if it might be memory related. I might try that soon.

I’m not so sure… From https://esphome.io/components/esp8266.html, it seems like it’s all going back to https://github.com/esp8266/Arduino, which is the same Arduino core used by PlatformIO and Arduino IDE.

I’m no SW expert by any stretch, but looking deeper, I see these releases:

ESP8266 Arduino Core:

  • v3.1.1 (2023-01-14) <== CRASHES
  • v3.1.0 (2023-01-06)
  • v3.0.2 (2021-07-26)
  • v3.0.1 (2021-06-26)
  • v3.0.0 (2021-05-15) <== WORKS

ESPHome uses PlatformIO. I just compiled on Platform IO a few days ago and it crashed. Platform IO’s last release was v4.1.0:

PlatformIO platform-espressif8266:
*v4.1.0 (2023-01-16), Arduino Core v3.1.0 <== CRASHES
*v4.0.1 (2022-01-01), Arduino Core v3.0.2
*v4.0.0 (2022-05-31), Arduino Core v3.0.2
*v3.2.0 (2021-08-13), Arduino Core v3.0.2

To me, it seems like this Arduino Core was a VERY recent update. Because of this, I became impatient in my testing. My Arduino Code v3.0.0 test ran for 3 hours with no crashes, which is a drastic difference from crashing within 1 minute or 5 minutes of the v3.1.1. I’m fairly confident it will behave the same as yesterday’s 24h test, so I cut it short and decided to narrow down the change between v3.0.2 from 2021 and v3.1.0 from this year.

I’ve installed v3.0.2, recompiled, and the board is running now. 17minutes and counting… My assumption is this will work fine (I’ll give it a few hours), then I will move to v3.1.0 and confirm it’s the recent .1 release that is the significant factor here.

After that, I will move into figuring out how to decode the exception to find the culprit of the illegal operation. Might need some help there, since I’m not quite sure how to approach it at this moment. Wish I had a second board and set of sensors on-hand so I could do some of this in parallel.