@MallocArray I reproduce the same issue.
I made a workaround to have arduino 3.1.x getting compiled on esphome just to test if this helps with the crashes. Im not responsible for any issues on your devices so use at your own risk. I removed some bits to workaround the softAP changes in the wifi module which I donāt use.
esphome:
name: "${devicename}"
libraries:
- uart=https://github.com/plerup/espsoftwareserial.git#8.0.1
external_components:
- source: github://eavyon/esphome@dev
components: [ wifi ]
refresh: 0s
esp8266:
board: d1_mini
framework:
version: 3.1.2
Good news. I re-compiled my custom AirGradient (non ESPHome) FW using the new ESP8266 Arduino library v3.1.2 which includes SoftwareSerial 8.0.1 last night. Itās been running with no exception crashes or any other issues for >15hours now, so Iām reasonably-confident the issue is resolved.
Clocking in at 66 hours, one of the highest uptime now. Lets continue with this version for now.
Im not sure if ESPHome will add 3.1.2 soon as recommended version so maybe some testing could be done with only the new espsoftwareserial 8.0.1 on 3.0.2. If this also works then maybe we should make a request for it on esphome pmsx003 code.
Im sad to report it just crashed after 71 hours.
Do you know what kind of crash? Exception 0? Mine is at 2D,16h,49m so almost 65 hours.
How nice to read all your efforts.
We are also mixing up a few things at the same time, e.g. we discuss air gradientās own software but also esphome in the same place. We are also mixing different versions/configurations. So we have to be careful not to draw wrong conclusions in the end.
Personally, I use esphome with graphs and mqtt. I have posted my configuration here: Esphome with graphs - #5 by argafal
I have two problems:
-
Wifi re-connection frequently fails. I have found reports of other esphome projects with similar symptoms, the cause was i2c timing. This may or may not be the reason for my particular issue, itās hard to narrow it down. I currently have to modify i2c settings for my air gradient board (v3.3) to work, depending on which MCU I use (D1 or C3). I will be curious to see how the v4 prototype performs.
-
In addition, I have random reboots, exception Out of Memory. I believe these are caused by heap_fragmentation, values are around 30-40%. I have added the following debug sensors to my esphome yaml to understand the problem better:
sensor:
- platform: debug
free:
name: "Heap Free"
fragmentation:
name: "Heap Fragmentation"
block:
name: "Heap Max Block"
loop_time:
name: "Loop Time"
@Hendrik Would you be able to catch the exact Exception you are getting? Is it the same for both of us? And might it be worth to start recording heap (fragmentation) values, too?
Iām aware that ESPhome and arduino can behave differently but they do use the same libraries where most probably the error occurs.
@ken830 @argafal
I will try to connect the D1 mini to a pc to catch any errors. Iāve had the debug sensors enabled for a long time but in the end I couldnāt find any correlations with my crashes. The only thing which definetly caused immediate crashes was having multple graphs drawn because of memory limitations. I removed all graphs therefore.
@argafal
It seems like the i2c frequency has something to do with wifi (re)connection issues. Mine didnāt even connect when i2c was at 50khz or below. I have the feeling that too low frquencies cause to much wait time so the wifi process timesout. Especially when more sensors are on the bus and the cumulative wait time builds up. So higher speeds(100khz) did sort that one out for me.
I just had another reset and caught only the reset cause.
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
This is an interesting one because its a hardware reset and I have no strack trace. It could be a one off maybe. I still do have memory pressure with low max heap block free sometimes of 200bytes but 3k heap free. Does anyone have that also? EDIT: I removed the webserver and āonlyā doubled free space but max block space got to 4k now. Which makes it far easier for esphome do it things like generating json for the api.
Also I found out that having a esp connected directly by serial to the ESPHome docker and logging opened it decodes a stack trace. So I keep it connected for any new crashes.
Definitely either ESPHome-caused crash or your specific hardware (power?) because with the new libraries, mine has been running non-stop for 5 Days, 2 Hours, and 59 minutes.
At this moment Iām at a loss. I only can get hardware resets as described before. So no stack to debug on.
Looking for common issues this could caused by is bad power supply, wrong wifi library called or wrong pin numbers put in the config. Well the power has changed from a usb adapter to a usb port on a computer but the latter two is more or less defined with ESPHome. And the power rail has added capacitors for stability already. So not much more I could do.
Who with ESPHome has also rst cause:4 (hardware reset) with the latest versions?
with the following configuration
esphome:
name: "${devicename}"
libraries:
- uart=https://github.com/plerup/espsoftwareserial.git#8.0.1
esp8266:
board: d1_mini
text_sensor:
- platform: debug
device:
name: "Device Info"
reset_reason:
name: "Reset Reason"
- platform: debug
free:
name: "Heap Free"
fragmentation:
name: "Heap Fragmentation"
block:
name: "Heap Max Block"
loop_time:
name: "Loop Time"
i got this morning an unexpected reset with reason āHardware Watchdogā"
Device Info changed to 2023.3.2|Flash: 4096kB Speed:40MHz Mode:DOUT|Chip: 0x008a6cec|SDK: 2.2.2-dev(38a443e)|Core: 3.0.2|Boot: 31|Mode: 1|CPU: 80|Flash: 0x0016405e|Reset: Hardware Watchdog|Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40103b35 epc2:0x00000000 epc3:0x00000
at that time, there is also a big spike in:
heap free
heap max block
loop time
hope it can help
If I had more time, I would look into ESPHomeā¦ but for now, I did a quick look through the documentation and according to Espressif, during power-on, the ROM will print out a reset cause. Reset cause 4 is the watchdog timer. Both of you are probably seeing the same basic reset cause.
If the user program (ESPHome FW in this case) support this, it can also get reset cause information:
These tables were pulled from: https://www.espressif.com/sites/default/files/documentation/esp8266_reset_causes_and_common_fatal_exception_causes_en.pdf
Unfortunately, a watchdog timer expiration doesnāt tell you what went wrong, but it does mean the SoC was busy doing something that took so long, it didnāt have the chance to reset the WDT ā an infinite loop, for example.
I believe the Arduino core library has support for a software watchdog, which can be set to expire earlier than the hardware watchdog. In the case a software watchdog expires, you will get a stack dump on the terminal that you can put in the decoder to pinpoint exactly which part of the code is stuck.
@Marco Thanks. I see the same that the heap decreases substantially but in my opinion itās not running out. Just high fragmentation and that could cause a slowdown. But this is above my head to debug. Im trying different versions of arduino and serial to see if it reproduces the same errors.
@ken830 I found that in esphome in different part of the code does a wdtfeed to keep the hardware watchdog running. If it exceeds above 6 seconds it could kick in and so that happens to us. Why is really hard to tell because a lot of components could be in a long loop exceeding(without some form of yield()) that 6 seconds. But it could be related with the high fragmentation of the heap and a slowing down of processes. Esphome does make use of the watchdogs. At this point its not really helping to make general statements of arduino/esp8266 because this is only in esphome as you experience no problems anymore. And itās not that simple to make adjustments to the watchdog settings. It would be better to not even touch this and just prevent that a watchdog kicks in.
If these issues from users of esphome are all related to the hardware watchdog itās maybe better to start a new topic and discuss it there. Personally I do want to make it work with esphome. And resolving this could help many and add to a easier usage of airgradient in home automation systems.
I will add that I see also errors on reading the uart which causes data corruption of the senseair. For example invalid preambles or checksum doesnt match. But this happens infrequently and could happen anytime, not just before a reset. I see no correlation with the hardware resets.
Besides randomly trying various changes and hoping to get lucky, the only real way to find the issue is to get a stack dump. A software WDT will give you a stack dump. If ESPhome is already using a software WDT but weāre still rebooting on a hardware WDT, then perhaps look into places in the code that disables the software WDT? 6 seconds is like 6,000 eternities to the MCU. Have you engaged the ESPhome devs?
Please look into ESPhome before commenting any further. Youāre investigations earlier helped understanding a lot of things but this does not work. I donāt do random things but Iām not you. Now that I have a clue this issue seems resolved on pure arduino code and esphome is still having troubles the next step is looking in that direction as I already did.
Gathering more information here can help defining the issue to the devs of esphome. But only complaining my esp8266 crashes and I have no stacktrace is not really a starting point there to begin with. And also the people who encounter these issues are here and not (yet) on the esphome git.
I keep responding to this thread to define the issue better and maybe others see different things I didnt encounter. If you want to help compile esphome and test it just as Marco did.
@argafal What was your exception cause precisely? I only get the hardware watchdog no 4.
@Hendrik Extremely sorry if I came off in a way offended you. I did not mean to imply or insinuate anything about you or your methods and I am very surprised by your reaction, but I could understand how it was taken that way. I was just throwing ideas out there based on my limited understanding and the ārandomā comment was my way to highlight that anything besides a stack trace is not going to pin-point the area of code that causes lock-up. Again, very sorry. I thought I was being helpful, but Iāll keep quiet now.
My entire reason for being here is for integration of air quality into my home automation system, so my hope was that these ESPHome issues are resolved before I get find the time to wade into it.
Youāre fine. Slowly we both have rooted out issues with the platform so users have a more stable device experience. While ESPhome isnāt the first choice for Airgradient I think itās still useful to find the problems with it and hopefully to resolve them. It will make it more accessible for people with already setup home automation systems and not too technically capable. Iām not in a hurry to resolve everything.
Last few weeks Iāve been monitoring my system and my device stays up now for almost a week. Most of restarts are still hardware wdt. Which means that something locks up in esphome for too long. Today esphome 2023.4.2 is released with a fix for i2c so lets see if that helps.
I also found a way to disable hardware wdt and I will try that if the update still has problems. GitHub - epiclabs-uc/esphome-nowatchdog-component: Component to disable watchdog in ESPHome for QEMU debugging I have not tried it yet if it really works.
Last thing remaining is putting in a esp32 module but that means rewriting a bit of config and who knows what else of problems turn up. But an outdoor Airgradient is coming my way soon which already contains this module so I can test in parallel then.
I landed here because my Airgradients are frequently rebooting.
While I donāt have a solution, I have observed that removing any unnecessary services helps quite a bit. Iāve removed the api:
, captive_portal:
, and ota:
stanzas. They still rebootā¦ but 1-5 times per day instead of 10-20 times a day as they did previously.
Changing the log level or PM sensor update frequency as mentioned elsewhere did not help with uptime one bit in my case. And bizarrely, changing the log level to ERROR
or completely disabling logging (by setting baud_rate: 0
or completely removing the logger:
section) renders the CO2 sensor nonfunctional
For what itās worth I am using @MallocArray 's EspHome config with some minor tweaks (mostly, adding 2 neopixels to show AQI and CO2 levels respectively) on DIY AirGradient Pros. (Many thanks @MallocArray for making the config available!)
I hope necromancing this old thread may help someone. Iāll probably swap out the 8266s with ESP32 C3 minis when I get the chance instead of further trial and errorā¦
Iāve also based my config heavily off of what @MallocArray made, however what worked for my setup is to wire my PM sensor directly to the hardware UART pins (RX/TX), and I got the idea from this github comment.
I would then modify the logger component to use the other, TX-only UART of the esp8266:
logger:
level: DEBUG
# baud_rate: 0
hardware_uart: UART1
logs:
pmsx003: INFO
Then use the hardware UART pins for the PM sensor:
uart:
# https://esphome.io/components/uart.html#uart
- rx_pin: D4
tx_pin: D3
baud_rate: 9600
id: senseair_s8_uart
- rx_pin: GPIO3 # previously D5
tx_pin: GPIO1 # previously D6
baud_rate: 9600
id: pms5003_uart
Then solder the wires. PM sensorās TX should be on RX, then RX should be on TX.
Checking the logs would confirm that hardware serial is being used instead:
[02:32:03][C][uart.arduino_esp8266:102]: UART Bus:
[02:32:03][C][uart.arduino_esp8266:103]: TX Pin: GPIO1
[02:32:03][C][uart.arduino_esp8266:104]: RX Pin: GPIO3
[02:32:03][C][uart.arduino_esp8266:106]: RX Buffer Size: 256
[02:32:03][C][uart.arduino_esp8266:108]: Baud Rate: 9600 baud
[02:32:03][C][uart.arduino_esp8266:109]: Data Bits: 8
[02:32:03][C][uart.arduino_esp8266:110]: Parity: NONE
[02:32:03][C][uart.arduino_esp8266:111]: Stop bits: 1
[02:32:03][C][uart.arduino_esp8266:113]: Using hardware serial interface.
This let my monitors run for days, with the PM sensor polling continuously, where previously Iād be lucky to see it last for 8 hours. But I donāt actually know much of the downsides doing this, so let me know if there are any.
Edit: I also did try a different wemos mini, the S2. Based on the limited time I had to test, it did run fine for days as well, although I tested it with only a PMSA003, HC8 co2 sensor (also on UART), and the .66 oled.