Yes, hardware serial port will be 100% rock solid.
Yay! I’m so glad that even if I never get down to the bottom of this, at least we have several work-arounds that are effective. And I don’t think you give up anything with any of the workarounds. As you can see below, you can have the latest Arduino core if you swap out to the previous-to-last version of SoftwareSerial. Or you can have everything if you build it with the debug flag. At least with my limited time testing it (~12-24h). We’ll have to run for weeks to see if that holds up.
Keep in mind that my make-shift uptime display/print-out is crude and is just based on the internal Arduino millis()
function, which is an unsigned-long (32-bits) that counts up the milliseconds from the beginning of time (when it boots). When it reaches it’s maximum value of 0xFFFF_FFFF, it’s going to rollover to 0x0000_0000. That takes exactly 2^32 milliseconds = ~49.7102696 days (not counting clock accuracy tolerances and drift). I guess we can write a bit of code to detect the overflow condition and count them in an unsigned-int , which would give us a limit of 2^(32+16) milliseconds = ~8,919.59429 years.
Update for the day:
The Core v3.1.1 with the older version of SoftwareSerial (v6.12.7, included in Core v3.0.x) was up for 14+ hours. I then moved on to test all the combinations of Core and SoftwareSerial, up to 8-hours each for the passing ones. Even with 13 releases between them, I took a chance and skipped all the way to the last 6.x.x release, which happens to be the previous-to-current release. This validated my hunch that it was major version 7.0.0 of SoftwareSerial that is making it crash. Here’s a handy chart of where we’ve been and where we are:
Test Configuration | Default? | Result |
---|---|---|
Core v3.1.1 + SoftwareSerial v7.0.0 | [X] | Exception 0 |
Core v3.1.1 + SoftwareSerial v6.17.1 | – | Working |
Core v3.1.1 + SoftwareSerial V6.12.7 | – | Working |
Core v3.0.2 + SoftwareSerial v7.0.0 | – | Exception 0 |
Core v3.0.2 + SoftwareSerial v6.17.1 | – | Working |
Core v3.0.2 + SoftwareSerial V6.12.7 | [X] | Working |
Core v3.1.1 + build_type = release
|
[X] | Exception 0 |
Core v3.1.1 + build_type = debug
|
– | Working |
Core v3.1.1 | Exception 0 | |
Core v3.0.2 | Working | |
Core v3.0.0 | Working |
Next steps? Well, the Core contributors had some hunches of what could be sources of the problem (related to ISRs not in IRAM), so I’ve capture a set of ELF files that may help to tell if that is the case. There’s also a pre-processor directive macro that was suggested that could hopefully make the stack dump more informative and give us a hint as to which part of the code called the circular_queue::available()
function that is resulting in an Exception 0.
I may also reach out to the SoftwareSerial contributors via an issue report to see if they could make something out of it.