- Network disconnecting: I will go to load a new web page and it fails to load.
- Fail to resume after hibernation: previously resume has been very reliable.
The network problem is fairly new but it is difficult to pin down. Over the last week or so I have been connecting through Linewize, and obviously there is going to be a question about whether that is responsible for the problem. The other computers seem to have no difficulty with it, and it doesn't look like the physical network interface has actually disconnected from the network. Considering that last week I used the media computer to download a large volume of Youtube stuff and had no real issues I can't understand why MainPC keeps losing web browser connections. I should check by running some terminal commands to see for example what nmcli's output returns when the issue is experienced, or what some pinging does.
I have a wireless bridge to a wireless network next door, and the bridge also acts as a router and does NAT as well. So everything from here appears at one network address as far as the wireless system next door and its Linewize system is concerned.
The hibernation one has been happening for a fair bit longer. The system simply freezes after completing the restore; you get blank screens and the keyboard num lock light can't be toggled, indicating it has just frozen. It is not every time the system is resumed but is happening about every week or two.
In both cases upgrading the system with the latest software packages for everything has not resolved the problems. Whilst hibernation can be problematic at times it has been a lot more reliable before now.
After the same thing happened this morning the kernel log is full of messages like this. These messages keep repeating until the point where I pulled the plug on the computer as all I could see was a blinking text-mode cursor in the upper left corner of one of the screens:
Apr 5 09:46:19 MainPC kernel: [ 8240.834662] NVRM: API mismatch: the client has the version 375.39, but
Apr 5 09:46:19 MainPC kernel: [ 8240.834662] NVRM: this kernel module has the version 367.57. Please
Apr 5 09:46:19 MainPC kernel: [ 8240.834662] NVRM: make sure that this kernel module and all NVIDIA driver
Apr 5 09:46:19 MainPC kernel: [ 8240.834662] NVRM: components have the same version.
I should have tried Ctrl Alt F1 to see if I could log in to a terminal window but I did try a lot of key combinations, possibly including that one, so it seems the system has locked out the keyboard at that point.
It looks like the issue therefore is the graphics card not initialising and maybe hibernation is a red herring because I use hibernation all the time, this would seem to be just happening randomly on some boots but not all of them, however it may be hibernation linked.
First thing to try is remove and reinstall the NVidia driver:
sudo apt purge nvidia
sudo apt install nvidia-375
then restart
On restarting it says "System problem" with an option to report and after clicking the button it shows another message "Sorry Ubuntu 16.04 has experienced an internal error." The error turns out to be in xserver-xorg-core 2:1.18.4-0ubuntu0.2 and the title is Xorg crashed with SIGABRT in OsAbort()
The time given corresponded to the time when the kernel messages were being reported as above and the exact timestamp corresponded with the first appearance of the message in the kernel log, for that boot sequence. On the current boot those NVRM messages are not present.
We shall see if this addresses the problem, it must have occurred through at least two system upgrades but I have not seen a similar issue on the MediaPC which is the only other linux system I regularly hibernate. Although it has a different model NVidia graphics card.
However I did find this referred to elsewhere on the web, where it was also noted it was not possible to open a terminal and the outcomes appeared to be the same - the X Windows server could not start so the GUI would not run. So it would seem this indeed is the issue.
UPDATE: Resume failed again a week later and there is no obvious cause. Just as not all of the previous resume issues had stuff in the logs. After seeing blank screens I pressed Ctrl-Alt-F1 and logged in textmode then had a look through a few logs, found nothing obvious. Then typed in startx and the GUI started but none of the previous windows were restored. Will keep testing.
UPDATE 2017-04-28: Still seeing this issue and unsure what is happening. The whole session gets restored from hibernation but the desktop is not loaded. You can start a terminal and type startx which gets you a new blank desktop whilst the applications for the restored desktop are all running but that desktop is not visible. So apps that will only run one instance cannot be started because they are running in the invisible restored desktop, and when this happens the only thing that can be done is to pull the plug.
UPDATE 2017-04-31: The system was stable for about a month and then there was a random instance that it failed to restore from hibernation. The difference this time was that the system first of all had a problem mounting a disk, and came up in emergency mode. After resolving the mount issue, there was then an instance of the blank screen after resuming with the funny x shaped pointer and nothing else displayed, which is what has happened in some of the other failed instances.
The main difference is now I have learned about journalctl (a systemd component) that will give me all the kernel logs. With a configuration change this is now storing more than just the most recent boot. I am now waiting for another resume failure (there have been several lately) to help me diagnose the problem. There is a pile of updates waiting to be installed so I am putting those in as well.
UPDATE 2017-06-30: After two months of no problems I installed some updates that required a restart, then the system hibernated (not shut down), then on resume I had this problem again. Once I powercycled it came up with the system error dialog to say it wanted to send an error report. It lists the following bug: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1543192
In this case the screen just showed the blinking cursor but did not allow me to go into a terminal.
I have debug mode set in the Grub command parameters and I could poke through the logs with journalctl but right now I can't be bothered.
I keep seeing the network problem with no obvious cause and have dug around without figuring out what is happening. On the other Linux PCs this never happens, so it must be specific to the chipset in this computer.
UPDATE 2017-07-08: I have had a situation where the system repeatedly failed to boot after I had connected a TV as a third screen (it normally has just two) and that screen was powered on, but the settings in xrandr were set that the screen's "Use this screen" box was unticked and also the setting to configure screens when new ones were connected was also off. Each reboot (two so far) the system failed to resume from hibernation, in that I was left with a blank screen. It would seem this setting causes problems if you use it, and in this respect the software is way below Windows, which copes pretty well with screens missing or whatever on startup. To resolve this I disconnected the cable for the TV and so I will only plug that in physically if I need it. It seems the software can't handle that it can detect the screen but it is told not to use it in the settings. However I am not getting the xorg server bug error messages popping up when I restarted in this instance.
The problem persisted from this point and by changing a setting in the display settings only then could I get a reliable resume. This has been reported to Ubuntu Bugs.
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/1703098
UPDATE 2017-04-31: The system was stable for about a month and then there was a random instance that it failed to restore from hibernation. The difference this time was that the system first of all had a problem mounting a disk, and came up in emergency mode. After resolving the mount issue, there was then an instance of the blank screen after resuming with the funny x shaped pointer and nothing else displayed, which is what has happened in some of the other failed instances.
The main difference is now I have learned about journalctl (a systemd component) that will give me all the kernel logs. With a configuration change this is now storing more than just the most recent boot. I am now waiting for another resume failure (there have been several lately) to help me diagnose the problem. There is a pile of updates waiting to be installed so I am putting those in as well.
UPDATE 2017-06-30: After two months of no problems I installed some updates that required a restart, then the system hibernated (not shut down), then on resume I had this problem again. Once I powercycled it came up with the system error dialog to say it wanted to send an error report. It lists the following bug: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1543192
In this case the screen just showed the blinking cursor but did not allow me to go into a terminal.
I have debug mode set in the Grub command parameters and I could poke through the logs with journalctl but right now I can't be bothered.
I keep seeing the network problem with no obvious cause and have dug around without figuring out what is happening. On the other Linux PCs this never happens, so it must be specific to the chipset in this computer.
UPDATE 2017-07-08: I have had a situation where the system repeatedly failed to boot after I had connected a TV as a third screen (it normally has just two) and that screen was powered on, but the settings in xrandr were set that the screen's "Use this screen" box was unticked and also the setting to configure screens when new ones were connected was also off. Each reboot (two so far) the system failed to resume from hibernation, in that I was left with a blank screen. It would seem this setting causes problems if you use it, and in this respect the software is way below Windows, which copes pretty well with screens missing or whatever on startup. To resolve this I disconnected the cable for the TV and so I will only plug that in physically if I need it. It seems the software can't handle that it can detect the screen but it is told not to use it in the settings. However I am not getting the xorg server bug error messages popping up when I restarted in this instance.
The problem persisted from this point and by changing a setting in the display settings only then could I get a reliable resume. This has been reported to Ubuntu Bugs.
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/1703098