Opened 2 years ago

Closed 2 years ago

#4889 closed defect (notsugar)

virtual memory leak in background control panel

Reported by: quozl Owned by:
Priority: Normal Milestone: Unspecified
Component: Sugar Version: Unspecified
Severity: Minor Keywords:
Cc: Distribution/OS: OLPC
Bug Status: New

Description

Virtual memory size of the shell increases by 20 Mb for each ten launches of the background control panel.

To reproduce, capture virtual memory size at one second intervals;

while true; do
  echo $(date +%s.%N) \
    $(awk '/VmSize/ { print $2; }' /proc/$(pgrep -f jarabe)/status)
  sleep 1
done | tee memory.log

Then open the background control panel, close it, and repeat ten times. Compare the starting VmSize to the ending VmSize.

A graph to be attached.

Cause might be Fedora 18, Python, libraries, or Sugar.

Attachments (2)

memory.png (8.8 KB) - added by quozl 2 years ago.
XO-1.5 OLPC OS 13.2.5 Sugar 0.106.1 My Settings Background VmSize leak test, eleven cycles, 2015-10-06, by James Cameron
FrameCpanelMemory.png (7.0 KB) - added by SAMdroid 2 years ago.

Download all attachments as: .zip

Change History (12)

Changed 2 years ago by quozl

XO-1.5 OLPC OS 13.2.5 Sugar 0.106.1 My Settings Background VmSize leak test, eleven cycles, 2015-10-06, by James Cameron

comment:1 Changed 2 years ago by quozl

The VmSize leak rate for background control panel show and hide is of the order of 151 kB per second.

A leak occurs with all GTK+ drawing operations done by Sugar. It even reproduces when scrolling the Journal, at a rate of 24 kB per second.

Automated reproducers:

sudo yum install xdotool

# journal scrolling
# prepare by adding journal entries until scroll bar is present
function test {
  xdotool \
  mousemove 1191 801 click 1 sleep 0.2 \
  mousemove 1191 200 click 1 sleep 0.2
}

# background control panel
# prepare by using f3 then position cursor in search box
function test {
  xdotool \
  mousemove 600 450 click 3 sleep 0.7 \
  mousemove 700 660 click 1 sleep 0.7 \
  mousemove 935 270 click 1 sleep 7 \
  key Escape sleep 0.1 \
  key Escape sleep 1
}

# test runner and vmsize logger
while true; do
  echo $(date +%s.%N) \
    $(awk '/VmSize/ { print $2; }' /proc/$(pgrep -f jarabe)/status)
  test
done | tee memory.log

Log file analysis tool:

#!/usr/bin/python
import glob

for name in glob.glob('memory*.log'):
    lines = file(name, 'r').readlines()
    t0 = float(lines[0].split()[0])
    t1 = float(lines[-1].split()[0])
    v0 = float(lines[0].split()[1])
    v1 = float(lines[-1].split()[1])
    print '%.0f | %s' % (((v1 - v0) / (t1 - t0)), name)

comment:2 Changed 2 years ago by quozl

Further testing on XO-1.5 with 1 GB RAM.

According to changes in /proc/$(pgrep -f jarabe)/smaps the area of virtual memory that leaks corresponds to the heap. The virtual memory is also part of the resident set VmRSS, since there is no swap.

Eventually the heap consumes system memory and Jarabe hangs. Before it hangs, it grows slower and slower. The test above had to be changed to give time for Jarabe to respond;

function test {
  xdotool \
  mousemove 600 450 click 3 sleep 30 \
  mousemove 700 660 click 1 sleep 30 \
  mousemove 935 270 click 1 sleep 100 \
  key Escape sleep 1 \
  key Escape sleep 40
}

Next step might be to use a tool to analyse what is in heap.

comment:3 Changed 2 years ago by quozl

On an XO-1,

  • after five uses of the background control panel, free memory is fully consumed, and the user interface response times begins to increase,
  • after ten uses, user interface response times are more than double normal times, and swap space is being used.

A quick fix may be to force a restart after changing background.

comment:4 Changed 2 years ago by SAMdroid

Testing on sugar-build. I used a library python to view leaked objects [1]. I started the SummaryTracker in the show_section_view function like:

diff --git a/src/jarabe/controlpanel/gui.py b/src/jarabe/controlpanel/gui.py
index 442857d..8a51917 100644
--- a/src/jarabe/controlpanel/gui.py
+++ b/src/jarabe/controlpanel/gui.py
@@ -31,6 +31,7 @@ from jarabe.controlpanel.toolbar import MainToolbar
 from jarabe.controlpanel.toolbar import SectionToolbar
 from jarabe import config
 from jarabe.model import shell
+from pympler import tracker
 
 _logger = logging.getLogger('ControlPanel')
 
@@ -193,7 +194,9 @@ class ControlPanel(Gtk.Window):
         if self._section_view is not None:
             self._section_view.destroy()
             self._section_view = None
+            self._tr.print_diff()
 
+        self._tr = tracker.SummaryTracker()
         self._set_toolbar(self._main_toolbar)
         self._main_toolbar.show()
         self._set_canvas(self._scrolledwindow)

Here are the outputs:

                         types |   # objects |   total size
============================== | =========== | ============
                          list |        5343 |    543.05 KB
                           str |        5422 |    335.25 KB
                           int |         627 |     14.70 KB
                          code |          21 |      2.62 KB
                          dict |           1 |      2.52 KB
  <class 'gi.types.GObjectMeta |           2 |      1.77 KB
                         tuple |          12 |      1.14 KB
             method_descriptor |          12 |    864     B
                       weakref |           5 |    440     B
            wrapper_descriptor |           5 |    400     B
               function (undo) |           3 |    360     B
           function (__init__) |           2 |    240     B
             member_descriptor |           3 |    216     B
    function (do_get_property) |           1 |    120     B
    function (do_set_property) |           1 |    120     B
                                                    types |   # objects |   total size
========================================================= | =========== | ============
                                                      set |           1 |    232     B
                                                     code |           1 |    128     B
                                    function (store_info) |           1 |    120     B
                                                     cell |           2 |    112     B
                                                    float |           1 |     24     B
                                                     list |           0 |      8     B
                                                      str |          -1 |    -50     B
                                                    tuple |          -1 |    -56     B
                 <class 'sugar3.graphics.icon._IconBuffer |          -1 |    -64     B
                   <class 'pympler.tracker.SummaryTracker |          -1 |    -64     B
                                                 instance |          -1 |    -72     B
                    <class 'jarabe.model.buddy.BuddyModel |          -1 |    -80     B
                  <class 'jarabe.view.buddyicon.BuddyIcon |          -1 |    -80     B
  <class 'gi.repository.SugarGestures.LongPressController |          -1 |    -80     B
      <class 'sugar3.graphics.palettewindow.CursorInvoker |          -1 |    -80     B
                                types |   # objects |   total size
===================================== | =========== | ============
                                 dict |          36 |     30.84 KB
                                  str |          48 |      8.90 KB
                       instancemethod |          92 |      7.19 KB
  <class 'dbus.connection.SignalMatch |          37 |      6.94 KB
                          dbus.String |          19 |      5.23 KB
                                 list |          29 |      2.59 KB
                                tuple |          38 |      2.38 KB
                                 cell |          40 |      2.19 KB
                 function (signal_cb) |          17 |      1.99 KB
                  function (callback) |          17 |      1.99 KB
      <class 'dbus.bus.NameOwnerWatch |          17 |      1.06 KB
                      dbus.ObjectPath |           6 |    957     B
                                 long |          26 |    728     B
                       dbus.ByteArray |           2 |    662     B
                                  set |           0 |    512     B
                                types |   # objects |   total size
===================================== | =========== | ============
                                  str |          46 |      8.64 KB
  <class 'dbus.connection.SignalMatch |          38 |      7.12 KB
                                 cell |          94 |      5.14 KB
                                 list |          44 |      3.40 KB
                 function (signal_cb) |          19 |      2.23 KB
                  function (callback) |          19 |      2.23 KB
         function (msg_reply_handler) |          15 |      1.76 KB
      <class 'dbus.bus.NameOwnerWatch |          19 |      1.19 KB
                      dbus.ObjectPath |           7 |   1014     B
                  function (error_cb) |           7 |    840     B
                    functools.partial |           6 |    528     B
            dbus.lowlevel.PendingCall |          22 |    528     B
                      dbus.UTF8String |          10 |    417     B
                                 code |           3 |    384     B
             <class 'threading._RLock |           5 |    320     B

At first appearence, most of these seem to come from the neighbour hood view. I need to do a better test, but yeah.

[1] https://pythonhosted.org/Pympler/

comment:5 Changed 2 years ago by quozl

  • thanks, that's interesting. Looking forward to some more detailed tests. How well does the Pympler summary correlate to the VmSize change?
  • some repeated mouse click testing, monitoring VmSize while simulating mouse clicks;
    • Moon activity orientation button is unaffected (switches between southern and northern hemisphere view),
    • Moon activity grid button is unaffected (overlays a grid or not),
    • Moon activity tool bar is affected a tiny bit,
    • StopWatch activity tool bar sharing icon is affected a tiny bit,
  • an F1 F3 key press loop shows only 21 kbyte/sec leak on XO-1, not the 85 kbyte/sec leak on XO-4 found earlier, so either architecture or the number and complexity of activity icons matters,
  • for the record, some earlier discussion of this problem is in pr 520, with an analysis of potential relevant code by Gonzalo,

comment:6 Changed 2 years ago by quozl

  • the leak of virtual memory in the background control panel reproduces on Ubuntu 14.04 on x86_64, and Sugar 0.107.0, at a rate of 145 kbyte/sec, and when graphed is a linear ramp,
  • a stress test of Icon shows no leak, VmSize is stable, (Hello World was used, see the replaced activity.py),
  • increasing the number of icons in the test from 40 to 100 causes the VmSize to ramp up and down like a sawtooth wave as the LRU surface cache is overburdened; in icon.py it is set to 50 surfaces, and because the surfaces are different sizes in the test the result is expected,

comment:7 Changed 2 years ago by SAMdroid

It's not just the background cpanel code causing leaks, it is also the whole cpanel ui.

Look at the test results for the frame cpanel (attached). Opening and closing the frame cpanel using my mouse causes a 25kbyte/sec increase in memory usage. The frame cpanel contains just Gtk+ ranges and labels - no fancy pixbuf loading.

Changed 2 years ago by SAMdroid

comment:8 Changed 2 years ago by quozl

Agreed. However the Background panel consumes the most memory so is easier to test with. A mine canary.

I was starting to worry that it might be the unusual way that control panel section views are called, in show_section_view() in src/jarabe/controlpanel/gui.py:

        mod = __import__('.'.join(('cpsection', option, 'view')),
                         globals(), locals(), ['view'])
        view_class = getattr(mod, self._options[option]['view'], None)

But still a 175 kbyte/sec leak occurs if the background panel is modified to rescan and reload the images in the Gtk.ListView when the "Clear background" button is pressed. See first file changed in my test patch. That's not closing and opening the panel.

comment:9 Changed 2 years ago by quozl

Comparing an XO-1 with OLPC OS 13.2.1 with Sugar 0.98 against OLPC OS 13.2.6 with Sugar 0.107.0, using only repeated frame control panel entry and exit, automated using xdotool.

OLPC OS and SugarVmSize Leak KBytes/second
13.2.1 and 0.983.711
13.2.6 and 0.107.01.887

The leak is reduced by later OLPC OS and Sugar version.

comment:10 Changed 2 years ago by quozl

  • Resolution set to notsugar
  • Status changed from new to closed

Testing with valgrind confirms many pygobject memory leaks exist.

Interesting tickets;

Doesn't seem to be caused by Sugar. Closing ticket.

Note: See TracTickets for help on using tickets.