Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#4061 closed defect (fixed)

Global gesture: can crash Sugar under certain circumstances

Reported by: erikos Owned by: garnacho
Priority: Immediate Milestone:
Component: Sugar Version: 0.97.x
Severity: Blocker Keywords: r+ olpc-test-passed
Cc: garnacho, dsd Distribution/OS: OLPC
Bug Status: Assigned

Description

Seen in: os7

Steps to reproduce:

  • start Sugar
  • tap on the canvas in the Home View
  • tap on the Owner (XO) icon
  • tap on the canvas in the Home View

Attachments (2)

xorg-valgrind (35.7 KB) - added by garnacho 8 years ago.
excerpt of valgrind logs for reference, all entries happen during the crashing operation
0001-Sync-TouchListener-memory-allocation-with-population.patch (1.3 KB) - added by garnacho 8 years ago.
xserver patch

Download all attachments as: .zip

Change History (18)

comment:1 Changed 9 years ago by erikos

  • Cc garnacho added

The Sugar crash does not happen if I disable the global gesture code.

comment:2 Changed 9 years ago by dsd

  • Cc dsd added

This is not a crash as such - the process exits with code 1 after the final touch is made. No signal is raised.
Nothing obvious in the logs. I estimated that libX11 might be calling exit and indeed this is the case:

Breakpoint 3, _XIOError (warning: Unable to fetch general register.
warning: Unable to fetch general register.
dpy=dpy@entry=<unavailable>) at XlibInt.c:1601
1601	{
(gdb) bt
#0  _XIOError (dpy=dpy@entry=<unavailable>) at XlibInt.c:1601
#1  0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>)
    at xcb_io.c:365
#2  0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>)
    at xcb_io.c:365
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Setting a breakpoint on filter_function in sugar-gesture-grabber shows that this function is not called on the "crash touch" - the process exits before sugar-gesture-grabber is called into.

comment:3 Changed 9 years ago by garnacho

Apparently we've been looking things from the wrong perspective... I traced that error so completely legit pipe errors due to X crashing :). From what I can see it always crashes around libc memory functions (malloc/calloc/free) in random (although related to touch) places.

From my experience this is a memory corruption issue, it would be great to be able to start X under valgrind, although I need to learn 1) how to disable X autorestart and 2) start manually everything as if running from the session script, I'll be trying to replicate on my development machine too, looks like the presence of both a passive touch grab and an active grab have this triggered.

comment:4 Changed 9 years ago by dsd

Yes, its an X crash. I got this trace with debuginfo installed:

(gdb) bt
#0  0xb6a4ee38 in raise () from /lib/libc.so.6
#1  0xb6a50478 in abort () from /lib/libc.so.6
#2  0xb6a8b4e0 in ?? () from /lib/libc.so.6
#3  0xb6a91c08 in ?? () from /lib/libc.so.6
#4  0xb6a94788 in ?? () from /lib/libc.so.6
#5  0xb6a97428 in calloc () from /lib/libc.so.6
q
#6  0x0013a9b4 in ProcXIQueryPointer (client=0x3961c8) at xiquerypointer.c:153
#7  0x0013011c in ProcIDispatch (client=<optimized out>) at extinit.c:406
#8  0x000388b8 in Dispatch () at dispatch.c:428
#9  0x00028138 in main (argc=8, argv=0x28138 <main+1052>, envp=<optimized out>)
    at main.c:295

The crash site (calloc) plus the fact that the other trace I took being different agrees - this must be some kind of memory corruption.

To stop X restarting you can do:

 systemctl stop olpc-dm.service

To start X just once via the normal display-manager path (integrating with PAM and all that):

olpc-dm

To run Sugar in a more bare bones environment:

X # on one terminal
DISPLAY=:0 sugar-session # on another terminal

(I can't reproduce the crash under that environment)

Running X under valgrind doesn't seem like it is going to work on ARM. Take a simple test:

bash-4.2# valgrind X
==2638== Memcheck, a memory error detector
==2638== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==2638== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==2638== Command: X
==2638== 
disInstr(arm): unhandled instruction: 0xECECA102
                 cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2)
==2638== valgrind: Unrecognised instruction at address 0x4018638.
==2638==    at 0x4018638: ??? (in /usr/lib/ld-2.16.so)
==2638== Your program just tried to execute an instruction that Valgrind
==2638== did not recognise.  There are two possible reasons for this.
==2638== 1. Your program has a bug and erroneously jumped to a non-code
==2638==    location.  If you are running Memcheck and you just saw a
==2638==    warning about a bad jump, it's probably your program's fault.
==2638== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2638==    i.e. it's Valgrind's fault.  If you think this is the case or
==2638==    you are not sure, please let us know and we'll try to fix it.
==2638== Either way, Valgrind will now raise a SIGILL signal which will
==2638== probably kill your program.

Maybe you will have more luck debugging this on your development machine (x86, I assume).

comment:5 Changed 8 years ago by garnacho

I've made some mild progress throughout the day, was able to get valgrind logs of the crashing operation, and there are certainly invalid writes on dix/touch.c and Xi/exevents.c, concretely around the TouchListener array in the TouchPointInfoPtr struct, possibly something's going out of bounds and stepping over other memory. I'm compiling an xserver with extra debugging traces

Changed 8 years ago by garnacho

excerpt of valgrind logs for reference, all entries happen during the crashing operation

Changed 8 years ago by garnacho

xserver patch

comment:6 Changed 8 years ago by garnacho

This patch fixes the memory corruption, has been sent already to the xorg-devel mailing list for review

comment:8 Changed 8 years ago by erikos

  • Keywords r+ added

Patch accepted upstream: http://lists.x.org/archives/xorg-devel/2012-October/034166.html

Will hopefully be available soon.

comment:9 Changed 8 years ago by erikos

Not merged: http://cgit.freedesktop.org/xorg/xserver/ Carlos please nag them to actually include the fix as well.

comment:10 Changed 8 years ago by dsd

Shipped in https://admin.fedoraproject.org/updates/xorg-x11-server-1.13.0-7.fc18 - just waiting for an ARM build.

Also, Peter H maintains his own input tree and merges regularly with master. So I would not worry about the lack of visibility in master yet, it should happen soon.

comment:11 Changed 8 years ago by erikos

  • Owner changed from erikos to garnacho
  • Status changed from new to assigned

comment:12 Changed 8 years ago by dsd

  • Keywords olpc-test-pending added

Fixed in xorg-x11-server-1.13.0-7.fc18, leaving the ticket open until the patch appears in master.

comment:13 Changed 8 years ago by dsd

  • Resolution set to fixed
  • Status changed from assigned to closed

This is now in xserver master and proposed for inclusion in 1.13.

comment:14 Changed 8 years ago by greenfeld

  • Keywords olpc-test-passed added; olpc-test-pending removed

xorg-x11-server-Xorg-1.13.0-11.fc18 is in 13.1.0 os20 for XO-4, and I could not reproduce this issue.

comment:15 Changed 8 years ago by dnarvaez

  • Component changed from sugar-toolkit-gtk3 to Sugar

comment:16 Changed 8 years ago by dnarvaez

  • Milestone 0.98 deleted

Milestone 0.98 deleted

Note: See TracTickets for help on using tickets.