Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#4061 closed defect (fixed)

Global gesture: can crash Sugar under certain circumstances

Reported by: erikos Owned by: garnacho
Priority: Immediate Milestone:
Component: Sugar Version: 0.97.x
Severity: Blocker Keywords: r+ olpc-test-passed
Cc: garnacho, dsd Distribution/OS: OLPC
Bug Status: Assigned

Description

Seen in: os7

Steps to reproduce:

  • start Sugar
  • tap on the canvas in the Home View
  • tap on the Owner (XO) icon
  • tap on the canvas in the Home View

Attachments (2)

xorg-valgrind (35.7 KB) - added by garnacho 12 years ago.
excerpt of valgrind logs for reference, all entries happen during the crashing operation
0001-Sync-TouchListener-memory-allocation-with-population.patch (1.3 KB) - added by garnacho 12 years ago.
xserver patch

Download all attachments as: .zip

Change History (18)

comment:1 Changed 12 years ago by erikos

  • Cc garnacho added

The Sugar crash does not happen if I disable the global gesture code.

comment:2 Changed 12 years ago by dsd

  • Cc dsd added

This is not a crash as such - the process exits with code 1 after the final touch is made. No signal is raised.
Nothing obvious in the logs. I estimated that libX11 might be calling exit and indeed this is the case:

Breakpoint 3, _XIOError (warning: Unable to fetch general register.
warning: Unable to fetch general register.
dpy=dpy@entry=<unavailable>) at XlibInt.c:1601
1601	{
(gdb) bt
#0  _XIOError (dpy=dpy@entry=<unavailable>) at XlibInt.c:1601
#1  0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>)
    at xcb_io.c:365
#2  0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>)
    at xcb_io.c:365
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Setting a breakpoint on filter_function in sugar-gesture-grabber shows that this function is not called on the "crash touch" - the process exits before sugar-gesture-grabber is called into.

comment:3 Changed 12 years ago by garnacho

Apparently we've been looking things from the wrong perspective... I traced that error so completely legit pipe errors due to X crashing :). From what I can see it always crashes around libc memory functions (malloc/calloc/free) in random (although related to touch) places.

From my experience this is a memory corruption issue, it would be great to be able to start X under valgrind, although I need to learn 1) how to disable X autorestart and 2) start manually everything as if running from the session script, I'll be trying to replicate on my development machine too, looks like the presence of both a passive touch grab and an active grab have this triggered.

comment:4 Changed 12 years ago by dsd

Yes, its an X crash. I got this trace with debuginfo installed:

(gdb) bt
#0  0xb6a4ee38 in raise () from /lib/libc.so.6
#1  0xb6a50478 in abort () from /lib/libc.so.6
#2  0xb6a8b4e0 in ?? () from /lib/libc.so.6
#3  0xb6a91c08 in ?? () from /lib/libc.so.6
#4  0xb6a94788 in ?? () from /lib/libc.so.6
#5  0xb6a97428 in calloc () from /lib/libc.so.6
q
#6  0x0013a9b4 in ProcXIQueryPointer (client=0x3961c8) at xiquerypointer.c:153
#7  0x0013011c in ProcIDispatch (client=<optimized out>) at extinit.c:406
#8  0x000388b8 in Dispatch () at dispatch.c:428
#9  0x00028138 in main (argc=8, argv=0x28138 <main+1052>, envp=<optimized out>)
    at main.c:295

The crash site (calloc) plus the fact that the other trace I took being different agrees - this must be some kind of memory corruption.

To stop X restarting you can do:

 systemctl stop olpc-dm.service

To start X just once via the normal display-manager path (integrating with PAM and all that):

olpc-dm

To run Sugar in a more bare bones environment:

X # on one terminal
DISPLAY=:0 sugar-session # on another terminal

(I can't reproduce the crash under that environment)

Running X under valgrind doesn't seem like it is going to work on ARM. Take a simple test:

bash-4.2# valgrind X
==2638== Memcheck, a memory error detector
==2638== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==2638== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==2638== Command: X
==2638== 
disInstr(arm): unhandled instruction: 0xECECA102
                 cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2)
==2638== valgrind: Unrecognised instruction at address 0x4018638.
==2638==    at 0x4018638: ??? (in /usr/lib/ld-2.16.so)
==2638== Your program just tried to execute an instruction that Valgrind
==2638== did not recognise.  There are two possible reasons for this.
==2638== 1. Your program has a bug and erroneously jumped to a non-code
==2638==    location.  If you are running Memcheck and you just saw a
==2638==    warning about a bad jump, it's probably your program's fault.
==2638== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2638==    i.e. it's Valgrind's fault.  If you think this is the case or
==2638==    you are not sure, please let us know and we'll try to fix it.
==2638== Either way, Valgrind will now raise a SIGILL signal which will
==2638== probably kill your program.

Maybe you will have more luck debugging this on your development machine (x86, I assume).

comment:5 Changed 12 years ago by garnacho

I've made some mild progress throughout the day, was able to get valgrind logs of the crashing operation, and there are certainly invalid writes on dix/touch.c and Xi/exevents.c, concretely around the TouchListener array in the TouchPointInfoPtr struct, possibly something's going out of bounds and stepping over other memory. I'm compiling an xserver with extra debugging traces

Changed 12 years ago by garnacho

excerpt of valgrind logs for reference, all entries happen during the crashing operation

Changed 12 years ago by garnacho

xserver patch

comment:6 Changed 12 years ago by garnacho

This patch fixes the memory corruption, has been sent already to the xorg-devel mailing list for review

comment:8 Changed 12 years ago by erikos

  • Keywords r+ added

Patch accepted upstream: http://lists.x.org/archives/xorg-devel/2012-October/034166.html

Will hopefully be available soon.

comment:9 Changed 11 years ago by erikos

Not merged: http://cgit.freedesktop.org/xorg/xserver/ Carlos please nag them to actually include the fix as well.

comment:10 Changed 11 years ago by dsd

Shipped in https://admin.fedoraproject.org/updates/xorg-x11-server-1.13.0-7.fc18 - just waiting for an ARM build.

Also, Peter H maintains his own input tree and merges regularly with master. So I would not worry about the lack of visibility in master yet, it should happen soon.

comment:11 Changed 11 years ago by erikos

  • Owner changed from erikos to garnacho
  • Status changed from new to assigned

comment:12 Changed 11 years ago by dsd

  • Keywords olpc-test-pending added

Fixed in xorg-x11-server-1.13.0-7.fc18, leaving the ticket open until the patch appears in master.

comment:13 Changed 11 years ago by dsd

  • Resolution set to fixed
  • Status changed from assigned to closed

This is now in xserver master and proposed for inclusion in 1.13.

comment:14 Changed 11 years ago by greenfeld

  • Keywords olpc-test-passed added; olpc-test-pending removed

xorg-x11-server-Xorg-1.13.0-11.fc18 is in 13.1.0 os20 for XO-4, and I could not reproduce this issue.

comment:15 Changed 11 years ago by dnarvaez

  • Component changed from sugar-toolkit-gtk3 to Sugar

comment:16 Changed 11 years ago by dnarvaez

  • Milestone 0.98 deleted

Milestone 0.98 deleted

Note: See TracTickets for help on using tickets.