Opened 9 years ago

Closed 8 years ago

#1940 closed defect (fixed)

failed registration breaks registration for the session

Reported by: dsd Owned by: tomeu
Priority: Unspecified by Maintainer Milestone: Unspecified
Component: Sugar Version: Unspecified
Severity: Unspecified Keywords: olpc-0.84 r+ dextrose
Cc: jasg, bernie, sridhar Distribution/OS: Unspecified
Bug Status: Unconfirmed

Description

Confirmed on Sugar 0.82 and 0.84:

  1. Start Sugar for the first time
  2. Don't connect to a network
  3. Attempt to register (it fails, as expected)
  4. Connect to a network with a schoolserver
  5. Attempt to register

The registration still fails, with error:

Registration: cannot connect to server: [Errno -3] Temporary failure in name resolution

The registration will continue to fail until you restart Sugar, and attempt the registration only after connecting to the network.

This bug makes registration in a classroom unreasonably hard. :(

Attachments (2)

network.py.patch (863 bytes) - added by martin.langhoff 8 years ago.
sl1940-register-session-failed-fix.patch (1.2 KB) - added by bernie 8 years ago.
fix for sugar 0.88

Download all attachments as: .zip

Change History (23)

comment:1 Changed 9 years ago by bernie

  • Cc jasg bernie added

Confirmed on os140py (sugar 0.84.15). Jorge is working on it.

comment:2 Changed 9 years ago by bernie

This bug also affects the Sugar update control panel applet.

comment:3 Changed 9 years ago by martin.langhoff

This has been with us for a long time -- see http://dev.laptop.org/ticket/6857

I just spotted this patch to Anaconda, which seems to deal with the exact same prob. Any cached curl objects need to be destroyed after a change of network...

https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00329.html

comment:4 Changed 9 years ago by quozl

This is because /etc/resolv.conf is changed since the shell first read it, and the socket.create_connection() call in httplib is using libc cached resolv.conf data.

jarabe.desktop.schoolserver.register_laptop() calls
xmlrpclib.ServerProxy() which calls
xmlrpclib.Transport.make_connection() which calls
httplib.HTTPConnection.connect() which calls
socket.create_connection() with the hostname.

Solutions might be:

  • convincing python to call res_init() again,
  • moving this function to a process that is created at the time it is required,
  • patching glibc to stat() resolv.conf and notice changes, which is what some distributions do.

comment:5 follow-up: Changed 9 years ago by quozl

Thanks to a contributor, Python may be convinced to call res_init() again, and this is an unpleasant but working hack to fix the problem.

The simplified test case that fails is:

  • use Sugar to disconnect network,
  • start Terminal,
  • start an instance of Python, and type interactively:
    import socket
    host = 'schoolserver.example.com'
    port = 80
    x = socket.create_connection((host, port))
    
  • observe the error in connecting to the host, it will be "Temporary failure in name resolution",
  • without terminating the Python instance, use Sugar to connect to the network,
  • repeat the connection attempt
    x = socket.create_connection((host, port))
    
  • observe that the problem persists.

At this point, the unpleasant workaround can be used:

  • type the following into the Python instance to clear the cached resolver data:
    import ctypes
    ctypes.CDLL('libc6.so.6').__res_init(None)
    
  • repeat the connection attempt,
    x = socket.create_connection((host, port))
    
  • note that it is now successful.

I'd like to know if this works as a temporary fix for deployments. The two ctypes lines would be inserted in schoolserver.py prior to the call to ServerProxy().

comment:6 in reply to: ↑ 5 Changed 9 years ago by bernie

Replying to quozl:

I'd like to know if this works as a temporary fix for deployments. The two ctypes lines would be inserted in schoolserver.py prior to the call to ServerProxy().

We'd also need to fix the activity updater, which has the same bug.

Pretty much any place where we do hostname lookups would have to be patched :-(

comment:7 follow-up: Changed 9 years ago by bernie

  • Keywords r? olpc-0.84 added

comment:8 in reply to: ↑ 7 Changed 9 years ago by bernie

  • Keywords r? removed

Replying to bernie:

http://patchwork.sugarlabs.org/patch/42/

I'm an idiot, this patch is for Anaconda, not Sugar :-)

comment:9 Changed 8 years ago by martin.langhoff

  • Keywords r? added

Reading Quozl's findings in detail, I think that we need to reset the resolver cache when we get the msg from NM that the connection has been setup correctly.

That message is caught by various classes that handle connection types, and then routed to model/network.py . NMSettings.set_connected() is the place.

Unfortunately, the ctypes syntactic sugar that lets you call ctypes.CDLL('libc6.so.6').res_init(None) directly doesn't work here. So we have to look up the pointer explicitly. Not a big deal.

Patch attached. Passes tests when applied on top of os207.

Review?

Changed 8 years ago by martin.langhoff

comment:10 Changed 8 years ago by martin.langhoff

Quozl was asking for a test script -- this is given by Daniel in the original report. Scroll up to the original bug description :-)

comment:11 follow-up: Changed 8 years ago by dsd

This is perhaps slightly controversial because it will make sugar depend on python 2.5 or newer. I think that's fine, but we should probably advise people on the mailing list first and see if we're missing anything.

comment:12 Changed 8 years ago by sridhar

  • Cc sridhar added

comment:13 in reply to: ↑ 11 Changed 8 years ago by bernie

Replying to dsd:

This is perhaps slightly controversial because it will make sugar depend on python 2.5 or newer. I think that's fine, but we should probably advise people on the mailing list first and see if we're missing anything.

FWIW, I'd be in favor of requiring even Python 2.6 to run Sugar. The burden of backpoerting Sugar to ancient operating systems should be carried entirely by those who think it's a sound idea.

comment:14 Changed 8 years ago by martin.langhoff

If this is a concern... move the import into the try block. Or wrap it right where it is.

( Surprised that 0.84 is meant to work with Python 2.4 :-) )

Any other comments on the patch?

comment:15 Changed 8 years ago by tomeu

  • Keywords r+ added; r? removed

+ res_init = getattr(libc, 'res_init')

Here https://bugzilla.redhat.com/show_bug.cgi?id=354071#c9 using "res_init" instead of "res_init" is recommended.

+ logging.error('Error calling libc.res_init')

Better use logging.exception so we get a traceback.

A bad call through ctypes can cause the whole shell process to crash, so please make sure we can be reasonably confident that that won't happen.

r+ with those concerns addressed, please push.

For the future, please attach the patch created with git-format-patch and in general follow the process in http://wiki.sugarlabs.org/go/Development_Team/Code_Review

comment:16 Changed 8 years ago by dsd

As res_init is not a symbol in current glibc, its just a define, it cant be called with ctypes. res_init is the actual symbol. This won't change without a change in the soversion (I hope).

comment:17 Changed 8 years ago by bernie

  • Keywords dextrose added

Tincho rebased Martin Langhoff's patch on Sugar 0.88 for Dextrose. I could test it today and it seems to work great.

comment:18 Changed 8 years ago by dsd

Please post that patch here. The one posted above doesn't apply to master branch.

Changed 8 years ago by bernie

fix for sugar 0.88

comment:19 Changed 8 years ago by bernie

Martin, your patch is ack'd.

Shall I commit my reworked version for 0.88 or would you like commit access so you can do it yourself?

comment:20 Changed 8 years ago by erikos

The patch is ack here and I tested it thoroughly in 0.84. I think we can push it.

comment:21 Changed 8 years ago by erikos

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.