Opened 13 years ago
Closed 13 years ago
#1940 closed defect (fixed)
failed registration breaks registration for the session
Reported by: | dsd | Owned by: | tomeu |
---|---|---|---|
Priority: | Unspecified by Maintainer | Milestone: | Unspecified |
Component: | Sugar | Version: | Unspecified |
Severity: | Unspecified | Keywords: | olpc-0.84 r+ dextrose |
Cc: | jasg, bernie, sridhar | Distribution/OS: | Unspecified |
Bug Status: | Unconfirmed |
Description
Confirmed on Sugar 0.82 and 0.84:
- Start Sugar for the first time
- Don't connect to a network
- Attempt to register (it fails, as expected)
- Connect to a network with a schoolserver
- Attempt to register
The registration still fails, with error:
Registration: cannot connect to server: [Errno -3] Temporary failure in name resolution
The registration will continue to fail until you restart Sugar, and attempt the registration only after connecting to the network.
This bug makes registration in a classroom unreasonably hard. :(
Attachments (2)
Change History (23)
comment:1 Changed 13 years ago by bernie
- Cc jasg bernie added
comment:2 Changed 13 years ago by bernie
This bug also affects the Sugar update control panel applet.
comment:3 Changed 13 years ago by martin.langhoff
This has been with us for a long time -- see http://dev.laptop.org/ticket/6857
I just spotted this patch to Anaconda, which seems to deal with the exact same prob. Any cached curl objects need to be destroyed after a change of network...
https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00329.html
comment:4 Changed 13 years ago by quozl
This is because /etc/resolv.conf is changed since the shell first read it, and the socket.create_connection() call in httplib is using libc cached resolv.conf data.
jarabe.desktop.schoolserver.register_laptop() calls
xmlrpclib.ServerProxy() which calls
xmlrpclib.Transport.make_connection() which calls
httplib.HTTPConnection.connect() which calls
socket.create_connection() with the hostname.
Solutions might be:
- convincing python to call res_init() again,
- moving this function to a process that is created at the time it is required,
- patching glibc to stat() resolv.conf and notice changes, which is what some distributions do.
comment:5 follow-up: ↓ 6 Changed 13 years ago by quozl
Thanks to a contributor, Python may be convinced to call res_init() again, and this is an unpleasant but working hack to fix the problem.
The simplified test case that fails is:
- use Sugar to disconnect network,
- start Terminal,
- start an instance of Python, and type interactively:
import socket host = 'schoolserver.example.com' port = 80 x = socket.create_connection((host, port))
- observe the error in connecting to the host, it will be "Temporary failure in name resolution",
- without terminating the Python instance, use Sugar to connect to the network,
- repeat the connection attempt
x = socket.create_connection((host, port))
- observe that the problem persists.
At this point, the unpleasant workaround can be used:
- type the following into the Python instance to clear the cached resolver data:
import ctypes ctypes.CDLL('libc6.so.6').__res_init(None)
- repeat the connection attempt,
x = socket.create_connection((host, port))
- note that it is now successful.
I'd like to know if this works as a temporary fix for deployments. The two ctypes lines would be inserted in schoolserver.py prior to the call to ServerProxy().
comment:6 in reply to: ↑ 5 Changed 13 years ago by bernie
Replying to quozl:
I'd like to know if this works as a temporary fix for deployments. The two ctypes lines would be inserted in schoolserver.py prior to the call to ServerProxy().
We'd also need to fix the activity updater, which has the same bug.
Pretty much any place where we do hostname lookups would have to be patched :-(
comment:8 in reply to: ↑ 7 Changed 13 years ago by bernie
- Keywords r? removed
comment:9 Changed 13 years ago by martin.langhoff
- Keywords r? added
Reading Quozl's findings in detail, I think that we need to reset the resolver cache when we get the msg from NM that the connection has been setup correctly.
That message is caught by various classes that handle connection types, and then routed to model/network.py . NMSettings.set_connected() is the place.
Unfortunately, the ctypes syntactic sugar that lets you call ctypes.CDLL('libc6.so.6').res_init(None) directly doesn't work here. So we have to look up the pointer explicitly. Not a big deal.
Patch attached. Passes tests when applied on top of os207.
Review?
Changed 13 years ago by martin.langhoff
comment:10 Changed 13 years ago by martin.langhoff
Quozl was asking for a test script -- this is given by Daniel in the original report. Scroll up to the original bug description :-)
comment:11 follow-up: ↓ 13 Changed 13 years ago by dsd
This is perhaps slightly controversial because it will make sugar depend on python 2.5 or newer. I think that's fine, but we should probably advise people on the mailing list first and see if we're missing anything.
comment:12 Changed 13 years ago by sridhar
- Cc sridhar added
comment:13 in reply to: ↑ 11 Changed 13 years ago by bernie
Replying to dsd:
This is perhaps slightly controversial because it will make sugar depend on python 2.5 or newer. I think that's fine, but we should probably advise people on the mailing list first and see if we're missing anything.
FWIW, I'd be in favor of requiring even Python 2.6 to run Sugar. The burden of backpoerting Sugar to ancient operating systems should be carried entirely by those who think it's a sound idea.
comment:14 Changed 13 years ago by martin.langhoff
If this is a concern... move the import into the try block. Or wrap it right where it is.
( Surprised that 0.84 is meant to work with Python 2.4 :-) )
Any other comments on the patch?
comment:15 Changed 13 years ago by tomeu
- Keywords r+ added; r? removed
+ res_init = getattr(libc, 'res_init')
Here https://bugzilla.redhat.com/show_bug.cgi?id=354071#c9 using "res_init" instead of "res_init" is recommended.
+ logging.error('Error calling libc.res_init')
Better use logging.exception so we get a traceback.
A bad call through ctypes can cause the whole shell process to crash, so please make sure we can be reasonably confident that that won't happen.
r+ with those concerns addressed, please push.
For the future, please attach the patch created with git-format-patch and in general follow the process in http://wiki.sugarlabs.org/go/Development_Team/Code_Review
comment:16 Changed 13 years ago by dsd
As res_init is not a symbol in current glibc, its just a define, it cant be called with ctypes. res_init is the actual symbol. This won't change without a change in the soversion (I hope).
comment:17 Changed 13 years ago by bernie
- Keywords dextrose added
Tincho rebased Martin Langhoff's patch on Sugar 0.88 for Dextrose. I could test it today and it seems to work great.
comment:18 Changed 13 years ago by dsd
Please post that patch here. The one posted above doesn't apply to master branch.
comment:19 Changed 13 years ago by bernie
Martin, your patch is ack'd.
Shall I commit my reworked version for 0.88 or would you like commit access so you can do it yourself?
comment:20 Changed 13 years ago by erikos
The patch is ack here and I tested it thoroughly in 0.84. I think we can push it.
comment:21 Changed 13 years ago by erikos
- Resolution set to fixed
- Status changed from new to closed
Confirmed on os140py (sugar 0.84.15). Jorge is working on it.