Ticket #1407 (assigned defect)

Opened 4 years ago

Last modified 9 months ago

Blanked metadata data-store entry damage (possibly caused during a shell crash)

Reported by: garycmartin Owned by: martin.langhoff
Priority: Unspecified by Maintainer Milestone: Unspecified by Release Team
Component: sugar-datastore Version: Git as of bugdate
Severity: Critical Keywords:
Cc: sascha_silbe, erikos Distribution/OS: Unspecified
Bug Status: Needinfo

Description

I've twice seen a case where a single data-store entry is corrupted by having its metadata files zeroed out. This causes the Journal to display it as a MIME data default document icon, with no title, and that randomly cycles through different colour schemes as you move the mouse cursor over it. You can not access it's details view, and it has no palette so you can't erase it via the UI. With a recent (today) build it's also now showing a 0% download grey bar as well.

Finding and looking at the data-store entry in both cases showed the data was valid and intact (one case was a TA project, the other a Labyrinth mind-map). Looking at their metadata files, all were zero bytes, except for the checksum that looked like a valid hash.

I think in both cases Sugar had previously crashed (sugar-jhbuild and either a full Fedora crash or a Xephyr black screen requiring a force quit). But am not 100% sure as I spotted the Journal entries a while later. I think it likely that the specific journal entry was also resumed at the time.

Couple of screens shots attached. Will try to reproduce and get some logs to post if possible.

Attachments

journal_showing_broken_metadata_datastor_entry.png Download (82.2 KB) - added by garycmartin 4 years ago.
broken_metadata_datastore_entry.png Download (79.1 KB) - added by garycmartin 4 years ago.
0001-trace-update-calls.patch Download (1.1 KB) - added by sascha_silbe 4 years ago.
trace update() calls

Change History

Changed 4 years ago by garycmartin

Changed 4 years ago by garycmartin

in reply to: ↑ description ; follow-up: ↓ 2   Changed 4 years ago by sascha_silbe

  • owner changed from tomeu to sascha_silbe
  • status changed from new to accepted
  • severity changed from Major to Critical

Replying to garycmartin:

I've twice seen a case where a single data-store entry is corrupted by having its metadata files zeroed out. This causes the Journal to display it as a MIME data default document icon, with no title, and that randomly cycles through different colour schemes as you move the mouse cursor over it. You can not access it's details view, and it has no palette so you can't erase it via the UI. With a recent (today) build it's also now showing a 0% download grey bar as well.

There are actually two issues then:
a) datastore entry getting corrupted somehow
b) Journal behaving funkily on entries with unexpected (but syntactically valid) metadata
I suggest to open a new bug about the second isssue.

Finding and looking at the data-store entry in both cases showed the data was valid and intact (one case was a TA project, the other a Labyrinth mind-map). Looking at their metadata files, all were zero bytes, except for the checksum that looked like a valid hash.

The checksum is set by optimizer.py using metadatastore..set_property(). So there are two possible "culprits":
a) "GUI"-side (shell / activity (framework))
b) datastore

I'll prepare a patch that traces update() calls. Please apply this patch, set SUGAR_LOGGER_LEVEL to trace (or all) and report back with the update() call for the broken entry when you encountered the bug again. Any recipe for reproducing this bug would be even better, of course. :)

I think in both cases Sugar had previously crashed (sugar-jhbuild and either a full Fedora crash or a Xephyr black screen requiring a force quit).

A full-machine (=kernel) crash would be a likely candidate for these symptoms. By default, ext3/4 only ensures metadata (=directory entries) integrity, but _not_ file content (=content of our metadata entries) integrity. I'm using data=journal for virtually all of my filesystems for exactly this reason.
We could modify metadatastore to ensure file content integrity even without data journalling (create new metadata directory, fsync() contents after writing, move new directory in place) but it would be a quite invasive patch, so
a) it won't make it into 0.86 and
b) I'm not sure it's worth the effort (with version support the impact will be much smaller and also easier to handle).

in reply to: ↑ 1   Changed 4 years ago by tomeu

Replying to sascha_silbe:

We could modify metadatastore to ensure file content integrity even without data journalling (create new metadata directory, fsync() contents after writing, move new directory in place) but it would be a quite invasive patch, so
a) it won't make it into 0.86 and
b) I'm not sure it's worth the effort (with version support the impact will be much smaller and also easier to handle).

c) if sugar can tolerate entries without any metadata, may be better to at least have the data file.

Though, as Gary has mentioned that he has had several cases of this particular failure, I would suspect a bug in our DS, rather than general system flakiness.

Changed 4 years ago by sascha_silbe

trace update() calls

  Changed 4 years ago by sascha_silbe

  • status_field changed from Unconfirmed to Needinfo

Patch attached, please report back when you reproduced the issue.

  Changed 4 years ago by garycmartin

Patched and awaiting re-occurance (actually F11 KP'ed on me earlier today while I was setting up this test).

  Changed 9 months ago by martin.langhoff

  • cc sascha_silbe, erikos added
  • owner changed from sascha_silbe to martin.langhoff
  • status changed from accepted to assigned

This seems to be exactly the issue addressed by  http://lists.sugarlabs.org/archive/sugar-devel/2012-September/039729.html

Note: See TracTickets for help on using tickets.