Opened 11 years ago

Closed 8 years ago

#1566 closed defect (fixed)

Fix metadata corruption

Reported by: sayamindu Owned by: martin.langhoff
Priority: Unspecified by Maintainer Milestone: Unspecified
Component: Sugar Version: Unspecified
Severity: Unspecified Keywords: r!
Cc: erikos, sascha_silbe Distribution/OS: Unspecified
Bug Status: Unconfirmed

Description (last modified by martin.langhoff)

(Originally titled: Call fsync() for metadata writes)

While the ds index is force flushed every 60 seconds, the metadata files can often be in a non-flushed state for an undefined time interval. A force power off in such a situation makes the Journal bring up odd looking entries.

Attachments (1)

dslo_1566.patch (468 bytes) - added by sayamindu 11 years ago.
Proposed patch

Download all attachments as: .zip

Change History (7)

Changed 11 years ago by sayamindu

Proposed patch

comment:1 Changed 11 years ago by sayamindu

  • Keywords r? added

comment:2 Changed 11 years ago by sascha_silbe

  • Keywords r! added; r? removed

While I can see some merit in this patch, I'm worried about performance on XO-1 and XO-1.5. It would be nice if you could provide some data points (i.e. how long it takes to save an entry a) with and b) without syncing).
fdatasync() should be enough, BTW, we don't rely on file system level metadata (mtime, atime, etc.) in any way.

comment:3 Changed 11 years ago by erikos

Any progress on this one?

comment:4 Changed 9 years ago by martin.langhoff

  • Cc erikos sascha_silbe added
  • Description modified (diff)
  • Owner changed from alsroot to martin.langhoff
  • Status changed from new to assigned
  • Summary changed from Call fsync() for metadata writes to Fix metadata corruption

Re-titling to reflect the problem instead of a diagnosis that was likely premature/faulty.

After working on this codebase on the patchseries addressing #2317 and related dataloss/datacorruption issues with the datastore, my tendency is to NAK the proposed patch and the whole logic behind it.

The original report talks about "the Journal bringing up odd-looking entries", I take it to mean that we are working on corrupt/mismatched metadata between index and metadatastore.

A likely root cause for this is the implementation of propery storage in metadatastore.py ; a patch addressing this is at http://lists.sugarlabs.org/archive/sugar-devel/2012-September/039729.html , and commentary on the erroneous use of try/except/finally at http://lists.sugarlabs.org/archive/sugar-devel/2012-September/039746.html . The metadatastore implementation is rather prone to data corruption / loss.

Once the writes to metadatastore are done defensively (atomically, in the patch linked), it is valid to ask whether it is worth calling fsync() to keep in rough sync with the index. The answer is no: all the filesystems used in Linux today have fsync() on a per-filesystem basis. All "dirty" buffers for the relevant filesystem are written out, not just the filehandle you passed.

So if the metadata changes are pending to be flushed when the index is flush()ed, the fsync() that Xapian invokes will _also_ flush metadata changes to disk.

I'll also note that currently we flush the Xapian db to disk every 5s, much lower than the 60s in use when originally reported.

comment:5 Changed 8 years ago by dnarvaez

  • Component changed from sugar-datastore to Sugar

comment:6 Changed 8 years ago by dnarvaez

  • Resolution set to fixed
  • Status changed from assigned to closed

It sounds like we are doing the best we can do these days.

Note: See TracTickets for help on using tickets.