Opened 14 years ago
Closed 10 years ago
#1566 closed defect (fixed)
Fix metadata corruption
Reported by: | sayamindu | Owned by: | martin.langhoff |
---|---|---|---|
Priority: | Unspecified by Maintainer | Milestone: | Unspecified |
Component: | Sugar | Version: | Unspecified |
Severity: | Unspecified | Keywords: | r! |
Cc: | erikos, sascha_silbe | Distribution/OS: | Unspecified |
Bug Status: | Unconfirmed |
Description (last modified by martin.langhoff)
(Originally titled: Call fsync() for metadata writes)
While the ds index is force flushed every 60 seconds, the metadata files can often be in a non-flushed state for an undefined time interval. A force power off in such a situation makes the Journal bring up odd looking entries.
Attachments (1)
Change History (7)
Changed 14 years ago by sayamindu
comment:1 Changed 14 years ago by sayamindu
- Keywords r? added
comment:2 Changed 14 years ago by sascha_silbe
- Keywords r! added; r? removed
While I can see some merit in this patch, I'm worried about performance on XO-1 and XO-1.5. It would be nice if you could provide some data points (i.e. how long it takes to save an entry a) with and b) without syncing).
fdatasync() should be enough, BTW, we don't rely on file system level metadata (mtime, atime, etc.) in any way.
comment:3 Changed 14 years ago by erikos
Any progress on this one?
comment:4 Changed 11 years ago by martin.langhoff
- Cc erikos sascha_silbe added
- Description modified (diff)
- Owner changed from alsroot to martin.langhoff
- Status changed from new to assigned
- Summary changed from Call fsync() for metadata writes to Fix metadata corruption
Re-titling to reflect the problem instead of a diagnosis that was likely premature/faulty.
After working on this codebase on the patchseries addressing #2317 and related dataloss/datacorruption issues with the datastore, my tendency is to NAK the proposed patch and the whole logic behind it.
The original report talks about "the Journal bringing up odd-looking entries", I take it to mean that we are working on corrupt/mismatched metadata between index and metadatastore.
A likely root cause for this is the implementation of propery storage in metadatastore.py ; a patch addressing this is at http://lists.sugarlabs.org/archive/sugar-devel/2012-September/039729.html , and commentary on the erroneous use of try/except/finally at http://lists.sugarlabs.org/archive/sugar-devel/2012-September/039746.html . The metadatastore implementation is rather prone to data corruption / loss.
Once the writes to metadatastore are done defensively (atomically, in the patch linked), it is valid to ask whether it is worth calling fsync() to keep in rough sync with the index. The answer is no: all the filesystems used in Linux today have fsync() on a per-filesystem basis. All "dirty" buffers for the relevant filesystem are written out, not just the filehandle you passed.
So if the metadata changes are pending to be flushed when the index is flush()ed, the fsync() that Xapian invokes will _also_ flush metadata changes to disk.
I'll also note that currently we flush the Xapian db to disk every 5s, much lower than the 60s in use when originally reported.
comment:5 Changed 10 years ago by dnarvaez
- Component changed from sugar-datastore to Sugar
comment:6 Changed 10 years ago by dnarvaez
- Resolution set to fixed
- Status changed from assigned to closed
It sounds like we are doing the best we can do these days.
Proposed patch