Opened 14 years ago

Closed 11 years ago

#1798 closed defect (notsugar)

Overlay corruption

Reported by: David Kergyl Owned by: sdz
Priority: Normal Milestone: Unspecified
Component: Sugar on a Stick (SoaS) Version: Unspecified
Severity: Critical Keywords:
Cc: mtd, FGrose, pbrobinson Distribution/OS: Fedora
Bug Status: Needinfo

Description

Around 3/5/10, I successfully created a Blueberry v2 SoaS on an Apacer 1Gb USB stick with a maximum overlay. I worked for about an hour, browsing, opening various activities, and (I think) downloading activities. During the session, at some point the LED light stopped blinking on the stick, and I received errors along the lines of "Cannot save to Journal."

Now, when I reboot, I receive the following text error during the boot:

[drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own
Boot has failed. Sleeping forever.

Any suggestions, other than to use a different stick?

The stick is still accessible in Windows XP, so please let me know if you need any files from it.

Change History (28)

comment:1 Changed 14 years ago by sascha_silbe

  • Component changed from sugar to SoaS
  • Distribution/OS changed from Fedora to SoaS
  • Keywords Lost communication USB SoaS Boot failed remove fb removed
  • Owner changed from tomeu to sdz
  • Severity changed from Blocker to Critical
  • Summary changed from Lost communication with USB SoaS during session; Can't restart to overlay corruption
  • Version changed from 0.86.x to Unspecified

Reassigning to SoaS.
This sounds like the usual "overlay corruption after running out of disk space" problem that's caused by using a DM/LVM snapshot for the overlay.

FWIW the Debian counterparts use aufs for the overlay and don't suffer from this issue (kudos to alexanderpirdy for researching that).

comment:2 follow-up: Changed 14 years ago by David Kergyl

Perhaps a simple work-around would be to issue a warning message when overlay space is critically low -- something like "Journal space is critically low; delete some journal entries now!"

comment:3 in reply to: ↑ 2 Changed 14 years ago by sascha_silbe

Replying to David Kergyl:

Perhaps a simple work-around would be to issue a warning message when overlay space is critically low -- something like "Journal space is critically low; delete some journal entries now!"

Unfortunately deleting files takes up overlay space as well (as paradox as that might sound), so that wouldn't help.

comment:4 follow-up: Changed 14 years ago by David Kergyl

Thanks, Sascha. This sounds like a serious problem, if there's no way to recover overlay space. Perhaps the warning message should be "The overlay is almost full; backup your active Journal work to a new USB stick NOW, and get a new Sugar-on-a-Stick."

The wiki should also be updated to warn new stick users about the importance of limiting downloads, including updates and activities, on 1Gb sticks.

FWIW, when I erase Journal entries on a new stick, Sugar reports the Journal space as being larger (even though the overlay space must be smaller if you are correct). Is journal space not the same as overlay space?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 14 years ago by sascha_silbe

Replying to David Kergyl:

Thanks, Sascha. This sounds like a serious problem, if there's no way to recover overlay space. Perhaps the warning message should be "The overlay is almost full; backup your active Journal work to a new USB stick NOW, and get a new Sugar-on-a-Stick."

AFAICT Sugar has no way of noticing that the overlay ran out of space.

The wiki should also be updated to warn new stick users about the importance of limiting downloads, including updates and activities, on 1Gb sticks.

It already does, but I agree it's not clear enough. If you have some time, it would be nice if you could improve those warnings.

FWIW, when I erase Journal entries on a new stick, Sugar reports the Journal space as being larger (even though the overlay space must be smaller if you are correct). Is journal space not the same as overlay space?

No, that's the problem. The filesystem is sized at 2GB, but reduced by using various hacks to make it fit on a smaller USB stick. Sugar sees only the original size of 2GB.

Bernie has manually installed a SoaS image directly on a USB stick, without using the size reducing hacks, thereby eliminating the issue. Unfortunately none of the tools seem to support this installation method, though.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 14 years ago by David Kergyl

Replying to sascha_silbe:

AFAICT Sugar has no way of noticing that the overlay ran out of space.

That's even worse. The user has no way to know if or when the overlay will run out and the SoaS will die, with all data lost (unless a school server has a back-up). I see this as a very serious problem that merits community attention. Is there a discussion on the wiki? If not, I'll happily create one. Where should it go?

It already does, but I agree it's not clear enough. If you have some time, it would be nice if you could improve those warnings.

Done.

Bernie has manually installed a SoaS image directly on a USB stick, without using the size reducing hacks, thereby eliminating the issue. Unfortunately none of the tools seem to support this installation method, though.

Considering how inexpensive larger USB sticks are, this seems to be the easiest and best solution, and worthy of more efforts.

comment:7 follow-ups: Changed 14 years ago by garycmartin

I don't have much hands on with USB (my Mac won't boot any so far for me to test, though the old PC's I occasionally bump into work fine), but I'm confused why folks are not ignoring all the live/overlay stuff (originally intended for CD Rom demo disks) and not just make a real install (I do this for my Sugar VMs)? Just burn an ISO to a CD, boot it, plug-in your USB device and then "sudo zyx-liveinstaller" from the Sugar Terminal. It walks you through partitioning, formatting, and then installs the OS to it as a regular install. You don't even have to reboot – though you might want to just to test magic and that the USB really boots ;-)

comment:8 in reply to: ↑ 6 Changed 14 years ago by sascha_silbe

Replying to David Kergyl:

I see this as a very serious problem that merits community attention. [...] Where should it go?

The SoaS mailing list would seem to be the best option.

comment:9 in reply to: ↑ 7 ; follow-up: Changed 14 years ago by David Kergyl

  • Summary changed from overlay corruption to Overlay corruption

Replying to garycmartin:
Thanks, Gary. I used LiveUSB because I followed the wiki instructions for creating a Sugar-on-a-Stick v2 USB on a Windows system. The instructions currently don't mention (i) the apparently inevitable problem of catastrophic corruption when the LiveUSB overlay eventually runs out of space without warning, or (ii) the option of going outside of Windows to do a regular, non-Live install.

Even in a regular, non-Live USB SoaS, though, what happens when the USB space fills up? Is there a warning to delete files?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 14 years ago by garycmartin

Replying to David Kergyl:

Replying to garycmartin:
Even in a regular, non-Live USB SoaS, though, what happens when the USB space fills up? Is there a warning to delete files?

Yes, well at least last time I tested (Sugar 0.86) and/or hit it accidentally. Actually it's not so great a solution as what happens is a modal dialogue with the warning information forces you to the Journal, and keeps on popping up even if you are in Journal trying to delete things, and for me it's also often not Journal items that have caused the low disk space, but some yum install, something else in terminal, or some large temp log file.

comment:11 in reply to: ↑ 7 Changed 14 years ago by David Kergyl

comment:12 in reply to: ↑ 10 ; follow-up: Changed 14 years ago by David Kergyl

Replying to garycmartin:

... what happens is a modal dialogue with the warning information forces you to the Journal, and keeps on popping up even if you are in Journal trying to delete things . ...

Right, I saw an open ticket on that problem. Still, it's a step in the right direction, even if USB space management still needs improvement. At least the manual approach is available. What would be the Terminal command to check the remaining drive space?

comment:13 in reply to: ↑ 12 ; follow-up: Changed 14 years ago by garycmartin

Replying to David Kergyl:

What would be the Terminal command to check the remaining drive space?

df -h should do the trick.

comment:14 in reply to: ↑ 13 Changed 14 years ago by David Kergyl

Replying to garycmartin:
Many thanks.

comment:15 Changed 14 years ago by satellit

Bernie's USB:

http://people.sugarlabs.org/bernie/soas-2-blueberry-direct-2GB.img.xz

works great as a real file ext3 USB. the 2GB .img can be expanded to any size you want: I have expanded it to fill a 4GB and a 8GB USB with gparted. Just write it to a larger USB/SD with dd. An additional benefit is that it can be edited while plugged in to a PC running Fedora.

see:http://people.sugarlabs.org/Tgillard/soas-2-blueberry-direct-cleared-3GB.txt

Text of e-mail I sent Today:
"Soas needs a way for a SoaS USB stick to check the free space it has and do deletions of oldest journal entries and overlay files (or warn that it is almost full before writing to journal) before filling the USB up so it cannot reboot and becomes unusable."

I see that the XO-1 software does this for it's nand....

http://wiki.laptop.org/go/Tests/Journal/Nand-full

Tom Gilliard
satellit

comment:16 Changed 14 years ago by sascha_silbe

  • Distribution/OS changed from SoaS to Unspecified

Bulk change distribution=SoaS -> component=SoaS

comment:17 Changed 14 years ago by FGrose

  • Bug Status changed from Unconfirmed to New
  • Distribution/OS changed from Unspecified to Fedora

See http://wiki.sugarlabs.org/go/LiveOS_image for information on optimizing storage with a LiveOS image.

There is a new, utility script named Sugar Cellar that will report on the overlay fill.
See http://wiki.sugarlabs.org/go/Sugar_on_a_Stick/Sugar_Clone.

comment:18 follow-up: Changed 14 years ago by mchua

This bug has also been reported by the CFS deployment; over half the sticks failed midway through spring semester in this manner (it took several months for the overlay to fill).

A solution to this bug is being drafted as a feature, http://wiki.sugarlabs.org/go/More_robust_iso, and taken through the soas feature process, http://wiki.sugarlabs.org/go/Sugar_on_a_Stick_release_process#Feature_process.

This feature still needs an owner to champion it.

comment:19 Changed 14 years ago by mtd

  • Cc mtd added

comment:20 Changed 14 years ago by mchua

From an email by pbrobinson to lmacken, wrt liveusb-creator fix resources:

1) bernie's stuff. Having read through this i think its a little more complex than it needs to be, but its a useful reference.

http://lists.sugarlabs.org/archive/soas/2010-January/000654.html

2) martin's git repo: http://git.sugarlabs.org/projects/soas/repos/mainline/trees/blueberry

3) parted for windows. I can't find it but I'm sure I've seen it. I
might have been smoking crack! I'll have a look and follow up.

4) My way. This isn't tested but I will do next week.

fdisk /dev/sdX
t 82
a
w
mkdir /mnt/iso /mnt/sfs
mount -o loop liveimage.iso /mnt/iso
mount -o loop /mnt/iso/LiveOS/squashfs.img /mnt/sfs
dd if=/mnt/sfs/LiveOS/ext3fs.img /dev/sdX1

I'm not sure the best way to do the boot loader. Might be easier to
stick with syslinux. I need to think about it. Thoughts?

comment:21 in reply to: ↑ 18 ; follow-up: Changed 14 years ago by David Kergyl

Replying to mchua:

This bug has also been reported by the CFS deployment; over half the sticks failed midway through spring semester in this manner (it took several months for the overlay to fill).

Considering the magnitude and critical nature of this problem, I'm very surprised that the LiveUSB option (rather than the non-live option) is still the standard deployment method for Sugar/Soas. Why make an SoaS that will most likely fail within a few months?

At a minimum, both the wiki and the Creation Kit should include a clearly visible warning to LiveUSB users, stating that: (i) LiveUSB sticks may run out of space after a few months (or less) and become corrupted, (ii) large downloads to LiveUSB sticks should be avoided, (iii) available space should be checked regularly, and (iv) important work should be backed up regularly to another USB stick or school server. The current Cautions with using Live USB devices section is now buried links deep, not to be seen by an average user.

I also think that both the wiki and the Creation Kit should tell users that non-live USB sticks can be made, and provide clear and simple instructions to do it. (I previously edited the Windows installation instructions for Blueberry to include the option of booting into Sugar and using ZyX-LiveInstaller from Terminal to create a non-live stick (see Blueberry instructions, option 2) as suggested above by Gary Martin, but the new Mirabelle instructions don't make any mention of this option.)

If I was a parent of a child at CFS, and my child's stick died and lost his work, I'd be very upset.

comment:22 Changed 14 years ago by mchua

I wish I could respond to David's excellent points above, and want to come back to them (and hope someone does!) but I have to run in a bit and only have time to upload an update from Luke right now.

Short version: Luke can't commit to helping with this feature at the moment, but did his best to braindump and unblock us (details below) and we should consider ourselves on our own from here on out. The notes at http://bugs.sugarlabs.org/ticket/1798#comment:20 are our best bet, and he has nothing further to add.

We need to find someone(s) to just sit down and do this - I'm willing to test afterwards but don't have the skills to write the needed code fast enough. If we can't push these changes by 8/3 (Alpha freeze) we should look at implementing at least the doc changes suggested by David above.

comment:23 in reply to: ↑ 21 Changed 14 years ago by FGrose

Replying to David Kergyl:

School-based deployments may want to take advantage of the persistent /home/ directory installation instead of the persistent OS overlay. The persistent home folder is not a write-once, ever-diminishing storage space.

For consistency and stability reasons, the school may want to leave out the persistent OS overlay (using a temporary overlay on each boot) and just use the persistent /home/ directories to save the Learners' Journals and settings.

If a school-wide system change were desired, the Sticks could be upgraded with a simple exchange of the /LiveOS/squashfs.img file in the base file system of each device or stick, perhaps, even triggered by the School Server.

Installation of a persistent /home/ directory is available at the present time by means of the livecd-iso-to-disk shell script (which is available within a booted Mirabelle image at /LiveOS/livecd-iso-to-disk).

There are other features that are available in http://wiki.sugarlabs.org/go/Sugar_on_a_Stick/Sugar_Clone, and storage space optimization is discussed on this page, http://wiki.sugarlabs.org/go/LiveOS_image.

The Sugar Clone enhancements have been submitted for inclusion in the standard livecd-iso-to-disk script, see http://www.mail-archive.com/soas@lists.sugarlabs.org/msg01608.html.

comment:25 Changed 11 years ago by FGrose

See http://fedoraproject.org/wiki/LiveOS_image#Overlay_recovery for a way to recover from overlay overconsumption.

There is also a shell script to merge an overlay into a new squashfs image and refresh the overlay at http://fedoraproject.org/wiki/LiveOS_image#Merge_overlay_into_new_image.

(See also the Latest news: at http://wiki.sugarlabs.org/go/Sugar_on_a_Stick/Sugar_Clone.)

comment:26 follow-up: Changed 11 years ago by godiard

  • Cc FGrose added

FGrose, Can we close this ticket?

comment:27 in reply to: ↑ 26 Changed 11 years ago by FGrose

  • Bug Status changed from New to Needinfo
  • Cc pbrobinson added

Replying to godiard:
Comment:25 covers the situation with the current LiveOS filesystem technology.

This bug could be resolved as 'notsugar' (notsoas), as the problem is a limitation of the LiveOS technology; or 'wontfix' because changing the technology is not within the scope of the SoaS project; or left open, if there is some expectation that pending technology changes will address this situation.

For the record here, I would request that Peter Robinson kindly comment about the potential for the adoption of btrfs in future LiveOS images, and how that adoption or another change might affect the situation.

comment:28 Changed 11 years ago by godiard

  • Resolution set to notsugar
  • Status changed from new to closed

Ok, closing as notsugar, Peter can comment anyway.

Note: See TracTickets for help on using tickets.