%HTMLlat1; %HTMLsymbol; %HTMLspecial; ]> Hyphenation rules the nation
Roland van Ipen­burg
To be stolen or blogged

Hyphen­ation rules the na­tion

Thurs­day 23 July 2009 04:17

As I men­tioned ear­li­er, I'm work­ing on a Perl mod­ule called HTML::Hyphen­ate, and now rolling out a test on this blog. I've added a fil­ter in my main Bri­co­lage web tem­plate that in­serts all the soft hy­phens in the body of every page. So if your brows­er is not to­tal­ly re­tard­ed you could see some words on this page be­ing bro­ken onto more than one line, and the text of the posts should be nice­ly jus­ti­fied. If your brows­er is some­what re­tard­ed you might just see all the in­sert­ed soft hy­phens as dash­es and read this text very slow. My sort of eng­lish doesn't have very long words, but since I set the hy­phen­ation al­go­rithm to use a min­i­mum word length of five char­ac­ters words like "wa­ter" can al­ready be bro­ken into "wa-ter". Lan­guages oth­er than en_US should also be han­dled cor­rect­ly, and even scripts like San­skrit and Mon­go­lian should just work. And of course Ara­bic, be­cause that doesn't have hy­phen­ation.

Dif­fer­ent browsers do of course dif­fer­ent things. Opera for ex­am­ple places the dash­es as which the soft hy­phens are re­vealed as at breaks like they ap­pear out­side of the text box. Which is ugly. Sa­fari does it bet­ter, lynx seems to ig­nore the soft hy­phens, Amaya shows them as dash­es. Browsers could of course im­ple­ment the al­go­rithms them­selves and do the hy­phen­ation with­out the au­thor sup­ply­ing all the points, but do­ing that sud­den­ly won't make the web look much bet­ter. As soon as you start with hy­phen­ation a lot of things can break. You need to spec­i­fy the lan­guage for every word so for ex­am­ple ger­man words don't get weird breaks as if they were eng­lish, and in­sert nobr tags or CSS equiv­a­lents to pre­vent ugly stuff, like "Star-ck". So now I have to look into CSS to see if that in some way can be used to tell that words should be bro­ken dif­fer­ent­ly if they are the last word in a sen­tence or para­graph.


Book­mark this on De­li­cious

Add to Stum­bleUpon

Add to Mixx!



application away browser buy cool data days different flash game gta html ibook internet linux movie open play playstation possible run screen server side site stuff system train web windows work

Blog Posts (418)

Image Gal­leries

ipen­bug Last.fm pro­file

ipen­bug last.fm pro­file

Fol­low me on Twit­ter

Roland van Ipen­burg on face­book
Lin­ux Regis­tered User #488795
rolipe BOINC com­bined stats


Add to Google

Valid XHTML + RFDa Valid CSS! Hy­phen­at­ed XSL Pow­ered Valid RSS This site was cre­at­ed with Vim Pow­ered by Bri­co­lage! Pow­ered by Post­greSQL! Pow­ered by Apache! Pow­ered by mod­_perl! Pow­ered by Ma­son! Pow­ered by Perl Made on a Mac Pow­ered By Mac OS X XS4ALL This site has been proofed for ac­cu­ra­cy on the VISTAWEB-3000 Creative Com­mons Li­cense
This work by Roland van Ipen­burg is li­censed un­der a Creative Com­mons At­tri­bu­tion-Non­com­mer­cial-Share Alike 3.0 Un­port­ed Li­cense.
Per­mis­sions be­yond the scope of this li­cense may be avail­able at mail­to:ipen­burg@xs4all.nl.