Hyphen­ation rules the na­tion

Thurs­day 23 July 2009 04:17

As I men­tioned ear­li­er, I'm work­ing on a Perl mod­ule called HTML::Hyphen­ate, and now rolling out a test on this blog. I've added a fil­ter in my main Bri­co­lage web tem­plate that in­serts all the soft hy­phens in the body of every page. So if your brows­er is not to­tal­ly re­tard­ed you could see some words on this page be­ing bro­ken onto more than one line, and the text of the posts should be nice­ly jus­ti­fied. If your brows­er is some­what re­tard­ed you might just see all the in­sert­ed soft hy­phens as dash­es and read this text very slow. My sort of eng­lish doesn't have very long words, but since I set the hy­phen­ation al­go­rithm to use a min­i­mum word length of five char­ac­ters words like "wa­ter" can al­ready be bro­ken into "wa-ter". Lan­guages oth­er than en_US should also be han­dled cor­rect­ly, and even scripts like San­skrit and Mon­go­lian should just work. And of course Ara­bic, be­cause that doesn't have hy­phen­ation.

Dif­fer­ent browsers do of course dif­fer­ent things. Opera for ex­am­ple places the dash­es as which the soft hy­phens are re­vealed as at breaks like they ap­pear out­side of the text box. Which is ugly. Sa­fari does it bet­ter, lynx seems to ig­nore the soft hy­phens, Amaya shows them as dash­es. Browsers could of course im­ple­ment the al­go­rithms them­selves and do the hy­phen­ation with­out the au­thor sup­ply­ing all the points, but do­ing that sud­den­ly won't make the web look much bet­ter. As soon as you start with hy­phen­ation a lot of things can break. You need to spec­i­fy the lan­guage for every word so for ex­am­ple ger­man words don't get weird breaks as if they were eng­lish, and in­sert nobr tags or CSS equiv­a­lents to pre­vent ugly stuff, like "Star-ck". So now I have to look into CSS to see if that in some way can be used to tell that words should be bro­ken dif­fer­ent­ly if they are the last word in a sen­tence or para­graph.


